Text key information extraction method and device and medium

A technology of key information and extraction method, applied in the field of text key information extraction, can solve the problems of high similarity or repetition of key information, insufficient conciseness of text abstracts, etc., to avoid repetition and omission, accurate key information, and ensure diversification. Effect

Active Publication Date: 2019-06-25
安徽省泰岳祥升软件有限公司
View PDF3 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The TextRank algorithm is used to extract key information from the text. Due to the limitations of the TextRank algorithm itself, the extracted key information is prone to high similarity or repeated content.
For example, the TextRank algorithm is used to extract 5 sentences from a text A including 100 sentences to form a text summary of text A, but there may be 3 sentences with high similarity in the text summary, and these 3 sentences express The semantic information output is very similar, resulting in the text summary is not concise enough

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text key information extraction method and device and medium
  • Text key information extraction method and device and medium
  • Text key information extraction method and device and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] The method of using the TextRank algorithm to extract key information from the text, in addition to the aforementioned problem of not being brief enough, also has the problem of easily missing key information. On the one hand, since the key information (such as a text summary) is formed by combining the top-ranked components (such as sentences), if the similarity between certain components before the test is very high, these highly similar Composition units will not only lead to duplication of key information content, but also cause composition units with low content similarity but relatively low ranking to be mistaken for unimportant information, resulting in the omission of key information. On the other hand, even if there is a low degree of repetition between the top-ranked constituent units, there will still be some relatively low-ranked constituent units that actually express important information in the text that are mistaken for unimportant information, resulting ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses a text key information extraction method and device and a medium. The extraction method comprises the following steps: obtaining a to-be-extracted text, wherein the to-be-extracted text comprises a title and a main body; generating a first list, wherein the first list comprises at least one candidate key unit, and the candidate key units are composition units extracted from the main body by using the similarity weight of the composition units in the main body; selecting a title similarity unit from the main body, the title similarity unit being a composition unit having the highest similarity with the title; if the title similarity unit is different from any candidate key unit, adding the title similarity unit into the first list; and generating key information by using the first list. By means of the extraction method, the situation that the extracted key information is missed can be avoided, the accuracy of the extracted key information is improved, and meanwhile the extracted key information is relatively simpler and more comprehensive.

Description

technical field [0001] The invention relates to the fields of information extraction and text mining, in particular to a method, device and medium for extracting key text information. Background technique [0002] With the continuous development of information technology, massive data has become the most valuable wealth. How to quickly and accurately grasp information and make accurate and reasonable decisions has become the only way for enterprises to survive and develop. This requires mining effective key information from massive texts, such as text summaries, keywords, etc. [0003] Many texts, such as news texts on the Internet, are unstructured texts. To dig out effective and structured key information from these unstructured texts, the TextRank algorithm can be used. The TextRank algorithm is a graph-based ranking algorithm for text. The basic idea is to divide the text into several components (such as sentences) and establish a graph model, use the voting mechanis...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F16/31G06F16/34
Inventor 吴云鹤李德彦吴少军
Owner 安徽省泰岳祥升软件有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products