Announcement text key information extraction method and device

A technology of key information and extraction methods, which is applied in the field of key information extraction of announcement texts, can solve the problems of key information extraction of announcement texts, cannot solve the problems of fast and accurate extraction of key information, and output of undisclosed information, so as to improve the efficiency of reading analysis and avoid standard Inconsistency, the effect of reducing the time to extract data

Active Publication Date: 2019-06-25
厦门商集网络科技有限责任公司
View PDF5 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This technical solution does not extract the key information from the announcement text, and the information it pushes to the user is still a complete announcement file, which cannot solve the technical problem of fast and accurate extraction of key information proposed by the present invention, and the technical solution has not been made public. How to output the information of each part of the announcement into a structured description form

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Announcement text key information extraction method and device
  • Announcement text key information extraction method and device
  • Announcement text key information extraction method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0034] see figure 2 , a method for extracting key information of an announcement text, comprising the steps of: converting the announcement text into an HTML file, wherein the HTML file includes DIV controls, and each DIV control corresponds to a line of text; extracting text information and text information according to the description style of the DIV controls Table information, and in the process of extraction, the adjacent semantically related lines are merged into paragraphs, and the lines that have no semantic relationship with adjacent lines are independently formed into paragraphs to obtain structured text, and establish a key information form containing keywords (such as Figure 8 As shown), the key information is obtained through feature engineering, and the key information is written into the key information form to complete the key information extraction of the announcement text, such as Figure 9 shown. The invention discloses a method and device for extracting ...

Embodiment 2

[0047] A device for extracting key information from an announcement text, comprising a memory and a processor, the memory stores instructions, the instructions are suitable for being loaded by the processor and performing the following steps: converting the announcement text into an HTML file, the HTML file contains DIV control, each DIV control corresponds to a line of text; text information and table information are extracted according to the description style of the DIV control, and the adjacent semantically related lines are merged into paragraphs during the extraction process, and there is no semantic relationship with adjacent lines Associated lines are independently formed into paragraphs to obtain structured text; a key information form containing keywords is established; key information is obtained through feature engineering, and the key information is written into the key information form.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an announcement text key information extraction method, which comprises the following steps: converting an announcement text into an HTML file, wherein the HTML file comprisesDIV controls, and each DIV control correspondingly represents a row of characters; extracting text information and table information according to the description style of the DIV control, merging adjacent semantically associated rows into paragraphs in the extraction process, and independently forming the paragraphs with the rows without semantic association with the adjacent rows to obtain structured texts; establishing a key information form containing keywords; and obtaining key information through feature engineering, and writing the key information into the key information form to complete key information extraction of the announcement text. The announcement text can be deeply analyzed, the unstructured data is converted into the structured text, key information can be quickly and accurately extracted, the manual data extraction time is greatly shortened, the research and investment efficiency and accuracy are improved, and a value is created for the analysis process.

Description

technical field [0001] The invention relates to a method and equipment for extracting key information of an announcement text, belonging to the field of natural language processing. Background technique [0002] The text of the announcement, taking the announcement of a listed company as an example, means that the listed company publishes relevant company information to the public through a designated platform in accordance with the requirements of the China Securities Regulatory Commission. In the process of stock market investment research, the announcements and disclosures of listed companies are an important reference for investors, especially for professional institutional researchers, mining important information from announcements is a necessary process for daily investment research. However, most of the announcement texts are expressed in unstructured natural language, and the description patterns and phrases vary greatly, making manual processing difficult, and some...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
Inventor 邱涛吴胜杰翁安栋
Owner 厦门商集网络科技有限责任公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products