Automatic extraction method for abstracts based on public company announcements

An automatic extraction and announcement technology, applied in natural language data processing, special data processing applications, instruments, etc., can solve the problems of affecting the accuracy of node sentence weights, ignoring, affecting the accuracy of abstracts, etc.

Active Publication Date: 2016-12-14
SUN YAT SEN UNIV
View PDF7 Cites 36 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the method based on graph sorting has defects in calculating the similarity between sentences, ignoring the unique characteristics of listed company announcements
The title of a listed company’s announcement document often contains a lot of key information, so a sentence with a high degree of similarity to the title of the announcement is more likely to become an abstract, and this sentence will have a greater impact on the surrounding sentences. In addition, the listed company’s announcement It often contains a lot of key terms (restructuring

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic extraction method for abstracts based on public company announcements
  • Automatic extraction method for abstracts based on public company announcements
  • Automatic extraction method for abstracts based on public company announcements

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0051] In order to make the purpose, technical solutions and advantages of the present invention clearer, the following references are attached figure 1 Give further details.

[0052] An automatic extraction method based on the announcement summary of a listed company, which specifically includes the following steps:

[0053] S1: Crawling the announcement documents of listed companies from the stock exchange to form an announcement document database, where each document is used as the target document to be extracted;

[0054] S2: Use the word2vec model to obtain word vectors from the text corpus;

[0055] The specific steps include:

[0056] (1) Word segmentation;

[0057] Perform word segmentation on the announcement document, filter out low-frequency words, and remove stop words, special symbols, punctuation marks and some marking information;

[0058] (2) Construct a Huffman tree;

[0059] In the constructed Huffman tree, all non-leaf nodes store a parameter vector, and all leaf nodes...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an automatic extraction method for abstracts based on public company announcements. The automatic extraction method for abstracts comprises the following steps: S1, obtaining public company announcement files from securities exchanges to form an announcement file database; S2, utilizing a word2vec model to obtain word vectors from a text corpus; S3, calculating out the similarity between sentences to construct a sentence graph model; S4, calculating out the weight of the sentences; S5, adjusting a sentence weight matrix according to sentence positions; S6, choosing sentences which are maximum in weight and free of redundancy to form an abstract. The automatic extraction method for abstracts based on the public company announcements can provide accurate abstract files with higher readability for financial market investors, help the investors to understand in a shorter time and well make investment judgments and also provide important indexes for quantitative fund companies.

Description

technical field [0001] The invention relates to the field of data extraction, in particular to an automatic extraction method based on an abstract of a listed company announcement. Background technique [0002] As of mid-June 2016, there were a total of 2,832 stocks in the Shanghai and Shenzhen stock markets, and hundreds to thousands of announcements were issued every day. With the rapid development of the Internet, the cost of editing is getting lower and lower, the dissemination of information is faster and faster, and the number of daily announcements is also increasing rapidly. At present, the announcements of listed companies are generally lengthy and the terminology is professional. However, most investors in China are retail investors and do not have enough time to read the announcements carefully. Moreover, it is difficult for ordinary investors to quickly identify the important content and make reasonable judgments. Therefore, it is very important and valuable to ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27
CPCG06F40/258
Inventor 郑子彬李阳
Owner SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products