Long-text structured text abstract extraction method

An extraction method and a structured technology, which is applied in the field of abstraction of long structured texts, can solve the problems of missing important information of long texts, unfriendliness, and irrelevant adjacent paragraphs, etc.

Pending Publication Date: 2020-02-11
南京星耀智能科技有限公司
View PDF4 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, the current text abstract extraction mainly has the following problems: (1) the existing abstract extraction model does not solve the long text problem well in the encoding process. For the long text problem, the prior art mainly adopts a direct truncation method, The truncated data is encoded. This operation will most likely lose important information in the long text; there is also a technique to add encoding representations between paragraphs when encoding. This technique has certain limitations, such as the input text Not segmented, or there is no correlation between adjacent paragraphs
(2) The data currently disclosed for extracting Chinese summaries involves a single field, and the text of a single data is short. This data is not friendly to the training task of extracting long text summaries in special fields

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Long-text structured text abstract extraction method
  • Long-text structured text abstract extraction method
  • Long-text structured text abstract extraction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] The following will further describe the accompanying drawings of the present invention in conjunction with the embodiments. The present invention provides a method for extracting a long text structured text abstract, the flow chart of the method is as follows figure 1 shown. The specific implementation process is as follows:

[0029] First, the input long text information is divided into sentences according to punctuation marks, and each sentence is converted into a vector matrix of the sentence using Bert WordEmbedding dynamic word embedding processing.

[0030] Secondly, analyze the chapter structure of the text. The model structure of this part is as follows: figure 2 shown. Put every two adjacent clauses into two bidirectional GRU models for processing, splice the hidden layer information of the two models, put the spliced ​​results into the multi-layer perceptron for classification, and get the predicted category probability , take the category label with the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

According to a long-text structured text abstract extraction method provided by the invention, word vectors can be dynamically obtained according to surrounding words by adopting a dynamic word embedding method, so that the problem of polysemy in the text is solved. Text structure analysis is adopted so that paragraphs are reasonably divided according to relation recognition result between sentences, and a computer is made to understand the text from the global perspective. Abstract extraction based on a model and a rule is adopted, and abstract extraction is carried out on each section on thebasis of chapter structure analysis, so that the problem of direct interception of a traditional long text abstract is solved; and the problem of multi-field text abstract extraction is solved.

Description

technical field [0001] The invention belongs to the technical field of natural language processing, and in particular relates to a method for extracting long text structured text summaries. Background technique [0002] At present, the abstract extraction of long text generally involves three parts: word embedding, text abstract extraction, and discourse structure analysis. For word embedding, the words in the text data are converted into numerical vectors that can be learned by the machine. The traditional word embedding is First use one-hot encoding for the words in the text, and then put them into the Word2Vec model for learning, and finally complete the mapping from the text to the numerical vector. This method is simple and efficient, but it cannot solve the problem of polysemy, because each word under Word2Vec / Words have only one fixed representation, and word / word occurrences are context-independent. [0003] Text summary extraction is a process in which the machine...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/34G06F40/205G06F40/253
CPCG06F16/345
Inventor 杨理想王云甘周亚黄家君徐慧
Owner 南京星耀智能科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products