Official document abstract generation model combining extraction type and generation type

A generative model and extraction technology, applied in unstructured text data retrieval, text database browsing/visualization, instruments, etc., can solve a large number of manual labeling data and other problems, and achieve the effect of enhancing semantic meaning and accurate representation

Active Publication Date: 2019-08-13
CETC BIGDATA RES INST CO LTD
View PDF6 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the training of the generative algorithm model requires a large amount of manually labeled data. In the case of limited manpower, financial resources and time, the application of the generative algorithm has been limited. The combined document abstract generation method effectively solves this problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Official document abstract generation model combining extraction type and generation type
  • Official document abstract generation model combining extraction type and generation type
  • Official document abstract generation model combining extraction type and generation type

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] The technical solution of the present invention is further described below, but the scope of protection is not limited to the description.

[0037] Such as image 3 As shown, a document abstract generation model combining extractive and generative methods; first, filter the content of official documents, remove the noise data of document abstracts, and clean and preprocess the processed data, and then use the extractive abstract model to generate weak labels Dataset A, secondly, enhance the quality of weakly labeled dataset A by means of summary coherence and increasing the number of high-confidence samples, and finally use weakly labeled dataset A to train the generative summary model to obtain the official document summary generation model.

[0038] Specifically include the following steps:

[0039] ① Document content screening: From the document data corpus, the document content is screened to remove the document summary noise data in the document;

[0040] ②Data c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an official document abstract generation model combining an extraction type and a generation type. The extraction type abstract and the generation type abstract are combined andofficial document data is screened and pre-processed. Meanwhile, the semantic meaning of weak label data generated by the extraction type abstract is enhanced. An official document text abstract automatic generation model is learned to realize automatic generation of the official document abstract. Compared with a traditional abstract generation method based on end-to-end and adding an attentionmechanism, the method has the advantages that the problem of lack of training data is solved, data screening and semantic enhancement are carried out on the basis of the characteristics of the official document data, and the semantic meaning of the official document text can be represented more accurately.

Description

technical field [0001] The invention relates to a generation model of an official document summary combining an extraction method and a generation method, and belongs to the technical field of natural language processing. Background technique [0002] The existence of a large amount of government document text data makes it very difficult for people to search and consult in a targeted manner, and the huge amount of information makes people spend a lot of time browsing and reading. Therefore, how to quickly extract key content from a large amount of official document information through automated methods to solve the problem of information overload has become an urgent need, and automatic document summarization technology is one of the feasible and effective solutions. [0003] Text summarization technology can be divided into extractive summarization and generative summarization according to the type of summarization generated. The former is to sort the importance of the se...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/34G06F17/27
CPCG06F16/345G06F40/289G06F40/30Y02D10/00
Inventor 宋荣伟王进王鹏
Owner CETC BIGDATA RES INST CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products