Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A generation type text abstract method fusing a sequence grammar annotation framework

A technology of fusion sequence and generative type, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve the problem that the abstract does not meet the grammar rules, and achieve the effect of strong readability

Inactive Publication Date: 2019-06-28
BEIJING INSTITUTE OF TECHNOLOGYGY
View PDF4 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0012] The purpose of the present invention is to solve the problem that the summary produced by the existing text summarization method does not meet the grammatical rules, and proposes a generative text summarization method that fuses the sequence grammar annotation framework

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A generation type text abstract method fusing a sequence grammar annotation framework
  • A generation type text abstract method fusing a sequence grammar annotation framework
  • A generation type text abstract method fusing a sequence grammar annotation framework

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] In order to better illustrate the purpose and advantages of the present invention, the implementation of the method of the present invention will be further described in detail below in conjunction with examples.

[0025] Experiments use the CNN / Daily Mail dataset, which contains online news articles (average 781 token tokens) and multi-sentence summaries (average 3.75 sentences or 56 token tokens). The dataset has a total of 287,226 training pairs, 13,368 validation pairs, and 11,490 testing pairs.

[0026] During the experiment, the maximum sentence length of the source end is 400, and the maximum sentence length of the target end is 100. The maximum vocabulary frequency limit for source and target is 50,000 words. All superclass words are uniformly marked as UNK. In addition, the vocabulary dimension is set to 128, the encoder and decoder hidden layer dimensions are set to 256, the encoder and decoder vocabulary size is 50000, the number of batch gradients is 16, a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a sequence grammar annotation framework fused generation type text abstract method, and belongs to the field of natural language processing. The method mainly aims to solve the problem that an existing model does not consider a grammar structure when generating an abstract, so that the generated abstract does not meet grammar rules. The method comprises the following steps: firstly, carrying out component syntactic analysis on sentences by utilizing an open source syntactic analyzer Berkley Parser to generate a phrase analysis tree; secondly, linearizing the phrase analysis tree into a structure label sequence through a depth-first traversal algorithm; vectorizing the grammar annotation sequence by using a word2vec tool; and finally, inputting the source syntax structure information into an encoder, encoding and decoding through an abstract generation module, and finally generating an abstract. The experiment is carried out on a CNN / Day Mail data set, and the result shows that the problems of super-outline words, repeated phrases, non-significant themes and the like are solved, the generated abstract basically meets grammar rules, the readability is higher,the abstract is more consistent with source text grammar, and the ROUGE score is improved to a certain extent compared with an advanced algorithm.

Description

technical field [0001] The invention relates to automatic text summarization technology, in particular to a generative text summarization method fused with a sequence grammar tagging framework, which belongs to the fields of natural language processing and machine learning. Background technique [0002] There are two types of text summarization techniques: extractive and generative. Among them, the extractive method forms a summary by extracting important sentences in the source text, while the generative summary uses natural language processing related technologies to generate new sentences to generate a summary by understanding the semantics of the text. In recent years, extractive text summarization technology has entered a bottleneck period due to its own limitations, and generative summarization tasks have gradually occupied the position of mainstream research. [0003] The development and application of deep learning, especially the introduction of the Encoder-Decoder...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06F17/21
Inventor 罗森林杨俊楠潘丽敏王睿怡吴舟婷
Owner BEIJING INSTITUTE OF TECHNOLOGYGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products