The invention relates to a sequence grammar
annotation framework fused generation type text abstract method, and belongs to the field of
natural language processing. The method mainly aims to solve the problem that an existing model does not consider a grammar structure when generating an abstract, so that the generated abstract does not meet grammar rules. The method comprises the following steps: firstly, carrying out component syntactic analysis on sentences by utilizing an
open source syntactic analyzer Berkley Parser to generate a
phrase analysis tree; secondly, linearizing the
phrase analysis tree into a structure
label sequence through a depth-first traversal
algorithm; vectorizing the grammar
annotation sequence by using a word2vec tool; and finally, inputting the source
syntax structure information into an
encoder, encoding and decoding through an abstract generation module, and finally generating an abstract. The experiment is carried out on a CNN / Day Mail
data set, and the result shows that the problems of super-outline words, repeated phrases, non-significant themes and the like are solved, the generated abstract basically meets grammar rules, the
readability is higher,the abstract is more consistent with
source text grammar, and the ROUGE
score is improved to a certain extent compared with an advanced
algorithm.