Automatic abstracting method and device based on XLNet

A technology of automatic summarization and summarization, applied in the field of information processing, can solve the problems of not considering sentence position information and the adverse effects of document summarization tasks, and achieve the effect of alleviating insufficient semantic extraction and improving comprehension ability.

Active Publication Date: 2020-09-15
南京优慧信安科技有限公司
View PDF3 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Among them, XLNet adopts the relative position encoding of words calculated, which theoretically supports infinitely long document sequence modeling, but it does not consider the position information of sentences in documents, which may have an adverse impact on document summarization tasks.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic abstracting method and device based on XLNet
  • Automatic abstracting method and device based on XLNet
  • Automatic abstracting method and device based on XLNet

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] Below in conjunction with specific embodiment, further illustrate the present invention, should be understood that these embodiments are only used to illustrate the present invention and are not intended to limit the scope of the present invention, after having read the present invention, those skilled in the art will understand various equivalent forms of the present invention All modifications fall within the scope defined by the appended claims of the present application.

[0031] In this embodiment, the CNN / DailyMail data set is used to illustrate the method. This dataset is the largest and most widely used dataset in the field of text summarization, including 92,579 news articles from CNN and 219,506 news articles from DailyMail. Each news article is accompanied by a corresponding manual summary sentence. According to the division method recommended by the data set, it is divided into training set, verification set and test set (respectively 287227, 13368, 11490 ar...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an automatic abstracting method and device based on XLNet. The automatic abstracting method comprises the following steps: firstly, carrying out data preprocessing, segmentinga text according to sentences, segmenting sentences according to words, and adding displayed placeholders for defining sentence boundaries in front of each sentence; then, constructing an abstract model XLNetSum, wherein global position codes are added to the model on the basis of XLNet, and dynamic word vectors corresponding to placeholders and the global position codes serve as sentence features; and after the model is trained by using the training data and the verification data, obtaining scores of all sentences in the test data through the trained model, and selecting a plurality of sentences as abstracts through a post-processing step. The abstract model extracts text information through the deep neural language model XLNet, mines semantics of words and context syntax structures, processes text sequences with uncertain lengths, can flexibly and accurately judge the importance of all sentences in the text, and then extracts abstract sentences of the text.

Description

technical field [0001] The invention belongs to the technical field of information processing, and in particular relates to an automatic summarization method and device based on an XLNet model. It mainly uses the neural language model XLNet to extract the semantic information of the text, which overcomes the problem of insufficient extraction of word semantics and syntactic structure information by traditional methods, and does not limit the sequence length of the input text, and can flexibly and accurately extract summary sentences from the text. Background technique [0002] With the continuous development of Internet technology, especially mobile Internet technology, people's study, work, life and other aspects are closely related to the Internet. Information on the Internet has brought convenience to people, but in the face of a huge amount of network information, it is difficult for people to select the most useful information for themselves. Automatic text summarizatio...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/289G06F40/30G06N3/04G06N3/08
CPCG06F40/289G06F40/30G06N3/084G06N3/045Y02D10/00
Inventor 杨鹏李文翰杨浩然
Owner 南京优慧信安科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products