Text abstract generation method based on advanced semantics

A high-level semantic and text technology, applied in the field of natural language processing, can solve problems such as the inability to solve the loss of low-frequency vocabulary information, and achieve the effect of reducing information loss and improving accuracy

Active Publication Date: 2019-07-09
ZHEJIANG UNIV
View PDF5 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This method cannot better deal with the lo

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text abstract generation method based on advanced semantics
  • Text abstract generation method based on advanced semantics
  • Text abstract generation method based on advanced semantics

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0052] The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be noted that the following embodiments are intended to facilitate the understanding of the present invention, but do not limit it in any way.

[0053] Such as figure 1 As shown, a text summarization method based on advanced semantics includes the following steps:

[0054] S01, see figure 2 In the S01 part, use text segmentation tools, such as CoreNLP / Jieba, etc., to segment the text corpus and convert it into a sequence of semantic tags (such as part-of-speech sequences, named entity sequences) corresponding to vocabulary one-to-one. Since the model needs to use the high-level semantic information of the vocabulary, it is first necessary to use text processing tools such as CoreNLP / Jieba to process the original text data. On the one hand, the text (especially Chinese) needs to be segmented first, and the smallest unit of the corpus i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text abstract generation method based on advanced semantics. The text abstract generation method comprises the following steps: (1) carrying out word segmentation on text corpora and converting the text corpora into semantic tag sequences in one-to-one correspondence with vocabularies; (2) on the text abstract model, using a bidirectional circulation network as an encoderto encode the vocabulary sequence and the semantic tag sequence, and abstract representation on vocabularies and abstract representation on semantics are obtained; (3) combining the abstract representation on the vocabulary with the abstract representation on the semantics; (4) sending the merged abstract representation into a decoder, respectively calculating vocabulary attention weight and semantic attention weight, and predicting probability distribution of each step of the sequence on a word list; and (5) combining the attention weight distribution and the word list probability distribution to obtain final output probability distribution, converting the final output probability distribution into readable vocabularies, and connecting the readable vocabularies in series to form sentences for outputting. According to the method, the accuracy of predicting the low-frequency words and carrying out the text abstract on the unlabeled data by the model can be improved.

Description

technical field [0001] The invention belongs to the field of natural language processing, in particular to a method for generating text summaries based on high-level semantics. Background technique [0002] Text summarization in the field of natural language is a method of automatically compressing a long text into a short text by computer technology while retaining the original text. This technology is currently used in all major media websites. Through this technology, the originally long text content can be compressed into short text containing key information, thereby saving screen space and displaying more content to users. On the media interface where space is at a premium, displaying more content will bring more traffic to manufacturers, directly increase the exposure rate of advertisements and other information, increase user activity, and bring direct benefits to manufacturers. [0003] Early text summarization techniques were based on textual rules, which are usua...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27G06N3/04
CPCG06F40/284G06F40/30G06N3/044
Inventor 李昊蔡登潘博远雷陈奕王国鑫何晓飞
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products