Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Automatic text abstracting method based on heterogeneous graph network

An automatic summarization and heterogeneous graph technology, applied in the field of text processing, can solve problems such as the inability to effectively capture long-distance text interaction information, semantic deviation, and the inability of sequence modeling to represent long-distance information.

Inactive Publication Date: 2021-05-18
山西三友和智慧信息技术股份有限公司
View PDF11 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] Existing techniques for text extraction and summarization all use sequence modeling to model text. This method will ignore the deep relationship between many texts, resulting in poor extraction effect.
In fact, words between different sentences also have a corresponding relationship, but sequence modeling cannot accurately represent long-distance information well, so it is impossible to associate the same information between two different sentences, resulting in semantic deviation
The current extractive summarization methods are all based on the LSTM modeling method, which models the text from left to right. This method can learn the language model well, but it cannot effectively capture the long-distance text interaction information.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic text abstracting method based on heterogeneous graph network
  • Automatic text abstracting method based on heterogeneous graph network
  • Automatic text abstracting method based on heterogeneous graph network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0051] Given a document D={s 1 ,...,s n}, which means that this document has n sentences, and extractive text summarization is defined as predicting a sequence {y 1 ,...,y n}, where y i = 1 means that the i-th sentence will be extracted as part of the summary. The model has two kinds of nodes, which are word nodes and sentence nodes, so the definition of a heterogeneous graph is a graph network G={V,E}, where V represents the collection of all types of nodes, and E represents the edges between all nodes gather.

[0052] Raw documents are long pieces of text, such as news corpora. This type of data has no obvious characteristics. If a sequence model such as LSTM is used for modeling, the relationship between long-distance sentences and words cannot be well modeled. Therefore, it is necessary to model the heterogeneous graph of the text first. , respectively connect word nodes, sentence nodes, and word and sentence nodes, and initialize these nodes and connection edges. A...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the technical field of text processing, in particular to an automatic text abstracting method based on a heterogeneous graph network. The method includes the following steps: S1, representing an initial word vector of each word by using pre-trained Word2Vec; S2, capturing word context semantic information by using CNN and a window with the length of n to finally obtain a vector representation of a jth sentence, and meanwhile, performing sequence modeling on a word sequence in each sentence by using LSTM to finally obtain a vector representation of each sentence; S3, iteratively updating the heterogeneous graph: updating the heterogeneous graph network by using a graph attention mechanism; and S4, scoring and sorting the vector table of the sentences obtained through the heterogeneous graph, and selecting sentences suitable for being used as abstracts. Through the steps, each sentence in the original text can obtain an important score, proper abstract sentences can be extracted according to score sorting, and the method is mainly used for automatic abstracting of texts.

Description

technical field [0001] The present invention relates to the technical field of text processing, and more specifically, to an automatic text summarization method based on a heterogeneous graph network. Background technique [0002] Existing techniques for text extraction and summarization all use sequence modeling to model text. This method will ignore the deep-level relationship between many texts, resulting in poor extraction effect. In fact, words between different sentences also have a corresponding relationship, but sequence modeling cannot accurately represent long-distance information well, so it cannot associate the same information between two different sentences, resulting in semantic deviation. The current extractive summarization methods are all based on the LSTM modeling method, which models the text from left to right. This method can learn the language model well, but it cannot effectively capture long-distance text interaction information. Therefore, it is ne...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/34G06F40/216G06F40/30G06N3/04G06N3/08
CPCG06F40/216G06F40/30G06F16/345G06N3/049G06N3/08G06N3/048G06N3/045
Inventor 潘晓光潘晓辉樊思佳陈亮董虎弟
Owner 山西三友和智慧信息技术股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products