Unlock instant, AI-driven research and patent intelligence for your innovation.

Global coding method for automatic abstracting of Chinese long texts

A global encoding and automatic summarization technology, applied in neural learning methods, text database query, unstructured text data retrieval, etc., can solve the problem that the encoding and decoding architecture does not have multiple structures

Pending Publication Date: 2020-06-16
SUZHOU UNIV OF SCI & TECH
View PDF3 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Text summarization relies on multiple inputs to the source text and multi-level abstract information. Shuming Ma and Xu Sun [2017] found that there are obvious hierarchical phenomena in the text, but there is no corresponding multiple structure in the encoding and decoding architecture

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Global coding method for automatic abstracting of Chinese long texts
  • Global coding method for automatic abstracting of Chinese long texts
  • Global coding method for automatic abstracting of Chinese long texts

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0064] In order to have a clearer understanding of the technical features, purposes and effects of the present invention, specific implementations are now described in detail.

[0065] The present invention proposes a global encoding model with an attention mechanism, which is used for Chinese long text summarization tasks, that is, a global encoding method for Chinese long texts.

[0066] Such as figure 1 , figure 2 As shown, a process of global encoding method for automatic summarization of Chinese long text, the specific steps are:

[0067] 1) Data preprocessing, data preprocessing is performed on the Chinese long text, that is, the source text, to obtain word vectors;

[0068] 11) First, receive the long Chinese text, which is the source text, and use the jieba word segmentation tool to perform Chinese word segmentation, and divide the long text into individual words;

[0069] 12) Then, convert the word-segmented source text into a text word vector (x 1 , x 2 ,...x ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a global coding method for automatic abstract of a Chinese long text, which comprises the following steps of: firstly, performing data preprocessing, namely performing data preprocessing on the Chinese long text, namely a source text, to obtain a word vector; coding is carried out again, a GRU gating circulation unit receives the word vectors after data preprocessing, thecoding process is carried out on each vector in the word vectors, hidden states are generated, and a matrix H formed by all the hidden states serves as input of the global coding process; performing global coding, performing convolutional neural network (CNN) feature extraction to receive an output matrix H from a coding process, processing the output matrix H to obtain an attention matrix g, andperforming operation through feature extraction of a global attention mechanism and a gating unit process to obtain an intermediate semantic vector C; and finally, decoding, and processing the last hidden state ht output in the encoding process and the intermediate semantic vector C output in the global encoding process through a GRU gating loop unit to obtain an abstract text. And the Chinese long document and the unstructured document are objectively summarized.

Description

technical field [0001] The invention relates to a global encoding method for automatic summarization of long Chinese texts, belonging to the technical field of text information processing. Background technique [0002] As an important branch of natural language processing, text summarization has been developed for decades, which can automatically convert text into short summaries. With the growth of massive data, the research of text summarization technology has become a hot spot. Text summarization saves search time and simplifies the search process. Especially in today's era of information explosion, text summarization is particularly important to improve the efficiency of knowledge discovery tasks. Most of the published research focuses on short text summarization. Due to the complexity of Chinese long texts, there are few researches on automatic summarization of Chinese long texts. [0003] Alexander M Rush first applied deep learning methods to text summarization [Ru...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/126G06F40/289G06F40/30G06F16/33G06F16/36G06N3/04G06N3/08
CPCG06F16/3344G06F16/367G06N3/08G06N3/047G06N3/045
Inventor 奚雪峰皮洲曾诚张谦王坚鲍观花吴宏杰付保川崔志明
Owner SUZHOU UNIV OF SCI & TECH