Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Unsupervised Chinese multi-document extraction type abstract method

An extractive, multi-document technology, used in neural learning methods, text database clustering/classification, unstructured text data retrieval, etc. The law of relationships, etc.

Pending Publication Date: 2022-02-18
BEIHANG UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0011] (2) Selection of summary sentences
[0013] (3) The sorting problem of summary sentences
Among them, the traditional method based on statistical rules can obtain rules for judging the sequence of sentences in some specific scenarios through manual settings, but cannot dig out the rules of deep logical relationships between sentences.
The method based on deep learning can learn the deep-level logical relationship between sentences through training, but in specific scenarios, especially when the logical relationship between sentences is relatively vague, its effect is often not good.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Unsupervised Chinese multi-document extraction type abstract method
  • Unsupervised Chinese multi-document extraction type abstract method
  • Unsupervised Chinese multi-document extraction type abstract method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0101] The following is a preferred embodiment of the present invention and the technical solutions of the present invention are further described in conjunction with the accompanying drawings, but the present invention is not limited to this embodiment.

[0102] The invention proposes an unsupervised Chinese multi-document extractive summarization method. From the expression of sentence vectors, to the selection of document summary sentences, to the sorting of final summary sentences. The overall framework of the method is as figure 1 shown.

[0103] The input of the model is a collection of news documents M, and the output is a summary sequence composed of K sentences extracted from it. The model mainly consists of three modules: sentence vector acquisition module, summary sentence selection module and summary sentence sorting module. Specifically, multiple documents are first input into the sentence vector acquisition module. In the Embedding layer, add [CLS] and [SEP] ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The unsupervised Chinese multi-document extraction type abstract method is realized through a method in the field of network security. The method is composed of three modules: a sentence vector acquisition module, an abstract sentence selection module and an abstract sentence sorting module. The sentence vector acquisition module is composed of six parts including text preprocessing, long text processing, an Embedding layer, an SPT task, an SRT task and multi-task learning, and sentence vector representation more suitable for an extraction type abstract task is obtained; the abstract sentence selection module generates a final abstract set; and the abstract sentence sorting module outputs a final abstract sequence. According to the method provided by the invention, automatic abstract extraction and generation based on a plurality of Chinese documents fused with deep learning and a rule statistics method are realized.

Description

technical field [0001] The invention relates to the technical field of natural language processing, in particular to an unsupervised Chinese multi-document extractive summarization method. Background technique [0002] The development and changes of Internet technology are changing with each passing day, and Internet applications have permeated all aspects of industry and social life. According to the 45th China Internet Development Report, as of March 2020, the number of Internet users in my country has reached 904 million, and the Internet penetration rate has reached 64.5%. Among them, the number of online news users has reached 731 million, accounting for 80.9% of the total Internet users. It can be seen that with the continuous development and updating of information technology, the traditional manual text is gradually being replaced by electronic text, and the Internet has become a part closely connected with people's life, and it is also an important channel for obtai...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/34G06F16/35G06F40/258G06N3/04G06N3/08
CPCG06F16/345G06F16/35G06F40/258G06N3/088G06N3/048G06N3/044
Inventor 马帅华轶名王惠芬
Owner BEIHANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products