Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Multi-document abstract sentence generating method

A multi-document and abstract technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of limited performance, small amount of supervision information, limited data set size, etc. Readability and comprehensive information

Inactive Publication Date: 2015-07-15
SOUTH CHINA UNIV OF TECH +2
View PDF0 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The method of machine learning is greatly affected by the training set, and the size of the existing data set in the field of automatic summarization is limited, which limits the performance of the method based on machine learning
At the same time, the acquisition of supervisory information is also a big problem
Limited by manpower, the number of standard abstracts provided in existing data sets is not large, so the amount of supervision information that can be obtained is small; at the same time, most of the standard abstracts are comprehensible abstracts, and it is difficult to find one-to-one matching original sentences in multi-document sets. Therefore, how to solve the fuzzy matching problem and accurately and effectively extract supervision information is also a technical difficulty in the method based on machine learning.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-document abstract sentence generating method
  • Multi-document abstract sentence generating method
  • Multi-document abstract sentence generating method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0030] Such as figure 1 As shown, a method for generating a multi-document summary sentence includes the following steps:

[0031] S1. Taking the sentence feature vector space as input, the sentences are clustered and analyzed according to the similarity of the sentence feature vectors, and each calculated cluster is recorded as a subtopic;

[0032] S2. Determine the importance of the sub-topic according to the coverage of the sub-topic's document collection and the number of sentences included, and sort the sub-topics according to the importance;

[0033] The above subtopics The importance of is evaluated by the number of documents it covers and the number of sentences it contains. If a subtopic involves more documents and contains more sentences, the subtopic is more important. Specifically: Hypothetical subtopic DC covered in total i documents, including SC i sentences, the importance score of this subtopic is:

[0034]

[0035] Among them, λ D +λ S =1, used to...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a multi-document abstract sentence generating method, which comprises the following steps that S1, a sentence feature vector space is used as input, sentences are subjected to clustering analysis according to the sentence feature vector similarity, and each cluster obtained through calculation is recorded as a sub theme; S2, the important degree of each sub theme is determined according to the document set covering degree of each sub theme and the number of contained sentences, and in addition, the sub themes are sequenced according to the important degree; S3, the sentences under each theme are graded and sequenced; S4, the sentences with the highest important degree grades in each sub theme are extracted out to be used as abstract sentences, demonstrative pronouns used as subjects in the sentences are replaced, in addition, the abstract sentences are sequenced according to the impart degree degrades of the sub themes of the sentences, and finally, abstracts are generated and output.

Description

technical field [0001] The invention relates to the research field of automatic summarization, in particular to a method for generating multi-document summary sentences. Background technique [0002] With the popularization of the Internet and the rapid development of various network applications, convenient access methods and complete types make it the main channel for people to obtain various information. Multi-document summarization extracts the main information from a large amount of information by processing multiple original texts under the same topic. The text of the remaining information is presented to the user for reading. Summary sentence extraction is to extract sentences that can express the topic information of the document and have substantial content from the classes describing related topics as summary sentences. Select sentences from them according to the importance of the topic, so that the summary sentence summarizes the important content of the topic a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06F17/30
Inventor 陈健赖旦冉
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products