Method for obtaining abstracts of multiple texts in same-topic text set

An acquisition method and technology of the same subject, applied in the intersection of information science and the field of natural language processing, can solve problems such as long time, ignoring the importance of text content, deviation of the number of topics, etc., and achieve the effect of improving quality and accurate and comprehensive summary results

Active Publication Date: 2018-09-28
NANJING UNIV OF POSTS & TELECOMM
View PDF3 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] (1) The research of multi-text automatic summarization involves the use of clustering methods. These clustering methods have a common defect, that is, these clustering methods often cannot automatically estimate the number of cluster centers, and all of them need to manually determine the clustering. Number of
The number of topics generated by the artificially given number of clusters will be biased, so it is impossible to automatically discover the natural potential subtopics hidden in the document set
[0006] (2) In the aspect of existing abstract extraction, only the frequency of occurrence of keywords is considered, and the importance of describing the text content of related subtopic events is ignored
Therefore, the extracted summary has high redundancy, inaccurate extraction of important sentences, low coverage, poor coherence and many other limitations, and the time required to generate the summary is relatively long

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for obtaining abstracts of multiple texts in same-topic text set
  • Method for obtaining abstracts of multiple texts in same-topic text set
  • Method for obtaining abstracts of multiple texts in same-topic text set

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] The specific implementation manners of the present invention will be further described in detail below in conjunction with the accompanying drawings.

[0041]A kind of multi-text automatic summarization method designed by the present invention, (1) related to use existing clustering method in multi-text automatic summarization research, these clustering methods have a common defect, that is that these clustering methods often cannot automatically Estimating the number of cluster centers requires manually specifying the number of clusters. The number of topics generated by the artificially given number of clusters will be biased, so it is impossible to automatically discover the natural potential subtopics hidden in the document set. (2) In terms of extraction, the existing summarization methods only consider the occurrence frequency of keywords, but ignore the importance of the text content describing related sub-events. Therefore, the extracted summary has high redund...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a method for obtaining abstracts of multiple texts in a same-topic text set. The method comprises the steps of firstly preprocessing the texts, wherein the preprocessing comprises word segmentation, stop word processing, feature selection, dimension reduction and the like; secondly, building a spatial vector model by utilizing processed feature words, and generating a distance matrix; thirdly, adding a sample density sorting method in a clustering method, constructing a circle by taking a central vector as the center of the circle and using a mean value of eigenvalue distances in a vector space as the radius, and automatically determining an initial clustering center according to sample density generated by the content similarity of the texts sorted in the circle,thereby automatically discovering the quantity of potential corresponding sub-topic sets in a document set; after the corresponding sub-topic sets are generated, performing supervised training on clustered sub-topic texts, performing scoring and marking on sentences, and extracting central sentences from different sub-topics to serve as the abstracts of the texts; and finally, outputting contentsof the abstracts. The quality of the abstracts of the texts is improved.

Description

technical field [0001] The invention relates to a method for acquiring multi-text abstracts in a text collection of the same subject, and belongs to the interdisciplinary technical field of natural language processing and information science. Background technique [0002] At present, a large amount of information emerges from the Internet every day, and information explosion occurs in various fields. The era of big data has come. People need to quickly and accurately find useful information from massive amounts of information. Automatic summarization technology automatically condenses and refines large-scale electronic texts quickly, extracts key information, and generates the central content of a given original text, making it an accurate and efficient tool for solving the current information overload problem, speeding up reading and obtaining information resources. means. With the development of computer science and the continuous improvement of natural language processi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCG06F40/258
Inventor 徐小龙杨春春段卫华张洁朱洁刘茜萍
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products