A method for obtaining multi-text summaries in the same subject text collection

An acquisition method and technology of the same subject, applied in the intersection of information science and the field of natural language processing, can solve problems such as long time, ignoring the importance of text content, deviation of the number of topics, etc., and achieve the effect of improving quality and accurate and comprehensive summary results

Active Publication Date: 2022-02-01
NANJING UNIV OF POSTS & TELECOMM
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] (1) The research of multi-text automatic summarization involves the use of clustering methods. These clustering methods have a common defect, that is, these clustering methods often cannot automatically estimate the number of cluster centers, and all of them need to manually determine the clustering. Number of
The number of topics generated by the artificially given number of clusters will be biased, so it is impossible to automatically discover the natural potential subtopics hidden in the document set
[0006] (2) In the aspect of existing abstract extraction, only the frequency of occurrence of keywords is considered, and the importance of describing the text content of related subtopic events is ignored
Therefore, the extracted summary has high redundancy, inaccurate extraction of important sentences, low coverage, poor coherence and many other limitations, and the time required to generate the summary is relatively long

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method for obtaining multi-text summaries in the same subject text collection
  • A method for obtaining multi-text summaries in the same subject text collection
  • A method for obtaining multi-text summaries in the same subject text collection

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] The specific implementation manners of the present invention will be further described in detail below in conjunction with the accompanying drawings.

[0041]A kind of multi-text automatic summarization method designed by the present invention, (1) related to use existing clustering method in multi-text automatic summarization research, these clustering methods have a common defect, that is that these clustering methods often cannot automatically Estimating the number of cluster centers requires manually specifying the number of clusters. The number of topics generated by the artificially given number of clusters will be biased, so it is impossible to automatically discover the natural potential subtopics hidden in the document set. (2) In terms of extraction, the existing summarization methods only consider the occurrence frequency of keywords, but ignore the importance of the text content describing related sub-events. Therefore, the extracted summary has high redund...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a method for obtaining multi-text abstracts in a set of texts with the same subject. Firstly, the text is preprocessed, including word segmentation, stop word processing, feature selection, dimension reduction, etc.; in the next step, the processed feature words are used to construct a space A vector model is used to generate a distance matrix; then a method of sorting the sample density is added to the clustering method. The center vector is used as the center of the circle, and the average value of the distance of the eigenvalues ​​in the vector space is used as the radius to construct a circle. According to the similar content of the sorted texts in the circle The density of the samples generated by the degree to automatically determine the initial cluster center, so as to automatically find the number of potential corresponding sub-topic sets in the document collection; Training, scoring sentences, labeling, extracting central sentences from different subtopics as multi-text summaries, and finally, the method outputs the content of the summaries; improving the quality of multi-text summaries.

Description

technical field [0001] The invention relates to a method for acquiring multi-text abstracts in a text collection of the same subject, and belongs to the interdisciplinary technical field of natural language processing and information science. Background technique [0002] At present, a large amount of information emerges from the Internet every day, and information explosion occurs in various fields. The era of big data has come. People need to quickly and accurately find useful information from massive amounts of information. Automatic summarization technology automatically condenses and refines large-scale electronic texts quickly, extracts key information, and generates the central content of a given original text, making it an accurate and efficient tool for solving the current information overload problem, speeding up reading and obtaining information resources. means. With the development of computer science and the continuous improvement of natural language processi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/258G06F16/35G06N3/04G06N3/08
CPCG06F40/258
Inventor 徐小龙杨春春段卫华张洁朱洁刘茜萍
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products