Multi-document summarization method based on text segmentation

A text segmentation and multi-document technology, applied in the field of multi-document summarization, can solve the problems of missing or ignoring secondary important topics, information ignoring, etc.

Inactive Publication Date: 2013-02-27
广西超宏科技有限公司
View PDF2 Cites 40 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention provides a multi-document summarization method based on text segmentation technology, aiming at solving the problem that the traditional text processing technology takes chapters as the basic processing unit, considers that an article only discuss...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-document summarization method based on text segmentation
  • Multi-document summarization method based on text segmentation
  • Multi-document summarization method based on text segmentation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] In order to make the purpose, technical solution and advantages of the present invention more clear, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the invention.

[0046] figure 1 It shows the implementation flow of the multi-document summarization method based on the text segmentation technology provided by the embodiment of the present invention.

[0047] The multi-document summarization method includes the following steps:

[0048] Step S101, using "HowNet" as a tool to acquire concepts, and using the acquired concepts as features to establish a concept vector space model;

[0049] Step S102, using the improved Dotpfotting algorithm for text segmentation to obtain the subject division of the text;

[0050] Step S103, using the established concept vector spa...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of multi-document summarization and provides a multi-document summarization method based on text segmentation, which comprises the following steps of: using HowNet to obtain a concept, building a concept vector space model, conducting text segmentation by adopting an improved DotPlotting model and a sentence concept vector space, calculating sentence weight by using the built concept vector space model, generating a summary according to the sentence weight, the text segmentation and the similarity situation, and evaluating the generated summary by using the ROUGE-N evaluation method and using F_Score as an evaluation index. According to the result, the multi-document summarization by using a text segmentation technique is effective, relevant documents provided by users can be gathered to form a summary by adopting the multi-document summarization method, the summary is displayed to the users in a proper way, the information acquisition efficiency is greatly improved, the practicability is high and the popularization and application values are greater.

Description

technical field [0001] The invention belongs to the technical field of multi-document summarization, in particular to a multi-document summarization method based on text segmentation technology. Background technique [0002] In the Internet age, various types of electronic text information emerge in large quantities. How to help users quickly and accurately obtain the information that users are interested in from the information ocean in less time has become a research hotspot in the field of natural language understanding. Multi-document summarization is a technology that removes redundant information from multiple texts on the same topic and organically fuses them together according to a certain compression ratio. This technology will form a collection of related documents provided by users into an abstract and present them to users in an appropriate form , improving the efficiency of information acquisition, and with the continuous holding of various large-scale internati...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27G06F17/30
Inventor 王萌唐新来王晓荣
Owner 广西超宏科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products