Automatic abstract generation method based on concept semantic unit

An automatic generation and semantic technology, applied in the field of natural language text intelligent analysis, can solve the problems of sparse data, the weight value cannot well reflect the deep semantic content of the document, etc., and achieve the effect of good user experience.

Active Publication Date: 2016-02-10
INST OF ACOUSTICS CHINESE ACAD OF SCI
View PDF2 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The purpose of the present invention is to solve the problem that the calculation of sentence weight value based on word frequency cannot well reflect the deep semantic content of the document described in the background technology, and at the same time improve the topic analysis method with words as the statistical processing unit applied to a single document and the data is sparse questions, and then form the abstract content of a single document

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic abstract generation method based on concept semantic unit
  • Automatic abstract generation method based on concept semantic unit
  • Automatic abstract generation method based on concept semantic unit

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0136] In this embodiment, a news release is selected for abstract processing. The press release comes from the Internet, and the title is "Obama Putin and Hollande will have two dinners without meeting in France" (the specific link is http: / / news.sina.com.cn / w / 2014-06-03 / 102030282941.shtml ).

[0137] First of all, it is necessary to extract the text content in the press release, and filter out other advertising links, associated recommendation links, and picture and video link information.

[0138] Secondly, the text information will be segmented according to paragraphs and sentences. An example of this embodiment is given below.

[0139]

[0140] Here, the content of the obtained document is organized, divided and labeled according to sentences and paragraphs. 1~10 on the far left is the number of marked sentences. The document of this embodiment has 10 sentences in total. The numbers from 0 to 9 in the second column on the left are serial numbers uniformly assigne...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an automatic abstract generation method based on a concept semantic unit. The method comprises the steps that the concept semantic unit is utilized to serve as a carrier of semantic computation and document content representation; a semantic focus of document content is obtained through convergence of document semantic content; automatic generation of a document abstract formed by selecting abstract sentences representing the document content according to the semantic focus is achieved. According to the automatic abstract generation method based on the concept semantic unit, by means of a potential Latent Dirichlet Allocation (LDA) mode, modeling is conducted on a document theme, theme generation treatment is achieved, and the abstract sentences are selected according to the importance degree of the theme. In the computation process, a concept hierarchy network symbol is introduced, merging to semantic information is achieved by means of semantic hierarchy relations in the concept hierarchy network symbol, and the data sparsity problem caused by taking a word as a semantic computation unit is improved.

Description

technical field [0001] The invention relates to the field of intelligent analysis of natural language texts, in particular to an abstract automatic generation method based on conceptual semantic primitives. Background technique [0002] With the rapid development of information technology, the Internet has entered people's daily life. The bottleneck of information transmission has been broken, and people can easily access massive amounts of information content. How to quickly understand the content of information has become the current development direction of intelligent information processing and a hot spot in technical research. Especially with the rapid expansion of document information on the Internet, facing a large amount of document information, users urgently need tools that can effectively process these document information. Automatic text summarization, based on natural language processing technology, is an intelligent text processing application technology that...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
Inventor 张全袁毅韦向峰丛培民杜义华池毓焕
Owner INST OF ACOUSTICS CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products