Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Automatic text summarization method based on fusion semantic clustering

An automatic summary and text technology, applied in semantic analysis, natural language data processing, structured data retrieval, etc., can solve problems such as failure to meet user needs, poor readability and coherence, and grammatical errors in abstracts, and achieve high readability Sexuality and coherence, the effects of a reasonably well-understood

Active Publication Date: 2018-06-22
SOUTH CHINA UNIV OF TECH
View PDF2 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, such a method stays literally, does not take advantage of the semantic relationship of the context, and the generated summary lacks relevance
At present, the research on generative summarization mainly focuses on the introduction of deep learning and even reinforcement learning methods. However, due to the current immaturity of related technologies, the generated summaries have grammatical errors, poor readability and coherence, and cannot satisfy users. need

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic text summarization method based on fusion semantic clustering
  • Automatic text summarization method based on fusion semantic clustering
  • Automatic text summarization method based on fusion semantic clustering

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0045] Such as figure 1 As shown, the method for automatic text summarization based on fusion semantic clustering disclosed in this embodiment includes: a text preprocessing step, a weight calculation step, a semantic analysis step, a clustering step, and a sentence selection step. in:

[0046] The text preprocessing step is to segment the obtained original document content, remove stop words, reduce text noise, and reduce the influence of words that have nothing to do with the text topic. The original document can be crawled from the document data on the Internet. In particular, if it contains pictures and videos, other information should be filtered. After word segmentation and keywords are obtained, the number of times each keyword appears in the document is counted, that is, word frequency information.

[0047] The weight calculation step is to represent the text as a text matrix A. According to the established keyword library, the weight of the keyword in the sentence ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an automatic text summarization method based on fusion semantic clustering. The method comprises the steps of text preprocessing, wherein preprocessing is conducted on originaldocuments, and word frequency information of keywords in the text is counted; weight calculation, wherein local weights are combined, and global weights and introduced relevant weights are used for determining the contribution degree of the keywords in sentences; semantic analysis, wherein a text matrix is subjected to singular value decomposition to obtain a semantic analysis model to calculatea semantic vector of each sentence; clustering, wherein K sentence clusters are obtained through a clustering algorithm in a semantic space on the basis of the calculated sentence semantic vectors; sentence selection, wherein the sentence weights is calculated in each sentence cluster, the first n sentences are selected to compose an abstract according to ranking, and the redundancy is removed. The method is simple and practical, a characteristic representation is provided for the text, the semantic connection of the context is integrated, a co-occurrence relationship between the sentences andwords is more fully displayed, and the generated abstract can better in line with the theme of the text.

Description

technical field [0001] The invention relates to the technical field of natural language processing, in particular to an automatic text summarization method based on fusion semantic clustering. Background technique [0002] With the development of computer technology and the Internet, great changes have taken place in the way information is disseminated. The Internet has become an important channel for people to obtain resources. But on the other hand, the content of document data on the Internet shows an exponential growth trend, which makes it very necessary to effectively solve the contradiction between information overload and people's fast reading. Automatic text summarization technology provides the possibility for this realization. [0003] Automatic text summarization technology uses a series of text processing technologies to analyze and process lengthy documents through computers, extracts the main ideas of documents, and generates a concise and general summary to...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06F17/30
CPCG06F16/285G06F40/258G06F40/289G06F40/30
Inventor 史景伦洪冬梅王桂鸿张福伟
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products