Text clustering multi-document automatic abstracting method and system for improving word vector model

A text clustering and automatic summarization technology, applied in neural learning methods, biological neural network models, character and pattern recognition, etc., can solve problems such as blunt contextual connection, grammatical errors in summaries, and logical incoherence

Pending Publication Date: 2019-11-05
上海晏鼠计算机技术股份有限公司
View PDF4 Cites 29 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, the related technologies of sentence fusion, sentence compression and language generation are not yet mat

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text clustering multi-document automatic abstracting method and system for improving word vector model
  • Text clustering multi-document automatic abstracting method and system for improving word vector model
  • Text clustering multi-document automatic abstracting method and system for improving word vector model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0080] Below, in conjunction with accompanying drawing and specific embodiment, the invention is further described:

[0081] see Figure 1-5 , according to an embodiment of the present invention, a text clustering multi-document automatic summarization method and system for improving the word vector model, the steps are as follows

[0082] The first step: preprocessing;

[0083] The second step: improve word vector model training;

[0084] The third step: sentence vector representation and clustering;

[0085] Step 4: Extract article summary sentences and generate abstracts;

[0086] The preprocessing method of the first step is: (1) Chinese word segmentation, the text sentence after the word segmentation processing is divided into word units with independent segmentation and processing meaning, and the corpus after the text word segmentation processing can be used for word vector training. The jieba word segmentation tool performs text segmentation on the corpus;

[0087...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text clustering multi-document automatic abstracting method and a system for improving a word vector model. The CBOW of the Hierachic Softmax belongs to the field of large-scale model training, and the CBOW of the Hierachic Softmax belongs to the field of large-scale model training. Based on the method, a TesorFlow deep learning framework is introduced into word vector model training; the problem of time efficiency of a large-scale training set is solved through streaming processing calculation, TF-IDF is introduced firstly during sentence vector representation, thenthe semantic similarity of a semantic unit to be extracted is calculated, weighting parameters are set for comprehensive consideration, and a semantic weighted sentence vector is generated; beneficialeffects are as follows. The advantages and disadvantages of semantics, deep learning and machine learning are comprehensively considered; density clustering and convolutional neural network algorithms are applied. Intelligent degree is high, according to the method, the statement with high relevancy with the central content can be quickly extracted to serve as the abstract of the text, various machine learning algorithms are applied to the automatic text abstract to achieve a better abstract effect, the method is possibly the main research direction in future in the field, and in addition, the system according to the invention supplies a tool for automatic extraction of a document abstract based on the method.

Description

technical field [0001] The invention relates to the field of natural language processing, in particular to a text clustering multi-document automatic summarization method and system for improving word vector models. Background technique [0002] The Internet belongs to the field of media. Also known as the International Network, the Internet began in 1969 with the Arpanet in the United States. It is a huge network connected in series between networks. These networks are connected by a set of common protocols to form a logically single huge international network. Generally, the internet generally refers to the Internet, while the Internet refers to the Internet in particular. This method of interconnecting computer networks can be called "network interconnection". On this basis, a global Internet covering the whole world is developed called the Internet, which is a network structure that is interconnected. The Internet is not the same as the World Wide Web. The World Wide ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27G06N3/04G06N3/08G06K9/62
CPCG06N3/08G06N3/048G06N3/045G06F18/2321G06F18/24Y02D10/00
Inventor 陈刚
Owner 上海晏鼠计算机技术股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products