Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Chinese short text clustering method

A clustering method and short text technology, applied in text database clustering/classification, unstructured text data retrieval, special data processing applications, etc., can solve problems such as short text semantic knowledge is not considered, to improve efficiency, The effect of improving accuracy

Active Publication Date: 2017-04-26
FOCUS TECH +1
View PDF4 Cites 44 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Therefore, the present invention aims at the overload of short text information in current social media, and the calculation method of existing short text similarity mainly includes calculating the same number of words, Jaccard similarity coefficient, cosine similarity, etc. These algorithms do not take into account The problem of semantic knowledge of short texts provides a Chinese short text clustering method, specifically a Chinese short text clustering method based on word vectors and their similarity calculations

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese short text clustering method
  • Chinese short text clustering method
  • Chinese short text clustering method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0048] The technical solutions of the present invention will be described in further detail below with reference to the accompanying drawings and embodiments.

[0049] In order to make the purpose, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings. Obviously, the described embodiments are only some of the embodiments of the present invention, rather than all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0050] figure 1 A flow chart of a Chinese short text clustering method provided by an embodiment of the present invention. Such as figure 1 As shown, step S101: use the Word2Vec word vector training model to obtain the required word vector.

[0051] Step S102: Use the word we...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a Chinese short text clustering method, and in particular relates to a Chinese short text clustering method based on word vectors and similarity calculation thereof. The Chinese short text clustering method comprises the following specific steps of: obtaining needed word vectors by utilizing a Word2Vec word vector training model; obtaining weights of all words in a short text set by utilizing a word weight calculation algorithm; according to the word vectors and the weights of all the words, calculating the similarity value between every two texts in the short text set through a short text similarity algorithm; and, according to the similarity value between every two texts in the short text set, clustering short texts. The invention provides a carrying optimization type short text similarity calculation method; the problems of sparse short text grammar characteristics, semantic loss and the like can be solved; on the basis of a graph model, the weights of the words are continuously calculated iteratively, so that the sentence similarity calculation accuracy is increased; and, a density peak clustering method is applied in short text clustering, so that the efficiency of the clustering method is effectively increased.

Description

technical field [0001] The invention relates to the field of natural language processing, in particular to a Chinese short text clustering method based on word vectors and similarity calculations. Background technique [0002] In recent years, due to the popularization of mobile devices and the update of social media platforms, the manifestations of social media have undergone tremendous changes. The changes are specifically reflected in: the transition from long text to short text has been realized, and the focus has also shifted from traditional long text social media platforms such as blogs and forums to short text social media platforms such as Sina Weibo and Twitter. [0003] The immediacy and convenience of short-text social media platforms have greatly contributed to the growth of information volume. Compared with traditional texts, short texts in social networks have the characteristics of short texts, diverse topics, more garbage, and emotional tendencies, which po...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/35G06F40/30
Inventor 崔莹曹杰姚瑞波叶婷伍之昂申冬琴
Owner FOCUS TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products