Unlock instant, AI-driven research and patent intelligence for your innovation.

Short text clustering method, device, electronic equipment and storage medium

A clustering method and short text technology, applied in text database clustering/classification, unstructured text data retrieval, electronic digital data processing, etc. problems such as low accuracy and low accuracy, to achieve the effect of improving accuracy and clustering accuracy

Active Publication Date: 2021-06-29
北京沃丰时代数据科技有限公司
View PDF11 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] In the existing short text clustering algorithm, there is no distinction in feature construction according to the length of the text. For sentences without word vectors, generally choose to discard, and the samples will be lost
And k-means is usually used in cluster selection, the cluster distance measure does not change with the text, and the number of clusters cannot be adjusted according to the similarity between texts
Less adjustable and less accurate

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Short text clustering method, device, electronic equipment and storage medium
  • Short text clustering method, device, electronic equipment and storage medium
  • Short text clustering method, device, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0037] figure 1 A flow chart of a short text clustering method provided by an embodiment of the present invention is shown. Such as figure 1 shown, combined with figure 2 , the short text clustering method that the embodiment of the present invention provides, comprises the following steps:

[0038] Step 101: Obtain word segmentation res...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the present invention provides a short text clustering method, device, electronic equipment and storage medium. Among them, the short text clustering method includes: obtaining the word segmentation result of the text in the text set to be clustered; based on the length of the text, selectively using all the words or keywords in the word segmentation result of the text to construct the text features of the text; for For texts including word vectors in the text set to be clustered, clustering is performed based on the edit distance between text features, otherwise, clustering is performed based on the cosine similarity between text features. The embodiments of the present invention can effectively improve the accuracy of short text clustering results.

Description

technical field [0001] The present invention relates to the technical field of short text clustering, in particular to a short text clustering method, device, electronic equipment and storage medium. Background technique [0002] In the existing short text clustering algorithm, there is no distinction in feature construction according to the length of the text. For sentences without word vectors, the general choice is to discard, and the samples will be lost. And k-means is usually used in cluster selection, the cluster distance measure does not change with the text, and the number of clusters cannot be adjusted according to the similarity between texts. Adjustability and accuracy are low. Contents of the invention [0003] To solve the problems in the prior art, the embodiments of the present invention provide a short text clustering method, device, electronic equipment and storage medium. [0004] Specifically, the embodiments of the present invention provide the follo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F40/284G06K9/62
CPCG06F16/35G06F40/284G06F18/23213
Inventor 高亨德
Owner 北京沃丰时代数据科技有限公司