Method, device and storage medium for text clustering

A text clustering and text technology, applied in the field of information processing, can solve the problems of different clustering results and inconsistent clustering results, and achieve the effect of fast and stable extraction and good effect.

Active Publication Date: 2022-07-01
SUZHOU LANGDONG NET TEC CO LTD
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, the Single pass algorithm has the characteristic of input order dependence, that is, if the same clustering object is input in different order, different clustering results will appear.
Other clustering algorithms, such as Kmeans, need to specify the number of categories. Hierarchical clustering algorithms also have the problem of layer selection. Different numbers of specified categories or different levels of selection will cause inconsistencies in the clustering results.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method, device and storage medium for text clustering
  • Method, device and storage medium for text clustering
  • Method, device and storage medium for text clustering

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0049] The present invention will be described in detail below with reference to the specific embodiments shown in the accompanying drawings. However, these embodiments do not limit the present invention, and structural, method, or functional changes made by those skilled in the art according to these embodiments are all included in the protection scope of the present invention.

[0050] like figure 1 As shown in the figure, a schematic flowchart of the method for text clustering in the first embodiment of the present invention, the relationship between texts is represented by a connected graph in this embodiment, and then the connected graph is disassembled to obtain different sub-connected graphs, thereby Cluster the text. The method includes:

[0051] Step S11: Obtain a list of text titles to be clustered.

[0052] The text title list may be a news title list related to a specific enterprise, or may be other types of text title lists. Each text title represents a text. ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention discloses a text clustering method, device and storage medium. The method includes: acquiring a list of text titles to be clustered; taking the text titles as vertices, and taking the vectorized distance of the text titles as edge, construct the initial connected graph between the text titles; remove the edges of the initial connected graph that are greater than the initial distance threshold to obtain one or more sub-connected graphs; calculate the aggregation degree of each of the sub-connected graphs, if If the aggregation degree of a sub-connected graph is greater than or equal to the clustering threshold, the text set corresponding to the sub-connected graph is a text cluster. Compared with the prior art, the present invention can quickly and stably cluster text, and the same text data results in the same clustering each time. At the same time, using this method to cluster enterprise-related news can quickly realize the stable extraction of enterprise-related hot news, and has a good effect on the extraction of enterprise-related news hotspots.

Description

technical field [0001] The present invention relates to information processing technology, in particular to a text clustering method, device and storage medium. Background technique [0002] Text is the main carrier of information. With the development of the Internet, browsing the news texts released in time on the Internet has become an important means for people to obtain information. At present, there is a huge amount of news text information on the Internet. In order to enable people to navigate and browse quickly and easily For news, it is necessary to cluster news texts using text clustering techniques. The text clustering technology can automatically divide the text set into multiple clusters, so that the texts in the same cluster have a certain similarity, and the similarity between the texts in different clusters is as low as possible. At present, the commonly used clustering methods include Kmeans, hierarchical clustering, Single pass algorithm and so on. [000...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F40/258
CPCG06F16/355
Inventor 龚朝辉陈汝龙陈誉段成阁
Owner SUZHOU LANGDONG NET TEC CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products