Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Density-based text clustering method, device and equipment, and storage medium

A text clustering and density technology, applied in text database clustering/classification, unstructured text data retrieval, special data processing applications, etc., can solve problems such as poor clustering effect and slow convergence of non-spherical data

Pending Publication Date: 2021-03-19
PING AN TECH (SHENZHEN) CO LTD
View PDF0 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the current various text clustering methods mainly have defects such as the need for iterative calculation, slow convergence, and poor clustering effect on non-spherical data.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Density-based text clustering method, device and equipment, and storage medium
  • Density-based text clustering method, device and equipment, and storage medium
  • Density-based text clustering method, device and equipment, and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field to which this application belongs. The terms used herein in the specification of the application are only for the purpose of describing specific embodiments, and are not intended to limit the application.

[0038] It should be noted that the terms "comprising", "comprising" and "having" in the specification and claims of the present application and the above drawings and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, product or device comprising a series of steps or units is not limited to the listed steps or units, but optionally also includes unlisted steps or units, or optionally further includes For other steps or units inherent in these processes, methods, products or devices. In the claims, description and drawings of this application, re...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention discloses a text clustering method, device and equipment based on density and a storage medium, and relates to the technical field of text data analysis. The method comprises the steps of receiving a target data set; determining a target distance formula; generating a distance matrix about the whole target data set; calculating the local density of each data point;separately extracting the minimum value of the distance between each data point and each data point in the sample point set, and recording the minimum value as the minimum point distance; establishinga clustering decision diagram according to the local density and the minimum point distance; determining the number of class clusters and a class cluster center in the clustering decision diagram; and dividing each data point into class clusters of the clustering decision diagram. According to the method, in the whole clustering process, the non-spherical data can be clustered only by calculatingthe distance between the sample points once without iterative calculation, the algorithm performance is greatly improved, the clustering decision diagram is used for scientifically selecting the number of the class clusters, and the situation that the number of the class clusters is manually set without basis is avoided.

Description

technical field [0001] The present application relates to the technical field of text data analysis, in particular to a density-based text clustering method, device, equipment and storage medium. Background technique [0002] Clustering is a typical unsupervised learning method, which divides the samples in the data set into several usually disjoint subsets (clusters / clusters) by learning unlabeled training samples. The goal of cluster analysis is to classify based on the similarity of elements, which is widely used in the fields of bioinformatics and pattern recognition. Commonly used clustering algorithms include: K-means, K-medoids, DBSCAN, etc. [0003] Text clustering is the specific application of clustering algorithms in the field of natural language processing. The usual method is to create text feature vectors based on tfidf (term frequency–inverse document frequency, word frequency-inverse text frequency index), word2vec, etc., and then use various The clustering ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/35
CPCG06F16/353
Inventor 曾斌
Owner PING AN TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products