Supercharge Your Innovation With Domain-Expert AI Agents!

Data clustering method and system, storage medium and equipment

A technology of data clustering and clustering algorithm, applied in the field of data analysis, can solve the problem that the output quality of clustering results cannot be guaranteed, and achieve the effect of ensuring the output quality

Pending Publication Date: 2021-02-23
杭州安恒信息安全技术有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Based on this, the purpose of the present invention is to provide a data clustering method, system, storage medium and equipment to solve the existing technical problems that cannot guarantee the output quality of clustering results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data clustering method and system, storage medium and equipment
  • Data clustering method and system, storage medium and equipment
  • Data clustering method and system, storage medium and equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0046] see figure 1 , which shows the data clustering method in Embodiment 1 of the present invention, which can be applied to a data clustering device. The data clustering method can be implemented by software and / or hardware, and the method specifically includes steps S01 to S01. S04.

[0047] Step S01, build a CF Tree of the database based on the BIRCH clustering algorithm.

[0048] Among them, the BIRCH (Balanced It 2erative Reducing and Clustering) clustering algorithm completes the clustering process by reading the objects one by one to construct a CF Tree (Clustering Feature Tree), so it is an incremental method, and the clustering structure outputs CF Tree structure. Another feature of the BIRCH clustering algorithm is that it can control the size of the CF tree to make the storage requirements match the actual memory size, which is suitable for large databases. In the specific implementation, the BIRCH algorithm can be used to scan the database once, and then the C...

Embodiment 2

[0081] Embodiment 2 of the present invention also proposes a data clustering method. The difference between the data clustering method in this embodiment and the data clustering method in the first embodiment is:

[0082] After the step of constructing the CF Tree of the database based on the BIRCH clustering algorithm, it also includes:

[0083] Connect any two adjacent leaf nodes of the CF Tree with a line segment to determine the left and right neighbors of each leaf node except the head leaf node and the tail leaf node.

[0084] Specifically, when the first step of the BIRCH algorithm is executed, a CF Tree will be stored in the memory, and all the leaves of the CFTree will be connected at this time, such as figure 2 As shown, each leaf node has left and right neighbors (except the head leaf and tail leaf). What needs to be explained here is that an important feature of cluster analysis is that it does not depend on the order, and the BIRCH algorithm is very sensitive to...

Embodiment 3

[0090] Another aspect of the present invention also provides a data clustering system, please refer to image 3 , shows the data clustering system in Embodiment 3 of the present invention, which can be applied to data clustering equipment, and the data clustering system specifically includes:

[0091] The first clustering module 11 is used to construct the CF Tree of the database based on the BIRCH clustering algorithm;

[0092] The second clustering module 12 is used to calculate the semantic center of each leaf node of the CF Tree based on the K2means center point algorithm, and the leaf node only retains the semantic center to construct Core 2 Tree;

[0093] The clustering evaluation factor calculation module 13 is used to obtain the objects satisfying the preset condition attributes and the objects satisfying the preset decision attributes of the database, and according to the objects satisfying the preset condition attributes and the satisfying preset decision attributes ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a data clustering method and system, a storage medium and equipment. The method comprises the following steps: constructing a CF Tree of a database based on a BIRCH clustering algorithm; calculating a semantic center of each leaf node of the CF Tree based on a K2 means central point algorithm, and only reserving the semantic center by the leaf nodes to construct a Core 2 Tree; obtaining an object satisfying a preset condition attribute and an object satisfying a preset decision attribute of the database, and calculating a confirmation factor and a containment factor according to the object satisfying the preset condition attribute and the object satisfying the preset decision attribute; and evaluating the Core 2 Tree according to the confirmation factor and the containment factor. According to the invention, the confirmation factor and the containment factor are introduced, and the confirmation factor and the containment factor are calculated according to the object with the preset condition attribute and the object meeting the preset decision attribute in the database, so that the rule of the Core 2 Tree structure obtained by data clustering is evaluated byutilizing the confirmation factor and the containment factor, and therefore, the output quality of the clustering result is ensured.

Description

technical field [0001] The invention relates to the technical field of data analysis, in particular to a data clustering method, system, storage medium and device. Background technique [0002] Clustering analysis refers to the process of dividing the data objects into multiple categories according to certain rules for a given set of data objects. Clustering makes the data objects in the same cluster as similar to each other as possible, and the data objects in different clusters. as different as possible. In the field of financial foreign exchange, data clustering methods have been used to cluster and analyze financial wire transfer data, and data detection and analysis are carried out based on the clustering results. [0003] However, the currently used data clustering methods lack evaluation criteria, which makes it impossible to guarantee the output quality of the clustering results. SUMMARY OF THE INVENTION [0004] Based on this, the purpose of the present inventio...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/28G06F16/22G06K9/62G06Q10/06
CPCG06F16/285G06F16/2246G06Q10/06393G06F18/23213
Inventor 杨暘戚华春
Owner 杭州安恒信息安全技术有限公司
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More