Data clustering method and system, computer equipment and storage medium

A data clustering and computer program technology, applied in the field of data clustering, can solve problems such as increasing the number of K-split algorithm splits, decreasing clustering speed and efficiency, and difficult to determine the K value.

Pending Publication Date: 2022-01-28
HUAWEI TECH CO LTD
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, when using the K-split block algorithm, it is difficult to determine the appropriate K value. In addition, when the K most irrelevant data are taken out from the data set, there may be "noise" data obtained, which will increase the number of K-split algorithm splits. leading to a decrease in clustering speed and efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data clustering method and system, computer equipment and storage medium
  • Data clustering method and system, computer equipment and storage medium
  • Data clustering method and system, computer equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] In the field of artificial intelligence, training or adjusting a model requires collecting and processing a large amount of data, but in some specific scenarios, there is no public and complete data set that can be used to train or adjust a model, such as a face recognition system at an intersection, there is A Groups of people will pass through this intersection regularly to form a fixed face data set. At the same time, a new group of people B will pass by the intersection. The face data of group B is new data for the face recognition system at this intersection. Without being recognized by the system, this new group of people will appear every day. Therefore, the system often can only collect and process by itself and obtain corresponding data sets, but this process will consume a lot of time and resources, which greatly limits the actual scenarios of deep learning applications.

[0047] Aiming at this problem, a feasible existing technology such as figure 1 shown. ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data clustering method and system, computer equipment and a storage medium, and belongs to the field of data processing. The method comprises the steps of obtaining a first similarity matrix of original data, performing connected domain analysis on the first similarity matrix, obtaining a first clustering result in combination with distribution statistical analysis of similarity, and obtaining a relationship of potential to-be-merged categories in the first clustering result at the same time; according to the relationship of the potential to-be-merged categories, merging the to-be-merged categories in the first clustering result to obtain a second clustering result; and performing strong connected component analysis on the second clustering result, and performing category internal noise data cleaning on the second clustering result according to an analysis result to obtain a third clustering result. According to the technical scheme provided by the invention, the high-accuracy data set subjected to clustering processing can be quickly obtained from the original data and is used for model training.

Description

technical field [0001] The invention relates to the field of data processing, in particular to a data clustering method, system, computer equipment and storage medium. Background technique [0002] In recent years, deep learning has made impressive progress in various fields. In the field of artificial intelligence, training or tuning a model using deep learning requires the collection and processing of large amounts of data. Especially for specific scene tasks, in the absence of public data sets, often only the corresponding data sets can be obtained by collecting, preprocessing, and manual error correction. This process often consumes a lot of time and resources, which greatly limits the actual scenarios of deep learning applications. [0003] In a feasible existing technology, obtain the similarity matrix of the original data, select K data as the initial grouping, and then use the K-split block clustering algorithm to cluster the remaining data into the selected K In ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/23G06F18/22
Inventor 郑宜海田行辉胡琪张梦阳
Owner HUAWEI TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products