Data clustering method and system, computer equipment and storage medium

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A data clustering and computer program technology, applied in the field of data clustering, can solve problems such as increasing the number of K-split algorithm splits, decreasing clustering speed and efficiency, and difficult to determine the K value.

Pending Publication Date: 2022-01-28

HUAWEI TECH CO LTD

View PDF0 Cites 3 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, when using the K-split block algorithm, it is difficult to determine the appropriate K value. In addition, when the K most irrelevant data are taken out from the data set, there may be "noise" data obtained, which will increase the number of K-split algorithm splits. leading to a decrease in clustering speed and efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0046] In the field of artificial intelligence, training or adjusting a model requires collecting and processing a large amount of data, but in some specific scenarios, there is no public and complete data set that can be used to train or adjust a model, such as a face recognition system at an intersection, there is A Groups of people will pass through this intersection regularly to form a fixed face data set. At the same time, a new group of people B will pass by the intersection. The face data of group B is new data for the face recognition system at this intersection. Without being recognized by the system, this new group of people will appear every day. Therefore, the system often can only collect and process by itself and obtain corresponding data sets, but this process will consume a lot of time and resources, which greatly limits the actual scenarios of deep learning applications.

[0047] Aiming at this problem, a feasible existing technology such as figure 1 shown. ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a data clustering method and system, computer equipment and a storage medium, and belongs to the field of data processing. The method comprises the steps of obtaining a first similarity matrix of original data, performing connected domain analysis on the first similarity matrix, obtaining a first clustering result in combination with distribution statistical analysis of similarity, and obtaining a relationship of potential to-be-merged categories in the first clustering result at the same time; according to the relationship of the potential to-be-merged categories, merging the to-be-merged categories in the first clustering result to obtain a second clustering result; and performing strong connected component analysis on the second clustering result, and performing category internal noise data cleaning on the second clustering result according to an analysis result to obtain a third clustering result. According to the technical scheme provided by the invention, the high-accuracy data set subjected to clustering processing can be quickly obtained from the original data and is used for model training.

Description

technical field [0001] The invention relates to the field of data processing, in particular to a data clustering method, system, computer equipment and storage medium. Background technique [0002] In recent years, deep learning has made impressive progress in various fields. In the field of artificial intelligence, training or tuning a model using deep learning requires the collection and processing of large amounts of data. Especially for specific scene tasks, in the absence of public data sets, often only the corresponding data sets can be obtained by collecting, preprocessing, and manual error correction. This process often consumes a lot of time and resources, which greatly limits the actual scenarios of deep learning applications. [0003] In a feasible existing technology, obtain the similarity matrix of the original data, select K data as the initial grouping, and then use the K-split block clustering algorithm to cluster the remaining data into the selected K In ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G06K9/62

CPCG06F18/23G06F18/22

Inventor郑宜海田行辉胡琪张梦阳

OwnerHUAWEI TECH CO LTD

Data clustering method and system, computer equipment and storage medium

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology