Unlock instant, AI-driven research and patent intelligence for your innovation.

Data set partitioning method and device

A data set and data concentration technology, applied in the field of data clustering, can solve the problem that the subsequent training model cannot be positively affected to increase the model overhead, and achieve the effect of reducing the number of target clusters and improving efficiency

Pending Publication Date: 2022-04-29
BEIJING QIANXIN TECH +1
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The invention provides a data set division method and device, which is used to solve the problem that multiple clustering results obtained after dividing the data set in the prior art cannot have a positive effect on the subsequent training model, thereby increasing the overhead of the model during training defects, so that it can adaptively complete the process of data division and improve the training efficiency of the model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data set partitioning method and device
  • Data set partitioning method and device
  • Data set partitioning method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] In order to make the purpose, technical solutions and advantages of the present invention clearer, the technical solutions in the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the present invention. Obviously, the described embodiments are part of the embodiments of the present invention , but not all examples. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0035] Combine below figure 1 Describe the data set division method provided by the embodiment of the present invention, including:

[0036] Step 101. Determine the number of target clusters of the data set to be divided, perform clustering on the data set to be divided, and obtain a clustering result, where the target number of clusters is the number of the clustering results.

[0037] It ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a data set division method and device, and the method comprises the steps: determining a target clustering number of a to-be-divided data set, carrying out the clustering of the to-be-divided data set, and obtaining a clustering result, and the target clustering number is the number of the clustering results; and according to a preset division standard, dividing the clustering result into an effective clustering result and an invalid clustering result, and merging data in the invalid clustering result into the effective clustering result. According to the data set division method and device provided by the invention, the invalid clustering results are merged into the effective clustering results, so that the number of target clusters is reduced, and only the effective clustering results are reserved, thereby improving the efficiency of model training.

Description

technical field [0001] The invention relates to the field of data clustering, in particular to a data set division method and device. Background technique [0002] Using machine learning algorithms to identify malware has become one of the current research and development trends in the security field. When applying machine learning algorithms, it is necessary to continuously provide high-quality labeled samples for model training and updating. [0003] Sample collection platforms such as VT use simple construction rules to obtain rough data sets, and the data labeling results in this data set are relatively low in purity. In order to improve the efficiency of training models, the existing technology divides the training data set into several types of groups. To train the model separately, but if the distribution of various data in the original data set is unbalanced, the multiple clustering results obtained after grouping the data set will not have a positive effect on the s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/906G06K9/62
CPCG06F16/906G06F18/23213G06F18/214
Inventor 赵毅强王志刚齐向东吴云坤
Owner BEIJING QIANXIN TECH