Data classification method and device and electronic equipment

A data classification and data processing technology, applied in the computer field, can solve the problems of unlabeled sample data, inexhaustible abnormal data, easy to confuse outliers and normal values, etc., to improve accuracy and reduce the final error effect

Active Publication Date: 2020-05-19
TENCENT TECH (SHENZHEN) CO LTD
View PDF7 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For application scenarios such as anomaly detection, the sample data often has no labels, or it is impossible to exhaust the abnormal data in the sample data.
However, the above-mentioned binning methods based on unsupervised mode are easy to confuse outliers and normal values, which makes the accuracy of abnormal data detection decrease.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data classification method and device and electronic equipment
  • Data classification method and device and electronic equipment
  • Data classification method and device and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete and fully convey the concept of example embodiments to those skilled in the art.

[0045] Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided in order to give a thorough understanding of embodiments of the invention. However, those skilled in the art will appreciate that the technical solutions of the present invention may be practiced without one or more of the specific details, or other methods, components, means, steps, etc. may be employed. In other instances, well-known methods, apparatus, implem...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the technical field of computers, in particular to a data classification method, a data classification device and electronic equipment. The method comprises the steps of obtaining at least two attribute values of a target attribute, and selecting one attribute value from the at least two attribute values as an initial clustering center; wherein the to-be-processed data comprises a plurality of samples; according to the distance between each attribute value and the initial clustering center, calculating a probability value that each attribute value can be used as the clustering center so as to determine the clustering center according to the probability value; clustering the attribute values of the target attributes based on the clustering centers, and dividing interval boundaries according to clustering results; and classifying the attribute values corresponding to the target attributes of the samples in the to-be-processed data according to an interval division result. According to the method, discrete processing can be carried out on continuous values, normal values and abnormal values are stored, and the normal values and the abnormal values can be classified into different categories.

Description

technical field [0001] The present invention relates to the field of computer technology, in particular to a data classification method, a data classification device and electronic equipment. Background technique [0002] Binning can discretize continuous data and increase granularity. It can be used to denoise data, screen abnormal data, etc. Data binning methods can be divided into supervised methods and unsupervised methods. Among them, supervised methods can include chi-square binning, decision tree binning, and the like. Supervised methods include equal frequency binning, equidistant binning, cluster binning, etc. [0003] However, the existing binning methods all have certain defects. For example, supervised methods need to configure labels for samples during model training. For application scenarios such as anomaly detection, the sample data often has no labels, or it is impossible to exhaust the abnormal data in the sample data. However, the above-mentioned bin...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/2321G06F18/2415
Inventor 程哲豪吕培立董井然黄文陈守志
Owner TENCENT TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products