Check patentability & draft patents in minutes with Patsnap Eureka AI!

Data mining method and device and computer readable storage medium

A data mining and data technology, applied in the field of communication, can solve the problems of labor cost, low hit rate of Badcase, diversity of data clusters, large differences, etc., and achieve the effect of reducing transition dependence and improving hit rate

Pending Publication Date: 2019-12-20
TENCENT CLOUD COMPUTING BEIJING CO LTD
View PDF0 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In the process of research and practice of the prior art, the inventors of the present invention found that a large amount of labor cost is consumed by manual search, and the simple calculation of the distance between two features in the data cluster is due to the diversity and difference of the data clusters. Large, making it difficult to distinguish between bad files and normal data, thus resulting in a low hit rate for Bad cases

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data mining method and device and computer readable storage medium
  • Data mining method and device and computer readable storage medium
  • Data mining method and device and computer readable storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative efforts fall within the protection scope of the present invention.

[0041] Embodiments of the present invention provide a data mining method, device and computer-readable storage medium. Wherein, the data mining device may be integrated in electronic equipment, and the electronic equipment may be a server, or a terminal or other equipment.

[0042] The so-called data mining can be a process of extracting hidden, unknown but potentially useful information and knowledge from a large number of incomplete, noisy, fuzzy, and random data. It can fi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention discloses a data mining method and device and a computer readable storage medium. The embodiment of the invention comprises the following steps of extracting features on a to-be-processed data set; constructing a feature space, extracting node features from the feature space; generating graph data of the to-be-processed data set, wherein the graph data at least comprises one node, screening out a data cluster corresponding to the node from the graph data, calculating data purity of the data cluster to obtain intra-cluster purity of the data cluster, and when theintra-cluster purity is lower than a preset purity threshold, obtaining corresponding data of the node in a to-be-processed data set to obtain mined data. According to the scheme, all feature information in the data cluster is investigated. A bad file is evaluated through the purity in the data cluster, then bad file mining is carried out, the transition dependence on feature representation is reduced, the bad file in the data can be mined more quickly, efficiently and accurately, and therefore the hit rate of the bad file in the data is increased.

Description

technical field [0001] The present invention relates to the field of communication technology, in particular to a data mining method, device and computer-readable storage medium. Background technique [0002] In data mining scenarios, both image data and text data require pure data, but limited by the representation ability of the model, the data generated by classification, clustering, etc. often cannot guarantee the in-cluster data due to its bad case. Purity, in the prior art, the purity of the data cluster is judged by manual search and simple calculation of the distance between two features in the data cluster. [0003] In the process of research and practice of the prior art, the inventors of the present invention found that a large amount of labor cost is consumed by manual search, and the simple calculation of the distance between two features in the data cluster is due to the diversity and difference of the data clusters. Large, making it difficult to distinguish b...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/903
CPCG06F16/90335
Inventor 余莉萍石楷弘王吉陈志博
Owner TENCENT CLOUD COMPUTING BEIJING CO LTD
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More