Large-scale data mining method capable of guaranteeing quality monotony

A technology of large-scale data and monotonicity, which is applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., and can solve problems such as difficulty in mining result quality and resource limitations, and approximate result quality monotonicity

Active Publication Date: 2015-05-27
NANJING UNIV OF POSTS & TELECOMM
View PDF2 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] The data capacity and type diversity of big data make us willing to use algorithms to generate approximate results to mine big data. Traditional algorithms are difficult to achieve mining results when min

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Large-scale data mining method capable of guaranteeing quality monotony
  • Large-scale data mining method capable of guaranteeing quality monotony
  • Large-scale data mining method capable of guaranteeing quality monotony

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] The following examples describe the present invention in more detail.

[0047] The present invention provides a big data mining method that guarantees monotonic quality. The process of the method is as follows: figure 1 Shown. The specific implementation of the present invention is:

[0048] The first stage: Perform data preprocessing and other operations on the data set to represent the data as a representation that can be processed by the mining part.

[0049] Step 1) Obtain the original iris data set (see Table 1).

[0050] Step 2) Use the principal component analysis method to reduce the dimensions of the data. Prevent the occurrence of maintenance disasters.

[0051] In this example, the iris flower data set contains information about 150 species of iris, and each 50 species are taken from one of the three iris species: Setosa, Versicolour, and Virginica. The characteristics of each flower are described by the following 5 attributes:

[0052] (1) Sepal length (cm)

[0053]...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a data mining method capable of guaranteeing quality monotony. The method comprises the following steps: after an original big data set is compressed by a PCA (principal components analysis) technology, mapping the original big data set onto an R-tree data structure; then, carrying out mining processing on the data set by an improved K-nearest neighbor classification algorithm. The method mainly comprises the following two parts including a coding part and a mining part, wherein the coding part utilizes R-tree to present data, data with high similarity in the data is combined to serve as one node of the R-tree so as to achieve a purpose of mass data compression and improve the efficiency of the mining part; the mining part utilizes the thought of the improved K-nearest neighbor classification algorithm to process the data node and predict the classification of an input test point. According to the large-scale data mining method, the problem that the quality of a mining result and resource restriction cannot be balanced and the quality monotony of an approximate result cannot be guaranteed when big data is mined by a traditional algorithm under the restriction of limited time and resource restriction can be solved.

Description

Technical field [0001] The invention relates to a method for efficiently processing data, by which the monotonicity of the quality of large-scale data mining results is guaranteed, and belongs to the cross-technology application field of data mining, big data and computer software. Background technique [0002] The data capacity and type diversity of big data make us willing to use algorithms to produce approximate results for data mining of big data. Traditional algorithms are difficult to mine big data under limited time and resource constraints. The balance between the quality and resource constraints and the monotonicity of the quality of the approximate results. To solve this problem, we designed a big data mining method that guarantees monotonic quality based on Shannon entropy. The mining method is divided into two parts: the coding part and the mining part. By ensuring the monotonicity of the entropy of the algorithm coding part and the mining part The entropy preservati...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/2465
Inventor 陈志党凯乐岳文静黄继鹏芮路
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products