Large-scale data mining method capable of guaranteeing quality monotony

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A technology of large-scale data and monotonicity, which is applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., and can solve problems such as difficulty in mining result quality and resource limitations, and approximate result quality monotonicity

Active Publication Date: 2015-05-27

NANJING UNIV OF POSTS & TELECOMM

View PDF2 Cites 5 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0002] The data capacity and type diversity of big data make us willing to use algorithms to generate approximate results to mine big data. Traditional algorithms are difficult to achieve mining results when mining big data under limited time and resource constraints. The balance between the quality of , and the resource constraints and the problem of guaranteeing the monotonicity of the quality of approximate results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0046] The following examples describe the present invention in more detail.

[0047] The present invention provides a big data mining method that guarantees monotonic quality. The process of the method is as follows: figure 1 Shown. The specific implementation of the present invention is:

[0048] The first stage: Perform data preprocessing and other operations on the data set to represent the data as a representation that can be processed by the mining part.

[0049] Step 1) Obtain the original iris data set (see Table 1).

[0050] Step 2) Use the principal component analysis method to reduce the dimensions of the data. Prevent the occurrence of maintenance disasters.

[0051] In this example, the iris flower data set contains information about 150 species of iris, and each 50 species are taken from one of the three iris species: Setosa, Versicolour, and Virginica. The characteristics of each flower are described by the following 5 attributes:

[0052] (1) Sepal length (cm)

[0053]...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a data mining method capable of guaranteeing quality monotony. The method comprises the following steps: after an original big data set is compressed by a PCA (principal components analysis) technology, mapping the original big data set onto an R-tree data structure; then, carrying out mining processing on the data set by an improved K-nearest neighbor classification algorithm. The method mainly comprises the following two parts including a coding part and a mining part, wherein the coding part utilizes R-tree to present data, data with high similarity in the data is combined to serve as one node of the R-tree so as to achieve a purpose of mass data compression and improve the efficiency of the mining part; the mining part utilizes the thought of the improved K-nearest neighbor classification algorithm to process the data node and predict the classification of an input test point. According to the large-scale data mining method, the problem that the quality of a mining result and resource restriction cannot be balanced and the quality monotony of an approximate result cannot be guaranteed when big data is mined by a traditional algorithm under the restriction of limited time and resource restriction can be solved.

Description

Technical field [0001] The invention relates to a method for efficiently processing data, by which the monotonicity of the quality of large-scale data mining results is guaranteed, and belongs to the cross-technology application field of data mining, big data and computer software. Background technique [0002] The data capacity and type diversity of big data make us willing to use algorithms to produce approximate results for data mining of big data. Traditional algorithms are difficult to mine big data under limited time and resource constraints. The balance between the quality and resource constraints and the monotonicity of the quality of the approximate results. To solve this problem, we designed a big data mining method that guarantees monotonic quality based on Shannon entropy. The mining method is divided into two parts: the coding part and the mining part. By ensuring the monotonicity of the entropy of the algorithm coding part and the mining part The entropy preservati...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30

CPCG06F16/2465

Inventor陈志党凯乐岳文静黄继鹏芮路

OwnerNANJING UNIV OF POSTS & TELECOMM

Large-scale data mining method capable of guaranteeing quality monotony

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology