Outlier data mining method based on feature weighting and MapReduce

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A feature weighted, outlier data technology, applied in data mining, special data processing applications, electrical digital data processing, etc., can solve the problems of ambiguous cluster structure, large amount of calculation, large amount of data, etc., to achieve mining efficiency and high precision, overcoming efficiency problems, and the effect of small human factors

Active Publication Date: 2020-09-01

太原太工天宇教育科技有限公司

View PDF11 Cites 1 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

In high-dimensional mass data, due to the large amount of data and high dimensionality, the effect and efficiency of outlier data mining are seriously affected, and some outlier data hidden in the subspace and some outlier data with edge distribution may not be found.

It is precisely because of the clustering characteristics of high-dimensional sparse data sets that the outlier data distribution often exists in a certain subspace instead of the entire feature space, and irrelevant features will make the cluster structure of the data more blurred. If the cluster structure in the data set is well discovered, the outliers will be more difficult to detect, and outlier data mining cannot be realized.

[0003] In addition, in recent years, although the traditional outlier data mining algorithms have made a lot of improvements in their respective fields, they are no longer applicable in high-dimensional data sets, and the calculation load is large, and the mining efficiency and accuracy are low. Therefore, how to target Accurate mining of big data, high-dimensional data, and outlier data is a major problem in outlier data mining.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0028] For the mining of high-dimensional and massive data, the scheme of the present invention provides the following method steps:

[0029] Step 1: Based on the feature weighted subspace, the subspace data is separated into cluster centers, clusters and candidate outlier data sets under the programming model; Step 2: Calculate the global distance for the outlier data set described in step 1, Then define the outlier data.

[0030]Preferably, in step one, the feature weighted subspace is obtained after defining the feature weighted estimation entropy on the attribute dimension, and then under the MapReduce programming model, the subspace data set is quickly separated by using the density peak algorithm; in step two, the The calculation of the global distance includes calculating its global Weight_k distance, and the calculation of the Weight_k distance also includes a process of sorting the Weight_k distance set in descending order and outputting TOP-N data. Further, in the f...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to the technical field of data mining, in particular to an outlier data mining method based on feature weighting and MapReduce, which comprises the following steps: step 1, basedon a feature weighting subspace, separating subspace data into a clustering center, a clustering cluster and a candidate outlier data set under a MapReduce programming model; and 2, calculating a global distance for the outlier data set in the step 1, and then defining outlier data. According to the invention, the outlier data mining method is reasonable in calculation amount; human factors are small, and the digging efficiency and precision are high; for high-dimensional mass data, feature dimensions which cannot provide valuable information in the high-dimensional data set are automaticallysearched and deleted, and the interference of dimension disaster is effectively reduced; the invention provides a technical scheme of a high-dimensional massive outlier data mining method which is simple in system, relatively high in accuracy and excellent in performance, so that the efficiency problem in outlier detection is relatively well overcome, and the method has profound application and influence in the field of informationized big data.

Description

technical field [0001] The present invention relates to the technical field of data mining, in particular to a method for outlier data mining Background technique [0002] Outlier data is the data that obviously deviates from other data, does not meet the general pattern or behavior of the data, and is inconsistent with other existing data. It often contains a lot of valuable information that is not easy to be discovered by people. As an important branch of data mining, outlier data mining has been widely used in securities market, astronomical spectrum data analysis, network intrusion, financial fraud, extreme weather analysis and other fields. In high-dimensional mass data, due to the large amount of data and high dimensionality, the effect and efficiency of outlier data mining are seriously affected, and some outlier data hidden in the subspace and some outlier data with edge distribution may not be found. It is precisely because of the clustering characteristics of high...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F16/2458G06K9/62G06F16/215

CPCG06F16/2465G06F16/215G06F2216/03G06F18/2321G06F18/22

Inventor 朱晓军吕士钦娄圣金

Owner 太原太工天宇教育科技有限公司

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Outlier data mining method based on feature weighting and MapReduce

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology