Method for filtering noise data based on pattern mining

A data and noise technology, applied in the field of noise data filtering based on pattern mining, to achieve the effect of filtering noise data and improving accuracy

Active Publication Date: 2012-08-15
INFORMATION & COMM BRANCH OF STATE GRID JIANGSU ELECTRIC POWER
View PDF3 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In order to overcome the existing problems of mining noise data using frequent patterns in the pr

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for filtering noise data based on pattern mining
  • Method for filtering noise data based on pattern mining
  • Method for filtering noise data based on pattern mining

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0017] A noise data filtering method based on pattern mining according to the present invention, the method first establishes a preprocessing data structure FP tree composed of a bag of words data set D, and the FP tree includes a bag of words data set and its corresponding thing data secondly, according to the FP-2INF algorithm, finally add all 2-itemset interest patterns to the interest pattern set L to complete noise data filtering; figure 1 It is a concrete flowchart of the present invention, and the concrete steps are as follows:

[0018] 1) Preprocess the input data set; the input data consists of two tuples The bag-of-words data set consists of Word_ID - instance matrix, each row of which consists of Word_ID and its related instance data, and then transformed into a transaction data set for constructing FP tree;

[0019] 2) According to the preprocessed data set, the FP tree is established in descending order of frequent concentration frequency, and the paramet...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for filtering noise data based on pattern mining. The method comprises the following steps that: a pretreatment data structure frequent pattern (FP) tree composed of a word bag dataset D is built and the FP tree comprises the word bag dataset and the corresponding thing dataset; and then, all 2- itemset interest patterns are added into an interest pattern set L according to an FP-2 infimum (INF) algorithm to complete noise data filtration. The method provided by the invention directly prunes interest degree indexes and performs one-step mining of the interest pattern according to interest degree index, which is helpful for effectively realizing noise data filtration, acquiring high-quality data and improving accuracy and consistency of data.

Description

technical field [0001] The invention relates to a data processing method, in particular to a noise data filtering method based on pattern mining. Background technique [0002] Data quality refers to the index of the degree to which data meets explicit or implicit requirements, and it is a true portrayal of the real world. The problem of data quality not only refers to incorrect data but also data inconsistency. With the increase of data volume, the internal consistency of data becomes extremely important, and it is a topic that widely exists in the use of data in various disciplines. Noise is a random part of measurement error which may involve distortion of values ​​or addition of falsified object data. [0003] As one of the core issues in data mining, association analysis is used to find hidden associations between data items in a given data record set and to describe meaningful connections between data. For association rule mining, it is often transformed into a framewo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 曹杰伍之昂李秀怡毛波杨风召
Owner INFORMATION & COMM BRANCH OF STATE GRID JIANGSU ELECTRIC POWER
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products