Hadoop-based fast neighborhood rough set attribute reduction method

A neighborhood rough set and attribute reduction technology, applied in special data processing applications, instruments, electrical digital data processing, etc., to improve analysis efficiency, reduce time complexity, and reduce output

Active Publication Date: 2013-10-02
HUZHOU TEACHERS COLLEGE
View PDF6 Cites 35 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, there are few researches on distributed

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Hadoop-based fast neighborhood rough set attribute reduction method
  • Hadoop-based fast neighborhood rough set attribute reduction method
  • Hadoop-based fast neighborhood rough set attribute reduction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] In order to achieve the above object, the present invention proposes a Hadoop-based neighborhood rough set fast attribute reduction method, comprising the following steps:

[0032] a) Set up a distributed platform based on Hadoop: set up the HDFS distributed file system and the MapReduce parallel programming model; the HDFS distributed file system adopts a master-slave structure system, consisting of a manager and multiple workers, and the manager manages files The namespace of the system maintains the file system tree and all files and directories in the entire tree. The worker is the working node of the file system, stores and retrieves data blocks as needed, and periodically sends a "heartbeat" report to the manager. If the management If the operator does not receive the worker's "heartbeat" report within the specified period of time, the manager starts a fault-tolerant mechanism to process it; the MapReduce parallel programming model divides the task into several sma...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Hadoop-based fast neighborhood rough set attribute reduction method. The method comprises the following steps: a, establishing a distributed platform based on the Hadoop; b, defining a neighborhood rough set; c, generating a candidate set; d, calculating the importance of each attribute; e, selecting the attribute with the largest importance and adding the attribute into the candidate set; f, judging whether a stop condition is met or not; g, storing conditions selected by characteristics. The method is based on the Hadoop distributed platform to analyze the parallelization of a parallel data mining algorithm so as to realize the parallelization of a neighborhood rough set attribute reduction algorithm; the time complexity of the parallelized attribute reduction is greatly lowered, the output of an intermediate result in the performing intermediate process is greatly reduced, and the analysis efficiency of large-scale data is improved, so that numerous and varied mass data are converted into available data with information and business values, thereby completing mining and analysis optimizing of data.

Description

【Technical field】 [0001] The invention relates to a data attribute reduction method, in particular to a large data distributed attribute reduction method. 【Background technique】 [0002] With the rapid development of the high-tech information industry and the continuous updating of the chapters of human history, we have now entered an era of data explosion and information expansion. Every day, massive amounts of data are generated, operated and utilized every second. The "big data era" is coming. Within one minute, the amount of new data posted on Weibo exceeds 100,000. The New York Stock Exchange generates 1TB of transaction data every day, and the world generates 2.5 Ai (1 Ai equals 10 to the 18th power) words every day. section data. IDC's recent digital universe research predicts that by 2020, the world's total data storage will reach 35ZB (1Z is equal to 10 to the 21st power). Faced with the rapid growth of massive data, how to more effectively analyze the massive dat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 蒋云良杨建党刘勇范婧张雄涛
Owner HUZHOU TEACHERS COLLEGE
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products