Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Data preprocessing method, system and device in data mining system

A data preprocessing and data mining technology, which is applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of low efficiency of parallel processing data and affecting the performance of data mining systems, etc.

Active Publication Date: 2011-05-11
CHINA MOBILE COMM GRP CO LTD
View PDF0 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Since a complete ETL data processing process usually consists of dozens or even hundreds of data processing functional components to complete data preprocessing, it brings a large number of I / O operations to read and write hard disks, and introduces different data during each read and write. Data transmission between data storage nodes leads to low efficiency of parallel processing of data, thus affecting the performance of the entire data mining system

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data preprocessing method, system and device in data mining system
  • Data preprocessing method, system and device in data mining system
  • Data preprocessing method, system and device in data mining system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] In order to provide an implementation plan for improving the efficiency of data preprocessing, the embodiment of the present invention provides a method, system and device for data preprocessing in a data mining system. The preferred embodiments of the present invention will be described below in conjunction with the accompanying drawings. It should be understood that the preferred embodiments described here are only used to illustrate and explain the present invention, not to limit the present invention. And in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other.

[0033] According to an embodiment of the present invention, a data preprocessing system in a data mining system is provided, wherein the data preprocessing corresponds to multiple preprocessing methods with a set execution order, such as figure 2 As shown, the system includes:

[0034] a control node 201 and a plurality of ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a data preprocessing method, a data preprocessing system and a data preprocessing device in a data mining system. The data preprocessing correspondingly has a plurality of preprocessing modes in which the execution sequence is set. The invention adopts a main technical scheme which comprises: determining a current preprocessing mode corresponding to the data preprocessing; and when determining that the processing results obtained according to the current preprocessing mode do not need to be combined and the current preprocessing mode is not the last preprocessing mode of the data preprocessing, processing the data to be processed in the current preprocessing mode at the operating nodes, and controlling the operating nodes to process the processing results in a preprocessing mode next to the current preprocessing mode. According to the technical scheme, the transmission process between different nodes for reading the data to be processed and writing the processing results to be written is not used, the data preprocessing efficiency in the data mining system is improved, and the performance of the entire data mining system is improved.

Description

technical field [0001] The invention relates to the technical field of data mining, in particular to a data preprocessing method, system and device in a data mining system. Background technique [0002] Data mining is the process of extracting potentially useful information and knowledge hidden in it from a large number of incomplete, noisy, fuzzy, random practical application data. The data mining process usually includes: data loading, data preprocessing (ETL), data mining algorithm implementation, and result display. Among them, ETL (Extraction-Transformation-Loading, extraction, transformation and loading) accounts for more than 60% of the workload in the data mining process. [0003] ETL is responsible for extracting data from distributed and heterogeneous data sources, such as relational data and flat data files, to a temporary middle layer for cleaning, conversion, and integration, and finally loading them into data warehouses or data marts to become online analysis ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 高丹徐萌邓超郭磊涛罗治国周文辉孙少陵陶涛何鸿凌来晓阳
Owner CHINA MOBILE COMM GRP CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products