Data preprocessing method for heterogeneous data sources

A data preprocessing, heterogeneous data source technology, applied in the field of data processing, can solve the problems of lack of ease of use and scalability, difficult to modify processing rules, lack of versatility, etc., to improve data processing efficiency and processing accuracy, Easy to expand and universal effect

Active Publication Date: 2017-02-01
GUANGDONG JINGAO INFORMATION TECH CO LTD +1
View PDF4 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

They all provide some data preprocessing functions, but there are also great limitations: (1) lack of versatility, such as DTS is only applicable to Windows platform, and can only use ODBC to connect various data sources; (2) lack of easy Usability and scalability, although OWB has a good effect on the preprocessing of names and addresses, but the process is too cumbersome and difficult to use, and it is difficult for users to write their own customized programs to adapt to data pr...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data preprocessing method for heterogeneous data sources
  • Data preprocessing method for heterogeneous data sources
  • Data preprocessing method for heterogeneous data sources

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] Attached below Figure 1-4 Be specific.

[0038] figure 1 A data preprocessing method for a heterogeneous data source of the present invention is shown, and the method includes the following steps:

[0039] S1: Read heterogeneous data from multiple heterogeneous data sources;

[0040] S2: Preprocessing the heterogeneous data based on the preprocessing rule base to obtain normalized data; and

[0041] S3: storing the normalized data in a database for data integration, data mining and / or online analytical processing of the enterprise.

[0042] The heterogeneous data is political and legal business data stored in public security, procuratorate, court, judicial and / or prison information processing systems. Through the anti-terrorism cooperation platform of the public security, a political and legal business data sharing platform that runs through political and legal departments at all levels is built to achieve the goals of information exchange, resource sharing, safety...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a data preprocessing method for heterogeneous data sources. The method comprises the following steps: reading heterogeneous data from a plurality of heterogeneous data sources; preprocessing the heterogeneous data based on a preprocessing rule library to obtain normalized data; storing the normalized data in a database to be applied to data integration, data mining and/or online analysis processing of enterprises. By adopting the data preprocessing method, politics and law service data can be shared; the method has high generality, and is easy to extend; three times of progressive preprocessing are performed on the data, and a processing process can be traced, so that a processing rule is easy to modify, the data processing efficiency and processing accuracy are increased, and the data can be stored in a unified way based on an error log modification and extraction rule to provide services to the outside.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to a data preprocessing method for heterogeneous data sources. Background technique [0002] When building an information system, even with good design and planning, there is no guarantee that the quality of the stored data will meet the user's requirements in all cases. There is a need for metadata to represent data quality. The four indicators of data consistency, correctness, completeness and minimality are defined in a formal method. According to the extent to which these indicators are satisfied in the information system, the data quality analysis and model in data engineering are put forward, and it is believed that there are many candidate data quality measurement indicators. Users should choose some of them according to the needs of the application. Indicators are divided into two categories: data quality indicators and data quality parameters. The former is objec...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/90
Inventor 李志敏梁柏超贺文锋
Owner GUANGDONG JINGAO INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products