Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Data information consistency processing method, system and device based on big data

A data information and consistency technology, applied in the field of big data, can solve problems such as long execution time and complex process processing

Inactive Publication Date: 2017-10-03
BEIJING HONGMA MEDIA CULTURE DEV
View PDF3 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the process of data cleaning, the data consistency check on the data from each data source needs to judge the uniqueness of the data based on the combination of multiple fields in each table. The process is complicated and the execution time is too long

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data information consistency processing method, system and device based on big data
  • Data information consistency processing method, system and device based on big data
  • Data information consistency processing method, system and device based on big data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0056] refer to figure 1 , figure 1 A flow chart of an embodiment of a big data-based data information consistency processing method provided by the present invention is shown. Including: step S110 to step S160.

[0057] In step S110, the business primary key of at least one data table to be processed is obtained.

[0058] In step S120, the business primary key is converted into a unified standard format to generate a verification code.

[0059] In step S130, the Hamming distance algorithm is used to determine the similarity of the verification code data.

[0060] In step S140, the identification codes of the verification code data are sequentially generated by using the drawer principle algorithm.

[0061] In step S150, the first identification code is compared with each subsequent identification code, when the subsequent identification code is the same as the first identification code, the identification code of the subsequent identification code is recorded as the secon...

Embodiment 2

[0089] refer to figure 2 , figure 2 A structural block diagram of an embodiment of a data information consistency processing system 200 based on big data provided by the present invention is shown. include:

[0090] An acquisition module 21, configured to acquire the business primary key of at least one data table to be processed;

[0091] A conversion module 22, configured to convert the business primary key into a unified standard format to generate a verification code;

[0092] A determining module 23, configured to determine the similarity of the verification code data by using the Hamming distance algorithm;

[0093] The generating module 24 is used to sequentially generate the identification codes of the verification code data by adopting the drawer principle algorithm;

[0094] Contrast module 25, is used for comparing the first identification code with each subsequent identification code, when the subsequent identification code is the same as the first identifica...

Embodiment 3

[0113] refer to image 3 , image 3 A structural block diagram of an example of a data information consistency processing device 300 based on big data provided by the present invention is shown. It includes the system 200 described in any one of the second embodiment.

[0114] Embodiment 3 of the present invention provides a data information consistency processing device based on big data. The invention obtains at least one business master key of a data table to be processed; converts the business master key into a unified standard format to generate a verification code; Use the Hamming distance algorithm to determine the similarity of the verification code data; use the drawer principle algorithm to sequentially generate the identification codes of the verification code data; compare the first identification code with each subsequent identification code, and follow-up When the identification code is the same as the first identification code, record the identification code o...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a data information consistency processing method, system and device based on big data. The data information consistency processing method comprises obtaining a business major key of at least one data sheet to be processed; converting the business major key to be in a unified standard format, and generating a verification code; determining the verification code data similarity by means of a Hamming distance algorithm; sequentially generating identification codes of the verification code data by means of a drawer principle algorithm; comparing the first identification code with each subsequent identification code, and marking a distinguishing code of the subsequent identification code as a second distinguishing code when the subsequent identification code is the same as the first identification code; and deleting the data the identification code of which has the second distinguishing code. When more than one hundred million data in multiple rows or multiple columns is processed, a lot of processing time is saved, and the data processing efficiency is improved.

Description

technical field [0001] The present invention relates to the technical field of big data, and in particular, to a method, system and device for processing data information consistency based on big data. Background technique [0002] With the development of the Internet and mobile Internet, the continuous increase of data has become a significant feature of the era of big data. Enterprises are also paying more and more attention to big data. No matter from the perspective of data storage, calculation and application, they have invested more manpower and material resources to try and explore. [0003] One of the important prerequisites for the production and use of big data is data cleaning. Data cleaning refers to the final process of finding and correcting identifiable errors in data files, including checking data consistency, handling invalid and missing values, etc. Because the data in the data warehouse is a collection of data oriented to a certain topic, these data are ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06F11/08
CPCG06F16/2365G06F11/08G06F16/215
Inventor 顾喜德
Owner BEIJING HONGMA MEDIA CULTURE DEV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products