Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Data cleaning method and system

A data cleaning and data technology, applied in the computer field, can solve the problems of data machines that cannot be recognized or corrected, high cost, and low efficiency

Inactive Publication Date: 2017-09-26
SHENZHEN AUDAQUE DATA TECH
View PDF4 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, most of the work of data cleaning is done by machines, but the errors in the data that cannot be recognized or corrected by machines still need to be processed manually
[0003] At present, for errors in data that cannot be recognized or corrected by machines, several fixed individuals are usually used to clean the data. However, for very large data systems, the cost of using fixed personnel to clean data is high. low efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data cleaning method and system
  • Data cleaning method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0021] combine figure 1 , the data cleaning method provided in this embodiment includes:

[0022] Step S1: Segment tasks for the data that needs to be manually cleaned;

[0023] Step S2: Publish the divided tasks to the crowdsourcing platform;

[0024] Step S3: receiving the manual cleaning result data returned by the task recipient through the crowdsourcing platform, and integrating the manual cleaning result data with the machine cleaning result data.

[0025] The data cleaning method provided by the embodiment of the present invention divides the data that needs to be manually cleaned into tasks, and sends the divided tasks to a large number of unfixed task recipients through the crowdsourcing platform, thereby completing the data cleaning , can improve data cleaning efficiency and reduce costs.

[0026] Preferably, the data that needs to be manually cleaned includes abnormal data submitted by the machine during the data cleaning process. These abnormal data include data...

Embodiment 2

[0035] combine figure 2 , the data cleaning system provided by the embodiment of the present invention includes task segmentation module 1: used to perform task segmentation on the data that needs to be manually cleaned; task release module 2: used to release the segmented tasks to crowdsourcing Platform; data integration module 3: used to receive the manual cleaning result data returned by the task recipient through the crowdsourcing platform, and integrate the manual cleaning result data with the machine cleaning result data.

[0036] The data cleaning system provided by the embodiment of the present invention divides the data that needs to be manually cleaned into tasks, and sends the divided tasks to a large number of unfixed task recipients through the crowdsourcing platform, thereby completing the data cleaning , can improve data cleaning efficiency and reduce costs.

[0037] Preferably, the data that needs to be manually cleaned includes abnormal data submitted by the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a data cleaning method and system. The method comprises the steps of conducting task segmentation on data which needs to be manually cleaned; publishing segmented tasks to a crowdsourcing platform; receiving manual cleaning result data returned by task receivers through the crowdsourcing platform, and integrating the manual cleaning result data and machine cleaning result data. According to the data cleaning method and system, data which needs to be manually cleaned is subjected to task segmentation, the segmented tasks are sent to a large number of unfixed task receivers through the crowdsourcing platform, and thus cleaning of the data is completed, and the data cleaning efficiency can be improved, and the cost can be lowered.

Description

technical field [0001] The invention relates to computer technology, in particular to a data cleaning method and system. Background technique [0002] Data cleaning refers to finding and correcting identifiable errors in data files, mainly including checking data consistency, dealing with invalid and missing values ​​in data, etc. At present, most of the work of data cleaning is done by machines, but the errors in the data that cannot be recognized or corrected by machines still need to be processed manually. [0003] At present, for errors in data that cannot be recognized or corrected by machines, several fixed individuals are usually used to clean the data. However, for very large data systems, the cost of using fixed personnel to clean data is high. low efficiency. Contents of the invention [0004] The technical problem to be solved by the present invention is to provide a data cleaning method and system, which divides the data that needs to be manually cleaned into...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/215
Inventor 于文渊贾西贝
Owner SHENZHEN AUDAQUE DATA TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products