Method and device for dirty data detection

A dirty data and data item technology, applied in digital data information retrieval, electrical digital data processing, special data processing applications, etc. problems, to achieve the effect of improving the efficiency of warehousing and reducing the detection link

Active Publication Date: 2022-06-28
XIAMEN MEIYA PICO INFORMATION
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] 1) The method of dirty data detection is single: it can only be matched by a single template or regular expression, and the meaning corresponding to the original data item attribute cannot be automatically analyzed, and the corresponding detection rules can be configured
[0006] 2) The scope of dirty data detection is small: only the corresponding detection rules can be pre-set for the original data items of the specified type, and the original data items of the unspecified type cannot be detected, resulting in a lot of dirty data entering the big data system. Affect the quality of big data service business development;

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for dirty data detection
  • Method and device for dirty data detection
  • Method and device for dirty data detection

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0062] The application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain related inventions, rather than to limit the invention. It should also be noted that, for the convenience of description, only the parts related to the related invention are shown in the drawings.

[0063] It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other. The present application will be described in detail below with reference to the accompanying drawings and embodiments.

[0064] figure 1 An exemplary system architecture 100 to which a dirty data detection method according to an embodiment of the present application can be applied is shown.

[0065] like figure 1 As shown, the system architecture 100 may include terminal devices 101 , 102 , ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention provides a dirty data detection method and device, which includes normalizing the attribute type of the original data and then analyzing the attribute characteristics, so as to distinguish between the original data items of a clear type and unspecified types According to the distinguished results, match the original data with an appropriate dirty data detection scheme. In addition, the original data is classified based on different classification methods, and the dirty data ratio of each classification is counted after detection using the matched dirty data detection scheme, and the dirty data detection scheme used is adjusted according to the dirty data ratio obtained, and the The proportion of dirty data in each category is counted, and finally the dirty data detection scheme used when the proportion of dirty data is the highest is selected as the priority dirty data detection scheme for the same data item. The invention can quickly and accurately identify dirty data in massive original data, greatly improves the analysis and utilization value of big data, and reduces the construction cost of big data system.

Description

technical field [0001] The invention relates to the technical field of computers, in particular to a method and device for dirty data detection. Background technique [0002] The big data system needs to access a large amount of raw data of various types and weak causal relationship every day, including network logs, pictures, geographical locations, etc. The raw data is generated very quickly and contains a large amount of dirty data. The processing speed of the big data system There are very strict requirements. The traditional dirty data detection rules can only be manually pre-set corresponding detection rules for known types of raw data. However, due to the characteristics of poor correlation and unclear attribute types of raw data, many raw data When parsing and warehousing, the corresponding detection rules are not matched, resulting in a lot of dirty data entering the big data system, which seriously affects the quality of big data service business development. Ther...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/62G06F16/2458
CPCG06F16/2462G06F18/22G06F18/24
Inventor 林文楷连志阳陈文艺鄢小征魏超蓝坤宏
Owner XIAMEN MEIYA PICO INFORMATION
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products