Intelligent cleaning method for structured data

A structured data, intelligent cleaning technology, applied in the field of data processing, can solve problems such as file transfer errors, achieve the effect of improving data maintenance, realizing database preservation, and avoiding omissions

Active Publication Date: 2019-11-15
河南开合软件技术有限公司
View PDF13 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The present invention proposes an intelligent cleaning method for structured data aimed at the problem of file conversion errors due to too many file formats in the prior art, which can effectively improve the accuracy rate and execution efficiency of data conversion

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Intelligent cleaning method for structured data
  • Intelligent cleaning method for structured data
  • Intelligent cleaning method for structured data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0045] refer to figure 1 As shown, the embodiment of the present invention provides a structured data intelligent cleaning method, including the following steps:

[0046] S101. Obtain data files to be cleaned based on the local file read-write interface and create a file list;

[0047] S102. Merge all the data files to be cleaned into one file to be cleaned;

[0048] S103. Use a hash table to identify the data type and file format contained in the file to be cleaned, and mark the template type to which the identifiable file data belongs;

[0049] S104. Load the file list according to the marked template type, and sequentially perform data cleaning processing on the file data such as header identification, data verification, format screening, and duplicate checking;

[0050] S105. Enter the cleaned data into the database one by one using the SQL queryer.

[0051] The steps in the above method are described and illustrated in detail below.

[0052] Understandably, the soluti...

Embodiment 2

[0087] Based on the foregoing embodiment scheme, with reference to figure 2 As shown, Embodiment 2 of the present invention also provides a structured data intelligent cleaning system, the main components of which include a file read and write interface, a multi-file merge module, a screening and verification module, a deduplication module and a database.

[0088]Among them, the user can import a single file or multiple files to be cleaned through the local file read and write interface; if it is a single file, it will directly send the single file to the screening and verification module for cleaning; The files are merged into one file to be cleaned through the multi-file merging module, and then sent to the screening and verification module for cleaning. In this process, the screening and verification module can obtain task parameters for the file data, obtain the main data file in the multi-file, and fill the main data file with other files of the same type as the array to...

Embodiment 3

[0097] Based on the foregoing embodiment scheme, with reference to image 3 As shown, Embodiment 3 of the present invention also provides a specific hardware structure of a structured data intelligent cleaning device. The structured data intelligent cleaning device 3 may include: a memory 32 and a processor 33; each component is coupled together through a communication bus 31 . Understandably, the communication bus 31 is used to realize connection and communication between these components. In addition to the data bus, the communication bus 31 also includes a power bus, a control bus and a status signal bus. But for clarity, in image 3 The various buses are denoted as communication bus 31 in FIG.

[0098] The memory 32 is used to store the structured data intelligent cleaning method program that can be run on the processor 33;

[0099] Processor 33, configured to perform the following steps when running the structured data intelligent cleaning method program:

[0100] St...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an intelligent cleaning method for structured data, being applied to the technical field of data processing. The intelligent cleaning method comprises the following steps: acquiring a to-be-cleaned data file based on a local file read-write interface, and establishing a file list; merging all the to-be-cleaned data files into a to-be-cleaned file; identifying a data type and a file format contained in the to-be-cleaned file by using the hash table, and marking a template type to which the identifiable file data belongs; loading a file list according to the marked template type, and sequentially carrying out header identification, data verification, format screening and duplicate checking data cleaning processing on the file data; and inputting the cleaned data intoa database one by one by using an SQL querier. According to the intelligent cleaning method, the workload of manual secondary input in the multi-file data cleaning process can be effectively reduced,and the data cleaning efficiency is remarkably improved.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to an intelligent cleaning method for structured data. Background technique [0002] Currently, public security organs, procuratorial organs, disciplinary committees and other investigation departments have massive query requirements for data such as bills, call lists, whereabouts, and tax invoices. Generally speaking, these data are retrieved directly through the internal database interface of the agency system to generate data files for people to consult. Because the source of data is uncontrollable, that is, the data source may have huge differences in technical architecture, version and operating environment, so that the format of the final generated electronic data file is different. However, files in different formats need to be obtained after data cleaning with various file editors or converters. In the process of data cleaning, there are often phenomena including c...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/215G06F16/242
CPCG06F16/215G06F16/2433
Inventor 王国俊吴东贤王广峰
Owner 河南开合软件技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products