Unlock instant, AI-driven research and patent intelligence for your innovation.

A method for intelligent cleaning of structured data

A structured data and intelligent cleaning technology, applied in the field of data processing, can solve problems such as file conversion errors, improve data maintainability, facilitate automatic processing of information, and speed up comparison speed

Active Publication Date: 2022-04-29
河南开合软件技术有限公司
View PDF13 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The present invention proposes an intelligent cleaning method for structured data aimed at the problem of file conversion errors due to too many file formats in the prior art, which can effectively improve the accuracy rate and execution efficiency of data conversion

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method for intelligent cleaning of structured data
  • A method for intelligent cleaning of structured data
  • A method for intelligent cleaning of structured data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0045] refer to figure 1 As shown, the embodiment of the present invention provides a structured data intelligent cleaning method, including the following steps:

[0046] S101. Obtain data files to be cleaned based on the local file read-write interface and create a file list;

[0047] S102. Merge all the data files to be cleaned into one file to be cleaned;

[0048] S103, using a hash table to identify the data type and file format contained in the file to be cleaned, and marking the template type to which the identifiable file data belongs;

[0049] S104. Load the file list according to the marked template type, and sequentially perform data cleaning processing of header identification, data verification, format screening, and duplicate checking on the file data;

[0050] S105. Enter the cleaned data into the database one by one using the SQL queryer.

[0051] The steps in the above method are described and illustrated in detail below.

[0052] Understandably, the soluti...

Embodiment 2

[0087] Based on the foregoing embodiment scheme, with reference to figure 2 As shown, Embodiment 2 of the present invention also provides a structured data intelligent cleaning system, the main components of which include a file read and write interface, a multi-file merge module, a screening and verification module, a deduplication module and a database.

[0088]Among them, the user can import a single file or multiple files to be cleaned through the local file read and write interface; if it is a single file, it will directly send the single file to the screening and verification module for cleaning; The files are merged into one file to be cleaned through the multi-file merging module, and then sent to the screening and verification module for cleaning. In this process, the screening and verification module can obtain task parameters for the file data, obtain the main data file in the multi-file, and fill the main data file with other files of the same type as the array to...

Embodiment 3

[0097] Based on the foregoing embodiment scheme, with reference to image 3 As shown, Embodiment 3 of the present invention also provides a specific hardware structure of a structured data intelligent cleaning device. The structured data intelligent cleaning device 3 may include: a memory 32 and a processor 33; each component is coupled together through a communication bus 31 . Understandably, the communication bus 31 is used to realize connection and communication between these components. In addition to the data bus, the communication bus 31 also includes a power bus, a control bus and a status signal bus. But for clarity, in image 3 The various buses are denoted as communication bus 31 in FIG.

[0098] The memory 32 is used to store the structured data intelligent cleaning method program that can be run on the processor 33;

[0099] Processor 33, configured to perform the following steps when running the structured data intelligent cleaning method program:

[0100] St...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for intelligent cleaning of structured data, which is applied in the technical field of data processing and comprises the following steps: acquiring data files to be cleaned based on a local file read-write interface and establishing a file list; merging all data files to be cleaned into A file to be cleaned; use a hash table to identify the data type and file format contained in the file to be cleaned, and mark the template type to which the identifiable file data belongs; load the file list according to the marked template type, and perform headers on the file data in turn Data cleaning processing for identification, data verification, format screening, and duplicate checking; use the SQL queryer to input the cleaned data into the database one by one. The invention can effectively reduce the workload of manual secondary entry in the multi-file data cleaning process, and significantly improve the data cleaning efficiency.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to an intelligent cleaning method for structured data. Background technique [0002] Currently, public security organs, procuratorial organs, disciplinary committees and other investigation departments have massive query requirements for data such as bills, call lists, whereabouts, and tax invoices. Generally speaking, these data are retrieved directly through the internal database interface of the agency system to generate data files for people to consult. Because the source of data is uncontrollable, that is, the data source may have huge differences in technical architecture, version and operating environment, so that the format of the final generated electronic data file is different. However, files in different formats need to be obtained after data cleaning with various file editors or converters. In the process of data cleaning, there are often phenomena including c...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/215G06F16/242
CPCG06F16/215G06F16/2433
Inventor 王国俊吴东贤王广峰
Owner 河南开合软件技术有限公司