Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Data cleaning method and device

A data cleaning and data technology, applied in electrical digital data processing, special data processing applications, digital data information retrieval, etc., can solve the problems of relying on cleaning rules, easy to accidentally damage normal samples, and inaccurate identification of abnormal samples.

Pending Publication Date: 2021-06-08
CHINA CONSTRUCTION BANK
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In the process of realizing the present invention, the inventors found at least the following problems in the prior art: excessive reliance on cleaning rules, inaccurate identification of abnormal samples, and easy accidental damage to normal samples; in addition, using a mechanical rule cleaning mechanism to process massive samples The cleaning efficiency is not high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data cleaning method and device
  • Data cleaning method and device
  • Data cleaning method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] Exemplary embodiments of the present invention are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present invention to facilitate understanding, and they should be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

[0031] It should be noted that, in the case of no conflict, the embodiments of the present invention and the technical features in the embodiments can be combined with each other.

[0032] figure 1 is a schematic diagram of the main steps of the data cleaning method according to the embodiment of the present invention.

[0033] Such as figure 1 As shown, the data cleani...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a data cleaning method and device, and relates to the technical field of automatic program design. A specific embodiment of the method comprises the following steps: acquiring a plurality of initial data samples without label values; inputting each initial data sample into a pre-trained sample classification model to obtain an initial label value of each initial data sample, wherein the initial label value comprises initial normal and initial abnormal, and each initial data sample and the initial label value of the initial data sample form an initial training sample; training a preset data cleaning model according to initial training samples corresponding to the plurality of initial data samples; acquiring to-be-cleaned data; and inputting to-be-cleaned data into the trained data cleaning model to obtain abnormal data in the to-be-cleaned data, and removing the abnormal data. According to the embodiment, the abnormal sample can be accurately identified based on the machine learning model, and the cleaning efficiency is relatively high.

Description

technical field [0001] The invention relates to the technical field of automatic program design, in particular to a data cleaning method and device. Background technique [0002] Data cleaning is an indispensable link in the data analysis process, and its results directly affect the subsequent calculation results and calculation conclusions. At present, data cleaning is generally carried out through the following steps: First, analyze the key elements in the massive data to be cleaned, formulate fixed cleaning rules according to the key elements, and then clean the samples to be cleaned one by one according to the established cleaning rules, and identify those that meet the rules as normal Samples that do not conform to the rules are identified as abnormal samples, and finally the abnormal samples are cleaned up and normal samples are retained. [0003] In the process of realizing the present invention, the inventors found at least the following problems in the prior art: e...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/215G06K9/62G06N20/00
CPCG06F16/215G06N20/00G06F18/24323G06F18/214
Inventor 林楚荣朱祖恩陈旭明
Owner CHINA CONSTRUCTION BANK
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products