Unlock instant, AI-driven research and patent intelligence for your innovation.

Data cleaning method and device based on deep reinforcement learning model

A technology of reinforcement learning and data cleaning, applied in the field of data cleaning based on deep reinforcement learning model, can solve problems such as training data noise

Pending Publication Date: 2021-08-31
INST OF ACOUSTICS CHINESE ACAD OF SCI +1
View PDF2 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The purpose of the present invention is to solve the above-mentioned defects in the existing data cleaning methods. The present invention proposes a data cleaning method based on a deep reinforcement learning model, which overcomes the problem of noise in the training data of text classification, and adopts the method of reinforcement learning. , discard the abnormal label data, keep the correct label data, so as to achieve the purpose of data cleaning

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data cleaning method and device based on deep reinforcement learning model
  • Data cleaning method and device based on deep reinforcement learning model
  • Data cleaning method and device based on deep reinforcement learning model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0094] The present invention will be further described now in conjunction with accompanying drawing.

[0095] Such as figure 1 As shown, the present invention provides a data cleaning method based on a deep reinforcement learning model, which introduces a deep reinforcement learning model for a data set used for text classification, and according to the different categories output by the classification network in the depth reinforcement learning model According to the action and action set of the sample data, the data is cleaned, the biased data is removed, the valid data is retained, and the performance of the classification is improved; the method includes:

[0096] Define the label set for text classification, and obtain the labeled data set to be cleaned;

[0097]Use the pre-screening algorithm to delete the content-free data in the labeled data set to be cleaned, the labeled data not in the label set and the data with contradictory labels, and obtain the data set to be c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the technical field of data communication and data processing, and particularly relates to a data cleaning method based on a deep reinforcement learning model. The method comprises the steps of obtaining a to-be-cleaned data set with a label; deleting the content-free data, the label data not in a label set and the label contradictory data in the to-be-cleaned labeled data set by adopting a pre-screening algorithm to obtain a to-be-classified data set; inputting the to-be-classified data set into a pre-trained deep reinforcement learning model to obtain different types of delay rewards; and discarding the biased data, retaining the effective data and updating a state list S according to the obtained different types of delay rewards and an action set in the pre-trained deep reinforcement learning model, and maximizing a delay reward value of each type; and taking the training data set with the label corresponding to the maximum delay reward value of each type as a cleaned training data set with the label, thereby completing data cleaning.

Description

technical field [0001] The invention belongs to the technical field of data communication and data processing, and in particular relates to a data cleaning method and device based on a deep reinforcement learning model. Background technique [0002] With the rapid development of computer technology and communication technology, people can obtain more and more digital information, but at the same time, they need to invest more time in organizing and sorting out the information. In order to alleviate this burden, people began to study the use of computers to automatically classify data. [0003] In the research of text classification technology, the usual data cleaning method is to use labeled text data to train a deep neural network classifier to achieve the purpose of identifying text categories. In this process, the credibility and validity of the data directly affect the performance of the system. Therefore, it is necessary to clean the data to eliminate abnormal data. A...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/205G06F40/211G06N20/00G06K9/62
CPCG06N20/00G06F18/2415G06F18/241Y02D10/00
Inventor 张学君林格平万辛沈亮宁珊颜永红
Owner INST OF ACOUSTICS CHINESE ACAD OF SCI
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More