Noise reduction device and method for deducing classification data set based on natural language

A technology of natural language and classified data, applied in the field of data processing, can solve problems such as high computing overhead and performance needs to be improved, and achieve good results and high computing efficiency

Pending Publication Date: 2021-12-21
DONGHUA UNIV
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, they are computationally expensive, and the performance of these methods needs to be improved

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Noise reduction device and method for deducing classification data set based on natural language
  • Noise reduction device and method for deducing classification data set based on natural language
  • Noise reduction device and method for deducing classification data set based on natural language

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021] The technical solutions of the present invention will be described in detail below in conjunction with the accompanying drawings and examples.

[0022] A denoising device for remote supervised relational classification data sets based on natural language inference, which includes the following modules:

[0023] (1) Data format conversion module: Construct corresponding templates according to the semantics of various relationships in the relationship classification dataset, convert the triples in the relationship classification into hypotheses in natural language inference, and use the text in the relationship classification dataset as natural The premise in language inference realizes the construction of natural language inference training set.

[0024] (2) Natural language inference model training module: When the original data can provide a large amount of supervised data that is correctly labeled and whose distribution is consistent with the full distance supervised ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a noise reduction device and method for deducing a classification data set based on a natural language. A data format conversion module constructs templates for various features in a relation classification data set, converts triples in the relation classification data set into assumptions in natural language inference, and converts corresponding text corpora into prerequisites in the natural language inference; if high-quality annotation data can be divided in the original data set, the data set is directly used as a training set, and supervised learning is used for training a natural language inference model; the noise reduction effect of the current model on the verification set serves as feedback, and parameters of a natural language inference model are trained through a reinforcement learning method; the data set noise reduction template evaluates a relation classification data set obtained by remote supervision through a trained natural language inference model, and selects a data set with high confidence as a denoised data set according to scores.

Description

technical field [0001] The invention belongs to the technical field of data processing methods, and in particular relates to a noise reduction device and method for a remote supervision relationship classification data set based on natural language inference. Background technique [0002] The task of relation classification is to predict the semantic relationship between two entities from a given text. Understanding the relationship of entities is essential for many downstream applications, such as tasks such as knowledge graph completion and question answering. Relation classification tasks usually rely on large-scale human-annotated data, which is expensive and time-consuming. In order to solve this problem, remote supervision is often used to automatically label a large amount of corpus. Distant supervision is based on the assumption that if a sentence contains an entity pair in the knowledge base, then it can be considered that the entity pair in the sentence also has ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06N5/04G06F40/295G06F40/30
CPCG06N5/041G06F40/295G06F40/30G06F18/214G06F18/24
Inventor 徐波赵象三宋晖
Owner DONGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products