Unsupervised machine reading understanding method based on large-scale problem self-learning

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A reading comprehension and self-learning technology, applied in the field of unsupervised machine reading comprehension, can solve problems such as data difficulties and achieve the effect of improving accuracy

Pending Publication Date: 2021-12-24

宏龙科技(杭州)有限公司 +1

View PDF0 Cites 4 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] In many NLP applications, it is very difficult to obtain large amounts of labeled data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0031] Example: We use a variety of pre-trained language models (such as GPT-2 and T5) to generate a large amount of potential question and answer data from unlabeled passages of in-domain text. This method allows us to achieve cold start in a completely new domain . We then pre-train the model on these generated samples and finally fine-tune it on a specific labeled dataset.

[0032] Although a domain-specific trained model on the SQuAD1.1 training dataset achieves state-of-the-art performance (EM score of 85%) on the SQuAD1.1 Dev dataset, it is completely unable to perform the same level of inference on a completely new domain , namely NewQA (EM score of 32%). We have found that preventing overfitting on synthetic datasets is critical when pretraining models with synthetic datasets, as it often contains many noisy samples. However, these synthetic datasets are very useful when there is little or no in-domain training data in the early stage, because we can use this method ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses an unsupervised machine reading understanding method based on large-scale problem self-learning. The method comprises the following steps: firstly, dividing data into four types; and then, carrying out the following steps: S1, training unlabeled general data by using a standard pre-training model to obtain a pre-training language model; s2, training the marked general data by using a pre-training language model to obtain a question generator, and generating a specific task general domain model; s3, generating synthesized intra-domain data from the unlabeled intra-domain data by using a problem generator, filtering by using a specific task general domain model, and training a high-quality synthesized intra-domain data set obtained by filtering to obtain a new pre-training model; s4, mixing the marked intra-domain data through a low-quality synthetic data set obtained by filtering, marking answers, and then training by using a new pre-training model to obtain a final model; and based on the final model, inputting data to obtain a machine reading understanding result.

Description

technical field [0001] The invention relates to the field of machine reading comprehension, in particular to an unsupervised machine reading comprehension method based on large-scale problem self-learning. Background technique [0002] Many state-of-the-art algorithms for natural language processing (NLP) tasks require human-annotated data. In the early days we usually did not have any domain-specific labeled datasets, and annotating a sufficient amount of such data was usually expensive and laborious. Thus, for many NLP applications, even resource-rich languages such as English have data labeled in only a few domains. [0003] In many NLP applications, obtaining large amounts of labeled data is difficult. Therefore, in many cases, we train a model from a small amount of data. However, the trained model is often overfit and needs to be generalized to unseen data. Therefore, researchers take advantage of large unlabeled datasets by pre-training language models, which of...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F40/211G06F40/253G06F40/295G06F40/58G06N3/04G06N3/08

CPCG06F40/211G06F40/253G06F40/295G06F40/58G06N3/088G06N3/045

Inventor 赵天成

Owner 宏龙科技(杭州)有限公司

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Unsupervised machine reading understanding method based on large-scale problem self-learning

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology