Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Crowdsourcing text integration method based on multi-stage transfer learning strategy integration

A technology of transfer learning and integration method, applied in the field of natural language processing, can solve the problems of improving crowdsourced text integration, lack of true value of integrated text, depending on the amount of true value data, etc., to improve generalization performance, improve effect, and reduce waste. Effect

Active Publication Date: 2022-06-24
NANJING UNIV OF INFORMATION SCI & TECH
View PDF5 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For example, existing methods based on deep learning almost all use supervised training models, which require a large amount of labeled text for model training; In addition, related methods do not make full use of the information that existing data in related fields can provide to improve the effect of crowdsourced text integration

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Crowdsourcing text integration method based on multi-stage transfer learning strategy integration
  • Crowdsourcing text integration method based on multi-stage transfer learning strategy integration
  • Crowdsourcing text integration method based on multi-stage transfer learning strategy integration

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] The accompanying drawings constituting a part of the present invention are used to provide further understanding of the present invention, and the exemplary embodiments of the present invention and their descriptions are used to explain the present invention and do not constitute an improper limitation of the present invention.

[0048] This implementation case is based on the improved Transformer text generation model, and uses a variety of strategies in transfer learning to comprehensively build an integration framework. It does not rely on the ground-truth data in the target domain to train the model to obtain integrated text, thereby improving the accuracy of crowdsourcing text integration. like figure 1 As shown, the method includes the following steps:

[0049] Step 10, based on the Transformer model to improve, build a customized migration generative crowdsourced text integration model TTGCIF, its structure is as follows: figure 2 shown. The model TTGCIF is im...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a crowdsourcing text integration method based on multi-stage transfer learning strategy synthesis. The crowdsourcing text integration method specifically comprises the following steps: 1, constructing a transfer generation type crowdsourcing text integration model TTGCIF; 2, obtaining semantic prototypes of the source domain text data set and the target domain text data set; 3, performing word embedding processing on the semantic prototype; 4, performing data distribution alignment according to the maximum mean value difference; 5, performing semantic prototype transduction model training on the TTGCIF; 6, processing the source domain text data set into a training task set; 7, inputting the training task set into the TTGCIF to carry out field fast adaptation model training; and 8, inputting a part of the target domain text data set into the TTGCIF to carry out model fine tuning training. Through the process, text integration is realized. According to the method, the requirement for data labels in a traditional method can be abandoned, waste of manpower and material resources is reduced, and crowdsourcing text integration in a data scarcity scene is greatly promoted.

Description

technical field [0001] The invention belongs to the technical field of natural language processing. Background technique [0002] A large number of supervised trained models in the field of natural language processing require text with labeled values ​​for training. However, there are few sources of labeled text in related fields. Except for some standard datasets, if new datasets are needed for training, we can only rely on experts to manually generate text label values, which requires a lot of manpower and material resources. In the crowdsourcing environment, a large amount of idle manpower is used to artificially generate labeled values ​​for unlabeled text, and a large amount of text with labeled values ​​can be obtained for training at a very small cost. Relying on the crowdsourcing model to obtain labeled text data has become the main way to obtain training data and labeled values ​​in the field of machine learning. [0003] The crowdsourcing model is a model in whic...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/04G06N3/08G06K9/62G06F40/284G06F40/30G06F40/216
CPCG06N3/08G06F40/284G06F40/30G06F40/216G06N3/045G06N3/048G06N3/044G06F18/214Y02D10/00
Inventor 荣欢于信马廷淮
Owner NANJING UNIV OF INFORMATION SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products