Chinese question generation method based on Unim optimization language model

A language model and Chinese technology, applied in the field of Chinese question generation based on the Unilm optimized language model, can solve the problem that the pointer network does not play well, and achieve good results

Pending Publication Date: 2022-04-12
XIAN UNIV OF TECH
View PDF2 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

By copying the input, part of the output comes from the input, which improves the correlation between the model output and the input. However, due to some inherent defects of the cyclic neural network, the pointer network does not perform well.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese question generation method based on Unim optimization language model
  • Chinese question generation method based on Unim optimization language model
  • Chinese question generation method based on Unim optimization language model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] Below in conjunction with accompanying drawing and specific embodiment, a kind of Chinese problem generation method based on Unilm optimization language model is described in further detail.

[0036] Step 1: Use the corpus of the target domain crawled from the web to pre-train the bert model, and transfer the parameters to the seq2seq cover matrix Unilm language model

[0037] Specific steps 1.1: Domain pre-training data acquisition;

[0038] In the domain pre-training, the initial parameters of the Transformer block of the model are taken from the basic bert of Wikipedia corpus training, and then the domain information text crawled from the Internet is segmented into upper and lower sentences, and then the pre-training corpus is sent to the model for pre-training. The pre-training uses bert's two-way cover pre-training mechanism and the second half sentence prediction mechanism. By using these two mechanisms, we optimize the preprocessing of the model.

[0039] Speci...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Chinese question generation method based on a Unim optimization language model, and the method comprises the steps: setting a relative position covering matrix when the relative position information of each single character and field vocabulary is added into a Unim model; by integrating the matrix into the model, not only can more position relations be learned, but also a better effect can be achieved in allusion to a target domain input generation problem. And meanwhile, a copy mechanism is used, so that the output can be copied from the original sentence to a certain extent, and the correlation between the output sentence and the original sentence is improved. In order to improve a training data sample, a strategy of combining back translation and entity word replacement is used for realizing data enhancement. Domain pre-training is also applied to the model for enhancing the inference ability of the model in a particular domain. Based on the same question and answer data set of the three strategies, the model provided by the invention has a better effect.

Description

technical field [0001] The invention belongs to the technical field of problem generation in Chinese natural language processing, and provides a Chinese problem generation method based on a Unilm optimized language model. Background technique [0002] In recent years, the explosive development of information technology and Internet technology has produced a large amount of information, and at the same time, the development of artificial intelligence has been spawned in the computer and a large amount of data. Among them, natural language processing is one of the more rapid developments, and it is also the most widely used one. The most common ones are intelligent dialogue systems, machine translation, combating spam, information extraction, text sentiment analysis, personalized recommendations, etc. [0003] In the field of natural language processing, intelligent question answering system QG (Question Generation) is one of the hot spots. The intelligent question answering...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/332G06F40/126G06F40/211G06F40/247G06F40/295G06F40/58G06N3/04G06N3/08
CPCY02D10/00
Inventor 朱磊皎玖圆张亚玲姬文江晁冰苗文青
Owner XIAN UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products