Method for constructing co-exponential digestion model, co-exponential digestion method and medium

A coreference resolution and model technology, applied in the fields of instruments, electrical digital data processing, computing, etc., can solve the problems of judgment, information loss, and inability to carry out long-distance coreference relationship, and achieve the effect of improving judgment performance.

Active Publication Date: 2020-11-20
INST OF COMPUTING TECH CHINESE ACAD OF SCI
View PDF5 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

There are two commonly used word vector generation methods. One word vector generation method is to query the dictionary to obtain ordinary word vectors, for example, to query Google News-vectors-negative300, a dictionary of 300-dimensional news corpus pre-trained with word2vec by Google. The word vector obtained by .bin has the problem of insufficient extraction of long-distance context information. When referring to word vectors based on the word vectors obtained by this word vector generation method, there is a loss of information on which long-distance coreference judgments depend on, so that it cannot The problem of long-distance coreference judgment; another word vector generation method is to input the word into the pre-training model to generate the word vector of the word
The pre-training model used in the field of coreference resolution is usually the BERT model. The maximum token (Token) length that the BERT model can support is 512 (including words and punctuation), and the excess part will be segmented (truncated). In this case, the long-term dependence of the information in the previous segment is lost, and there is also the problem of insufficient extraction of long-distance context information, which cannot satisfy the coreference relationship judgment of the coreference relationship exceeding the segment segmentation length
Although the existing processing mode of coreference resolution has achieved practical results in some application fields, it is unable to judge long-distance coreference relationships, and is limited to coreference resolution in local contexts.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for constructing co-exponential digestion model, co-exponential digestion method and medium
  • Method for constructing co-exponential digestion model, co-exponential digestion method and medium
  • Method for constructing co-exponential digestion model, co-exponential digestion method and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below through specific embodiments in conjunction with the accompanying drawings. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0026] As mentioned in the background technology section, the existing processing mode of coreference resolution cannot perform long-distance coreference relationship judgment, and the word vector extracted by ordinary word vector or BERT model is limited to the coreference resolution of local context. The coreference resolution model constructed by the present invention adopts the pre-trained XLNet model to extract word vectors of words for subsequent coreference judgments, and the dual-stream attention mechanism and segmented loop mechanism of the XLNet model can obtain long-distance ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention provides a method for constructing a co-exponential digestion model, a co-exponential digestion method and a medium. The method comprises the following steps: A1, constructing an initial co-reference digestion model, wherein the initial co-exponential resolution model comprises a pre-training module, a text segment vector module, a reference word judgment module anda co-exponential judgment module, wherein the pre-training module adopts a pre-trained XLNet model, the reference word judgment module comprises a first feedforward network, and the co-exponential judgment module comprises a second feedforward network; a2, obtaining a training data set comprising a plurality of sentences, wherein the training data set has a manually annotated co-exponential relationship; and A3, carrying out multiple rounds of training on an initial co-reference digestion model by using the training data set until convergence to obtain a co-reference digestion model. According to the technical scheme provided by the embodiment of the invention, the judgment performance of the co-exponential digestion process on the long-distance dependent co-exponential relationship can be improved.

Description

technical field [0001] The present invention relates to the field of natural language processing, specifically to the technical field of coreference resolution, and more specifically to a method for building a coreference resolution model, a method and a medium for coreference resolution. Background technique [0002] It is common to have different representations referring to the same entity in natural language. For example, pronouns are often used in place of names, and noun abbreviations are used in place of full nouns. Two language expressions can be called coreference if they have the same referent. In other words, if both refer to the same entity it can be called coreference. Entities refer to the names or symbols of things with specific meanings in the text, such as names of people, places, institutions, dates, proper nouns, etc. [0003] Coreference resolution refers to the processing of text to identify which references in the text refer to the same entity in the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/289G06F40/216
CPCG06F40/289G06F40/216
Inventor 郭嘉丰范意兴吴志达张儒清程学旗
Owner INST OF COMPUTING TECH CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products