Chinese zero anaphora resolution method based on LSTM

A technology of referring to resolution and Chinese, applied in semantic analysis, natural language data processing, special data processing applications, etc., can solve the problems of low accuracy of Chinese zero-reference resolution tasks and low accuracy of semantic information understanding

Inactive Publication Date: 2017-01-04
HARBIN INST OF TECH
View PDF1 Cites 44 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0010] The purpose of the present invention is to solve the shortcomings of the low accuracy of the existing Chinese zero-reference resolu

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese zero anaphora resolution method based on LSTM
  • Chinese zero anaphora resolution method based on LSTM
  • Chinese zero anaphora resolution method based on LSTM

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0028] Specific implementation mode one: combine figure 1To illustrate this embodiment, a Chinese zero-reference resolution method based on word vectors and bidirectional LSTM in this embodiment is specifically prepared according to the following steps:

[0029] Step 1. Simply process each word in the existing text data, and use the word2vec tool to train each word in the processed text data (word2vec is an open source software, which is specially used to convert the text of good words Through the internal model, the words are converted into corresponding vectors), and a word vector dictionary is obtained, in which each word corresponds to a word vector;

[0030] Step 2. Use the Chinese part of the data in the OntoNotes5.0 corpus. The zero reference and its antecedent of the sentence in the Chinese part of the data are clearly marked; for the sentence text that has marked the zero reference position, first use the syntax analysis tool (A tool that converts sentences into a tr...

specific Embodiment approach 2

[0036] Specific embodiment 2: the difference between this embodiment and specific embodiment 1 is: the process of simple processing of existing text data in the step 1 is: use word segmentation program to carry out word segmentation to sentences in existing text data, Remove special characters, and only keep Chinese characters, English and punctuation (special characters such as Greek letters, Russian letters, phonetic symbols, special symbols, etc.).

specific Embodiment approach 3

[0037] Specific embodiment three: the difference between this embodiment and specific embodiment one or two is that: the processing method of the antecedent candidate set in the step two is:

[0038] Set the maximum number of words in the antecedent candidate set to n, 1≤n≤maxW, where maxW represents the maximum number of words in a sentence;

[0039] If the number of words in the antecedent candidate set is less than n, fill it with the symbol * until the number of words is equal to n;

[0040] If the number of words in the antecedent candidate set is greater than n, only the last n words are kept;

[0041] In the stage of word mapping into word vectors, * is mapped into zero vectors.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a Chinese zero anaphora resolution method based on LSTM and aims at solving the problem that according to an existing method, a Chinese zero anaphora resolution task is low in accuracy and the accuracy of understanding semantic information is low. The method comprises the steps of 1, processing each word in existing text data, and training each word in the processed text data by employing a word2vec tool, thereby obtaining a word vector dictionary; 2, selecting an antecedent candidate set of zero anaphora; 3, if candidate phrases in the current antecedent candidate set of the zero anaphora is true antecedents of the zero anaphora, determining that the training samples are positive example samples, otherwise determining that the training samples are negative example samples; and 4, connecting a Dropout layer with a logistic regression layer, representing probability value that model input samples are judged as the positive example samples, and outputting the value as a model. The method is applicable to the field of natural language processing.

Description

technical field [0001] The invention relates to a Chinese zero-reference resolution method based on LSTM. Background technique [0002] Reference refers to the use of a reference pronoun in a text to refer back to a previously spoken language unit. In linguistics, the referring pronoun is called anaphora, and the object or content referred to is called antecedent. Anaphoria is a rhetorical term that refers to the phenomenon of referring to the same word, person, or thing over and over again in a passage or discourse. Anaphora resolution is the process of determining the relationship between anaphors and antecedents, and it is one of the key issues in natural language processing. In natural language, the part that the reader can infer based on the contextual relationship is often omitted, and the omitted part bears the syntactic component of the sentence in the sentence and refers back to the language unit mentioned above. This phenomenon is called zero fingering. generati...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27
CPCG06F40/211G06F40/279G06F40/30
Inventor 赵铁军
Owner HARBIN INST OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products