Bayesian word sense disambiguation method based on mass pseudo-data

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A word sense disambiguation and pseudo-data technology, applied in the field of natural language processing, can solve the problems of time-consuming and labor-intensive disambiguation knowledge, poor disambiguation effect, etc., and achieve the effect of alleviating the problem of data sparseness, improving accuracy and broad development prospects.

Inactive Publication Date: 2017-11-17

SHANXI UNIV

View PDF6 Cites 4 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] The present invention mainly aims at the problems of poor disambiguation effect and time-consuming and laborious acquisition of disambiguation knowledge in current word sense disambiguation methods, and provides a Bayesian word sense disambiguation method based on a large amount of dummy data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0027] The specific implementation scheme of the present invention will be given below in combination with examples. "Project One unit One-time ignition success" is the training corpus, and the sentence "wind power unit System Analysis Key Technology Research" is the test corpus, and the ambiguity word "unit" in the test corpus is disambiguated. The meaning of the unit is "machine" and "personnel".

[0028] A kind of Bayesian word sense disambiguation method based on a large amount of dummy data of the present invention, comprises the following steps:

[0029] Step 1. Use a dependency parser to analyze the training examples, and collect tuples that have a dependency relationship with the target ambiguous word. The specific operations are as follows:

[0030] Syntactically analyze the instance, such as figure 2 shown. Get the dependency tuples (number, unit) and (unit, ignition). Take the second tuple (unit, ignition) as an example to illustrate the working principle of...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention particularly relates to a new bayesian word sense disambiguation method based on mass pseudo-data. The problems that a current word sense disambiguation method is poor in disambiguation effect and capable of wasting time and labor when disambiguation knowledge is obtained are solved. The new bayesian word sense disambiguation method includes the steps that through a dependency grammar analyzer, training examples containing ambiguous words in a training corpus base are subjected to syntactic analysis, and tuples with the dependence relationship with the ambiguous words are collected; then through a machine translation system, example sentences containing the tuples in a machine translation corpus base are searched. The steps are repeatedly carried out in a mode, the searched example sentences are added into a pseudo-training corpus base, and then through the training corpus base and the pseudo-training corpus base, a bayesian disambiguation model is trained; word meanings of the ambiguous words are decided through the disambiguation model, and on the basis of a small amount of manually-annotated corpuses, the data sparsity problem of word sense disambiguation can be effectively solved, the accuracy of word sense disambiguation is increased, and the new bayesian word sense disambiguation method has broad development prospects.

Description

technical field [0001] The invention belongs to the technical field of natural language processing, and in particular relates to a new Bayesian word sense disambiguation method based on a large amount of dummy data. technical background [0002] Word sense disambiguation (Word Sense Disambiguation, WSD) refers to determining the meaning of polysemous words in a specific context of natural language, which is a core issue in the field of natural language processing. In the process of machine understanding of natural language, when an ambiguous word appears in a specific context, word ambiguity will appear, especially in the current Internet age of "information explosion", the problem of lexical ambiguity is even more serious. Whether it is Chinese or Western languages, the phenomenon of polysemy is common. [0003] Currently, corpus-based word sense disambiguation methods can be divided into supervised and unsupervised methods. Unsupervised methods do not require training co...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/27

CPCG06F40/211G06F40/216G06F40/247G06F40/284

Inventor杨陟卓张虎李茹谭红叶陈千

OwnerSHANXI UNIV

Bayesian word sense disambiguation method based on mass pseudo-data

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology