Bayesian word sense disambiguation method based on synonym expansion

A technology of word sense disambiguation and synonym word forest, applied in the field of natural language processing, can solve the problems of poor disambiguation effect, time-consuming and laborious disambiguation knowledge, etc., to achieve the effect of improving accuracy, alleviating the problem of data sparseness, and broad development prospects.

Inactive Publication Date: 2017-04-26
SHANXI UNIV
View PDF1 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention mainly aims at problems such as poor disambiguation effect and time-consuming and laborious acquisition of disambiguation knowledge in current word sense disambiguation methods, and provides a Bayesian word sense disambiguation method based on synonym expansion

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Bayesian word sense disambiguation method based on synonym expansion
  • Bayesian word sense disambiguation method based on synonym expansion
  • Bayesian word sense disambiguation method based on synonym expansion

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0040] figure 1 It is a schematic diagram of the entire method of the present invention, and a specific implementation scheme will be given below in conjunction with examples. "all unit The work of the team members has been completed" for the training corpus, the sentence" unit Personnel did not issue a distress signal" is the test corpus, and the ambiguity word "crew" in the test corpus is disambiguated.

[0041] The whole implementation process is as follows:

[0042] (1) Expand the context of the training corpus by using the synonym Cilin to generate a pseudo-training corpus.

[0043] The ambiguous sentence is expanded by using the synonym Cilin to obtain the set of context synonyms. For example, "all, all, everything, the whole", "personnel, team members, team members, party members", "tasks, responsibilities, full-time, vocation" and so on. The above set of synonyms, the ambiguous word "crew" and the meaning of the ambiguous word "personnel" together constitute the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of natural language processing methods, and in particular relates to a Bayesian word sense disambiguation method based on synonym expansion. The Bayesian word sense disambiguation method disclosed by the invention is used for mainly solving the problem that the current word sense disambiguation method has the problems of poor disambiguation effect, wasting time and energy to obtain disambiguation knowledge and the like. The Bayesian word sense disambiguation method based on synonym expansion disclosed by the invention comprises the following steps of: (1), expanding the context of a training corpus by adopting the Chinese thesaurus, and generating a lot of pseudo training corpuses; (2), removing noise in the pseudo training corpuses by utilizing a word collocation corpus, and generating a pseudo training corpus; (3), training a Bayesian disambiguation model by adopting the training corpus and the pseudo training corpus simultaneously; and (4), inputting a test corpus into the Bayesian disambiguation model, and co-determining word senses of ambiguous words by comprehensively utilizing the disambiguation knowledge in the two corpuses.

Description

technical field [0001] The invention belongs to the technical field of natural language processing methods, in particular to a Bayesian word sense disambiguation method based on synonym expansion. [0002] technical background [0003] Word sense disambiguation (Word Sense Disambiguation, WSD) refers to determining the meaning of polysemous words in a specific context of natural language, which is a core issue in the field of natural language processing. In the process of machine understanding of natural language, when an ambiguous word appears in a specific context, word ambiguity will appear, especially in the current Internet age of "information explosion", the problem of lexical ambiguity is even more serious. Whether it is Chinese or Western languages, the phenomenon of polysemy is common. Statistical research shows that in a large-scale corpus, the frequency of ambiguous words in Chinese texts and English texts in the corpus reaches about 40%. The ambiguous words with...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06K9/62
CPCG06F40/247G06F40/30G06F18/24155
Inventor 杨陟卓张虎李茹陈千谭红叶
Owner SHANXI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products