A Judgment Method of Semantic Relevance of Words Based on Wikipedia Bidirectional Links

A semantic correlation and Wikipedia technology, applied in the field of word semantic correlation judgment based on Wikipedia bidirectional links, can solve the problems of limiting the performance of the WOLM model and not optimizing the disambiguation page, achieving improved accuracy and improved accuracy , the effect of simplifying the calculation method

Active Publication Date: 2021-06-15
芽米科技(广州)有限公司
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, there are still some problems with this model, which limit the performance of the WOLM model
For example, the WOLM model only uses one-way outlinks when calculating the vector weights of concept pages, and does not use the in-links in Wikipedia; it does not optimize the disambiguation pages, but simply uses the in-links in the disambiguation pages. all external links

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Judgment Method of Semantic Relevance of Words Based on Wikipedia Bidirectional Links
  • A Judgment Method of Semantic Relevance of Words Based on Wikipedia Bidirectional Links
  • A Judgment Method of Semantic Relevance of Words Based on Wikipedia Bidirectional Links

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0048] The present invention will be further described below in conjunction with specific examples, but the protection scope of the present invention is not limited to the following examples.

[0049] combine figure 1 Shown, the method of this embodiment is:

[0050] For the semantic relevance of any two words word1 and word2, the following steps are used to judge the semantic relevance of words based on Wikipedia bidirectional links:

[0051] S1. Obtain the positioning pages of two words respectively in the Wikipedia data repository;

[0052] S2. If the positioning page obtained in step S1 belongs to the content page, then the positioning page is the meaning page. At this time, the number of meaning page sequences of the word is 1, and then go to step S3; if the positioning page belongs to the disambiguation page, first use disambiguation The algorithm performs disambiguation processing to obtain multiple semantic item pages, and then go to step S3;

[0053] S3. Calculate ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for judging the semantic relevancy of words based on Wikipedia two-way links. For any two words: S1. Obtain the positioning pages of the two words in the Wikipedia data resource database; S2. If the positioning page is The content page, that is, the meaning page, goes to S3; if the positioning page is a disambiguation page, performs disambiguation processing, and goes to S3; S3. Calculate the conceptual semantic interpretation of each meaning page of the two words, and the conceptual semantic interpretation is a two-way link Vector; S4. Calculate the cosine of the two-way link vector between the semantic item pages of two words, obtain the semantic correlation between each pair of semantic item concepts, and use the maximum value as the semantic correlation of the two words. The invention uses the in-links and out-links of pages in Wikipedia as page features, constructs a feature vector model describing concept semantics, and combines a disambiguation strategy based on social awareness to improve the semantic correlation of words based on Wikipedia links Accuracy of degree calculation.

Description

technical field [0001] The present invention relates to the technical field of natural language processing in the field of artificial intelligence, and more specifically to a method for judging semantic relevancy of words based on Wikipedia bidirectional links. Background technique [0002] Wikipedia is a user-editable open text corpus with a co-edited taxonomy structure, edited and maintained by volunteers from all over the world. On the one hand, it has a classification structure similar to WordNet. In Wikipedia, on the other hand, each concept or entry has a corresponding web document that defines and details it. Due to its remarkable coverage and rapid updates, more and more researchers use Wikipedia to calculate lexical relevance in recent years. As an encyclopedia, Wikipedia contains a variety of data, including categories, hierarchies, articles, and links between pages. It has grown rapidly since its launch in 2001. As of October 2, 2017, it covers a total of 299 l...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/30G06F40/216G06F16/30
CPCG06F40/216G06F40/30
Inventor 朱新华郭青松张兰芳陈宏朝
Owner 芽米科技(广州)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products