Entity link system for language lacking resources

A resource-oriented and entity-oriented technology, applied in the field of resource-oriented entity linking system lacking language, can solve the problem of no entity linking system, and achieve the effect of ensuring correct selection.

Active Publication Date: 2015-09-23
XINJIANG TECHN INST OF PHYSICS & CHEM CHINESE ACAD OF SCI
View PDF2 Cites 25 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] 2) At present, there is no physical link system av

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Entity link system for language lacking resources
  • Entity link system for language lacking resources
  • Entity link system for language lacking resources

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0059] a. Entity reference item acquisition module: identify the entity reference item in the text to be linked to the entity database. The entity reference item is a text description to be linked to the entity database, such as the Uighur word "kechiche" (Chinese translation: all night, In the second half of the patent content, Uyghur will be written in Latin Uyghur);

[0060] b. Uyghur language preprocessing module: use a combination of rule-based and statistical methods to perform stem extraction and part-of-speech tagging on Uyghur vocabulary. Stem extraction is to divide vocabulary into stems and affixes, and keep the stem part. The part-of-speech tagging indicates the part of speech of the vocabulary, such as nouns, verbs, and adjectives. Among them, Uyghur vocabulary is segmented into stems and affixes, and the stem part is reserved. For example, the Uighur word "kechiche", the result of extracting the stem is "kech", and the Chinese translation is "evening";

[0061] ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an entity link system for a language lacking resources. The system adopts a method combining rules with statistics for carrying out word stem segmentation and part-of-speech tagging on Uygur language vocabularies; expanding entity referring items according to contexts of the entity referring items; aligning Chinese vocabularies with the Uygur language vocabularies by means of the bilingual alignment technology of machine translation, so as to expand the Uygur language via the abundant semantics of the Chinese and obtain candidate entities; sequencing the candidate entities by fusing entity context features, textual theme features and concept map features in a knowledge base, and linking the entity referring items to the sequenced target entities. The system provided by the invention is used for solving the entity link problem of the language lacking resources and creating a practical Uygur language entity link system. The system can achieve an entity link function of the Uygur language lacking language resources so as to satisfy the demand of intelligent information processing.

Description

technical field [0001] The invention relates to the fields of information extraction and knowledge discovery in the field of information technology, in particular to an entity linking system oriented to a language lacking in resources. Background technique [0002] Entity Linking, as a branch of natural language processing technology, refers to the process of linking a given entity referent to the entity concept in the knowledge base. Focusing on the diversity and ambiguity of natural language, by linking the text in natural language with the entries in the knowledge base, it can achieve reading enhancement, entity-centric accurate information aggregation, knowledge base expansion, etc. [0003] In terms of candidate entity discovery, there are mainly two methods. One is based on Wikipedia, which uses the hyperlink relationship of anchor text in Wikipedia, disambiguation pages and redirection pages to obtain candidate entities. The other is a method based on topic models. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/28
Inventor 蒋同海李晓马博王磊周喜赵凡杨雅婷
Owner XINJIANG TECHN INST OF PHYSICS & CHEM CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products