Entity disambiguation method and system

An entity and disambiguation technology, applied in the field of deep learning and natural language processing, can solve the problem of low accuracy of disambiguation

Active Publication Date: 2020-08-25
AEROSPACE INFORMATION RES INST CAS
View PDF10 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In order to solve the problem of low disambiguation accuracy in existing entity disam

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Entity disambiguation method and system
  • Entity disambiguation method and system
  • Entity disambiguation method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0081] The present invention provides an entity disambiguation method, such as figure 1 As shown, it mainly includes:

[0082] S1 determines multiple independent candidate entities based on the reference to be disambiguated to form a candidate entity set;

[0083] S2 Based on the hyperlink-anchor text in the network encyclopedia class corpus, obtain the reference-candidate entity pair information corresponding to each candidate entity from the candidate entity set as training data; the reference-candidate entity pair information includes: anchor text , the reference and reference context corresponding to the anchor text, the entity page corresponding to the anchor text, and the entity description text;

[0084] S3 uses a two-way long-short-term memory network to carry out semantic coding on the reference context and entity description text in the training data, and extracts and processes the key semantic information in the reference context and entity description text through...

Embodiment 2

[0134] In order to realize the above method, the present invention also provides an entity disambiguation system, such as image 3 As shown, it includes: select candidate entity module, training data construction module and disambiguation result determination module; its functions, such as Figure 4 Shown:

[0135] Selecting the candidate entity module is used to determine a plurality of mutually independent candidate entities to form a candidate entity set based on the reference to be disambiguated; specifically, it is used to obtain the entity that has a referential relationship with the reference to be disambiguated from the encyclopedia corpus as the first Candidate entities; use a web search engine or obtain some entities that have a reference relationship with the reference to be disambiguated and do not belong to the first candidate entity as the second candidate entity; combine the first candidate entity and the second candidate entity to form a candidate entity colle...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an entity disambiguation method and system. The entity disambiguation method comprises the following steps: determining a plurality of mutually independent candidate entities to form a candidate entity set based on a to-be-disambiguated reference; based on a hyperlink-anchor text in the network encyclopedia corpus, obtaining reference-candidate entity pair information corresponding to each candidate entity as training data; conducting semantic coding on a reference context and an entity description text through a bidirectional long-short-term memory network, extractingand processing key semantic information in the reference context and the entity description text through a multi-angle attention mechanism, and then determining a disambiguation result from candidateentities; extracting key semantic information of a text from different angles, more disambiguation criteria can be found from the text, and disambiguation precision is improved. By extracting and emphasizing the information with high cross correlation in the text, the mutual attention layer can enrich the semantic features of the representation vectors from different angles, and the accuracy and disambiguation performance of the reference and candidate entity similarity calculation are further improved.

Description

technical field [0001] The invention belongs to the technical field of deep learning and natural language processing, and in particular relates to an entity disambiguation method. Background technique [0002] With the continuous development of computer science and Internet technology, the amount of information in human society, especially in the Internet, has shown explosive growth, and a large amount of data is stored in network text and electronic documents in the form of natural language. Due to the ambiguity and ambiguity of natural language, how to accurately extract target information from massive text data and understand and process text from the semantic level is a major challenge in the field of natural language processing. [0003] Given a piece of text and reference items to be disambiguated in it, the task of entity disambiguation is to link each reference to the correct entity in the knowledge base to eliminate its ambiguity. Entity disambiguation converts sem...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/295G06F40/30G06N3/04
CPCG06F40/295G06F40/30G06N3/044G06N3/045
Inventor 付琨于泓峰张文凯苏武运姚康泽王承之姚方龙李沛光田雨
Owner AEROSPACE INFORMATION RES INST CAS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products