A method and system for entity disambiguation

An entity and disambiguation technology, applied in the field of deep learning and natural language processing, can solve the problem of low accuracy of disambiguation

Active Publication Date: 2020-12-29
AEROSPACE INFORMATION RES INST CAS
View PDF10 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In order to solve the problem of low disambiguation accuracy in existing entity disambiguation models, the present invention provides an entity disambiguation method, including:

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and system for entity disambiguation
  • A method and system for entity disambiguation
  • A method and system for entity disambiguation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0081]The present invention provides an entity disambiguation method, such asfigure 1 As shown, it mainly includes:

[0082]S1 determines multiple independent candidate entities to form a candidate entity set based on the reference to be disambiguated;

[0083]S2 is based on the hyperlink-anchor text in the web encyclopedia corpus, and obtains reference-candidate entity pair information corresponding to each candidate entity from the candidate entity set as training data; the reference-candidate entity pair information includes: anchor text , The reference and reference context corresponding to the anchor text, the entity page corresponding to the anchor text, and the entity description text;

[0084]S3 uses a two-way long and short-term memory network to semantically encode the referential context and entity description text in the training data, and uses a multi-angle attention mechanism to extract and process key semantic information in the referential context and entity description text....

Embodiment 2

[0134]In order to realize the above method, the present invention also provides an entity disambiguation system, such asimage 3 Shown, including: selection of candidate entity module, training data construction module and disambiguation result determination module; its realized functions, such asFigure 4 Shown:

[0135]The candidate entity selection module is used to determine multiple independent candidate entities to form a set of candidate entities based on the reference to be disambiguated; specifically, it is used to obtain the entity that has a referential relationship with the reference to be disambiguated from the web encyclopedia corpus as the first Candidate entity; with the help of a web search engine or to obtain a part of the entity that has a referential relationship with the reference to be disambiguated and does not belong to the first candidate entity as the second candidate entity; merge the first candidate entity and the second candidate entity to form a candidate En...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The entity disambiguation method and system disclosed in the present invention include: determining a plurality of mutually independent candidate entities based on the reference to be disambiguated to form a candidate entity set; The referent-candidate entity pair information is used as training data; the referential context and entity description text are semantically coded by using a two-way long-short-term memory network, and the key semantic information in the referent context and entity description text is encoded through a multi-angle attention mechanism. Extract and process, and then determine the disambiguation result from the candidate entities; extract the key semantic information of the text from different angles, you can find more disambiguation criteria from the text, and improve the disambiguation accuracy; by extracting and emphasizing the text For information with high cross-correlation, the cross-attention layer can enrich the semantic features of the representation vector from different perspectives, and further improve the accuracy of similarity calculation between references and candidate entities and the performance of disambiguation.

Description

Technical field[0001]The invention belongs to the technical field of deep learning and natural language processing, and specifically relates to an entity disambiguation method.Background technique[0002]With the continuous development of computer science and Internet technology, the amount of information in human society, especially the amount of information on the Internet, has exploded. A large amount of data is stored in network text and electronic documents in the form of natural language. Due to the ambiguity and ambiguity of natural language, how to accurately extract target information from massive text data and understand and process the text from the semantic level is a major challenge in the field of natural language processing.[0003]Given a piece of text and its referents to be disambiguated, the task of entity disambiguation is to link each referent to the correct entity in the knowledge base to eliminate its ambiguity. Entity disambiguation is an important basic link of ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/295G06F40/30G06N3/04
CPCG06F40/295G06F40/30G06N3/044G06N3/045
Inventor 付琨于泓峰张文凯苏武运姚康泽王承之姚方龙李沛光田雨
Owner AEROSPACE INFORMATION RES INST CAS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products