Unlock instant, AI-driven research and patent intelligence for your innovation.

Identification and disambiguation method of long-tail entity

A recognition method and entity technology, applied in the field of disambiguation, can solve problems such as poor disambiguation effect and non-universal application

Pending Publication Date: 2020-11-27
GUANGDONG UNIV OF TECH
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] There are few identification and disambiguation methods for long-tail entities in the existing technology. For some existing ones, such as identifying long-tail entities in a specific field through semi-supervised methods, such as scientific publications, it is necessary to first find a corpus of a specific field and Set relevant seeds, and then continuously increase the corpus and seed quality through the expansion and filtering mechanism, so as to identify long-tail entities in this field, which does not have universal application, and the disambiguation effect is not very good

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Identification and disambiguation method of long-tail entity
  • Identification and disambiguation method of long-tail entity
  • Identification and disambiguation method of long-tail entity

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0119] Long-tail entity disambiguation for a text passage in an industrial process text, including:

[0120] Long tail entity recognition:

[0121] Firstly, entity recognition is performed on the text through the Stanford Nlp tool, and all entities in the text are identified.

[0122] All entities are linked through Tagme, and entities that do not appear in the link, cannot be found in Wikipedia, and appear no more than 10 times in the text, are identified as long-tail entities.

[0123] Through the above process, some long-tail entities can be obtained as follows:

[0124] (1) "CAD of Management Software for Mold".

[0125] (2) "Programmable Logic Controller for Elevator".

[0126] (3) "PLC for Elevator".

[0127] (4) "Subsidiary Company of General Electric".

[0128] Perform dependency syntax analysis and part-of-speech analysis on the above long-tail entities. Taking the first long-tail entity as an example, its dependency syntax analysis is as follows: image 3 As sh...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an identification and disambiguation method of a long-tail entity. The disambiguation method of the long-tail entity comprises a process of carrying out candidate entity replacement on the identified long-tail entity. According to the method, the long-tail entity can be accurately and efficiently disambiguated, the understanding of different references in the text can be remarkably improved, and information tracking and information acquisition can be better carried out.

Description

technical field [0001] The present invention relates to the technical field of disambiguation methods. Background technique [0002] In natural language, word meanings, sentence meanings, and chapter meaning levels all have different semantics according to the context. Disambiguation is the process of determining the semantics of objects according to the context, which is one of the core issues in natural language understanding. [0003] Long-tail entities refer to entities with relatively few mentions in large text sets. They are often characterized as having no or limited general knowledge base summaries, or only scarce resources outside of the knowledge base. [0004] There are few identification and disambiguation methods for long-tail entities in the existing technology. For some existing ones, such as identifying long-tail entities in a specific field through semi-supervised methods, such as scientific publications, it is necessary to first find a corpus of a specific...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/295G06N20/00G06F40/211G06F40/194G06F40/216
CPCG06F40/295G06N20/00G06F40/194G06F40/211G06F40/216
Inventor 程良伦张鸿彬王德培张伟文
Owner GUANGDONG UNIV OF TECH