CN-DBpedia-based entity identification and linking system and method

An entity recognition and entity technology, which is applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problem of short text context information and achieve good results in word segmentation and entity recognition

Active Publication Date: 2018-09-04
FUDAN UNIV
View PDF5 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The present invention can solve the problem of entity linking with less contextual information in short texts

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • CN-DBpedia-based entity identification and linking system and method
  • CN-DBpedia-based entity identification and linking system and method
  • CN-DBpedia-based entity identification and linking system and method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0056] The invention proposes a CN-DBpedia-based short text entity recognition and linking system and method. The framework of the technical solution proposed by the present invention is as follows: figure 1 As shown, it includes an entity linking module and an entity recognition module. The entity link module includes a synonym matching unit and an entity link unit; the entity recognition module includes a tokenizer, a word probability calculation unit and an entity discrimination unit. In the present invention, the synonym matching unit first uses the thesaurus of CN-DBpedia to identify candidate entities for the input text sequence, that is, to identify all possible entity synonyms in the sequence. Then the probability of each entity corresponding to the entity synonym is calculated. Finally, the text sequence and the identified candidate entities and their probabilities will be input to the entity recognition module, and the tokenizer of the entity recognition module wil...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention discloses a CN-DBpedia-based entity identification and linking system and method. The system comprises an entity linking module and an entity identification module; the entity linking module comprises a synonym matching unit and an entity linking unit; and the entity identification module comprises a tokenizer, a word probability calculation unit, and an entity discriminatingunit. According to the technical scheme of the present invention, a semantic relationship between an entity and a word is constructed, so that the relationship with the entity can be mined in a few of context; a machine learning-based entity recognition algorithm is combined with an unsupervised word segmentation algorithm, the rationality of entity name division is considered from the perspective of globality, the vocabulary space of word segmentation is expanded, and the word formation probability of entity words can be calculated by using a more reasonable algorithm; and with a linking first and then identification manner, the semantic information of the text is fully utilized in the entity identification, and better word segmentation and entity identification are realized.

Description

technical field [0001] The invention belongs to the technical field of data services, and in particular relates to a CN-DBpedia-based entity identification and linking system and method. Background technique [0002] The advent of the era of big data has brought unprecedented data dividends to the rapid development of artificial intelligence. Under the "feeding" of big data, artificial intelligence technology has made unprecedented progress. Its progress is prominently reflected in related fields such as knowledge engineering represented by knowledge graph and machine learning represented by deep learning. As the dividends of deep learning for big data are exhausted, the ceiling of the effect of deep learning models is increasingly approaching. On the other hand, a large number of knowledge graphs continue to emerge, but these treasure houses containing a large amount of human prior knowledge have not been effectively utilized by deep learning. Integrating knowledge graph...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27
CPCG06F40/247G06F40/295G06F40/30
Inventor 梁家卿陈砺寒肖仰华
Owner FUDAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products