Academic institution name entity alignment method based on text features

A technology of entity pairs and names, which is applied in text database query, unstructured text data retrieval, special data processing applications, etc. It can solve the problems of inability to align entities in academic institutions, insufficient labeled data, and high complexity of algorithm implementation. The effect of good entity alignment effect

Active Publication Date: 2020-12-01
EAST CHINA NORMAL UNIV
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In view of the shortcomings and existing difficulties and challenges of the prior art, the purpose of the present invention is to provide a method for entity alignment of academic institution names based on text features for the characteristics of academic institution names. The relationship between full name, Chinese and English, geographic location and institution solves the problems that the existing technical so

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Academic institution name entity alignment method based on text features
  • Academic institution name entity alignment method based on text features
  • Academic institution name entity alignment method based on text features

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0093] refer to figure 1 , the English text data source can extract four types of institutional data that contain only English abbreviations, only English full names, both English abbreviations and full names, and Chinese and English names, respectively marked as ①,②,③,④; Chinese text The data source can extract two types of institutional data that contain only Chinese names and both Chinese and English names, which are marked as ⑤ and ⑥ respectively. For the data marked ①, use step 1 to complete the full English name, and mark the result as ⑦; for the data marked ③, ④, ⑥, use step 2 to correct the correspondence between the English abbreviation and the full name, and the obtained results are respectively marked ⑧, ⑨ ,⑨; for the data marked as ②,⑦,⑧, use step 3 to translate the full English name into a Chinese name, and use step 4 to correct the Chinese name, and the result is marked as ⑨; for the data marked as ⑤, use step 3 to translate the Chinese name It is the full name ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an academic institution name entity alignment method based on text features. The academic institution name entity alignment method comprises the following five steps: converting English abbreviations into English full names; correcting the error correspondence of the English abbreviation and the English full name; translating to complement English full names and Chinese names; correcting wrong Chinese names; and performing academic institution combination based on the text features. According to the method, entity alignment is carried out by using academic institution data extracted from Chinese and English text data; wherein each part of institution data contains English abbreviation, English full name, Chinese name, geographic position, content loss and a small amount of errors, and finally obtaining a plurality of different names corresponding to the same institution by complementing the missing data, correcting the error data and combining the data of the same institution. According to the method, the organization name text features and the geographic position information are combined for academic organization name entity alignment, the pre-marked organization name corresponding relation and the context semantic information of the name are not needed, and a good entity alignment effect is obtained with low complexity.

Description

technical field [0001] The technical fields involved in the present invention include entity alignment, entity disambiguation, construction of knowledge graph, data preprocessing technology and search algorithm, especially relates to the method of entity alignment for academic institution names and construction of academic knowledge graph, and relates to a text-based The academic institution name entity alignment method for the trait. Background technique [0002] In recent years, with the development of computers and networks and the accumulation of data, there have been more and more electronic data assisting computers to complete more tasks. In order to understand the relationship between common items in life, so that the computer can learn more knowledge, a knowledge graph can be constructed for the entities in life. Each item corresponds to an entity point on the graph, and the relationship between items corresponds to the graph. Edges connecting entities, this method ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/295G06F16/33G06F16/36
CPCG06F40/295G06F16/3344G06F16/367Y02D10/00
Inventor 林欣郭晨亮李继洲
Owner EAST CHINA NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products