Scholar name duplication disambiguation method and system

A technology for scholars and disambiguation, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of scarcity of academic data and information, and achieve the effect of solving the scarcity of information

Active Publication Date: 2014-10-22
INST OF COMPUTING TECH CHINESE ACAD OF SCI
View PDF3 Cites 31 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The technical problem to be solved by the present invention is to provide a method and system for disambiguating scholars with duplica...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Scholar name duplication disambiguation method and system
  • Scholar name duplication disambiguation method and system
  • Scholar name duplication disambiguation method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0065] Specific embodiments of the present invention are given below, and the present invention is described in detail in conjunction with the drawings.

[0066] The invention transforms the problem of disambiguation of the same name of scholars into a problem of collective classification (Collective Classification). Make full use of heterogeneous academic network data to extract the characteristics of scholars' information, including not only the scholars' homepages, email addresses, and work organizations, but also the titles, keywords, abstracts, and related journal conferences of published papers; using manual annotation Using the supervised learning method to obtain the classification model of the training data set, based on the classification model, an iterative classification algorithm is used to disambiguate scholars with duplicate names. The collective classification method effectively solves the problem of information sparsity, so it has a high accuracy and recall ra...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a scholar name duplication disambiguation method which comprises the steps of classification model establishing and iteration disambiguation. In the classification model establishing step, a labeled data set is obtained by labeling based on heterogeneous academic network data, a document-to-binary-classification training data set is established based on the labeled data set, binary classification model training is carried out through a classification algorithm based on the training data set, and a document-to-binary-classification model is obtained. In the iteration disambiguation step, iteration judging is carried out on the data set which needs disambiguation with an iteration classification algorithm based on a binary classification model, a final agglomeration corresponding to a real scholar is obtained, and scholar name duplication disambiguation processing is achieved. The invention further discloses a scholar name duplication disambiguation system.

Description

technical field [0001] The invention relates to the field of entity disambiguation, in particular to a method and system for disambiguating the same name of scholars in the academic field. Background technique [0002] The literature system is an important tool for researchers to engage in research work. Through the literature system, researchers can comprehensively obtain literature and scholar information, understand the latest developments in related research, and then develop ideas and improve research levels. However, there is an important problem in the current literature system, that is, the problem of duplicate names of scholars. The phenomenon of duplication of scholars' names mainly includes: (1) different scholars have the same name; (2) the same scholar's name has different manifestations in different documents. For example, querying "Wang Wei" in the Wanfang literature system will return thousands of scholar information. The problem of duplicate names is part...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/335G06F16/355
Inventor 程学旗陈忠祥郭嘉丰曹雷
Owner INST OF COMPUTING TECH CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products