Homonymous author disambiguation method based on network representation and semantic representation

A semantic feature and representation technology, applied in the field of disambiguation technology based on network representation and semantic representation of the author of the paper with the same name, can solve problems such as ambiguity of the author of the same name, wrong assignment of papers, etc., and achieve the effect of improving accuracy

Active Publication Date: 2020-05-22
COMP NETWORK INFORMATION CENT CHINESE ACADEMY OF SCI
View PDF14 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, due to the huge number of papers and the complexity and variety of paper information, there is a problem that a large number of papers are assigned incorrectly. Among them, the ambiguity of authors with the same name is a relatively important but thorny problem.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Homonymous author disambiguation method based on network representation and semantic representation
  • Homonymous author disambiguation method based on network representation and semantic representation
  • Homonymous author disambiguation method based on network representation and semantic representation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

[0038] The present invention aims to solve the ambiguity problem of the author with the same name in the paper, and uses some main information of the paper, such as title, abstract, author, periodical, author institution, year of publication, keywords, and learns the relational representation and semantic representation of the paper And use the clustering method to cluster them, and at the same time, use the method based on similarity threshold matching to process the outlier papers generated in the process, so as to obtain the final paper division result, that is, the real papers of the same author are divided into In one cluster, papers by different authors are in different clusters. figure 1 It is a model architecture diagram of the present invention.

[0039] Step 1: Perform feature analysis on the relevant information of the papers in the paper da...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a homonymous author disambiguation method based on network representation and semantic representation. The method comprises the following steps: 1) extracting semantic featuresand discrete features of each paper in a target paper library; 2) calculating the similarity among the papers based on the discrete features to obtain a relationship similarity matrix of the papers;if one paper and other papers do not have a co-author or institution, adding the paper and other papers into an outlier paper set; 3) calculating a semantic similarity matrix of the papers based on the semantic features of the papers; adding papers without semantic features in the target paper library into an outlier paper set; 4) performing weighted summation on the relationship similarity matrixand the semantic similarity matrix to obtain a paper similarity matrix, and clustering the paper similarity matrix; adding papers which do not belong to any cluster into the outlier paper set; and 5)distributing the papers in the outlier paper set to the corresponding clusters by utilizing a similarity threshold matching-based method. According to the method, high-accuracy disambiguation of authors with the same name of the papers is realized.

Description

technical field [0001] The present invention mainly relates to the field of entity disambiguation, heterogeneous network embedding technology, and word vector embedding technology, in particular to a disambiguation technology based on network representation and semantic representation of the same author of the paper. Background technique [0002] Same-name disambiguation has been regarded as a meaningful but challenging problem in many fields, such as document management, social network analysis, etc. In the field of academic network, the emergence of various academic search systems, such as Google Scholar, Aminer, etc., provide great convenience for the search of papers and academic communication. However, due to the huge number of papers and the complexity and variety of paper information, there is a problem that a large number of papers are assigned incorrectly. Among them, the ambiguity of authors with the same name is an important but thorny problem. The disambiguation...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/35G06F40/279G06F16/35
CPCG06F16/35G06F16/3344G06F40/30G06F40/295G06F16/93
Inventor 杜一王寒雪乔子越周园春
Owner COMP NETWORK INFORMATION CENT CHINESE ACADEMY OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products