Scholars name disambiguation method based on heterogeneous network embedding

A heterogeneous network and disambiguation technology, applied in the field of big data, can solve problems such as reducing the accuracy of name retrieval, character relationship mining and character similarity association, and achieve the effect of improving representativeness

Pending Publication Date: 2019-04-02
COMP NETWORK INFORMATION CENT CHINESE ACADEMY OF SCI
View PDF8 Cites 33 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For example, in the massive documents in the knowledge base, there will be a large number of authors with the same n...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Scholars name disambiguation method based on heterogeneous network embedding
  • Scholars name disambiguation method based on heterogeneous network embedding
  • Scholars name disambiguation method based on heterogeneous network embedding

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0018] The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

[0019] The present invention adopts a non-supervised method based on meta-path random walk heterogeneous network node representation vector learning to disambiguate scholars' names. In the following embodiments, the benchmark database of name disambiguation papers is selected as the paper database, and the present invention is further described in conjunction with the accompanying drawings.

[0020] Step 1: Collect all the papers in the paper database that are related to the authors that need to be disambiguated, and build a heterogeneous network of paper relationships through information such as the authors of these papers, the titles of the journals, titles, keywords, and abstracts.

[0021] Treat each paper as a node in a heterogeneous network. If there are co-authors between them, then build a relationship named CoAuthor between them. At the same ti...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a scholars name disambiguation method based on heterogeneous network embedding, which comprises the following steps of 1) setting a plurality of authors needing disambiguation,collecting all papers related to the authors needing disambiguation, and generating a paper relation heterogeneous network by utilizing the authors of the collected papers and semantic information ofthe papers; 2) according to the paper relation heterogeneous network, generating paths containing text information of neighbor nodes of paper nodes through a meta-path-based random walk strategy, andstoring the paths as a training corpus; 3) using Skip-gram to train the training corpus, and generating a paper representation vector corresponding to each paper; 4) for an author needing disambiguation in the step 1), obtaining a paper representation vector corresponding to the paper of the author from the obtained paper representation vector, and 5) clustering the paper representation vector obtained in the step 4) to obtain a plurality of clusters to realize disambiguation of the author name.

Description

technical field [0001] The invention relates to the technical fields of big data, knowledge graph, entity disambiguation, and heterogeneous network embedding, and specifically relates to a non-supervised method for learning the representation vector of heterogeneous network nodes based on meta-path random walk to disambiguate scholars' names. Background technique [0002] In building a knowledge base of scientific and technological literature, the problem of author name disambiguation is often encountered. For example, in the massive documents in the knowledge base, there will be a large number of authors with the same name, which will reduce the accuracy of name retrieval, character relationship mining, and character similarity association. For example, when retrieving an author's name, all the papers written by the author with the same name will appear. To solve this problem, clustering methods are usually used to divide these retrieved papers into different author entitie...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/36G06F16/35
Inventor 杜一乔子越周园春
Owner COMP NETWORK INFORMATION CENT CHINESE ACADEMY OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products