Paper author name disambiguation method and device and storage medium

A technology of the author and author of the paper, applied in the direction of metadata text retrieval, unstructured text data retrieval, instruments, etc., to achieve the effect of facilitating similarity calculation

Pending Publication Date: 2022-07-05
ZHEJIANG SCI-TECH UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The present invention is mainly to solve the technical problem of how to disambiguate the author of the paper in the literature data retrieval process in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Paper author name disambiguation method and device and storage medium
  • Paper author name disambiguation method and device and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0053] This embodiment provides a method for disambiguating author names of papers, such as figure 1 shown, including the following steps:

[0054] S1: Create author information data set and mailbox statistics temporary table;

[0055] S2: Calculate the weight of each author attribute feature in the author information set;

[0056] S3: Generate the embedded representation of the author attribute feature based on the word vector, and perform weighted fusion of the embedded feature vector through the feature weight obtained by S2 to obtain the overall embedding of the author information set;

[0057] S4: Extract an email address in the temporary table of email statistics;

[0058] S5: According to the extracted email addresses, filter the author information data set, obtain several selected author information records, and construct a graph neural network. Each author information record is used as one of the nodes;

[0059] S6: In the graph neural network, construct edges fro...

Embodiment 2

[0139] A computer device comprising: one or more processors; a memory for storing one or more programs that, when executed by the one or more processors, cause all The one or more processors execute the method described in Embodiment 1.

[0140] A storage medium storing a computer program, when the program is executed by a processor, implements the method described in Embodiment 1 above.

[0141] figure 2 This is a schematic structural diagram of a device provided in this embodiment.

[0142] like figure 2 As shown, as another aspect, the present application also provides a computer device 100, comprising one or more central processing units (CPU) 101, which can be stored in a read-only memory (ROM) 102 according to a program or from a storage Section 108 loads programs into random access memory (RAM) 103 to perform various appropriate actions and processes. In the RAM 103, various programs and data necessary for the operation of the device 100 are also stored. The CPU ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a paper author name disambiguation method and device and a storage medium, nodes are judged based on names and organizations, edges are constructed for the nodes with the same names and organizations and the distance smaller than a set threshold value, author information records with the edges are merged through a graph automatic encoder to obtain a related expert data set, and the expert data set is used for disambiguation of the authors. And associating the expert data set with a mailbox, realizing homonymy disambiguation by inevitably different mailbox addresses adopted by authors with different homonymy persons, judging the same authors according to the same or inclusion relationship of the author names and institutions and the distance smaller than a set threshold value, merging to form the expert data set, merging the paper data of the mailbox used by the authors into groups, and carrying out disambiguation. And after all experts associated with the mailbox are recorded, expert data sets in which organizations, author names and research subjects coincide are combined, so that an expert data set associated with the mailbox is realized, and disambiguation of authors is realized.

Description

technical field [0001] The invention belongs to the technical field of document processing, and in particular relates to a method, device and storage medium for disambiguating author names of papers. Background technique [0002] In recent years, as the number of international papers published by China has risen sharply, the attention of Chinese authors in the international academic community has continued to rise. At the same time, in the English academic literature database, the problem of duplicate names of Chinese authors has become increasingly prominent. After the Chinese name is converted into Pinyin (or English name), the Chinese characters are lost, and the probability of duplicate names is greatly increased. For example, if two authors from different institutions are called "Li Si", or one is called "Wang Wu", the other One is called "Wang Wu", and the name of the corresponding author written in some foreign literatures is all "Wu Wang", which will cause great tro...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/36G06F16/335G06F16/38
CPCG06F16/367G06F16/335G06F16/38
Inventor 方志坚王露张华熊陈超颖汤哲冲贾子杰
Owner ZHEJIANG SCI-TECH UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products