Literature author name duplication disambiguation method and literature author name duplication disambiguation construction system

A document and author technology, applied in the field of document author duplication disambiguation, can solve the problems of inability to apply multi-language and multi-document types, difficult to guarantee the accuracy and recall level of disambiguation results, and achieve good compatibility.

Pending Publication Date: 2020-12-25
三螺旋大数据科技(昆山)有限公司
View PDF0 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The clustering technology can be used to complete the task of disambiguation of the author’s duplicate name. Most of the existing methods are based on the information contained in the literature, mainly including the method based on feature distinction, the method based on graph segmentation and the classification based on network resources. Although these methods can disambiguate duplicate names, the divi

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Literature author name duplication disambiguation method and literature author name duplication disambiguation construction system
  • Literature author name duplication disambiguation method and literature author name duplication disambiguation construction system
  • Literature author name duplication disambiguation method and literature author name duplication disambiguation construction system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0052] Embodiment: A method for disambiguation of the same name of a document author, such as figure 1 shown, including the following steps:

[0053] Step 1: Read the literature data and scholar data in the database;

[0054] Step 2: Use the Word2Vec model to train and predict the document vector of each document;

[0055] Step 3: Construct the author-collaborator relationship network graph to be disambiguated and calculate the node similarity and clustering;

[0056] Step 4: Obtain the document vectors of the documents in the document clusters clustered by the collaborator relationship graph and calculate the similarity and clustering between the document clusters.

[0057] Described step one specifically includes:

[0058] Relevant data are read from the company's literature database and scholar database, including:

[0059] (1) ID, title, author, institution, abstract, periodical, year, keywords in Chinese paper data;

[0060] (2) ID, title, author, institution, abstract...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a literature author name duplication disambiguation method and a literature author name duplication disambiguation construction system. The literature author name duplication disambiguation method comprises the following steps of 1, reading literature data and scholarship data in a database; 2, training and predicting a document vector of each document by using a Word2Vec model; 3, constructing a to-be-disambiguated author partner relationship network graph, and calculating node similarity and clustering; and 4, obtaining document vectors of documents in document clusters clustered by the partner relation graph, and calculating similarity and clustering among the document clusters. The invention can ensure that the disambiguation result has relatively high accuracyand recall rate level, and is suitable for multi-language and multi-literature types of Chinese literature, English literature, patents and the like.

Description

technical field [0001] The invention belongs to the technical field of document processing, in particular to a method for disambiguation of duplicate names of document authors. Background technique [0002] With the rapid development of science and technology and the continuous integration of information, when dealing with informatization issues, especially when dealing with flexible and diverse natural language data, the phenomenon of duplicate names widely existing in the real world will greatly affect the retrieval and processing of data , thus resulting in the technique of named entity disambiguation, which studies how to match ambiguous entity references with correct entities in a knowledge base. Author disambiguation belongs to named entity disambiguation. In the real world, different people may have the same name. In many applications such as scientific literature management and information integration, people’s names are used as identifiers for retrieving information...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/284G06F40/289G06F40/295
CPCG06F40/284G06F40/289G06F40/295
Inventor 李微胡晟
Owner 三螺旋大数据科技(昆山)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products