Literature author name duplication disambiguation method and literature author name duplication disambiguation construction system

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A document and author technology, applied in the field of document author duplication disambiguation, can solve the problems of inability to apply multi-language and multi-document types, difficult to guarantee the accuracy and recall level of disambiguation results, and achieve good compatibility.

Pending Publication Date: 2020-12-25

三螺旋大数据科技(昆山)有限公司

View PDF0 Cites 14 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] The clustering technology can be used to complete the task of disambiguation of the author’s duplicate name. Most of the existing methods are based on the information contained in the literature, mainly including the method based on feature distinction, the method based on graph segmentation and the classification based on network resources. Although these methods can disambiguate duplicate names, the division methods based only on text features or graph relationships cannot fully utilize the rich information contained in the literature, and it is difficult to ensure that the disambiguation results have a high level of accuracy and recall. Moreover, the existing duplicate name disambiguation method cannot be applied to multi-language and multi-document types such as Chinese documents, English documents and patents.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0052] Embodiment: A method for disambiguation of the same name of a document author, such as figure 1 shown, including the following steps:

[0053] Step 1: Read the literature data and scholar data in the database;

[0054] Step 2: Use the Word2Vec model to train and predict the document vector of each document;

[0055] Step 3: Construct the author-collaborator relationship network graph to be disambiguated and calculate the node similarity and clustering;

[0056] Step 4: Obtain the document vectors of the documents in the document clusters clustered by the collaborator relationship graph and calculate the similarity and clustering between the document clusters.

[0057] Described step one specifically includes:

[0058] Relevant data are read from the company's literature database and scholar database, including:

[0059] (1) ID, title, author, institution, abstract, periodical, year, keywords in Chinese paper data;

[0060] (2) ID, title, author, institution, abstract...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a literature author name duplication disambiguation method and a literature author name duplication disambiguation construction system. The literature author name duplication disambiguation method comprises the following steps of 1, reading literature data and scholarship data in a database; 2, training and predicting a document vector of each document by using a Word2Vec model; 3, constructing a to-be-disambiguated author partner relationship network graph, and calculating node similarity and clustering; and 4, obtaining document vectors of documents in document clusters clustered by the partner relation graph, and calculating similarity and clustering among the document clusters. The invention can ensure that the disambiguation result has relatively high accuracyand recall rate level, and is suitable for multi-language and multi-literature types of Chinese literature, English literature, patents and the like.

Description

technical field [0001] The invention belongs to the technical field of document processing, in particular to a method for disambiguation of duplicate names of document authors. Background technique [0002] With the rapid development of science and technology and the continuous integration of information, when dealing with informatization issues, especially when dealing with flexible and diverse natural language data, the phenomenon of duplicate names widely existing in the real world will greatly affect the retrieval and processing of data , thus resulting in the technique of named entity disambiguation, which studies how to match ambiguous entity references with correct entities in a knowledge base. Author disambiguation belongs to named entity disambiguation. In the real world, different people may have the same name. In many applications such as scientific literature management and information integration, people’s names are used as identifiers for retrieving information...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F40/284G06F40/289G06F40/295

CPCG06F40/284G06F40/289G06F40/295

Inventor李微胡晟

Owner三螺旋大数据科技(昆山)有限公司

Literature author name duplication disambiguation method and literature author name duplication disambiguation construction system

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology