A Method for Detecting Duplicate Names of Document Authors

A detection method and literature technology, applied in the field of data retrieval, can solve the problems of high cost and low identification accuracy, and achieve the effect of avoiding errors, over-identification, and incomplete clustering.

Active Publication Date: 2019-05-28
NANJING UNIV OF POSTS & TELECOMM
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The technical problem to be solved by the present invention is to overcome the shortcomings of low identification accuracy and high cost in the duplicate name identification ability of the current network document knowledge base system

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Method for Detecting Duplicate Names of Document Authors
  • A Method for Detecting Duplicate Names of Document Authors
  • A Method for Detecting Duplicate Names of Document Authors

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051]The specific implementation of the present invention will be further described in detail in conjunction with the accompanying drawings.

[0052] In order to facilitate the public to understand the technical solutions of the present invention, the concepts and models involved in the present invention are briefly introduced below.

[0053] 1. Single feature similarity

[0054] Use L to represent a document, A L Indicates the set of authors of the document, U L Indicates the author's unit, K L Indicates keywords, P L Indicates the set of collaborators except the author of the same name, J L Indicates periodical, T L Indicates the title. Let's first analyze the role of the five single features in disambiguation:

[0055] Author unit (Unit): The author unit has a strong ability to disambiguate. The author’s unit information can be found in any document. If two articles have the same author’s name and the same author’s unit, then it can be roughly assumed that the two ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for detecting duplicate names of document authors, which belongs to the technical field of data mining. The method of the present invention makes full use of the characteristics of single-feature similarity and single-feature fusion in disambiguating authors with the same name in scientific and technological documents. First, the document object to be used is modeled, and secondly, a single-feature similarity detection method is used. Calculate the similarity of two single features, and then use the disambiguation method based on the similarity of single features to calculate the discriminative power of each single feature, on this basis, design the disambiguation rules of multi-feature fusion, and propose a literature author with the same name Detection method. Since this detection method combines the advantages of each single feature in author entity disambiguation, the method has better precision and recall rate in the recognition process.

Description

technical field [0001] The invention relates to a method for detecting duplicate names of document authors, belonging to the technical field of data retrieval. Background technique [0002] With the rapid increase in the number of scientific and technological documents every year, the phenomenon of a large number of authors with duplicate names reduces the accuracy of knowledge retrieval and research work, affects subsequent scientific research work, and prolongs the cycle of the entire scientific research work. However, the current network literature knowledge base system does not yet have the ability to identify authors with the same name. Taking the China National Knowledge Infrastructure (CNKI) as an example, when the search condition is limited to "author", input There are often many irrelevant authors with the same name after an author's name, and the subsequent classification work can only be manually identified by the user, which is a waste of energy and time. There...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35
CPCG06F16/35
Inventor 徐小龙李永萍孙雁飞杨维荣王勇
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products