Method for detecting same name of document writers

A detection method and literature technology, applied in the field of data retrieval, can solve the problems of high cost and low identification accuracy, and achieve the effects of avoiding errors, avoiding over-identification, and avoiding incomplete clustering

Active Publication Date: 2016-10-12
NANJING UNIV OF POSTS & TELECOMM
View PDF6 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The technical problem to be solved by the present invention is to overcome the shortcomings of low identification accuracy and high cost in the duplicate name identification ability of the current network document knowledge base system

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for detecting same name of document writers
  • Method for detecting same name of document writers
  • Method for detecting same name of document writers

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051]The specific implementation of the present invention will be further described in detail in conjunction with the accompanying drawings.

[0052] In order to facilitate the public to understand the technical solutions of the present invention, the concepts and models involved in the present invention are briefly introduced below.

[0053] 1. Single feature similarity

[0054] Use L to represent a document, A L Indicates the set of authors of the document, U L Indicates the author's unit, K L Indicates keywords, P L Indicates the set of collaborators except the author of the same name, J L Indicates periodical, T L Indicates the title. Let's first analyze the role of the five single features in disambiguation:

[0055] Author unit (Unit): The author unit has a strong ability to disambiguate. The author’s unit information can be found in any document. If two articles have the same author’s name and the same author’s unit, then it can be roughly assumed that the two ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention discloses a method for detecting the same name of document writers, belonging to the technical field of data mining. The method fully uses a characteristic of same name disambiguation of a single characteristic similarity and single characteristic fusion in scientific literature. The method includes the steps of firstly modeling for a to-be-used document, then, calculating a similarity of every two single characteristics by using a single characteristic similarity detection method, and calculating identification capability of each single characteristic by using a disambiguation method based on the single characteristic similarity, so as to design multi-characteristic fusion disambiguation rules, and provide a method for detecting the same name of the document writers. The detection method integrates advantages of single characteristics of disambiguating the physical writer names, so that the method has high accuracy and callback rate in identification.

Description

technical field [0001] The invention relates to a method for detecting duplicate names of document authors, belonging to the technical field of data retrieval. Background technique [0002] With the rapid increase in the number of scientific and technological documents every year, the phenomenon of a large number of authors with duplicate names reduces the accuracy of knowledge retrieval and research work, affects subsequent scientific research work, and prolongs the cycle of the entire scientific research work. However, the current network literature knowledge base system does not have the ability to identify authors with the same name. Taking the China National Knowledge Infrastructure (CNKI) as an example, when the search condition is limited to "author", After entering an author's name, there are often many irrelevant authors with the same name. The subsequent classification work can only be manually identified by the user, which is a waste of energy and time. Therefore...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/35
Inventor 徐小龙李永萍孙雁飞杨维荣王勇
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products