Name duplication disambiguation method of Chinese literature authors

A technology of author and literature, applied in the field of disambiguation of authors with the same name in Chinese literature, can solve the problems of unpredictable number of clusters, failure to comprehensively consider multiple characteristics of author disambiguation, etc., and achieve the effect of improving the degree of accuracy

Active Publication Date: 2016-06-08
QINGDAO ACADEMY OF INTELLIGENT IND
View PDF6 Cites 35 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the pre-set number of clusters in the clustering algorithm generally used in unsupervised methods is usually unpredictable, and the existing method system does not comprehensively consider multiple features to solve the problem of author disambiguation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Name duplication disambiguation method of Chinese literature authors
  • Name duplication disambiguation method of Chinese literature authors
  • Name duplication disambiguation method of Chinese literature authors

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0054] In the present invention, users can use methods such as setting keywords, specifying relevant periodicals, setting field classification direction, starting and ending years, etc. to collect relevant Chinese literature data sets on CNKI, Wanfang and other literature platforms according to the fields they care about, and record it as PS . Under normal circumstances, all article attributes that can be seen on the platform can be collected, including the title, author, institution, abstract, keywords, journal, publication time, etc. of the document. By default, all basic attributes are collected and the text is not downloaded. Each document in PS is recorded as P. Due to the diversity of the expression methods of individual attributes collected and the irregularity of the platform itself for individual entered documents, it is necessary to perform preliminary filtering on PS and to filter each valid document P after filtering. Relevant attributes, including authors, institu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a name duplication disambiguation method of Chinese literature authors. The method comprises the following steps: carrying out normalized preprocessing on a literature data set; carrying out author primary extraction and formalized expression; according to the basic attributes of the author with name duplication ambiguity, calculating a basic attribute similarity score between every two authors for the authors with the name duplication ambiguity; constructing a keyword relationship corresponding table of a name duplication author, and calculating a keyword similarity score; establishing a cooperative relationship network for the name duplication author, and calculating a cooperative relationship similarity score between every two authors; according to relative basic attributes, keywords and the cooperative relationship network, calculating a comprehensive similarity index to judge the name duplication author; and according to a judgment result, updating the relevant information of the author. Through the name duplication disambiguation method, an author name duplication phenomenon in the Chinese literature can be disambiguated, and a method for improving an academic analytics precision degree is provided for applying to aspects including science and technology evaluation, academic research and the like through academic literature analytics.

Description

technical field [0001] The invention belongs to the field of document processing, and in particular relates to a method for disambiguation of duplicate names of Chinese document authors. Background technique [0002] At present, with the continuous enrichment and development of online literature databases, more and more scholars and related institutions and business units have begun to pay attention to the latest scientific research trends in the field through literature analysis methods, and to grasp the scientific and technological activities of peers or competitors. On this basis, further research and discover the key and hot issues in the field, grasp the general situation of the development of the field, and assist in scientific and technological decision-making and academic evaluation. However, after obtaining documents in related fields by setting specific keywords, authors, journal directions, etc., it is common to have the problem of duplicate authors’ names in docu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/334G06F16/35G06F40/295
Inventor 孙星恺陆浩袁勇王飞跃关晓炟吕宏强
Owner QINGDAO ACADEMY OF INTELLIGENT IND
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products