A method for disambiguation of duplicate names of authors in Chinese literature

A technology of author and literature, applied in the field of disambiguation of authors with the same name in Chinese literature, can solve the problems of unpredictability of the number of clusters, failure to comprehensively consider multiple characteristics of author disambiguation, etc., and achieve the effect of improving the degree of accuracy

Active Publication Date: 2019-03-26
QINGDAO ACADEMY OF INTELLIGENT IND
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the pre-set number of clusters in the clustering algorithm generally used in unsupervised methods is usually unpredictable, and the existing method system does not comprehensively consider multiple features to solve the problem of author disambiguation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method for disambiguation of duplicate names of authors in Chinese literature
  • A method for disambiguation of duplicate names of authors in Chinese literature
  • A method for disambiguation of duplicate names of authors in Chinese literature

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0054] In the present invention, users can use methods such as setting keywords, specifying relevant periodicals, setting field classification direction, starting and ending years, etc. to collect relevant Chinese literature data sets on CNKI, Wanfang and other literature platforms according to the fields they care about, and record it as PS . Under normal circumstances, all article attributes that can be seen on the platform can be collected, including the title, author, institution, abstract, keywords, journal, publication time, etc. of the document. By default, all basic attributes are collected and the text is not downloaded. Each document in PS is recorded as P. Due to the diversity of the expression methods of individual attributes collected and the irregularity of the platform itself for individual entered documents, it is necessary to perform preliminary filtering on PS and to filter each valid document P after filtering. Relevant attributes, including authors, institu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for disambiguation of duplicate names of authors in Chinese documents. The method includes: performing normalized preprocessing on document data sets; initial extraction and formalized expression of authors; The similarity score of the basic attributes between the two; construct the keyword relationship correspondence table of the same author and calculate the similarity score of the keyword; establish a cooperative relationship network for the author with the same name and calculate the similarity score of the cooperative relationship between the two; according to the relevant basic attributes , keyword and cooperative relationship network to calculate the comprehensive similarity index to judge the author with the same name; update the relevant information of the author according to the judgment result. The present invention can disambiguate the phenomenon of duplicate names of authors existing in Chinese documents, and provides a method for improving the accuracy of academic analysis for the application of academic document analysis in scientific and technological evaluation, academic research, and the like.

Description

technical field [0001] The invention belongs to the field of document processing, and in particular relates to a method for disambiguation of duplicate names of Chinese document authors. Background technique [0002] At present, with the continuous enrichment and development of online literature databases, more and more scholars and related institutions and business units have begun to pay attention to the latest scientific research trends in the field through literature analysis methods, and to grasp the scientific and technological activities of peers or competitors. On this basis, further research and discover the key and hot issues in the field, grasp the general situation of the development of the field, and assist in scientific and technological decision-making and academic evaluation. However, after obtaining documents in related fields by setting specific keywords, authors, journal directions, etc., it is common to have the problem of duplicate authors’ names in docu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F16/38G06F17/27
CPCG06F16/334G06F16/35G06F40/295
Inventor 孙星恺陆浩袁勇王飞跃关晓炟吕宏强
Owner QINGDAO ACADEMY OF INTELLIGENT IND
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products