Paper same-named author disambiguation method based on high-confidence-degree characteristic attribute hierarchical-clustering method

A characteristic attribute and high-confidence technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of researchers, affecting the final results, affecting the search quality, etc., to improve operating efficiency and improve recall rate, the effect of improving accuracy

Active Publication Date: 2018-01-16
HUBEI UNIV
View PDF4 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the same-name ambiguity problem has been affecting the quality of the search
However, with the rapid increase of network resource data, these traditional methods have gradually shown some limitations and deficiencies in the face of the possible problems of lack of information, information errors and deep-seated ambiguity. It will lead to a large and disorganized number of papers returned to users when searching by author name, which will cause confusion or even mislead researchers, reduce the efficiency of academic activities, and even affect the final results of academic research in severe cases

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Paper same-named author disambiguation method based on high-confidence-degree characteristic attribute hierarchical-clustering method
  • Paper same-named author disambiguation method based on high-confidence-degree characteristic attribute hierarchical-clustering method
  • Paper same-named author disambiguation method based on high-confidence-degree characteristic attribute hierarchical-clustering method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0032] The embodiments of the present invention will be described in detail below, and the technical solution of the present invention will be further described with reference to the accompanying drawings and in combination with the data in the table.

[0033] The disambiguation method of the author of the same name of the paper described in the present invention, the specific operation steps are as follows:

[0034] 1. Data preprocessing;

[0035] It is worth noting that the raw data extracted from academic search engines are rough and lack of standardization. Among them, the author's name may be represented by the full name or the first letter of the first name in capitalized form, and the publishing unit also has two forms of representation, the full name and the abbreviation. This makes it difficult to compute similarity scores for clustering methods that only utilize attribute values ​​for string matching. So in the data preprocessing step, the eigenvalues ​​in the pape...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a paper same-named author disambiguation method based on a high-confidence-degree characteristic attribute hierarchical-clustering method. The paper same-named author disambiguation method mainly comprises the steps that 1, original data is firstly extracted from an academic search engine, characteristic attribute values are extracted, and normalized processing is conducted on the values; 2, another name groups are firstly formed according to rules, and homonymic author ambiguity groups are generated according to the another name groups;3, similarity calculation and disambiguation method selection is performed for characteristic attributes respectively; 4, the high-confidence-degree characteristic attribute hierarchical-clustering method is achieved through attribute confidence-degree assessment performed in the step 3. By applying the paper same-named author disambiguation method, the name disambiguation speed is ensured, and the disambiguation accuracy is also improved.

Description

technical field [0001] The invention relates to a disambiguation method for authors with the same name in papers, in particular to a method for disambiguation of authors with the same name in papers based on a hierarchical clustering method for feature attributes with high confidence. Background technique [0002] In today's society, people rely heavily on the Internet for academic activities, and one of the most important reasons is that the Internet has the advantages of resource sharing. Nowadays, the vast majority of academic papers are stored in network databases in the form of electronic resources. People only need to use some legitimate network channels to easily find, read and download the learning resources they need, especially academic papers. With the change of people's academic research habits, more and more academic search engines (DLs) are born and continue to develop. They provide users with the service of searching papers by author, and return the list of al...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27G06F17/30
Inventor 胡婕
Owner HUBEI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products