Scholar name disambiguation method and device, storage medium and terminal

A technology of personal names and scholars, applied in unstructured text data retrieval, instruments, calculations, etc., can solve problems such as high algorithm implementation complexity, low evaluation scores, and inability to run efficiently

Active Publication Date: 2020-08-25
SHANGHAI R&D PUBLIC SERVICE PLATFORM MANAGEMENT CENT +1
View PDF3 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

The technical effect of this patented system described by Scholey et al., which uses advanced techniques such as machine learning or cluster processing to identify specific terms from documents that contain relevant content. This helps improve accuracy when identifying different types of knowledge quickly during large amounts of time-consuming data collection processes without compromising their quality.

Problems solved by technology

The technical problem addressed by these inventors relates to efficiently identifying relevant documents from complex data sources like astronomy or other science-related publications without compromising their quality level. This can be crucial because duplicate entries may lead to incorrect conclusions based on specific terms used during searches.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Scholar name disambiguation method and device, storage medium and terminal
  • Scholar name disambiguation method and device, storage medium and terminal
  • Scholar name disambiguation method and device, storage medium and terminal

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0027] figure 1 Shown is a schematic flow chart of a disambiguation method for a scholar's name according to an embodiment of the present invention, including steps:

[0028] Step S11. Acquire the paper data set of the name of the person to be disambiguated. Optionally, use the method of group statistics to obtain the paper data set, and organize the papers corresponding to each name to be disambiguated into a secondary dictionary data format; use hive local mode to obtain the relationship between the paper and the name to be disambiguated Relational data; use the generator to read the paper data in blocks, convert each paper into a dictionary format, and divide some fields into lists and store them in the database. Preferably, this embodiment selects a lightweight memory-mapped database (Lightning Memory-Mapped Database, LMDB). The storage structure of the LMDB database is stored in a key-value manner, and its data structure is a byte array, which has the following advantag...

Embodiment 2

[0059] This embodiment provides a disambiguation device for scholars' names, such as image 3 Shown, comprise: paper data set acquisition module 31, obtain the paper data set of name to be disambiguated; Feature vector acquisition module 32, utilize word vector model to obtain the paper relation feature vector and paper semantic feature vector of described paper data set; Feature The fusion module 33 calculates the similarity matrix of the paper relation feature vector and the paper semantic feature vector respectively, and performs feature fusion to obtain a feature fusion matrix; the clustering module 34 performs clustering based on the feature fusion matrix to obtain Clustering papers and outlier papers.

[0060] It should be noted that the modules provided in this embodiment are similar to the methods and implementations provided above, so details are not repeated here. In addition, it should be noted that it should be understood that the division of each module of the ab...

Embodiment 3

[0063] This embodiment provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method for disambiguating the names of scholars is implemented.

[0064] Those of ordinary skill in the art can understand that all or part of the steps for implementing the above method embodiments can be completed by hardware related to computer programs. The aforementioned computer program can be stored in a computer-readable storage medium. When the program is executed, it executes the steps including the above-mentioned method embodiments; and the aforementioned storage medium includes: ROM, RAM, magnetic disk or optical disk and other various media that can store program codes.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a scholar name disambiguation method and device, a storage medium and a terminal. The method comprises the steps of obtaining a paper data set of names to be disambiguated; utilizing a word vector model to obtain a paper relationship feature vector and a paper semantic feature vector of the paper data set; respectively calculating similarity matrixes of the paper relationship feature vector and the paper semantic feature vector, and carrying out feature fusion to obtain a feature fusion matrix; and clustering based on the feature fusion matrix to obtain a clustering paper set and an outlier paper set. According to the method and device, paper information is fully utilized, scholar name disambiguation of the scientific and technological literature is achieved throughthe technologies of feature learning, feature fusion, clustering analysis and the like, the accuracy of related evaluation scores and scientific and technological literature author base retrieval is improved, and the construction of the literature knowledge base with scholar entities as the core is facilitated.

Description

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Owner SHANGHAI R&D PUBLIC SERVICE PLATFORM MANAGEMENT CENT
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products