Unlock instant, AI-driven research and patent intelligence for your innovation.

Historical error correction method and system for preventing overexsplit of scholar paper library

A technology for papers and scholars, applied in the field of historical error correction to prevent the over-splitting of scholars' dissertation databases, it can solve problems such as over-splitting errors, wrong merging, insufficient precision and recall, and achieve high versatility and speed. Effect

Pending Publication Date: 2022-07-05
北京智谱华章科技有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For example, in the incremental disambiguation algorithm, the INC algorithm directly merges all the over-split fragments by checking whether the similarity between the over-split fragments is greater than the specified threshold, but this method is prone to erroneous merging
The MINDi algorithm specifically studies fragment merging. When multiple candidate authors meet the specified conditions, it calculates the distance between the candidate authors and merges the two closest candidate authors. However, the accuracy of this method and Insufficient recall
[0004] Therefore, the above-mentioned solution to prevent over-splitting by improving the name disambiguation algorithm, under the principle of purity first, multiple papers by the same author will inevitably be divided into multiple clusters, resulting in over-splitting errors, over-splitting Less accurate error correction

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Historical error correction method and system for preventing overexsplit of scholar paper library
  • Historical error correction method and system for preventing overexsplit of scholar paper library
  • Historical error correction method and system for preventing overexsplit of scholar paper library

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] The following describes in detail the embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary, and are intended to explain the present invention and should not be construed as limiting the present invention.

[0052] The following describes a method and system for correcting historical errors for preventing over-splitting of a scholar's dissertation library proposed by the embodiments of the present invention with reference to the accompanying drawings.

[0053] figure 1 A flowchart of a method for correcting historical errors for preventing over-splitting of scholar's dissertation database proposed in the embodiment of the present application, such as figure 1 The method includes the followi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a historical error correction method and system for preventing overexsplit of a scholar paper library. The method comprises the following steps: reconstructing a scholar name; directly matching the target scholar paper library with the paper cluster to be distributed according to the information capable of uniquely determining the author; for the paper cluster which is not successfully matched, identifying author related information of the paper and entity information in the abstract through a BERT-Bi-LSTM-CRF model; respectively calculating the matching degree of the information of the institutions to which authors belong and the periodical information included in the papers to be matched; respectively calculating similarity characteristics of each candidate alignment paper cluster and the target scholar paper library, and judging whether each candidate alignment paper cluster is aligned with the target scholar paper library or not; and combining the candidate aligned paper clusters which are judged to be aligned by the integrated learning model, and carrying out manual annotation on the paper clusters which are not aligned. According to the method, the over-splitting error generated in the disambiguation process can be solved, and the over-splitting error correction speed, accuracy and recall rate are improved.

Description

technical field [0001] The present application relates to the technical field of information processing, and in particular, to a method and system for correcting historical errors for preventing over-splitting of a scholar's dissertation database. Background technique [0002] At present, the over-splitting of scholar papers is a common problem encountered in the operation of scholar papers libraries. This problem originates from the scenario where the cold-start disambiguation of the paper and the incremental disambiguation algorithm of the paper are designed to eliminate the ambiguity of different authors with the same name, and the algorithm is over-split in the process of running. The over-splitting scenario generates a large number of fragment clusters, that is, multiple scholar libraries of the same scholar. The fragment clusters will continue to increase with the increase of systematic papers, which will lead to a great reduction in the recall and accuracy of the incr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62G06F40/295
CPCG06F40/295G06F18/22G06F18/24G06F18/214
Inventor 房小涵李晓彦宋健赵祎仇瑜刘德兵褚晓泉李青
Owner 北京智谱华章科技有限公司