Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A paper relevance quantification method based on reference list overlap degree

A technology of reference and overlapping degree, which is applied in the fields of unstructured text data retrieval, text database clustering/classification, special data processing applications, etc., and can solve the problems of irregular citation, missing citation, and multiple citations.

Active Publication Date: 2019-02-22
DALIAN UNIV OF TECH
View PDF3 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, there are many implicit citation irregularities in the references, such as missing citations and multiple citations.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A paper relevance quantification method based on reference list overlap degree
  • A paper relevance quantification method based on reference list overlap degree
  • A paper relevance quantification method based on reference list overlap degree

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] In order to make the purpose, technical solution and advantages of the present invention clearer, the specific implementation manners of the present invention will be further described in detail below.

[0039] The example of the present invention provides a method for quantifying paper correlation based on the overlapping degree of the reference list, and the process is as follows figure 1 As shown, the design includes:

[0040] Step 1: Data preprocessing, delete irrelevant redundant data.

[0041] Select the paper whose field information is Computer Science in the data set, and delete its redundant attributes, leaving only the number id identifying the paper, the title of the paper, and the reference list references, and calculate the number of cited articles and the number of cited articles for each article. The number of references and the above information are stored in a dictionary structure.

[0042] Step 2: Classify and simplify the data set, and calculate rel...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a paper correlation degree quantization method based on a reference list overlap degree. The invention associates the overlap degree of the reference with the similarity degreeof the paper through the co-citation idea, quantifies the similarity degree of the paper by combining the statistical verification network and the error detection rate, and obtains a threshold valuefor judging whether two articles are similar or not. At the same time, the invention also provides several application methods of the quantification method, which are applied to detecting the homologyof the paper, searching for missing citations and simplifying the reference list. The invention can evaluate the relevance of the paper on the basis of the above analysis, At the same time, the method is applied to the missed citation detection of papers, which provides a basis for the sorting and retrieval of papers, clustering and classification, as well as the error detection in references.

Description

technical field [0001] The invention belongs to the technical field of measuring the similarity of papers in the academic field of design, and in particular designs a method for measuring the correlation of papers based on co-citation ideas and statistical verification networks. Background technique [0002] As the scientific field flourishes, so does the number of academic papers. Under such circumstances, it is of great value to quantify the correlation between papers, and the correlation can be used as an important basis for document retrieval and document classification and clustering. However, mainstream text analysis methods (such as methods based on cosine similarity and methods based on TF-IDF) are not very suitable for academic papers with a large amount of text data, and the computational complexity is high and the efficiency is very low; classification screening comparison methods (such as Methods such as Naive Bayes algorithm and KNN algorithm) are not reliable ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06F16/332
Inventor 刘嘉莹张冬瑜肖心茹步晓楠宁兆龙夏锋
Owner DALIAN UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products