Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Method for automatically mining corresponding citing fragment and citied literature original content fragment

A technology for automatic mining and content fragmentation. It is applied in special data processing applications, natural language data processing, instruments, etc. It can solve problems such as result impact and large performance fluctuations, and achieve simple and easy methods, solving ambiguity problems, and mining effects. good effect

Inactive Publication Date: 2016-11-16
同方知网数字出版技术股份有限公司
View PDF4 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This method uses a relatively mature technology, and its disadvantage is that it needs a large-scale corpus training to obtain word feature vectors in advance. At the same time, the selection of feature vectors has a great impact on the results, resulting in large performance fluctuations.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for automatically mining corresponding citing fragment and citied literature original content fragment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the embodiments and accompanying drawings.

[0020] Description of related concepts:

[0021] Citing fragments: sentences that cite references in a clear manner in the text of the paper, and the delimiters of sentences are Chinese and English periods.

[0022] References: Cited documents and related information listed in sequential coding after the main text of the paper, excluding endnotes, footnotes and other forms.

[0023] Original content fragments of cited documents: sentences in the text of references, and the delimiters of sentences are Chinese and English periods.

[0024] Such as figure 1 As shown, the method flow for automatically mining the corresponding citing fragment and the original text content fragment of the cited document, the method includes:

[0025] Step 10 extracts ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for automatically mining a corresponding citing fragment and a citied literature original content fragment. The method comprises the following steps: extracting a sentence which cites references as a citing fragment; carrying out sentence segmentation and numbering on the references cited by the citing fragment; carrying out word segmentation on each sentence in the citing fragment and the references to form citing fragment word groups and reference sentence word groups, and calculating similarity between the sentence in the citing fragment and the sentence in the references; and according to the calculated similarity of the sentences, sorting the sentences, extracting the sentence, which has the highest similarity with the citing fragment, in the references, and taking the extracted sentence as the cited literature original content fragment corresponding to the citing fragment. By use of the method provided by the invention, corpus training does not need to be prepared in advance, calculation complexity is low, various similarity calculation methods can be flexibly realized, and high accuracy and a high recall rate are realized.

Description

technical field [0001] The invention belongs to the fields of information extraction in natural language processing and citation content extraction and analysis in bibliometrics, and in particular relates to a method for automatically mining corresponding citing fragments and original content fragments of cited documents. Background technique [0002] Currently, the research on citation relationship in the field of bibliometrics only uses information such as the number of citations and bibliography of the paper, without in-depth analysis and utilization of the specific reference content information in the paper. Based on the above deficiencies, the present invention uses natural language processing technology to analyze scientific and technological papers, proposes and implements a method system for analyzing the content of citing fragments of scientific and technological papers and the original text of cited documents, and mining and discovering their corresponding relations...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCG06F40/205G06F40/279
Inventor 王骏赵一方熊海涛伍军红
Owner 同方知网数字出版技术股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products