Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Scientific and technical literature picture extraction method based on Faster-RCNN

An extraction method and image technology, applied in the field of computer application and target detection, can solve the problems of ineffective effect and low accuracy, and achieve the effect of saving detection time and improving accuracy.

Inactive Publication Date: 2019-09-27
ZHEJIANG UNIV OF TECH
View PDF6 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In order to extract pictures from documents, most of the previous methods are based on manual feature extraction, the accuracy of extraction depends on the quality of feature extraction, and the effect on different scientific and technological documents is not robust
In addition, although the direct use of a simple convolutional neural network (CNN) training model can separate and extract pictures in scientific literature, the accuracy rate is still not high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Scientific and technical literature picture extraction method based on Faster-RCNN
  • Scientific and technical literature picture extraction method based on Faster-RCNN
  • Scientific and technical literature picture extraction method based on Faster-RCNN

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] The present invention will be further described below in conjunction with the accompanying drawings.

[0025] refer to figure 1 , figure 2 , a Faster-RCNN-based image extraction method for scientific and technological documents, the present invention uses the arXiv paper preprint website as an empirical data set. The method includes data collection, training data labeling, Faster-RCNN model training and picture detection and extraction.

[0026] The present invention comprises the following steps:

[0027] S1: Use web crawlers to obtain scientific literature data and preprocess them;

[0028] S2: Divide the data set into a training set and a test set, make labels for the data in the training set, and leave the data in the test set unprocessed;

[0029] S3: Input the data in the training set into the convolutional layer, and extract the feature map of the picture;

[0030] S4: Input the obtained feature maps into the RPN (Region Proposal Networks) module to obtain ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a scientific and technological literature picture extraction method based on a Fast-RCNN. The method comprises the following steps of 1) acquiring the scientific and technological literature data by using a web crawler and preprocessing the scientific and technological literature data; 2) dividing a data set, making a label for the data in the training set, and not processing the data in the test set; 3) inputting the data in the training set into a convolution layer, and extracting the feature mapping of the pictures; (4) mapping and inputting the obtained features into an RPN module to obtain the proposal feature maps with fixed sizes; 5) classifying the specific categories by utilizing the softmax to obtain the accurate position of a target, calculating a loss function, and updating the parameters of the whole network to obtain a training model; 6) utilizing the training model to detect the data in the data set, and outputting the detected pictures. The scientific and technological literature picture extraction method is high in detection speed and high in accuracy, facilitates the further analysis and understanding of the scientific and technological literature pictures, and has the higher practical application value.

Description

technical field [0001] The invention relates to the field of computer application technology and the field of target detection, in particular to a method for extracting pictures of scientific and technological documents based on Faster-RCNN. Background technique [0002] The pictures in scientific and technological literature contain the important ideas and results of the literature. Analyzing and understanding the pictures in scientific and technological literature will help researchers better understand the literature. In order to extract pictures from documents, most of the previous methods are based on manual feature extraction. The accuracy of extraction depends on the quality of feature extraction, and the effect on different scientific and technological documents is not robust. In addition, although the direct use of a simple convolutional neural network (CNN) training model can separate and extract pictures in scientific and technological literature, the accuracy rat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06K9/00G06N3/04
CPCG06V30/422G06N3/045G06F18/214
Inventor 傅晨波李一帆夏镒楠沈彬潘星宇盛轩硕
Owner ZHEJIANG UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products