character detection and recognition method for Chinese historical literature dense texts

A technology for historical document and text detection, applied in character and pattern recognition, instruments, computer parts, etc., can solve the problems that affect the recognition of historical documents and text, and the detection frame effect is not ideal.

Active Publication Date: 2019-05-24
SOUTH CHINA UNIV OF TECH +1
View PDF5 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In recent years, deep learning algorithms have made a series of breakthroughs in the field of computer vision, and general object detection algorithms and scene text detection algorithms have been greatly improved. However, for dense text in Chinese historical documents, general object The detection a

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • character detection and recognition method for Chinese historical literature dense texts
  • character detection and recognition method for Chinese historical literature dense texts
  • character detection and recognition method for Chinese historical literature dense texts

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0054] The present invention mainly solves the problem that the general object detection and scene text detection frameworks are not accurate enough for the detection of dense texts. With reference to the characteristics of human reading, a text recognition classifier is used to provide text information to help train text detectors, thereby improving text detection. The accuracy of the detector, in relative cases, using a small number of parameters can achieve more compact and effective detection and positioning.

[0055] Such as figure 1 As shown, a text detection and recognition method for dense texts of Chinese historical documents includes the following steps:

[0056] S1. Data acquisition: collect pictures of historical documents and manually label them to form a labeled data set;

[0057] S2. Data preprocessing: perform vertical projection on the historical document pictures collected in step S1 for column segmentation, and cut the vertical text in the historical docume...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a character detection and recognition method for Chinese historical literature dense texts, which comprises the following steps of: (1) making data acquisition: acquiring historical literature images, and manually labeling the historical literature images; (2) pre-processing data: performing column segmentation on the vertical projection of the historical literature images,and cutting the vertical text in the historical literature according to columns; (3) constructing and pre-training a convolutional neural network for single-row text recognition; (4) constructing a convolutional neural network for performing character detection on the single-row text, sharing shallow parameters with the convolutional neural network for performing single-row text recognition, andperforming training at the same time; and enabling the text detection convolutional neural network to identify the text information provided by the convolutional neural network by using the text, andfinely adjusting the detection position, so that the single text position of the dense text in the historical literature can be accurately detected. According to the invention, the convolutional neural network is adopted to realize text recognition, the guidance information of the text recognition classifier is fully utilized, and the detection effect can be more accurate.

Description

technical field [0001] The invention relates to the technical fields of pattern recognition and artificial intelligence, in particular to a text detection and recognition method for dense texts of Chinese historical documents. Background technique [0002] A large number of historical documents are the precious heritage left by past civilizations. The most effective way to interpret and protect these historical documents is to digitize them, including identifying and preserving the words and symbols in them. In recent years, deep learning algorithms have made a series of breakthroughs in the field of computer vision, and general object detection algorithms and scene text detection algorithms have been greatly improved. However, for dense text in Chinese historical documents, general object The detection algorithm and scene text detection framework are not ideal, and the detection effect affects the further text recognition of historical documents. Therefore, accurate detecti...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/32G06K9/34G06K9/62
Inventor 黄伟国金连文杨海林
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products