Chinese ancient book character recognition method, Chinese ancient book character segmentation, layout reconstruction method, medium and equipment

A technology of character recognition and character classification, applied in character recognition, character and pattern recognition, neural learning methods, etc., can solve problems such as misjudgment, omission, and uneven distribution of character categories, achieve uniform character size distribution, reduce negative interference, The effect of improving accuracy

Active Publication Date: 2021-07-23
SOUTH CHINA UNIV OF TECH
View PDF7 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, there are special elements such as icons, seals, and double-column notes in Chinese ancient book documents. Traditional text line detection algorithms or only focus on simple layouts such as single-line text are not suitable for ancient book documents with complex layout structures and diverse contents.
At the same time, there are handwritten, variant or uncommon fonts in Chinese ancient book documents, and traditional algorithms that focus on common printed Chinese character recog

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese ancient book character recognition method, Chinese ancient book character segmentation, layout reconstruction method, medium and equipment
  • Chinese ancient book character recognition method, Chinese ancient book character segmentation, layout reconstruction method, medium and equipment
  • Chinese ancient book character recognition method, Chinese ancient book character segmentation, layout reconstruction method, medium and equipment

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0080] Example 1

[0081] This embodiment discloses a Chinese ancient bike character identification method, which can be performed by a smart device such as a computer, such as figure 1 As shown, the specifically includes the steps of:

[0082] Step 1. Get the Chinese ancient books of the character boundary box and the character category as the original training sample; simultaneously obtain the label file of the original training sample, including the character boundary box size, character position, and character category in the standard file.

[0083] The above character position can be obtained by the character boundary box, the specific, character position is the coordinates of the two top angles opposite the boundary frame, for example: (x left Y top , X right Y bottom ), (X left Y top ) As the coordinate of the left upper left corner, (x right Y bottom ) The coordinates of the lower right corner of the boundary box.

[0084] The above character category refers to a specific ...

Example Embodiment

[0117] Example 2

[0118] This example discloses a Chinese ancient book character group method, including the following steps:

[0119] Step 7, for the acquired Chinese ancient bike document image, the prediction boundary frame and predictive category of each character are acquired by the method described in Embodiment 1;

[0120] Step 8, the predicted boundary box of each character is restored in group clustering and reading order in accordance with the ancient book order and the character language sequence, and obtain an ancient book content of the non-punctuation symbol. like figure 2 As shown in this, the specific steps are as follows:

[0121] S1, in the predicted boundary box of each character, in accordance with the ancient book order habitual sorting, and calculate the geometric feature information of the character boundary box. details as follows:

[0122] S1A, the predicted boundary box of each character is sorted in the order of the ancient books from the left, and from...

Example Embodiment

[0170] Example 3

[0171] This embodiment discloses a Chinese ancient book reconstruction method, including steps:

[0172] Step 9. For the acquired Chinese ancient books, first, the characters identified in the Chinese ancient book image are restored by the Chinese ancient bike character set method according to Embodiment 2, and the ancient book for non-punctuation symbol is obtained. Text content;

[0173] Step 10, build a language model ancient books reconstruction algorithm, including the error correction language model, and the fractal language model, error correction and selection of ancient book content for non-punctuation symbols. like Figure 5 As shown, the specific is as follows:

[0174] (1) Bert-base-chinese language model based on modern text, use the ancient text data set of 殆 Zhixi Huichen as a domain tang, and perform an unaffected language model based on masking language model. .

[0175] (2) Based on the above acquired error correction language model, the ancient...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Chinese ancient book character recognition method, a Chinese ancient book character segmentation, a layout reconstruction method, a medium and equipment, and the Chinese ancient book character recognition method comprises the steps: firstly obtaining a Chinese ancient book document image marked with a character bounding box and a character category, and taking the image as an original training sample; acquiring an annotation file of the original training sample; randomly selecting a plurality of original training samples, and processing the original training samples to obtain new training samples: processing the original training samples and the new training samples in an online random cutting mode to obtain a training sample set; training a character level detection classification model through training samples in the training sample set; and inputting a Chinese ancient book document image of which characters are to be recognized into the character level detection classification model to obtain a prediction bounding box and a prediction category of each character of the Chinese ancient book document image. According to the method, common characters can be recognized, some uncommon special characters in the ancient books can be recognized very accurately, and the problems of misjudgment, omission and the like existing in ancient book document recognition in the prior art are solved.

Description

technical field [0001] The invention relates to the technical field of ancient Chinese books research, in particular to a method, medium and equipment for character recognition, grouping and layout reconstruction of ancient Chinese books. Background technique [0002] With the research and development of deep learning, image text detection and recognition technology based on computer vision is playing an increasingly important role in daily life, business activities and scientific research, and has made good progress. The research results involve document recognition , bill recognition and specific scene text recognition. However, the existing research is only for text images with clear handwriting, obvious background contrast, and limited character categories. The distribution of characters follows the modern typesetting style from left to right and top to bottom. There are deficiencies in the research work on the recognition of ancient book documents that follow the arran...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/00G06K9/62G06N3/04G06N3/08G06K9/03G06F40/232G06F40/30
CPCG06N3/08G06F40/232G06F40/30G06V30/414G06V30/40G06V10/98G06V30/287G06V30/10G06N3/045G06F18/23G06F18/24G06F18/214Y02D10/00
Inventor 薛洋李智豪
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products