Tibetan ancient book document image binarization method and system

A document image, binarization technology, applied in the field of image processing, can solve problems such as false adhesion between strokes, achieve the effect of accurate reconstruction and suppress false adhesion

Active Publication Date: 2021-05-25
NORTHWEST UNIVERSITY FOR NATIONALITIES
0 Cites 3 Cited by

AI-Extracted Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to provide a method and system for binarizing Tibetan ancient book document images, so as to solve the p...
View more

Method used

(4) network training: start the training of network, send picture into network and carry out forward propagation, obtain actual output, and by loss function calculation and the loss betw...
View more

Abstract

The invention relates to a Tibetan ancient book document image binarization method and system. The method comprises the following steps: acquiring a Tibetan ancient book document image, and carrying out binarization processing on the Tibetan ancient book document image to determine a preliminary binarization image; determining an estimated binary image according to the preliminary binary image, labeling the estimated binary image, and determining a Tibetan ancient book document image labeling image; training an improved U-Net network model by using the Tibetan and ancient book document image annotation graph and the Tibetan and ancient book document image to generate a trained U-Net network model, and storing network model parameters; and slicing a to-be-processed Tibetan and ancient book document image, amplifying the sliced Tibetan and ancient book document image and the to-be-processed Tibetan and ancient book document image, inputting the amplified Tibetan and ancient book document image and the to-be-processed Tibetan and ancient book document image into the trained U-Net network model, and determining a final binarization result image. According to the invention, the generation of false adhesion is effectively inhibited.

Application Domain

Image enhancementImage analysis +1

Technology Topic

Network modelBinary image +1

Image

  • Tibetan ancient book document image binarization method and system
  • Tibetan ancient book document image binarization method and system
  • Tibetan ancient book document image binarization method and system

Examples

  • Experimental program(1)

Example Embodiment

[0064] The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
[0065] The purpose of the present invention is to provide a method and system for image binarization of Tibetan ancient book documents, which can effectively suppress the generation of false adhesion.
[0066] In order to make the above objects, features and advantages of the present invention more clearly understood, the present invention will be described in further detail below with reference to the accompanying drawings and specific embodiments.
[0067] Terminology Explanation:
[0068] Channels: Usually, a digital image consists of three channels, R, G, and B, that is, red, green, and blue.
[0069] Binarization: The process of converting a color or grayscale image into a black and white image.
[0070] Binary image: A binary image is a black-white image, and the pixel value of the image is either 0 for black, 255 or 1 for white.
[0071] Grayscale image: The image contains only one channel, and the pixel values ​​in the image can be anywhere from 0 to 255.
[0072] Pseudo-adhesion: It means that the place in the original document image that should not be glued in the foreground is glued in its binary image. figure 1 The position of the middle box is the position where the pseudo-adhesion occurs.
[0073] Convolutional Neural Networks: It is a class of feedforward neural networks that contain convolutional computations and have a deep structure.
[0074] Fully convolutional network: is a type of convolutional neural network that does not contain fully connected layers.
[0075] U-Net: A convolutional neural network for biomedical image segmentation proposed in 2015, which has now been shown to be an efficient network. The input of the network can be a three-channel color image, and the output can be a single-channel grayscale image. It is a type of fully convolutional network.
[0076] Bilinear interpolation: Mathematically, bilinear interpolation is a linear interpolation extension of an interpolation function with two variables. The core idea is to perform linear interpolation in two directions respectively.
[0077] Otsu method (OTSU method): a global binarization method.
[0078] Sauvola's method: a local binarization method.
[0079] Network parameter: The numerical value used in the calculation in the network.
[0080] Hyperparameters: parameters in the network that cannot be obtained by training, usually need to be manually set by humans.
[0081] Loss function: A function used to calculate the error between the actual output and the target output.
[0082] Optimizer: A tool that tunes network parameters based on error.
[0083] Tibetan ancient book document image: Different Tibetan ancient books have different page sizes, generally about 25-90 cm in length and 6-30 cm in width, usually called long-striped books. The image of the Tibetan ancient book document image of the Beijing version of "Ganjur" used in the present invention is about 5300×1500 pixels in size.
[0084] False positive pixels: In the binary image, the white pixels that should be used as the background are wrongly binarized and become black pixels.
[0085] False negative pixels: In the binary image, the black pixels that should be the foreground are wrongly binarized and become white pixels.
[0086] figure 2 The flow chart of the image binarization method for Tibetan ancient book documents provided by the present invention, such as figure 2 As shown, a method for image binarization of Tibetan ancient book documents includes:
[0087] Step 201 : Acquire an image of an ancient Tibetan book document, and perform a binarization process on the image of the Tibetan ancient book document to determine a preliminary binarized image.
[0088] Step 202 : Determine an estimated binarization map according to the preliminary binarization map, and annotate the estimated binarization map to determine an image annotation map of the Tibetan ancient book document.
[0089] The step 202 specifically includes: comparing the estimated binarization map with the real label, marking and removing false positive pixels and false negative pixels in the estimated binarization map, and determining the marked binarization map; Judging whether the marked binarized image has the phenomenon of stroke edge expansion, and obtaining a first judgment result; If not, it is determined that the annotated binarized image is an image annotation image of an ancient Tibetan book document.
[0090] Well-labeled data is the basis for training neural networks. If an inappropriate method is used to label images of ancient Tibetan books, it is neither time-saving nor practical. Therefore, the present invention explores a more efficient method to build an annotated dataset. The process consists of three main stages: obtaining a preliminary binary map, obtaining an estimated binary map, and relabeling. image 3 Flowchart of the data labeling process.
[0091]First, use the Sauvola method or other methods to roughly generate the corresponding binary atlas;
[0092] Then, feed the data into the network model and start the training process. Once the network is trained, it can be used to generate an estimated binary map.
[0093] Finally, since the estimated binary image may not be too accurate compared with the real label, further manual correction is required, that is, the false positive and false negative pixels are manually erased to make the labeled image more accurate. In addition, if different degrees of edge expansion are found, the map should also be subjected to a morphological erosion operation to shrink its edges inward by one pixel. If necessary, repeat this process as many times as possible until the stroke thickness in the annotation image is exactly the same as that in the original image.
[0094] Step 203: Use the image annotation map of the Tibetan ancient book document and the Tibetan ancient book document image to train the improved U-Net network model, generate a trained U-Net network model, and save the network model parameters; The improved U-Net network model introduces an attention mechanism in the skip connection of the original U-Net network model; the network model parameters include a network model structure, weight parameters and hyperparameters, and the network model structure includes the optimizer's Selection and definition of the loss function, the hyperparameters include the number of training epochs and the learning rate.
[0095] Figure 4 It is the structure diagram of the Attention U-Net network model. The Attention U-Net network model is an improved U-Net network model generated on the basis of the U-Net network model. The attention mechanism is introduced in the skip connection part, making the network It can better focus on salient regions and suppress irrelevant background regions such as noise and stains, where F i ×H j ×W j Indicates that the position has an F i H j ×W j The size of the feature map, i is 1, 2, 3; j is 0, 1, 2, 3, 4.
[0096] Neural networks need to be trained before they can be used, Figure 5 For the training flow chart of the improved U-Net network model, the main steps are as follows:
[0097] (1) Loading the Tibetan ancient book document image dataset: reading the dataset from the memory, that is, reading the original document image and its corresponding annotated image.
[0098] (2) Data set augmentation: Data augmentation is performed on the read data set, including adding noise, simulating stains, and random flipping. For operations such as adding noise, simulating stains, etc., no operations are performed on the annotated image. For operations such as random flip, this operation needs to be performed on the labeled image at the same time.
[0099] (3) Initialize the network model and set hyperparameters: Define the structure of the network model, including the selection of the optimizer, the definition of the loss function, etc., and set the hyperparameters required by the network, such as the number of training rounds, learning rate, etc.
[0100] (4) Network training: Start the network training, send the picture into the network for forward propagation, get the actual output, and calculate the loss between the image and the labeling image through the loss function, and use the optimizer to continuously adjust the network parameters in order to achieve the next The error between the actual output of the training round and the annotated image is smaller.
[0101] (5) Save the network model: After the network training is over, save the network model and parameters to the pth model file.
[0102] The step 203 specifically includes: using the Tibetan ancient book document image and the Tibetan ancient book document image annotation map corresponding to the Tibetan ancient book document image as a Tibetan ancient book document image dataset, and analyzing the Tibetan ancient book document. Perform data augmentation processing on the image data set to determine the augmented Tibetan ancient book document image data set; the augmented Tibetan ancient book document image data set includes the augmented Tibetan ancient book document image annotation map and the augmented image data set. The image of the Tibetan ancient book document; initialize the network model parameters, input the augmented Tibetan ancient book document image data set into the improved U-Net network model for forward propagation, and calculate the loss function through the loss function. Describe the loss between the augmented Tibetan ancient book document image and the augmented Tibetan ancient book document image, and use the optimizer to adjust the network model parameters to generate a trained U-Net network model , and save the network model parameters.
[0103] The said image of the Tibetan ancient book document and the image annotation map of the Tibetan ancient book document corresponding to the said Tibetan ancient book document image are used as the Tibetan ancient book document image data set, and the data of the Tibetan ancient book document image data set is carried out. Augmentation processing, determining the augmented Tibetan ancient book document image dataset, specifically including: performing noise processing, simulating stain processing and random flip processing on the Tibetan ancient book document image to determine the augmented Tibetan ancient book document image; perform random flip processing on the image annotation map of the Tibetan ancient book document, and determine the enhanced image annotation map of the Tibetan ancient book document.
[0104] Step 204: Perform slice processing on the image of the Tibetan ancient book document to be processed, and input the sliced ​​Tibetan ancient book document image and the to-be-processed Tibetan ancient book document image into the trained U-Net network In the model, the final binarization result map is determined.
[0105] The step 204 specifically includes: inputting the image of the Tibetan ancient book document to be processed into the trained U-Net network model, and using the Dajin binarization algorithm to perform the processing on the to-be-processed Tibetan ancient book document image. Binarization processing to generate a first binarized Tibetan ancient book document image; image slicing of the to-be-processed Tibetan ancient book document image to generate a plurality of Tibetan ancient book document image sub-blocks; The image sub-blocks of ancient texts and documents are input into the U-Net network model of the training in turn, and it is judged whether all the image sub-blocks of the ancient Tibetan books are input into the U-Net network model of the training, and if so, Merge the image sub-blocks of the Tibetan ancient book document, and reduce the merged Tibetan ancient book document image to the same size as the to-be-processed Tibetan ancient book document image; use the Ojin binarization algorithm to The ancient book document image is binarized to generate a second binarized Tibetan ancient book document image; the first binarized Tibetan ancient book document image and the second binarized Tibetan text image are integrated. The image of the ancient book document generates the final binarization result map.
[0106] When the network training is completed, the network can be used to generate binary images, Image 6 Prediction flowchart for the final binarized image.
[0107] First, the image of the Tibetan ancient book document to be binarized is read from the memory, and then the network model and its parameters are loaded from the previously saved pth model file.
[0108] Due to the limitation of equipment function, it is often difficult to send a complete image of a complete Tibetan ancient book document to a graphics card (Graphics Processing Unit, GPU) for training, so it needs to be divided and sent to the GPU. However, the cost of doing so is that it will affect the network's ability to perceive the stains in the ancient book document images in a disguised form, reducing the performance of the network, resulting in a weakening of the processing ability of the original image noise, rough texture, and stains. However, using the Central Processing Unit (CPU) with a larger running memory (Random Access Memory, RAM) can usually complete the loading of the entire image and save the operation results. The experimental results show that its ability to inhibit stains is better. Excellent, but tends to take longer and text areas aren't as detailed as they would be diced and fed into the graphics card. In order to combine the advantages of dicing operation and full graph operation, the present invention is divided into two branches: GPU branch and CPU branch.
[0109] For the GPU branch, the original document image needs to be sliced ​​first to obtain multiple image sub-blocks, and then the image sub-blocks are enlarged using the bilinear interpolation method. The evaluation indicators corresponding to different enlargement ratios are shown in Table 1. It can be seen that 2x magnification has the highest accuracy. Although the 2.8 times magnification factor evaluation index is more excellent, it takes longer time and is prone to generate holes in the character strokes, so the magnification factor of the image sub-block in the present invention is 2 times.
[0110] Table 1 Schematic diagram of evaluation indicators under different magnifications
[0111]
[0112]
[0113] The image sub-blocks are continuously enlarged and sent to the network to complete the operation until all the image sub-blocks have completed the operation. After combining the image sub-blocks predicted by the network according to certain rules, an enlarged predicted binary image is obtained, which is reduced to the original image size, and the OTSU method is used to obtain the result image under the GPU branch.
[0114] For the CPU branch, instead of slicing the image, the larger-capacity RAM can completely save the operation result of the CPU, and the corresponding output result can be obtained by sending the image into the network completely. Similarly, use the OTSU method on its output to get the result graph under the CPU branch. like Figure 7 As shown, the image bitwise OR operation is performed on the CPU result graph and the GPU result graph to obtain the final binary result graph.
[0115] Figure 8 The structure diagram of the image binarization system for Tibetan ancient book documents provided by the present invention, such as Figure 8 As shown in the figure, an image binarization system of Tibetan ancient book documents includes:
[0116] The preliminary binarized map determination module 801 is used for acquiring an image of an ancient Tibetan book document, and performing a binarization process on the image of the Tibetan ancient book document to determine a preliminary binarized map.
[0117] An annotation module 802 is configured to determine an estimated binarization map according to the preliminary binarization map, and annotate the estimated binarization map to determine an image annotation map of the Tibetan ancient book document.
[0118] The labeling module 802 specifically includes: a labelled binarization map determining unit, configured to compare the estimated binarization map with the real label, and label and remove false positive pixels and false positive pixels in the estimated binarization map. The negative pixel points determine the marked binarized image; the first judgment unit is used to judge whether the marked binarized image has the phenomenon of stroke edge expansion, and obtain the first judgment result; the morphological erosion operation unit is used for If the first judgment result indicates that the marked binarized image has a phenomenon of stroke edge expansion, perform a morphological erosion operation on the marked binarized image, and re-analyze the marked binarized image. The image annotation map determination unit of the Tibetan ancient book document is used to determine the marked binary image if the first judgment result indicates that the marked binary image has no stroke edge expansion phenomenon. The map is an image annotation map of Tibetan ancient book documents.
[0119] The training module 803 is used to train the improved U-Net network model using the image annotation map of the Tibetan ancient book document and the Tibetan ancient book document image, generate a trained U-Net network model, and save the network model parameters; the improved U-Net network model introduces an attention mechanism in the skip connection of the original U-Net network model; the network model parameters include a network model structure, weight parameters and hyperparameters, and the network model structure includes The selection of the optimizer and the definition of the loss function, the hyperparameters include the number of training epochs and the learning rate.
[0120]The training module 803 specifically includes: an augmentation processing unit configured to use the Tibetan ancient book document image and the Tibetan ancient book document image annotation map corresponding to the Tibetan ancient book document image as a Tibetan ancient book document image dataset, and perform data augmentation processing on the image data set of Tibetan ancient book documents to determine the augmented Tibetan ancient book document image data set; the augmented Tibetan ancient book document image data set includes the augmented Tibetan text The image annotation map of the ancient book document and the augmented image of the Tibetan ancient book document; the training unit is used to initialize the network model parameters, and the augmented Tibetan ancient book document image dataset is input into the improved U-Net network model Forward propagation is carried out in the loss function, and the loss between the augmented Tibetan ancient book document image and the augmented Tibetan ancient book document image is calculated by the loss function, and the network model is analyzed by the optimizer. The parameters are adjusted to generate a trained U-Net network model, and the network model parameters are saved.
[0121] The augmentation processing unit specifically includes: a subunit for determining the image of the augmented Tibetan ancient book document, which is used to perform noise processing, simulating stain processing and random inversion processing on the image of the Tibetan ancient book document, to determine the augmented image. The image of the Tibetan ancient book document; the subunit for determining the image annotation map of the enhanced Tibetan ancient book document is used to randomly turn over the image annotation map of the Tibetan ancient book document to determine the augmented image annotation map of the Tibetan ancient book document .
[0122] The final binarization result image determination module 804 is used for slicing the image of the Tibetan ancient book document to be processed, and the sliced ​​Tibetan ancient book document image and the to-be-processed Tibetan ancient book document image are enlarged and input In the trained U-Net network model, the final binarization result map is determined.
[0123] The final binarization result map determination module 804 specifically includes: a first binarized Tibetan ancient book document image generation unit, used for inputting the to-be-processed Tibetan ancient book document image into the trained In the U-Net network model, the image of the Tibetan ancient book document to be processed is binarized by using the Dajin binarization algorithm to generate the first binarized image of the Tibetan ancient book document; A block generation unit, used for image slicing of the to-be-processed Tibetan ancient book document image, to generate a plurality of Tibetan ancient book document image sub-blocks; a second judgment unit, used for all the Tibetan ancient book document image sub-blocks Input into the described trained U-Net network model successively, judge whether all described Tibetan ancient book document image sub-blocks are all input into the described trained U-Net network model, obtain the second judgment result; Merging unit , for if the second judgment result is expressed as that all the image sub-blocks of the Tibetan ancient book document are enlarged and input into the trained U-Net network model, and the image sub-block of the Tibetan ancient book document is merged, and reduce the merged Tibetan ancient book document image to the same size as the to-be-processed Tibetan ancient book document image; the second binarized Tibetan ancient book document image generation unit is used to utilize the Ojin binarization algorithm Binarization processing is performed on the reduced Tibetan ancient book document image to generate a second binarized Tibetan ancient book document image; the final binarization result map generation unit is used to integrate the first binarized image. The Tibetan ancient book document image and the second binarized Tibetan ancient book document image to generate the final binarization result map.
[0124] Figure 9 For the comparison map of binary local instances generated by the method of the present invention, the upper row is the original image, and the lower row is the corresponding binary image after processing by the present invention. Depend on Figure 9 It can be seen that the final binarized image obtained by the present invention can more clearly and accurately display the handwriting of the image of the Tibetan ancient book document, which is more helpful for the study of the Tibetan ancient book document image.
[0125] The various embodiments in this specification are described in a progressive manner, and each embodiment focuses on the differences from other embodiments, and the same and similar parts between the various embodiments can be referred to each other. For the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant part can be referred to the description of the method.
[0126] The principles and implementations of the present invention are described herein using specific examples. The descriptions of the above embodiments are only used to help understand the method and the core idea of ​​the present invention; meanwhile, for those skilled in the art, according to the present invention There will be changes in the specific implementation and application scope. In conclusion, the contents of this specification should not be construed as limiting the present invention.

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products