Deep learning-based printed burmese character optical character recognition method and system

By combining deep learning methods with Hough transform and YOLOv2/ResNet18 to construct a Burmese OCR model, the problem of insufficient recognition accuracy of Burmese printed documents was solved, realizing high-precision character recognition and low hardware requirements for practical application.

CN122290141APending Publication Date: 2026-06-26MYANMAR CENTRAL SOLUTIONS CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
MYANMAR CENTRAL SOLUTIONS CO LTD
Filing Date
2026-04-24
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing technologies cannot effectively recognize characters in Burmese printed documents, especially in table scenarios where recognition accuracy is insufficient, and they have high hardware requirements, making them difficult to deploy in local office settings in Myanmar.

Method used

A deep learning-based approach is adopted, using Hough transform for image preprocessing and table segmentation. A Burmese OCR recognition model is constructed by combining the YOLOv2 object detection model and the ResNet18 residual network. Transfer adaptation and training are performed, and the model parameters are optimized to achieve high-precision recognition.

Benefits of technology

It achieves high-precision character recognition in complex office scenarios with an accuracy rate of 98.31%, adapts to noise, blur and rotation issues in scanned documents, reduces hardware requirements, and is suitable for deployment in local office scenarios in Myanmar.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122290141A_ABST
    Figure CN122290141A_ABST
Patent Text Reader

Abstract

This invention discloses a method and system for optical character recognition of printed Burmese text based on deep learning, belonging to the field of optical character recognition technology. The method first preprocesses scanned images of printed Burmese documents, using Hough transform to segment table and character regions and extract target character regions. Then, it constructs an image dataset and builds a Burmese OCR recognition model. The model is trained using a stochastic gradient descent optimizer with momentum, and parameters are optimized through 4-fold cross-validation to obtain a high-precision recognition model. Finally, the segmented character regions are input into the model to complete detection and classification, outputting text information and importing it into a data table. This invention can accurately segment Burmese characters within tables, improve model recognition accuracy, has low deployment costs, and can be used for detecting duplicate voter registration in Burmese elections, effectively improving the efficiency of Burmese document digitization and information extraction.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of optical character recognition technology, specifically to a method and system for optical character recognition of printed Burmese script based on deep learning. Background Technology

[0002] Optical character recognition (OCR) is a technology that can identify printed and handwritten text in digital images of scanned paper documents and other physical documents. It is one of the core research directions in the fields of pattern recognition, artificial intelligence, and computer vision. Mature OCR technology can convert characters in images into editable Unicode encoding, significantly reducing the time cost and human error in digitizing paper documents. It has extremely high application value in office scenarios such as government affairs, finance, and public services. Therefore, there is a need for a method and system for optical character recognition of printed Burmese characters based on deep learning.

[0003] In Myanmar, the technology related to the Burmese scripting language urgently needs development and improvement. Furthermore, a large amount of information exists in the form of paper documents. Currently, these documents need to be digitized, stored, and converted into editable forms. Burmese is widely used as an official language in many states and regions of Myanmar. It is used in many office settings, such as passport control, banking, sales tax, railways, and embassies.

[0004] Existing technologies for OCR systems targeting Southeast Asian languages ​​have significant limitations: Malaysian character recognition systems can only recognize basic characters and cannot accurately segment and extract characters within tables; Sinhala OCR systems based on the Tesseract 4.0 engine have an accuracy rate of only 97%, which is still insufficient to meet the requirements of government-level identity verification; and printed character recognition schemes based on canonical correlation analysis have not been adapted to the morphological features of Burmese characters and table scenarios, resulting in insufficient recognition accuracy and anti-interference capabilities.

[0005] Meanwhile, Burmese printed document recognition faces several technical challenges: scanned documents generally suffer from noise, blurriness, and rotation, making accurate segmentation of characters within tables difficult; Burmese contains 33 consonant letters, 10 numbers, and several special characters, resulting in complex character forms that existing models struggle to achieve high-precision classification and recognition; and limited local office hardware in Myanmar makes existing high-precision OCR models too demanding to be deployed effectively. Summary of the Invention

[0006] To address the aforementioned technical shortcomings, the present invention aims to provide a method and system for optical character recognition of printed Burmese script based on deep learning.

[0007] To solve the above technical problems, the present invention adopts the following technical solution: The present invention provides a method for optical character recognition of printed Burmese script based on deep learning, including the following steps: Step 1, image preprocessing and table segmentation: The input scanned image of printed Burmese script is preprocessed, and Hough transform is used to detect horizontal and vertical lines in the image, thereby segmenting the table area and character area and extracting the character area containing the target information.

[0008] Step 2: Dataset Construction and Model Adaptation: Construct an image dataset containing 44 classes of Burmese printed characters, perform transfer adaptation based on the YOLOv2 object detection model, and construct a Burmese OCR recognition model using a pre-trained ResNet18 residual network as the feature extraction backbone.

[0009] Step 3: Model Training and Optimization: Divide the dataset into training and validation sets according to a preset ratio, use the stochastic gradient descent (SGDM) optimizer with momentum to train the OCR recognition model, and optimize the model parameters through k-fold cross-validation to obtain a high-precision Burmese OCR recognition model after training.

[0010] Step 4: Character Recognition and Result Output: Input the character regions segmented in S1 into the trained OCR recognition model to complete the detection and classification of Burmese characters, output the recognized text information and import it into the data table.

[0011] Preferably, the detection of horizontal and vertical lines in the image is performed as follows: The line is described by a normal parametric equation, which is: ρ = xcosθ + ysinθ, where ρ is the perpendicular distance from the origin to the line, θ is the angle between the perpendicular line and the x-axis, x is the abscissa of the coordinate axis, and y is the ordinate of the coordinate axis. , It is the slope of the straight line. yes The intercept.

[0012] Preferably, the construction process of the Burmese OCR recognition model is as follows: the feature extraction module adopts a pre-trained ResNet18 architecture, the detection module replaces the detection sub-network of YOLOv2, and is composed of 14 convolutional layers stacked in sequence. Each convolutional layer consists of a convolutional layer, a batch normalization layer and a ReLU activation layer. At the end of the model, YOLO convolutional layer, YOLO transform layer and YOLO output layer are set in sequence. The input of the model is an RGB image with a resolution of 224×224.

[0013] Preferably, the OCR recognition model is trained in the following manner: the programming environment for model training is MATLAB 2020a, and the hyperparameters of the training process are set as follows: initial learning rate of 0.001, mini-batch size of 5, and maximum number of training rounds of 32; 4-fold cross-validation is used to optimize the model parameters, with each fold corresponding to 2000 OCR image data.

[0014] In the OCR stage, the detector model is first trained on 8,000 images containing 44 categories. Then, the detector model is saved and used to detect 2,000 verification images. Training is completed when the detection accuracy reaches a preset value by randomly allocating 80% of the images for training and 20% for verification.

[0015] On the other hand, the present invention provides a deep learning-based optical character recognition system for printed Burmese script, including the following modules: an image preprocessing and table segmentation module, used to preprocess the input scanned image of printed Burmese script document, use Hough transform to detect horizontal and vertical lines in the image, realize the segmentation of table area and character area, and extract character area containing target information.

[0016] The dataset construction and model adaptation module is used to construct an image dataset containing 44 classes of Burmese printed characters. It performs transfer adaptation based on the YOLOv2 object detection model and uses a pre-trained ResNet18 residual network as the feature extraction backbone to construct a Burmese OCR recognition model.

[0017] The model training and optimization module is used to divide the dataset into training and validation sets according to a preset ratio, use a stochastic gradient descent (SGDM) optimizer with momentum to train the OCR recognition model, and optimize the model parameters through k-fold cross-validation to obtain a high-precision Burmese OCR recognition model after training.

[0018] The character recognition and output module is used to input the character regions segmented in S1 into the trained OCR recognition model to complete the detection and classification of Burmese characters, output the recognized text information and import it into the data table.

[0019] The beneficial effects of this invention are as follows: 1. This invention achieves accurate detection and segmentation of table lines through Hough transform, effectively solving the industry pain point of difficult character segmentation in printed documents. It can adapt to common problems such as noise, blur and rotation in scanned documents, and greatly improves the ability to extract character regions in complex office scenarios.

[0020] This invention is based on the YOLOv2 object detection model for transfer adaptation, combined with the ResNet18 residual network as the feature extraction backbone, and constructs a dedicated 14-layer convolutional neural network for 44 classes of Burmese characters. The model has fast convergence speed and high recognition accuracy; the overall accuracy of 4-fold cross-validation reaches 98.31%, the highest single character recognition accuracy can reach 100%, the average recognition accuracy exceeds 99%, and the average specificity of negative samples exceeds 99.9%, which has extremely strong recognition stability and anti-interference ability.

[0021] This invention can accurately extract birth dates and NRC numbers from Burmese documents and tables. It can be directly applied to the scenario of detecting duplicate voter registrations in the Myanmar general election, transforming manual verification into automated detection, greatly improving information extraction efficiency, reducing the error rate and workload of manual verification, and providing reliable technical support for the investigation of election fraud.

[0022] The model training of this invention can be completed in a conventional consumer-grade hardware environment with an Intel Core i3 processor, 4GB of memory and an NVIDIA GeForce 920MX graphics card. It has a low deployment threshold, is suitable for the hardware conditions of local office scenarios in Myanmar, and has strong practicality and promotional value. Attached Figure Description

[0023] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0024] Figure 1 This is a schematic diagram of the implementation steps of the method of the present invention.

[0025] Figure 2 This is a schematic diagram illustrating the boundary detection principle based on Hough transform of this invention.

[0026] Figure 3 This is a system design flowchart of the detection model architecture of the present invention.

[0027] Figure 4 series are schematic diagrams of the birth date recognition results of the present invention, wherein Figure 4a For the input image of the birth date, Figure 4b The result is the birth date identification.

[0028] Figure 5 series is a schematic diagram of the NRC number recognition results of the present invention, wherein Figure 5a For the input NRC number image, Figure 5b This is the result of NRC number identification.

[0029] Figure 6This is a schematic diagram of the implementation interface of the OCR system of the present invention.

[0030] Figure 7 This is a schematic diagram illustrating the implementation effect of the OCR system of the present invention in a table.

[0031] Figure 8 This is a schematic diagram of the system structure connection of the present invention. Detailed Implementation

[0032] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0033] according to Figure 1 As shown, this invention provides a method for optical character recognition of printed Burmese script based on deep learning, including the following steps: Step 1, image preprocessing and table segmentation: preprocess the input scanned image of printed Burmese script, use Hough transform to detect horizontal and vertical lines in the image, segment the table area and character area, and extract the character area containing the target information.

[0034] After accurately locating the table borders and row / column lines using Hough transform, the table area and background area are segmented. Then, based on the coordinates of the table lines, the target character areas containing voters' birth dates and Myanmar National Identity Card (NRC) numbers are extracted from the table, completing the segmentation and extraction of single-character images. If the image is severely blurred or damaged, making segmentation impossible, an error message is output, requiring manual review.

[0035] In one specific embodiment, the target information includes the birth date information and Myanmar National Identity Card (NRC) number information of the Myanmar citizen.

[0036] In one specific embodiment, the detection of horizontal and vertical lines in the image is performed as follows: The line is described by a normal parametric equation, which is: ρ = xcosθ + ysinθ, where ρ is the perpendicular distance from the origin to the line, θ is the angle between the perpendicular line and the x-axis, x is the abscissa of the coordinate axis, and y is the ordinate of the coordinate value. , It is the slope of the straight line. yes The intercept.

[0037] In another specific embodiment, the present invention obtains a scanned image of a paper document of voter registration for the Myanmar general election as input, and first performs basic preprocessing on the input image, including grayscale conversion, noise reduction and tilt correction.

[0038] according to Figure 2 As shown, Hough transform is used to detect horizontal and vertical lines. The lines in the image are described by the normal parametric equation ρ=xcosθ+ysinθ, where ρ is the vertical distance from the origin to the line and θ is the angle between the vertical line and the x-axis.

[0039] Step 2: Dataset Construction and Model Adaptation: Construct an image dataset containing 44 classes of Burmese printed characters, perform transfer adaptation based on the YOLOv2 object detection model, and construct a Burmese OCR recognition model using a pre-trained ResNet18 residual network as the feature extraction backbone.

[0040] In one specific embodiment, the image dataset includes 8,000 Burmese character images, which are divided into a training set and a validation set in an 8:2 ratio, with 6,400 images in the training set and 1,600 images in the validation set.

[0041] In one specific embodiment, the construction process of the Burmese OCR recognition model is as follows: the feature extraction module adopts a pre-trained ResNet18 architecture, the detection module replaces the detection sub-network of YOLOv2, and is composed of 14 convolutional layers stacked in sequence. Each convolutional layer consists of a convolutional layer, a batch normalization layer and a ReLU activation layer. At the end of the model, YOLO convolutional layers, YOLO transform layers and YOLO output layers are set in sequence. The input of the model is an RGB image with a resolution of 224×224.

[0042] In another specific embodiment, the present invention constructs a dedicated dataset of Burmese printed characters. The dataset covers 33 basic consonant characters, 10 Burmese numerals, and 1 special character in Burmese, totaling 44 categories and containing 8,000 character images with different lighting, different levels of blur, and different printed fonts.

[0043] Data augmentation was performed on the dataset, including random rotation, random cropping, brightness adjustment, and Gaussian noise addition, to artificially increase the size of the dataset and reduce the risk of overfitting during model training. The augmented dataset was then randomly divided into a training set and a validation set in an 8:2 ratio, with 6,400 images used for model training and 1,600 images used for model accuracy validation.

[0044] according to Figure 3As shown, the YOLOv2 object detection model is transferred and adapted to construct a Burmese OCR recognition model: a pre-trained ResNet18 residual network is used as the feature extraction backbone, and the advantages of the residual network, such as easy optimization and strong deep expansion capability, are utilized to improve the accuracy of character feature extraction.

[0045] The detection subnetwork of YOLOv2 was replaced, and the fully connected layers in the original network were removed. A detection network consisting of 14 convolutional layers stacked in sequence was constructed. Each convolutional layer consists of a convolutional layer, a batch normalization layer and a ReLU activation layer in sequence.

[0046] The network is configured with YOLO convolutional layers, YOLO transform layers, and YOLO output layers in sequence. The model input is an RGB image with a resolution of 224×224, and the final output is the classification and detection results of 44 classes of Burmese characters.

[0047] Step 3: Model Training and Optimization: Divide the dataset into training and validation sets according to a preset ratio, use the stochastic gradient descent (SGDM) optimizer with momentum to train the OCR recognition model, and optimize the model parameters through k-fold cross-validation to obtain a high-precision Burmese OCR recognition model after training.

[0048] In one specific embodiment, the OCR recognition model is trained as follows: the programming environment for model training is MATLAB 2020a, and the hyperparameters of the training process are set as follows: initial learning rate of 0.001, mini-batch size of 5, and maximum number of training rounds of 32; 4-fold cross-validation is used to optimize the model parameters, with each fold corresponding to 2000 OCR image data.

[0049] In the OCR stage, the detector model is first trained on 8,000 images containing 44 categories. Then, the detector model is saved and used to detect 2,000 verification images. Training is completed when the detection accuracy reaches a preset value by randomly allocating 80% of the images for training and 20% for verification.

[0050] It should be noted that the preset detection accuracy was set by staff. To evaluate the accuracy of the recognition, the parents conducted 100 tests for each category. Table 1 shows the recognition results based on 44 categories of Burmese characters. The data shows that precision is higher than recall. The average negative sample result for all sentences exceeded 99.9%, demonstrating the specificity of the recognition model. The accuracy of sign language recognition exceeded 99%, and the average accuracy of sign language sentence recognition was 99.84%.

[0051] Table 1: OCR System Accuracy Test Results

[0052] In another specific embodiment, the present invention completes model training in the MATLAB 2020a programming environment, and the hardware environment is: Intel Core i3 CPU (2.40GHz), 4GB DDR3 memory, 64-bit Windows 10 operating system, and the training is accelerated by the GPU computing power of NVIDIA GeForce 920MX graphics card.

[0053] The SGDM optimizer was used during training, with hyperparameters set as follows: initial learning rate of 0.001, mini-batch size of 5, and maximum number of training epochs of 32. Four-fold cross-validation was employed to optimize model parameters. The 8000 images were divided into four equal folds, each containing 2000 images. Three folds were used as the training set, and one fold as the validation set, for a total of four training and validation epochs. Grid search was used to determine the optimal model parameters to avoid overfitting. The results of the four-fold cross-validation are shown in Table 2.

[0054]

[0055] Table 2: Cross-validation results of the training dataset Step 4: Character Recognition and Result Output: Input the character regions segmented in S1 into the trained OCR recognition model to complete the detection and classification of Burmese characters, output the recognized text information and import it into the data table.

[0056] In another specific embodiment, according to Figure 4a , Figure 4b , Figure 5a and Figure 5b As shown, the present invention inputs the birth date and NRC number character regions obtained in step 1 into the trained Burmese OCR recognition model. The model completes the feature extraction, detection and classification of the characters, outputs the corresponding Unicode encoded text, and finally automatically imports the recognized birth date and NRC number information into a structured data table.

[0057] Based on the above identification results, the duplicate voter registration detection for the Myanmar general election is achieved. The specific process is as follows: batch read the identification results of voter registration documents and extract the NRC number information of all voters.

[0058] The extracted NRC number information is compared with the official voter registration database to count the number of times the same NRC number appears.

[0059] When the same NRC number appears more than once, it is determined to be a duplicate voter registration. The system automatically marks and outputs all voter registration information corresponding to that NRC number, thus completing the automated detection of duplicate voter registration in election fraud. This is how an OCR system is developed. Figure 6 At the same time, the OCR results are obtained in the table, for example. Figure 7.

[0060] according to Figure 8 As shown, this invention provides a deep learning-based optical character recognition system for printed Burmese script, comprising the following modules: an image preprocessing and table segmentation module, a dataset construction and model adaptation module, a model training and optimization module, and a character recognition and result output module.

[0061] The dataset construction and model adaptation module is connected to the image preprocessing and table segmentation module and the model training and optimization module, respectively, and the character recognition and result output module is connected to the model training and optimization module.

[0062] The image preprocessing and table segmentation module is used to preprocess the input scanned image of printed Burmese document. It uses Hough transform to detect horizontal and vertical lines in the image, segment table regions and character regions, and extract character regions containing target information.

[0063] The dataset construction and model adaptation module is used to construct an image dataset containing 44 classes of Burmese printed characters. It performs transfer adaptation based on the YOLOv2 object detection model and uses a pre-trained ResNet18 residual network as the feature extraction backbone to construct a Burmese OCR recognition model.

[0064] The model training and optimization module is used to divide the dataset into training and validation sets according to a preset ratio, use a stochastic gradient descent (SGDM) optimizer with momentum to train the OCR recognition model, and optimize the model parameters through k-fold cross-validation to obtain a high-precision Burmese OCR recognition model after training.

[0065] The character recognition and output module is used to input the character regions segmented in S1 into the trained OCR recognition model to complete the detection and classification of Burmese characters, output the recognized text information and import it into the data table.

[0066] The examples described in this invention are not limited to the specific embodiments listed above. The examples are merely illustrative to facilitate understanding of the invention and do not constitute a limitation on the scope of protection of this invention. Any modifications, equivalent substitutions, etc., made within the spirit and principles of this invention should be included within the scope of protection.

[0067] The above description is merely an example and illustration of the concept of the present invention. Those skilled in the art can make various modifications or additions to the specific embodiments described or use similar methods to replace them, as long as they do not deviate from the concept of the invention or exceed the scope defined in this specification, they should all fall within the protection scope of the present invention.

Claims

1. A method for optical character recognition of printed Burmese script based on deep learning, characterized in that, Includes the following steps: Step 1: Image Preprocessing and Table Segmentation: The input scanned image of printed Burmese document is preprocessed. Hough transform is used to detect horizontal and vertical lines in the image, thereby segmenting the table area and character area and extracting the character area containing the target information. Step 2: Dataset Construction and Model Adaptation: Construct an image dataset containing 44 classes of Burmese printed characters, perform transfer adaptation based on the YOLOv2 object detection model, and use a pre-trained ResNet18 residual network as the feature extraction backbone to construct a Burmese OCR recognition model. Step 3: Model Training and Optimization: Divide the dataset into training and validation sets according to a preset ratio, use the stochastic gradient descent (SGDM) optimizer with momentum to train the OCR recognition model, and optimize the model parameters through k-fold cross-validation to obtain a high-precision Burmese OCR recognition model after training. Step 4: Character Recognition and Result Output: Input the character regions segmented in S1 into the trained OCR recognition model to complete the detection and classification of Burmese characters, output the recognized text information and import it into the data table.

2. The method for optical character recognition of printed Burmese script based on deep learning according to claim 1, characterized in that, The target information includes the birth date information and Myanmar National Identity Card (NRC) number information of Myanmar citizens.

3. The method for optical character recognition of printed Burmese script based on deep learning according to claim 1, characterized in that, The detection of horizontal and vertical lines in the image is completed, and the specific detection process is as follows: A straight line is described by a normal parametric equation, which is: ρ = xcosθ + ysinθ, where ρ is the perpendicular distance from the origin to the line, θ is the angle between the perpendicular and the x-axis, x is the abscissa of the coordinate axis, and y is the ordinate of the coordinate axis. , It is the slope of the straight line. yes The intercept.

4. The method for optical character recognition of printed Burmese script based on deep learning according to claim 1, characterized in that, The image dataset includes 8,000 Burmese character images, divided into a training set and a validation set in an 8:2 ratio, with 6,400 images in the training set and 1,600 images in the validation set.

5. The method for optical character recognition of printed Burmese script based on deep learning according to claim 1, characterized in that, The specific construction process of the Burmese OCR recognition model is as follows: The feature extraction module adopts a pre-trained ResNet18 architecture, and the detection module replaces the detection sub-network of YOLOv2. It consists of 14 convolutional layers stacked in sequence. Each convolutional layer consists of a convolutional layer, a batch normalization layer and a ReLU activation layer. At the end of the model, YOLO convolutional layer, YOLO transform layer and YOLO output layer are set in sequence. The input of the model is an RGB image with a resolution of 224×224.

6. The method for optical character recognition of printed Burmese script based on deep learning according to claim 5, characterized in that, The specific training process for the OCR recognition model is as follows: The programming environment for model training was MATLAB 2020a. The hyperparameters for the training process were set as follows: initial learning rate of 0.001, mini-batch size of 5, and maximum number of training epochs of 32. Four-fold cross-validation was used to optimize the model parameters, with each fold corresponding to 2000 OCR image data. In the OCR stage, the detector model is first trained on 8,000 images containing 44 categories. Then, the detector model is saved and used to detect 2,000 verification images. Training is completed when the detection accuracy reaches a preset value by randomly allocating 80% of the images for training and 20% for verification.

7. A recognition system utilizing the deep learning-based optical character recognition method for printed Burmese script as described in any one of claims 1-6, characterized in that, Includes the following modules: The image preprocessing and table segmentation module is used to preprocess the input printed Burmese scanned document image. It uses Hough transform to detect horizontal and vertical lines in the image, segment table areas and character areas, and extract character areas containing target information. The dataset construction and model adaptation module is used to construct an image dataset containing 44 classes of Burmese printed characters. It performs transfer adaptation based on the YOLOv2 object detection model and uses a pre-trained ResNet18 residual network as the feature extraction backbone to construct a Burmese OCR recognition model. The model training and optimization module is used to divide the dataset into training and validation sets according to a preset ratio, use a stochastic gradient descent (SGDM) optimizer with momentum to train the OCR recognition model, and optimize the model parameters through k-fold cross-validation to obtain a high-precision Burmese OCR recognition model after training. The character recognition and output module is used to input the character regions segmented in S1 into the trained OCR recognition model to complete the detection and classification of Burmese characters, output the recognized text information and import it into the data table.