An ocr recognition method and system with online automatic optimization function

By automatically optimizing the OCR recognition method online, using object detection and text recognition algorithms to train the initial model, and combining anomaly element statistics and generative adversarial neural networks, the problem of recognition errors in OCR recognition technology under real-world scene changes is solved, achieving fast and automatic model optimization and high-accuracy recognition.

CN115690810BActive Publication Date: 2026-06-23BANK OF COMMUNICATIONS

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BANK OF COMMUNICATIONS
Filing Date
2022-10-30
Publication Date
2026-06-23

Smart Images

  • Figure CN115690810B_ABST
    Figure CN115690810B_ABST
Patent Text Reader

Abstract

The application relates to an OCR recognition method and system with an online automatic optimization function, which comprises the following steps: S1, obtaining an OCR recognition training image set to be recognized, and preprocessing to obtain an initial training data set; S2, adopting a target detection algorithm and a text recognition algorithm to sequentially perform text positioning and text recognition on the initial training data set, and training to obtain an initial OCR recognition model; S3, deploying the initial OCR recognition model to actual production, comparing recognition results and correction results every set time, and collecting original image samples with recognition errors; when an optimization signal threshold is triggered, turning to S4; S4, performing abnormal element statistics, and constructing an optimization data set; S5, based on the optimization data set, performing optimization training on the initial OCR recognition model to obtain an optimized OCR recognition model, and deploying the optimized OCR recognition model to actual production as the initial OCR model in S3 to perform OCR recognition. Compared with the prior art, the application can realize online automatic optimization OCR recognition of images.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of OCR recognition technology, and in particular to an OCR recognition method and system with online automatic optimization function. Background Technology

[0002] Modern society, especially the financial industry, utilizes a large number of paper certificates and forms for various applications and management processes. As society becomes increasingly information-driven, there are more and more scenarios requiring the input of information from these paper documents into computers. Traditionally, this is done manually, a repetitive and tedious process. With the development of scanning and deep learning technologies, OCR (Optical Character Recognition) technology has also advanced to replace manual input. OCR text recognition, or optical character recognition, converts the grayscale of text on paper into electrical signals, which are then input into a computer. This technology significantly reduces repetitive work and provides a convenient way to convert images into text.

[0003] Current OCR recognition technology is primarily based on deep learning. The basic process involves training a text localization and recognition model using existing data, and then deploying the trained model to a real-world production environment. A problem arises here: the training data may not fully meet actual production needs. Some documents in real-world scenarios may not conform to the training fit criteria, or changes in circumstances, such as the document layout or input environment, may affect the model's recognition process, leading to errors. The usual solution is to collect error samples and optimize the training model after accumulating a certain scale. However, this optimization process is lengthy, requires significant manual intervention, and carries the risk of production data leakage.

[0004] To address the aforementioned shortcomings, we will continue to design an OCR recognition method and system that can be automatically optimized online. Summary of the Invention

[0005] The purpose of this invention is to overcome the shortcomings of the existing technology and provide an OCR recognition method and system with online automatic optimization function.

[0006] The objective of this invention can be achieved through the following technical solutions:

[0007] According to a first aspect of the present invention, an OCR recognition method with online automatic optimization function is provided, the method comprising the following steps:

[0008] Step S1: Obtain the OCR recognition training image set to be recognized and label it to obtain the initial training dataset;

[0009] Step S2: Using object detection algorithm and text recognition algorithm, text localization and text recognition are performed on the initial training dataset in sequence to train the initial OCR recognition model;

[0010] Step S3: Deploy the initial OCR recognition model into actual production, compare the recognition results and correction results at set intervals, and collect original image samples with recognition errors; when the set optimization signal threshold is triggered, proceed to step S4 to start the model optimization process.

[0011] Step S4: Perform statistical analysis on abnormal elements and synthesize image samples according to the set probability. Combine these samples with erroneous samples and abnormal time period interval samples to form an optimized dataset.

[0012] Step S5: Based on the optimized dataset, optimize and train the initial OCR recognition model to obtain the optimized OCR recognition model, and deploy it as the initial OCR model in step S3 to actual production for OCR recognition.

[0013] Preferably, the annotation in step S1 includes annotating the text regions of the identified elements and the text content information corresponding to each text region.

[0014] Preferably, the text region is a rectangular area that completely covers the position of the text in the image; the annotation result of the text region is in the form of four coordinates, which correspond to the coordinates of the four corners of the rectangular area.

[0015] Preferably, the target detection algorithm in step S2 includes YOLO v3, YOLO v4, and Mask RCNN algorithms.

[0016] Preferably, the text recognition algorithm in step S2 includes CRNN, SRN, and RARE algorithms.

[0017] Preferably, the optimized signal in step S3 is the recognition accuracy.

[0018] Preferably, the abnormal element statistics in step S4 include abnormal character statistics, erroneous corpus statistics, similar corpus search, text position range statistics, font-background separation, similar font collection, and similar background collection, respectively:

[0019] 1) Abnormal character statistics: Statistically analyze the verification results corresponding to the images with recognition errors, filter out characters that did not appear in the training samples or whose frequency of appearance was lower than the set value, and mark them as key characters to increase the frequency of these abnormal characters appearing in the optimization dataset in the subsequent synthesis process;

[0020] 2) Error Corpus Statistics: Record the verification result corpus corresponding to each image that appears to have an error recognition error, generate an error corpus, and count the range of characters in the corpus;

[0021] 3) Similar corpus search: Based on the statistically identified erroneous corpus data, a similarity retrieval algorithm is used to search within the constructed corpus database;

[0022] 4) Text location interval statistics: Statistically analyze the locations of text recognition errors within the image;

[0023] 5) Font and background separation: Separating the background and font from images with incorrect recognition;

[0024] 6) Similar font collection: Generative adversarial neural networks are used to search the separated font images in a pre-set font image database through a similar font image retrieval network to obtain the most similar font;

[0025] 7) Similar background collection: The separated background images are searched in a pre-set background image database through a similar background image retrieval network to obtain the most similar background image.

[0026] Preferably, the text position interval statistics specifically include the following sub-steps:

[0027] 41) Use an image correction method based on convolutional neural networks to correct the original image to a normal horizontal region;

[0028] 42) Calculate the text editing distance between the recognition result and the verification result; if the editing distance is less than the set threshold, proceed to 43), otherwise proceed to 44);

[0029] 43) When the edit distance is less than the set threshold, it is considered a text content recognition error. The text location information is directly recorded in the text location area set, and the original image to which the corresponding erroneous text belongs is also recorded.

[0030] 44) When the edit distance is greater than or equal to the set threshold, it is considered a text recognition error caused by text positioning error. The trained general positioning model is used to find the specified text target region near the relative position. The accuracy of the text position interval is judged comprehensively based on the text length factor. When it meets the general rules, the target region is recorded in the text position region set, and the original image to which the corresponding erroneous text belongs is also recorded.

[0031] 45) Based on the target recognition region, each image containing the incorrectly recognized text is cropped to obtain the incorrect text recognition sub-image.

[0032] Preferably, the font background separation specifically involves using a generative adversarial neural network to separate the identification background and the identification font from an image that has been incorrectly identified.

[0033] According to a second aspect of the present invention, an OCR recognition system with online automatic optimization function is provided, employing any one of the methods described above, the system comprising:

[0034] The OCR recognition training set acquisition and annotation module is used to acquire and annotate the OCR recognition training image set to be recognized, and obtain the initial training dataset.

[0035] The initial OCR recognition model deployment module is used to build an initial OCR recognition model based on deep learning, train it, and then deploy it to the actual production environment.

[0036] The recognition rate monitoring and error sample collection module is used to compare the recognition results with the correct results at regular intervals, calculate the recognition rate and perform threshold-based monitoring, and at the same time collect the original image samples of the recognition errors.

[0037] The abnormal element statistics collection module is used to collect statistical information on abnormal elements;

[0038] The optimized dataset synthesis module is used to synthesize image samples from extracted abnormal features according to a set probability, and merge them with erroneous samples and abnormal time period samples to form an optimized dataset.

[0039] The optimized model training and deployment module is used to optimize and train the initial OCR recognition model using an optimized dataset, obtain the optimized OCR recognition model, and replace and deploy it in the actual production environment.

[0040] Compared with the prior art, the present invention has the following advantages:

[0041] 1) This invention statistically analyzes the recognition rate within a certain time period at regular intervals. When the recognition rate is lower than a set threshold, it automatically performs online sample analysis, corpus retrieval, sample synthesis, model optimization training, and automatic deployment. The entire process requires less manual intervention and has a short optimization cycle, which can meet the needs of actual production to the greatest extent.

[0042] 2) This invention adopts deep learning-based OCR recognition technology, and uses YOLO v4 object detection algorithm and CRNN text recognition algorithm to train the initial recognition model, which can guarantee the basic recognition accuracy. In addition, it adds online optimization function, which has superior performance compared with previous OCR recognition models.

[0043] 3) This invention uses a generative adversarial neural network to separate the background image and the font image, and can perform separate retrieval based on the features of the background and the font, resulting in a synthetic sample that is more consistent with the actual situation;

[0044] 4) This invention uses multiple image and corpus databases for image synthesis, which can be optimized and trained without accumulating a sufficient number of error samples. It can also cover all factors that affect the recognition accuracy, and the performance improvement effect is more obvious. Attached Figure Description

[0045] Figure 1 This is a flowchart of the method of the present invention. Detailed Implementation

[0046] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.

[0047] Example

[0048] This embodiment takes the OCR recognition of check vouchers in actual production as an example. The system is divided into modules for initial training set collection and annotation, initial OCR recognition model training and deployment, recognition rate monitoring and error sample collection, abnormal element statistical collection, optimized dataset synthesis, and optimized model training and deployment.

[0049] 1) Label the check sample dataset to obtain the correct results for the corresponding elements of each image. Use YOLO v4 and CRNN technology to train the initial OCR recognition model and deploy it to the production environment.

[0050] 2) Monitor the check recognition accuracy at regular intervals and collect error samples in real time. When an abnormal recognition rate is detected or the number of error samples accumulates to a certain amount, start the optimization program.

[0051] 3) Perform online automatic error sample analysis and corpus retrieval, and obtain text region intervals, similar background images and font image samples according to the corresponding algorithm. Randomly extract similar corpus, background images and font images, and synthesize samples to generate check optimization training dataset.

[0052] 4) Using the check optimization training dataset, the initial OCR recognition model is used to optimize and train the model, resulting in an optimized recognition model. This model is then redeployed to the production environment. This includes monitoring the duplicate recognition rate and the model optimization process.

[0053] Next, as Figure 1 As shown, the method is described in detail.

[0054] S1: Collect image samples to be identified, manually label the image samples, and obtain the initial training dataset.

[0055] S11: The image samples to be identified mainly come from the historical records of this type of image. The image samples used for training need to meet a certain quantity requirement. If the quantity does not meet the requirement, relevant blank samples or similar background images, as well as relevant corpus information, can be collected. The information can be searched through a search engine or extracted from historical databases. Using the above materials, a portion of the sample images are synthesized using a specified font file and included in the samples to be identified.

[0056] S12: Manually annotate the samples to be identified. This manual annotation consists of two parts: one part is the text region of the element to be identified, typically a rectangular area that completely covers the text's position in the image. The annotation result is in the form of four coordinates, corresponding to the coordinates of the four corners of the rectangular area. The other part annotates the text content information corresponding to each text region. After annotation, the initial training dataset is obtained.

[0057] S2: Using the initial training dataset obtained in S1, perform deep learning training, use the YOLO v4 algorithm for text localization, and the CRNN algorithm for text recognition to obtain the initial OCR recognition model. Deploy the initial OCR recognition model to actual production.

[0058] S21: Text localization is performed using the YOLO v4 algorithm. The input consists of images and labeled text regions from the initial training dataset. A ResNet50 is used as the backbone. During training, a pre-trained weight file from the COCO dataset within the YOLO v4 network is used as the pre-trained model. Fine-tuning is then performed using the input images and labeled regions. The loss function includes category loss, confidence loss, and location loss. The trained text localization model is able to locate the text regions in the image to be identified.

[0059] S22: Cropping the original image based on the coordinates of the labeled text regions yields a text recognition sub-image, which is used as a training dataset for text content recognition.

[0060] S23: The CRNN method is used for text content recognition, including image feature extraction, a recurrent neural network, and a transformation layer. The input is the text content recognition training dataset obtained in S22. Image feature extraction uses an adjusted VGG network, the recurrent neural network uses a deep bidirectional LSTM network, and the transformation layer performs CTC translation on the feature vectors obtained from the recurrent neural network, outputting characters. The loss function is minimizing the negative log-likelihood function. The trained text content recognition model can infer text content based on text regions.

[0061] S24: Deploy the trained text localization model and text content recognition model to the actual production environment. The recognition service requester sends image binary information to the URL of the service used by the model via an HTTP request to call the recognition model. The model returns the text recognition result to the requester to complete the text recognition process.

[0062] S3: During the model usage process, the recognition results and the verification results are compared at regular intervals to calculate the recognition rate. At the same time, original image samples with recognition errors are collected, and recognition rate thresholds or other optimization conditions are set as optimization signals to start the model optimization process.

[0063] S31: During the model usage process, the model recognition results and recognition verification results are stored in the database. The recognition results and verification results are compared at regular intervals to obtain the model recognition accuracy within that time period. Samples with incorrect recognition and their corresponding correct verification results are stored.

[0064] S32: Set a model optimization signal. When the recognition accuracy falls below a pre-set threshold within a certain period, initiate the model optimization process and save the sample images from that period to the optimized training dataset. Other model optimization signals can also be set, such as forcing optimization when a certain number of erroneous samples accumulate or at fixed time intervals.

[0065] S4: After S3 starts the model optimization process, it collects statistics on abnormal elements, including statistics on abnormal words, statistics on erroneous corpora, search for similar corpora, statistics on text position intervals, font-background separation, collection of similar fonts, and collection of similar backgrounds.

[0066] S41: Abnormal character statistics mainly involves counting the verification results of images with recognition errors one by one, filtering out characters and symbols that did not appear or appeared very rarely in the training samples, and marking them as key characters to increase the frequency of these abnormal characters appearing in the optimized training dataset during subsequent synthesis.

[0067] S42: Error corpus statistics mainly involves recording the verification result corpus corresponding to images with recognition errors one by one, generating an error corpus, and statistically analyzing basic information such as the range of corpus characters.

[0068] S43: Similarity corpus search involves retrieving erroneous corpora from the constructed corpus database using a similarity retrieval algorithm. Based on basic information such as the character count range obtained in S42, the corpus database is initially screened. The word2vec algorithm is used to simultaneously generate corresponding text vectors for both the corpus in the database and the erroneous corpus information. The large-scale vector retrieval algorithm hnswlib is then applied to retrieve the top n most similar corpus texts for each erroneous corpus (n is flexibly set according to actual training needs), and these are collected into a similar corpus set. If it is necessary to update the corpus database, a search engine can be connected, and similar text information can be obtained and updated to the corpus database using web crawling technology and named entity recognition technology in NLP.

[0069] S44: Text location interval statistics involves statistically analyzing the locations of text recognition errors within an image. First, a convolutional neural network image correction method is used to correct the original image to a normal horizontal region. Since most verification results in actual production do not contain text region information, but only text content information, two cases are handled here. One case is caused by text content recognition errors. The text edit distance between the recognition result and the verification result is calculated. When the edit distance is less than a certain threshold, it is considered a text content recognition error, and the text location information is directly recorded in the text location interval set, along with the original image to which the corresponding erroneous text belongs. The other case is caused by text location errors. When the edit distance is greater than a set threshold, a Mask R-CNN general localization model trained on large-scale general localization text searches for the specified text target region near the relative position. The accuracy of the text location interval is comprehensively judged based on factors such as text length. When it conforms to general rules, the target region is recorded in the text location interval set, along with the original image to which the corresponding erroneous text belongs. Each image containing erroneous text is cropped based on the target recognition region to obtain an erroneous text recognition sub-image.

[0070] S45: Font-background separation separates the recognized background and font from an image with incorrect recognition. The method used is a generative adversarial neural network (GAN). The input is the incorrect text recognition sub-image obtained in S44. The generator uses a U-Net encoder-decoder structure, and the discriminator uses a convolutional neural network to judge the authenticity of the generated images. The network generates a background image without text and a text image without background. The loss functions used are L1 norm loss, generative adversarial loss, and VGG style loss. These three loss functions are applied to the two generated images respectively, and finally summed to obtain the final loss function. The training set uses artificially synthesized text images containing backgrounds. Through training, the text image and background image can be separated.

[0071] S46: Similar font collection involves retrieving the font images obtained in S45 from a pre-defined font image database using a similar font image retrieval network to find the most similar fonts. The similar font image database collects commonly used font files and uses image processing libraries such as OpenCV to generate text images corresponding to the fonts for texts that failed to be recognized. These images are combined to form the similar font image database. The similar font image retrieval network includes a ResNet50 network for feature extraction, a softmax classifier, and uses cross-entropy loss as the loss function. The font names corresponding to the recorded similar font images constitute the similar font set.

[0072] S47: Similar background collection involves retrieving the background images obtained in S45 from a pre-set background image database using a similar background image retrieval network to find the most similar background image. The similar background image database uses web crawling technology to obtain various commonly used background images. The structure of the similar background image retrieval network is the same as the similar font image retrieval network in S46. The recorded similar background images constitute the similar background set.

[0073] S5: Combine the similar corpora, similar fonts, similar backgrounds, and text intervals extracted in S4 into image samples according to a certain probability, and merge them with the erroneous samples and abnormal time interval samples to form an optimized dataset.

[0074] S51: Randomly extract similar corpora and similar fonts from S4, and use image processing libraries such as OpenCV to generate synthetic font images of the extracted similar corpora and similar fonts. The font attributes such as font size, slant, color, and thickness are randomly set within the range of actual needs.

[0075] S52: Randomly extract similar backgrounds from S4. Use image processing libraries such as OpenCV to generate a composite text sub-image from the composite font image in S51 and the extracted background image. The background image should be scaled according to the size of the font image using a bilinear interpolation algorithm to maintain consistency with the font image size. The composite sub-image can be expanded using data augmentation techniques such as rotation and perspective as needed.

[0076] S53: Randomly extract text intervals from S4, ensuring the size of the extracted intervals matches the size of the synthesized font image generated in S51. Use the EraseNet algorithm to erase the text in the original image corresponding to each interval. Specifically, crop the corresponding region of the original image based on the text interval, input the corresponding region into the EraseNet network using image processing libraries such as OpenCV to erase the text while preserving the background image, and paste the generated background image into the corresponding position in the original image. Finally, use image processing libraries such as OpenCV to paste the synthesized font image into the corresponding position of the text interval, obtaining the synthesized localized image.

[0077] S6: Using the synthetic text sub-image and synthetic localization image obtained in S5, an optimized recognition model is obtained by optimizing and training based on the initial OCR recognition model in S2 using YOLO v4 and CRNN algorithms. The pre-trained model is changed to the initial OCR recognition model in S2, and the training and deployment configuration of other models are the same as in S2.

[0078] S7: After deploying the optimized recognition model obtained in S6 to the actual production environment, repeat the process of S3-S6 for continuous optimization, where the initial OCR recognition model used in S6 is replaced with the optimized recognition model obtained in the previous round.

[0079] The above description is merely a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any person skilled in the art can easily conceive of various equivalent modifications or substitutions within the technical scope disclosed in the present invention, and these modifications or substitutions should all be covered within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.

Claims

1. An OCR recognition method with online automatic optimization function, characterized in that, The method includes the following steps: Step S1: Obtain the OCR recognition training image set to be recognized and label it to obtain the initial training dataset; Step S2: Using object detection algorithm and text recognition algorithm, text localization and text recognition are performed on the initial training dataset in sequence to train the initial OCR recognition model; Step S3: Deploy the initial OCR recognition model into actual production, compare the recognition results and correction results at set intervals, and collect original image samples with recognition errors; when the set optimization signal threshold is triggered, proceed to step S4 to start the model optimization process; the optimization signal is the recognition accuracy. Step S4: Perform statistical analysis on abnormal elements and synthesize image samples according to the set probability. Combine these samples with erroneous samples and abnormal time period interval samples to form an optimized dataset. The abnormal element statistics include abnormal character statistics, erroneous corpus statistics, similar corpus search, text position range statistics, font-background separation, similar font collection, and similar background collection, which are respectively: 1) Abnormal character statistics: Statistically analyze the verification results corresponding to images with recognition errors, filter out characters that did not appear in the training samples or whose frequency of appearance was lower than the set value, and mark them as key characters to increase the frequency of abnormal characters appearing in the optimization dataset in the subsequent synthesis process; 2) Error Corpus Statistics: Record the verification result corpus corresponding to each image that appears to have an error recognition error, generate an error corpus, and count the range of characters in the corpus; 3) Similar corpus search: Based on the statistically identified erroneous corpus data, a similarity retrieval algorithm is used to search within the constructed corpus database; 4) Text Location Range Statistics: This involves statistically analyzing the locations of text recognition errors within the image, specifically including the following sub-steps: 41) Use an image correction method based on convolutional neural networks to correct the original image to a normal horizontal region; 42) Calculate the text editing distance between the recognition result and the verification result; if the editing distance is less than the set threshold, proceed to 43); otherwise, proceed to 44). 43) When the edit distance is less than the set threshold, it is considered a text content recognition error. The text location information is directly recorded in the text location area set, and the original image to which the corresponding erroneous text belongs is also recorded. 44) When the edit distance is greater than or equal to the set threshold, it is considered a text recognition error caused by text positioning error. The trained general positioning model is used to find the specified text target region near the relative position. The accuracy of the text position interval is judged comprehensively based on the text length factor. When it meets the general rules, the target region is recorded in the text position region set, and the original image to which the corresponding erroneous text belongs is also recorded. 45) Based on the target recognition region, each image containing the incorrectly recognized text is cropped to obtain the incorrect text recognition sub-image; 5) Font and background separation: Separate the background image and font image from the error text recognition sub-image; 6) Similar font collection: Generative adversarial neural networks are used to search the separated font images in a pre-set font image database through a similar font image retrieval network to obtain the most similar font; 7) Similar background collection: The separated background images are searched in a pre-set background image database using a similar background image retrieval network to obtain the most similar background image; The process of obtaining the synthetic image samples includes: 1) Randomly extract similar text and fonts, and generate a composite font image of the corresponding extracted similar text and fonts; 2) Randomly select similar backgrounds, and generate a composite text sub-image based on the composite font image and the selected background image; 3) Randomly select text position intervals. The size of the selected text position intervals matches the size of the generated synthetic font image. Crop the corresponding area of ​​the original text image according to the text position intervals. Use the EraseNet algorithm to erase the text in the corresponding area of ​​the original text image and retain the background image. Paste the generated background image into the corresponding position of the original image. Paste the synthetic font image into the position corresponding to the text position interval to obtain the synthetic positioning image. The synthetic image sample includes the synthetic positioning image and the synthetic text sub-image. Step S5: Based on the optimized dataset, optimize and train the initial OCR recognition model to obtain the optimized OCR recognition model, and deploy it as the initial OCR model in step S3 to actual production for OCR recognition.

2. The OCR recognition method with online automatic optimization function according to claim 1, characterized in that, The annotation in step S1 includes annotating the text regions of the identified elements and the text content information corresponding to each text region.

3. The OCR recognition method with online automatic optimization function according to claim 2, characterized in that, The text region is a rectangular area that completely covers the position of the text in the image; the annotation result of the text region is in the form of four coordinates, which correspond to the coordinates of the four corners of the rectangular area.

4. The OCR recognition method with online automatic optimization function according to claim 1, characterized in that, The target detection algorithms in step S2 include YOLO v3, YOLO v4, and Mask RCNN algorithms.

5. The OCR recognition method with online automatic optimization function according to claim 1, characterized in that, The text recognition algorithms in step S2 include CRNN, SRN, and RARE algorithms.

6. The OCR recognition method with online automatic optimization function according to claim 1, characterized in that, The font-background separation specifically involves using a generative adversarial neural network to separate the identified background and the identified font from an image that has been incorrectly identified.

7. An OCR recognition system with online automatic optimization function, characterized in that, The system comprising the method according to any one of claims 1 to 6, wherein the system includes: The OCR recognition training set acquisition and annotation module is used to acquire and annotate the OCR recognition training image set to be recognized, and obtain the initial training dataset. The initial OCR recognition model deployment module is used to build an initial OCR recognition model based on deep learning, train it, and then deploy it to the actual production environment. The recognition rate monitoring and error sample collection module is used to compare the recognition results with the correct results at set intervals, calculate the recognition rate and perform threshold-based monitoring, and at the same time collect the original image samples of the recognition errors. The abnormal element statistics collection module is used to collect statistical information on abnormal elements; The optimized dataset synthesis module is used to synthesize image samples from extracted abnormal features according to a set probability, and merge them with erroneous samples and abnormal time period samples to form an optimized dataset. The optimized model training and deployment module is used to optimize and train the initial OCR recognition model using an optimized dataset, obtain the optimized OCR recognition model, and replace and deploy it in the actual production environment.