Numerical identification method and device, electronic equipment and storage medium

By employing geometric correction methods—including rotational target detection, perspective transformation correction, and angle correction—combined with a character recognition model, the problem of low accuracy in numerical recognition on home appliance displays has been solved. This enables efficient automated recognition in complex environments and reduces the cost of manual verification.

CN122244884APending Publication Date: 2026-06-19GD MIDEA AIR CONDITIONING EQUIP CO LTD +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
GD MIDEA AIR CONDITIONING EQUIP CO LTD
Filing Date
2026-05-21
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies for numerical recognition on home appliance displays suffer from problems such as varying shooting angles, optical interference, blurring, and occlusion, resulting in low recognition accuracy and requiring extensive manual verification.

Method used

A geometric correction method combining rotational target detection, perspective transformation correction, and angle correction is adopted, and numerical recognition is performed by combining it with a character recognition model. Coordinate mapping is performed using the transformation matrices of perspective transformation correction and angle correction to alleviate the instability caused by weak texture in perspective and text areas. Furthermore, the recognition accuracy is improved by using confidence threshold and placeholder mechanism.

🎯Benefits of technology

It improves the accuracy of numerical recognition on the display screens of home appliances, reduces the cost of manual verification, enhances the level of automation, and ensures the stability and accuracy of recognition in complex environments.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122244884A_ABST
    Figure CN122244884A_ABST
Patent Text Reader

Abstract

This invention provides a numerical recognition method, apparatus, electronic device, and storage medium, relating to the field of home appliance installation service technology. The method includes: performing rotational target detection on an image to be recognized to determine a screen area and a numerical display area within the image; performing perspective transformation correction on the screen area to obtain a first corrected image; performing angle correction on the first corrected image to obtain a second corrected image; mapping the vertex coordinates of the numerical display area to the second corrected image to determine the area to be recognized; and performing character recognition on the area to be recognized to obtain a numerical recognition result. The method and apparatus provided by this invention can stably and accurately extract standardized areas to be recognized from complex original images; improve the accuracy of numerical recognition on home appliance displays, reduce the cost of manual review, and enhance the automation level of the overall business process.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of home appliance installation service technology, and in particular to a numerical identification method, device, electronic device, and storage medium. Background Technology

[0002] In after-sales installation and maintenance services for home appliances, service engineers typically need to take photos of the installation site and upload them to a back-end system for automated compliance checks on installation standards and equipment operating status. A key aspect of this check is verifying whether the values ​​displayed on the device's screen meet the required standards. For example, in the after-sales service of products like water purifiers, service engineers need to photograph the Total Dissolved Solids (TDS) value displayed on the water purifier's screen to verify whether the purifier is operating within acceptable limits.

[0003] Existing technologies use image processing to identify and read values ​​displayed on a device screen. However, due to issues such as varying image shooting angles, optical interference from the screen surface, image blurring, and partial occlusion in complex real-world application scenarios, the accuracy of value recognition is low, the false alarm rate is high, and significant manpower is required for subsequent verification.

[0004] Therefore, improving the accuracy of numerical recognition on the display screens of home appliances, solving problems such as rotation, reflection, blurring and occlusion, and reducing the cost of manual verification have become urgent technical problems for the industry. Summary of the Invention

[0005] This invention provides a numerical recognition method, apparatus, electronic device, and storage medium to solve the technical problem of how to improve the accuracy of numerical recognition on the display screen of home appliances.

[0006] This invention provides a numerical identification method, comprising: Rotational target detection is performed on the image to be identified to determine the screen region in the image to be identified, the rotation angle of the screen region, and the numerical display area in the screen region; Based on the rotation angle, the screen area is subjected to perspective transformation correction to obtain a first corrected image; The first corrected image is angle-corrected based on the main direction classification result of the first corrected image to obtain the second corrected image; Based on the transformation matrix corresponding to the perspective transformation correction and the transformation matrix corresponding to the angle correction, the vertex coordinates of the numerical display area are mapped to the second corrected image, and the area to be identified is determined in the second corrected image based on the mapped vertex coordinates; Character recognition is performed on the region to be recognized to obtain the numerical recognition result in the image to be recognized.

[0007] In some embodiments, the step of performing character recognition on the region to be recognized to obtain a numerical recognition result in the image to be recognized includes: The region to be identified is input into the character recognition model to obtain the numerical recognition result output by the character recognition model; The character recognition model is configured to output a preset placeholder at the position of any character in the numerical recognition result when the confidence level of any character in the numerical recognition result is lower than a preset confidence threshold.

[0008] In some embodiments, the character recognition model includes a visual feature extraction module and a connection-time classification module; the visual feature extraction module is constructed based on a residual network. The step of inputting the region to be recognized into the character recognition model and obtaining the numerical recognition result output by the character recognition model includes: The region to be identified is input into the visual feature extraction module to obtain the visual feature vector output by the visual feature extraction module; The visual feature vector is input into the connection time series classification module to obtain the numerical recognition result output by the connection time series classification module.

[0009] In some embodiments, the character recognition model is trained based on the following steps: The initial model is pre-trained based on simulated digital sequence images to obtain a basic weight model; The character recognition model is obtained by fine-tuning the basic weight model based on the simulated digital sequence image and the real digital sequence image.

[0010] In some embodiments, the simulated digital sequence image is generated based on the following steps: Construct a hybrid seed library; the hybrid seed library includes simulated digital character images and real digital character images; Different numbers of simulated digital character images and / or real digital character images are extracted from the hybrid seed library to synthesize the simulated digital sequence image.

[0011] In some embodiments, synthesizing the simulated digital sequence image includes: Random scene perturbation parameters are introduced during the synthesis process; The scene perturbation parameters include at least one of contrast change, rotation angle, spatial position offset, and scale scaling.

[0012] In some embodiments, the simulated digital character image is generated based on the following steps: A simulation generation program based on the independently controlled seven-segment display stroke structure generates positive sample digital character images; Based on the simulation generation program, negative sample digital character images are generated by simulating different types of display defects, and the sample labels of the negative sample digital character images are set as preset placeholders; Based on the positive sample digit character image and the negative sample digit character image, a simulated digit character image is generated.

[0013] In some embodiments, the step of performing perspective transformation correction on the screen area based on the rotation angle to obtain a first corrected image includes: A perspective transformation is performed based on the vertex coordinates of the screen area to map the screen area into a horizontal rectangle with no angle. The first corrected image is obtained by cropping the image to be identified based on the horizontal rectangle without angle.

[0014] In some embodiments, the step of performing angle correction on the first corrected image based on the main direction classification result of the first corrected image to obtain a second corrected image includes: The first corrected image is input into the angle classification model to obtain the main direction classification result output by the angle classification model; Based on the main direction classification result, the first corrected image is angle-corrected to obtain the second corrected image.

[0015] This invention provides a numerical recognition device, comprising: The rotation detection module is used to perform rotation target detection on the image to be identified, determine the screen area in the image to be identified, the rotation angle of the screen area, and the numerical display area in the screen area; A first-level correction module is used to perform perspective transformation correction on the screen area based on the rotation angle to obtain a first corrected image; The secondary correction module is used to perform angle correction on the first corrected image based on the main direction classification result of the first corrected image to obtain the second corrected image; The region determination module is used to map the vertex coordinates of the numerical display region to the second corrected image based on the transformation matrix corresponding to the perspective transformation correction and the transformation matrix corresponding to the angle correction, and to determine the region to be identified in the second corrected image based on the mapped vertex coordinates. The character recognition module is used to perform character recognition on the region to be recognized and obtain the numerical recognition result in the image to be recognized.

[0016] The present invention provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the numerical recognition method.

[0017] The present invention provides a non-transitory computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the numerical recognition method described above.

[0018] The present invention provides a computer program product, including a computer program that, when executed by a processor, implements the numerical recognition method as described above.

[0019] The numerical recognition method, apparatus, electronic device, and storage medium provided by this invention detect the screen area, the rotation angle of the screen area, and the numerical display area in the screen area of ​​the image to be recognized. It performs two stages of geometric correction: perspective transformation correction and angle correction. Coordinate mapping is performed using the vertex coordinates of the original numerical display area, alleviating the instability caused by weak textures in perspective and text areas. This enables stable and accurate extraction of standardized regions to be recognized from complex original images. It significantly improves the accuracy of numerical recognition on home appliance displays, effectively solving the problem of numerical recognition difficulties caused by factors such as shooting angle, perspective distortion, and image rotation in home appliance after-sales service scenarios. It also reduces the cost of manual review and improves the automation level of the overall business process. Attached Figure Description

[0020] To more clearly illustrate the technical solutions in this invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.

[0021] Figure 1 This is one of the flowcharts of the numerical recognition method provided by the present invention.

[0022] Figure 2 This is a schematic diagram of the character recognition model provided by the present invention.

[0023] Figure 3 This is the second flowchart of the numerical recognition method provided by the present invention.

[0024] Figure 4 This is a schematic diagram of the numerical recognition device provided by the present invention.

[0025] Figure 5 This is a schematic diagram of the structure of the electronic device provided by the present invention. Detailed Implementation

[0026] To enable those skilled in the art to better understand the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.

[0027] It should be noted that the terms "first," "second," etc., used in this invention are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of the invention described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or device that comprises a series of steps, units, or modules is not necessarily limited to those explicitly listed, but may include other steps, units, or modules not explicitly listed or inherent to such processes, methods, products, or devices.

[0028] Complex real-world application scenarios present the following problems: (1) Multi-angle shooting leads to recognition difficulties: Service engineers shoot from various angles on site. The liquid crystal display (LCD) may be tilted or rotated. Optical character recognition (OCR) and digital tube recognition methods are not adaptable to large-angle rotation and cannot distinguish text rotated 180 degrees.

[0029] (2) Reflection and protective film interference: Reflection on the screen surface, residual protective film or stains, form highlights or blurry areas on the image. These optical interferences directly destroy the stroke structure of the numbers, making it difficult for traditional optical character recognition models that rely on pixel-level features to accurately extract character features, resulting in misreading or missed reading.

[0030] (3) Insufficient ability to handle blur and partial occlusion: The shooting distance is too far, the hand shake or the lens is dirty, which will cause the image to be blurry, while scratches or dust on the screen itself will cause the characters to be missing. Existing methods lack a mechanism to handle such incomplete information, often giving incorrect recognition results and low reliability.

[0031] To solve the above technical problems, Figure 1 This is one of the flowcharts illustrating the numerical recognition method provided by the present invention, such as... Figure 1As shown, the method includes steps 110, 120, 130, 140, and 150. The numerical identification method provided in this embodiment of the invention is applicable to scenarios in home appliance after-sales installation services where the total dissolved solids value displayed on the screen of devices such as water purifiers is automatically identified.

[0032] Step 110: Perform rotation target detection on the image to be recognized, determine the screen area in the image to be recognized, the rotation angle of the screen area, and the numerical display area in the screen area.

[0033] Specifically, the execution subject of the numerical recognition method provided in this embodiment of the invention is a numerical recognition device or system. This device can be implemented in software, such as a numerical recognition program; or it can be a device that executes the numerical recognition method, such as a terminal, computer, or server.

[0034] The images to be identified are typically photographs taken by field service engineers using mobile devices (such as smartphones and tablets) and include the display panels of home appliances (such as water purifiers). Due to the complexity of the field environment, the display panels in these photographs often exhibit varying degrees of tilt and rotation.

[0035] To handle such complex poses, embodiments of this invention employ rotated object detection. Rotated object detection adds a rotational degree of freedom to object detection, enabling accurate fitting of targets with arbitrary orientations and non-axis alignment. Rotated object detection outputs a oriented, rotated bounding box. In a specific embodiment, a pre-trained rotated object detection model can be used, such as a model built based on mainstream object detection models like YOLO, RetinaNet, or Faster R-CNN, to predict the target's center point, width, height, and rotation angle.

[0036] The rotated target detection model is configured to simultaneously identify and output two types of target regions: (1) Screen area: refers to the physical boundary of the entire display screen or display panel. This area usually has relatively stable and obvious structural features such as borders and outlines. Even in the case of reflection or partial obstruction, its overall outline is relatively easy to detect accurately. The model outputs the coordinates of the four vertices of the screen area and the rotation angle calculated from these coordinates.

[0037] (2) Numerical display area: This refers to the local area within the screen area where the numerical value is specifically displayed, that is, the area where the number is located. The texture features of this area are relatively weak, but its position is the core of subsequent recognition. The model also outputs the coordinates of the four vertices of this numerical display area.

[0038] Simultaneous detection of these two regions allows for the use of more robust screen regions to calculate initial, more reliable rotation angles and perspective relationships, providing a precise basis for subsequent geometric correction.

[0039] Step 120: Perform perspective transformation correction on the screen area based on the rotation angle to obtain the first corrected image.

[0040] Specifically, the perspective transformation correction performed in this embodiment of the invention is also called first-level correction. The detected screen region, which may be any quadrilateral in the image to be recognized, is mapped to a preset target rectangle (e.g., a horizontally placed standard-sized rectangle without rotation) through a perspective transformation matrix. This transformation matrix can be solved by solving a system of equations, that is, finding a transformation relationship that accurately maps the four vertices of the source quadrilateral to the four vertices of the target rectangle.

[0041] By applying the perspective transformation matrix to the entire screen area of ​​the image, the first corrected image can be obtained. In the first corrected image, the originally tilted screen area has been corrected into a horizontal or vertical rectangle, eliminating the distortion caused by perspective effects where objects appear larger when closer and smaller when farther away.

[0042] Step 130: Based on the main direction classification result of the first corrected image, perform angle correction on the first corrected image to obtain the second corrected image.

[0043] Specifically, in rotating target detection, the main direction of the image to be identified can be the angle of deflection of the character arrangement direction in the image relative to the horizontal direction of the image. It is usually divided into four categories {0°, 90°, 180°, 270°}, which correspond to upright (0° clockwise rotation), right-handed (90° clockwise rotation), inverted (180° clockwise rotation), and left-handed (270° clockwise rotation), respectively.

[0044] Although perspective distortion has been eliminated after perspective transformation correction, the first corrected image may still exhibit rotation in the principal direction, such as upright, right-handed, inverted, or left-handed. A 180-degree inversion, in particular, can easily cause confusion when displaying numbers (such as 6 and 9) on a seven-segment display.

[0045] To address this issue, this embodiment of the invention employs angle correction, also known as secondary correction. Specifically, the first corrected image is input into a pre-trained angle classification model.

[0046] In a specific embodiment, the angle classification model can be a lightweight convolutional neural network (CNN) whose task is to classify input images and output the classification result of their dominant orientation. The output set of this model is {0°, 90°, 180°, 270°}. The training method of the angle classification model is as follows: An original image set can be obtained (the original images can be photos of home appliance display panels). Each original image is rotated by preset angles of 0°, 90°, 180°, and 270°, and a random small-angle perturbation is applied to the image after each rotation to generate training images labeled with the corresponding preset angle categories, thus constructing a training dataset. A CNN is used as the initial model. The initial model includes a feature extraction part and a feature classification part (also called a detection head). The feature classification part is configured to output four classification results, corresponding to the four angle categories of 0°, 90°, 180°, and 270°, respectively. The initial model is trained using the training dataset. The optimizer minimizes the cross-entropy loss between the model's predicted output and the angle category label to update the model parameters. After training, the final model parameters are saved, resulting in the angle classification model.

[0047] In another specific embodiment, the angle classification model can be implemented using the YOLO (You Only Look Once) series of models. For example, the YOLOv10 Oriented Bounding Box (YOLOv10OBB) rotating object detection model can achieve four-class angle detection {0°, 90°, 180°, 270°}.

[0048] After obtaining the main orientation classification result, for example, if the model output is "180°", a corresponding angle correction operation is performed on the first corrected image, that is, the image is rotated 180 degrees. After performing the operation, the second corrected image is obtained. The numerical display content in the second corrected image is guaranteed to be upright (i.e., in the 0-degree direction), providing ideal input for subsequent character recognition.

[0049] Step 140: Based on the transformation matrix corresponding to perspective transformation correction and the transformation matrix corresponding to angle correction, map the vertex coordinates of the numerical display area to the second corrected image, and determine the region to be identified in the second corrected image based on the mapped vertex coordinates.

[0050] Specifically, first, the first transformation matrix used for perspective transformation correction in the first-level correction is obtained. Then, the second transformation matrix used for angle correction in the second-level correction is obtained. Angle correction is essentially a two-dimensional affine transformation, which can be represented as a matrix.

[0051] Next, the coordinates of the four vertices of the numerical display area in the original image to be recognized are extracted. These four vertex coordinates are then mapped by applying the first transformation matrix and the second transformation matrix in sequence. Through this cascaded transformation, the new vertex coordinates corresponding to the original numerical display area in the final second-corrected image after two corrections can be accurately calculated, i.e., the mapped vertex coordinates.

[0052] Finally, based on the coordinates of these four mapped vertices, the region to be identified is determined in the second corrected image. In one specific embodiment, the bounding horizontal rectangle of these four vertices is calculated, and this rectangular region is cropped as the final region to be identified.

[0053] By using this coordinate mapping method instead of secondary detection, the difficulty of relocating the target on the corrected image is cleverly avoided, the instability caused by weak texture in perspective and text areas is alleviated, and the image containing the value to be identified can be accurately cropped even when the contrast of the numerical display area is low and the features are not obvious.

[0054] Step 150: Perform character recognition on the region to be recognized to obtain the numerical recognition result in the image to be recognized.

[0055] Specifically, after obtaining a positive, clear, and precisely cropped image of the region to be recognized, character recognition can be performed on it.

[0056] An image of the region to be identified is input into a pre-trained character recognition model. This character recognition model can be any optical character recognition model known to those skilled in the art capable of recognizing digit sequences, such as a deep learning-based neural network model. The model analyzes the input image and outputs the recognized character sequence, which is the numerical recognition result obtained by this invention.

[0057] The numerical recognition method provided in this invention detects the screen area, the rotation angle of the screen area, and the numerical display area in the screen area of ​​the image to be recognized. It performs two stages of geometric correction: perspective transformation correction and angle correction. It uses the vertex coordinates of the original numerical display area for coordinate mapping, which alleviates the instability caused by weak textures in perspective and text areas. It can stably and accurately extract standardized areas to be recognized from complex original images. This greatly improves the accuracy of numerical recognition on the display screen of home appliances, effectively solves the problem of numerical recognition difficulties caused by factors such as shooting angle, perspective distortion, and image rotation in home appliance after-sales service scenarios, reduces the cost of manual review, and improves the automation level of the overall business process.

[0058] It should be noted that each embodiment of the present invention can be freely combined, rearranged, or executed individually, and does not need to rely on or depend on a fixed execution order.

[0059] In some embodiments, character recognition is performed on the region to be recognized to obtain numerical recognition results in the image to be recognized, including: Input the region to be recognized into the character recognition model and obtain the numerical recognition result output by the character recognition model; The character recognition model is configured to output a preset placeholder at the position of any character in the numerical recognition result if the confidence level of any character in the numerical recognition result is lower than a preset confidence threshold.

[0060] Specifically, a character recognition model is a trained computational model capable of receiving image input and outputting corresponding character sequences. Its core function is to map pixel features in an image into character sequences.

[0061] Confidence score refers to a quantifiable score assigned by a character recognition model to its prediction of a specific location in an image. Typically, this score is a probability value between 0 and 1. For example, when recognizing a character location, the model might calculate a probability of 0.98 for the digit "8", a probability of 0.01 for the digit "6", and extremely low probabilities for other digits. Here, 0.98 represents the model's confidence score for the prediction of "8".

[0062] The preset reliability threshold is a pre-set numerical value used to determine whether the model's recognition results are reliable. This threshold can be adjusted according to the accuracy requirements of the actual application; for example, it can be set to 0.9, 0.95, etc. It is used to distinguish between high-confidence recognition results and low-confidence uncertain results.

[0063] A predefined placeholder is a special marker that does not belong to the normal set of numbers (0-9). The placeholder's function is to explicitly identify the presence of a character in an image, but due to image quality issues, the model cannot confirm its specific value with a sufficiently high confidence level. In this embodiment, the placeholder can be set to the symbol "#". Of course, it can also be any other symbol that can be distinguished from normal numbers. For example, an image of the area to be recognized is obtained, which should display "128". However, due to lens reflection during shooting, the number "2" in the middle becomes very blurry. The model judges the most similar number to be "7", but the confidence level is only 0.4. Because 0.4 is far below the preset confidence threshold of 0.9, the model determines this result to be unreliable. Ultimately, the character recognition model outputs the numerical recognition result as "1#8".

[0064] The numerical recognition method provided in this invention introduces a confidence-deficient placeholder mechanism. When a character in an image becomes blurred due to factors such as shooting angle, motion blur, reflection, or occlusion, and cannot be recognized as a specific number by the network with high confidence, the network actively outputs a preset placeholder. This avoids the "illusion" phenomenon caused by traditional models forcibly outputting incorrect numbers in blurred areas, thereby significantly improving the controllability of the recognition results and the convenience of subsequent data cleaning.

[0065] In some embodiments, the character recognition model includes a visual feature extraction module and a connection-time classification module; the visual feature extraction module is constructed based on a residual network. Input the region to be recognized into the character recognition model, and obtain the numerical recognition result output by the character recognition model, including: The region to be identified is input into the visual feature extraction module to obtain the visual feature vector output by the visual feature extraction module; The visual feature vector is input into the temporal classification module to obtain the numerical recognition result output by the temporal classification module.

[0066] Specifically, in general character recognition tasks, especially when recognizing natural language text (such as words and sentences), there are usually strong contextual semantic relationships between characters. In order to capture this temporal dependency, traditional mainstream recognition models usually insert a recurrent neural network, such as a Long Short-Term Memory (LSTM) network, after the visual feature extraction network and before the final decoding layer.

[0067] However, in-depth research revealed that in the scenario of numerical displays on home appliances, each digit in the displayed numerical sequence (e.g., "123") is physically and logically independent, lacking the contextual relationships found in natural language. For example, there is no necessary causal or probabilistic relationship between the hundreds digit being "1" and the tens digit being "2". In this situation, if a recognition model containing a recurrent neural network module is still used, the recurrent neural network module may incorrectly learn some false, non-existent numerical association patterns from the training data, thus introducing unnecessary prediction biases during the recognition process and ultimately affecting the accuracy of the recognition.

[0068] Based on the above research findings, Figure 2 This is a schematic diagram of the character recognition model provided by the present invention, as shown below. Figure 2 As shown, the character recognition model 200 includes a visual feature extraction module 210 and a connection time-series classification module 220.

[0069] The visual feature extraction module is built on a residual network (ResNet).

[0070] Correspondingly, the process of inputting the region to be recognized into the character recognition model and obtaining its output numerical recognition result specifically includes: First, the image of the region to be identified is input into the visual feature extraction module. The core function of this module is to extract deep, distinctive visual features from the image. In this embodiment of the invention, the module preferably uses a Residual Network (ResNet) as its backbone network. Depending on the requirements for model complexity and performance, different layers of ResNet can be selected, such as ResNet-18, ResNet-34, etc. After performing a series of convolution, pooling, and non-linear activation operations on the input image, the visual feature extraction module outputs a sequence of visual feature vectors. This sequence can be understood as the feature representation extracted at each spatial location after scanning the input image from left to right.

[0071] Then, the visual feature vector sequence is directly input into the Connectionist Temporal Classification (CTC) module. It is particularly noteworthy that, in this embodiment of the invention, the visual feature vector sequence is directly passed from the visual feature extraction module to the connectionist temporal classification module, without including any recurrent neural network module used for modeling temporal dependencies.

[0072] The temporal classification module acts as the decoding end of the model, receiving a sequence of visual feature vectors as input. It automatically processes consecutive, repeated predictions within the feature sequence and ultimately decodes a variable-length output sequence. The final output of this module is the numerical recognition result obtained by this invention.

[0073] The numerical recognition method provided in this invention directly combines a visual feature extraction module based on residual networks with a temporal classification module to form a de-temporalized recognition network. This not only avoids the prediction bias that may be introduced by recurrent neural networks and improves the accuracy of independent digit recognition, but also significantly reduces the number of model parameters and computational complexity, making the model more lightweight, easier to deploy, and easier to implement for fast inference.

[0074] In some embodiments, the character recognition model is trained based on the following steps: The initial model is pre-trained based on simulated digital sequence images to obtain a basic weight model; The character recognition model is obtained by fine-tuning the basic weight model based on simulated and real digital sequence images.

[0075] Specifically, the training process of the character recognition model includes two stages.

[0076] The first stage is pre-training, which yields the basic weight model.

[0077] The goal of this stage is to enable the model to first learn the core, general structural features and stroke rules of the seven-segment display digits, and build a cognitive foundation for the ideal form of digits.

[0078] The initial model can be a model with a fixed network structure, but whose internal weight parameters are randomly initialized.

[0079] The training data at this stage consists entirely of simulated digit sequence images generated by a computer program. These images simulate digit sequences of varying lengths displayed on a seven-segment LED display. Because they are program-generated, a massive amount of training samples can be easily obtained, and each sample comes with an accurate label. The initial model is trained using these simulated digit sequence images. Through a standard model training process, the model learns how to associate the pixel features of digits with their corresponding labels from this vast amount of clean simulated data.

[0080] Training stops once the model's performance on the simulation validation set reaches a preset standard (e.g., accuracy convergence). The resulting model is the basic weight model, which possesses solid basic digit recognition capabilities.

[0081] The second stage is fine-tuning training to obtain the character recognition model.

[0082] The goal of this stage is to build upon the first stage by transferring the model from the source domain (simulation environment) to the target domain (real physical environment), enabling it to learn to adapt to various disturbances in the real-world scenario while maintaining its original structural cognitive abilities. This process is also known as domain adaptive fine-tuning.

[0083] This phase employs a hybrid data strategy. The training dataset consists of two parts: one part is the simulated digital sequence images used in the first phase, and the other part is the real digital sequence images. The real digital sequence images refer to actual photographs taken in real home appliance after-sales installation scenarios and manually labeled.

[0084] The weight parameters of the basic weight model obtained in the first stage are loaded as the starting point for training. Then, the model is further trained (i.e., fine-tuned) using the aforementioned mixed dataset. In a specific embodiment, to ensure that the model does not forget the basic structural knowledge it has learned while adapting to real-world scenarios, the ratio of simulated data to real data in the mixed dataset can be controlled. For example, this ratio can be controlled to 8:2, meaning that each training batch contains 80% simulated data and 20% real data.

[0085] Through this progressive knowledge transfer from the source domain to the target domain, the model can maintain stable recognition of core digital features under the guidance of a large amount of simulation data. At the same time, by being exposed to a small portion of real data, it learns to handle complex factors unique to the real world, such as changes in lighting, reflections, motion blur, and noise.

[0086] Training stops when the model achieves optimal performance on the real validation set. The resulting model is the character recognition model ultimately used in this embodiment of the invention and deployed in actual business applications.

[0087] The numerical recognition method provided in this invention adopts a phased training strategy, which transfers and adapts the general knowledge learned from easily accessible simulation data to complex real-world application scenarios. This significantly improves the generalization ability and scene adaptability of the character recognition model, enabling it to maintain high recognition accuracy and robustness even when faced with various low-quality images in the real world.

[0088] In some embodiments, simulated digital sequence images are generated based on the following steps: Construct a hybrid seed library; the hybrid seed library includes simulated digital character images and real digital character images; Different numbers of simulated digit character images and / or real digit character images are extracted from a mixed seed library to synthesize a simulated digit sequence image.

[0089] Specifically, in this embodiment of the invention, a seed refers to the most basic unit constituting a digital sequence, namely, a single digital character image. The hybrid seed library is a collection containing seeds from two different sources: (1) Simulated digital character images: These images are single digital character images generated by computer programs. For example, images of the numbers "0" to "9" can be generated by drawing a standard seven-segment display. The advantages of these images are that they are standardized in shape, have clear labels, are easy to generate in batches, and can provide the model with basic knowledge about the ideal digital structure.

[0090] (2) Real-world digital character images: These images originate from the real physical world. Individual digital characters can be manually or semi-automatically extracted from actual photographs of home appliances. Before adding these real-world character images to the seed library, some preprocessing operations are usually required, such as standardizing their size, converting them to grayscale, or normalizing their colors, to ensure that their format is consistent with the simulated digital character images. The advantage of these images is that they contain real-world lighting textures, screen material reflections, and possible minor imperfections or noise, providing valuable realism information for the model.

[0091] By combining individual character images from these two sources, a rich and diverse hybrid seed library is constructed. This library possesses both the structural correctness of simulated characters and the textural realism of real characters.

[0092] After constructing the hybrid seed library, the final digital sequence images used for training can be synthesized by combining and splicing them together.

[0093] First, determine the length of the sequence to be generated. This length can be fixed or randomly selected within a range (for example, generating a random number sequence with a length between 2 and 5 digits) to simulate numerical values ​​with different numbers of digits that may occur in reality.

[0094] Then, based on the determined length, a corresponding number of individual character images are randomly selected from the mixed seed library. Since the seed library is mixed, each selection may yield either simulated digit characters or real digit characters.

[0095] Finally, the extracted individual character images are horizontally stitched or arranged from left to right to synthesize a wider image containing the complete digit sequence. This newly synthesized image is a simulated digit sequence image. Because its components originate from a hybrid seed library, its labels are also defined.

[0096] By repeating this step, a virtually unlimited number of digit sequence images of varying lengths and contents can be generated using a finite number of seed characters, thus greatly expanding the size and diversity of the training dataset.

[0097] The numerical recognition method provided in this invention creatively integrates the advantages of simulated and real data at the most basic character level. The digital sequence images generated in this way inherit the clear structural features of simulated characters while incorporating the rich texture and lighting details of real characters, significantly reducing the difference between synthetic and real data. Using such high-quality synthetic data for model training allows the model to be exposed to complex features close to the real world during the learning phase, thereby greatly improving the model's generalization ability and performance in practical applications.

[0098] In some embodiments, synthesizing simulated digital sequence images includes: Random scene perturbation parameters are introduced during the synthesis process; The scene perturbation parameters include at least one of contrast change, rotation angle, spatial position offset, and scale scaling.

[0099] Specifically, contrast variations are used to simulate different lighting conditions in a real-world environment. For example, images shot outdoors in bright light have higher contrast, while those shot indoors in dim light may have lower contrast. In practice, a contrast adjustment factor can be randomly selected within a preset range and applied to the entire image sequence to increase or decrease its overall contrast.

[0100] The rotation angle is used to simulate the slight tilt that occurs when a user holds the shooting device. This rotation is a small angular perturbation introduced to increase the model's robustness (e.g., randomly selecting an angle within a small range, such as -15 degrees to +15 degrees). By introducing this small rotation, the model can be trained to be more adaptable to imperfectly leveled digit sequences.

[0101] Spatial position offset is used to simulate situations where a digital sequence is not perfectly centered in the captured image. In practice, the synthesized sequence of images can be randomly translated in two dimensions (i.e., moved slightly in the horizontal and vertical directions) within a predefined range.

[0102] Scale scaling is used to simulate changes in digital size due to varying shooting distances. In practice, a scaling factor can be randomly selected (e.g., between 0.8 and 1.2) to enlarge or reduce the overall size of the synthesized image sequence.

[0103] In each independent digital sequence synthesis, one or more of the above perturbations can be randomly selected and applied. For example, one synthesis might only introduce a contrast change, while another synthesis might introduce both slight rotation and scaling. Similarly, the specific parameter values ​​of the selected perturbation (such as the specific degree of rotation, the specific scale ratio, etc.) are also randomly selected within a preset range.

[0104] The numerical recognition method provided in this invention introduces real-world random perturbations during the synthesis stage, forcing the model to learn to focus on the intrinsic structural features of the numbers themselves, while ignoring interference from surface factors such as illumination, angle, and scale. This greatly enhances the robustness and generalization ability of the final character recognition model, making it more stable and accurate when facing complex and ever-changing real-world scenarios.

[0105] In some embodiments, the simulated digital character image is generated based on the following steps: A simulation generation program based on the independently controlled seven-segment display stroke structure generates positive sample digital character images; Based on the simulation generation program, negative sample digit character images are generated by simulating different types of display defects, and the sample labels of the negative sample digit character images are set as preset placeholders. Simulated digit character images are generated based on positive and negative sample digit character images.

[0106] Specifically, a simulation generation program can be built. The core capability of this program lies in its ability to independently control the various attributes of the seven segments that make up a seven-segment display. This independent control means that the on / off state, style, thickness, color, transparency, etc. of each segment can be determined individually.

[0107] According to the standard encoding rules of seven-segment displays, for each digit from 0 to 9, the corresponding stroke segment combination is activated. To increase the diversity of samples, the controllable attributes of the program can be randomized during generation, such as randomly selecting the stroke style (e.g., right angle or rounded corner), randomly adjusting the stroke thickness, and randomly setting the overall size of the character. Each standard digit character image generated in this way is then associated with its corresponding real numerical label (e.g., if the image is "2", the label is "2").

[0108] This step generates a large number of diverse positive sample digit character images with correct digit labels.

[0109] The same simulation generation program is used, but its purpose is no longer to generate standard numbers. Instead, it simulates various display defects that may occur on a real-world display screen, thereby generating character forms that cannot be clearly identified. These defects may include broken strokes, stroke adhesion / crosstalk, missing strokes, or excessive darkness. After generating an image with display defects, a crucial operation is performed: regardless of which number the image originally intended to simulate, its sample label is uniformly and forcibly set to a preset placeholder.

[0110] This step generates a large number of diverse negative sample digit character images with sample labels (preset placeholders).

[0111] Finally, all positive sample digit character images (labeled 0-9) and all negative sample digit character images (labeled with preset placeholders) are combined to form a simulated digit character image.

[0112] The numerical recognition method provided in this invention actively generates and explicitly labels negative samples (i.e. defective images), enabling the model to recognize uncertain characters and output placeholders. Instead of relying on indirect and unstable confidence threshold judgments, it becomes an active and reliable classification behavior learned through training with a large amount of data, fundamentally improving the controllability of the entire recognition system and the reliability of the final result.

[0113] In some embodiments, perspective transformation correction is performed on the screen area based on the rotation angle to obtain a first corrected image, including: A perspective transformation is performed based on the vertex coordinates of the screen area, mapping the screen area into a horizontal rectangle with no angle. The first corrected image is obtained by cropping the image to be recognized using a horizontal rectangle with no angle.

[0114] Specifically, the coordinates of four vertices of the screen region are obtained. These four vertex coordinates together define the source quadrilateral in the original image to be recognized.

[0115] Define the coordinates of the four vertices of a target rectangle. This target rectangle is a horizontal rectangle without angles, representing the standard shape to which the source quadrilateral is to be corrected.

[0116] Based on the coordinates of the four vertices of the source quadrilateral and the four vertices of the target rectangle, a perspective transformation matrix is ​​calculated. This matrix describes the mathematical relationship that maps any point in the source coordinate system to its corresponding point in the target coordinate system. Solving this matrix is ​​a standard computational procedure in this field, and it can usually be obtained by solving a system of linear equations consisting of eight unknowns.

[0117] Then, based on a horizontal rectangle without angle, the image to be recognized is cropped to obtain the first corrected image.

[0118] The specific implementation of this process is as follows: The calculated perspective transformation matrix is ​​applied to the original image to be recognized. This operation, according to the definition of the transformation matrix, "stretches" or "compresses" the image content within the source quadrilateral pixel by pixel and fills it into the target rectangular region. The final output of this operation is a new image. In this new image, the screen content that was originally tilted and distorted in the image to be recognized has now become a standard, unrotated, and distortion-free rectangular image. This newly generated image is the first corrected image.

[0119] The numerical recognition method provided in this invention can accurately eliminate perspective distortion caused by shooting angle, stably restore screen areas of arbitrary shape into standardized rectangular images, effectively eliminate the interference of shape distortion on angle classification, and thus ensure the accuracy and stability of the entire correction process.

[0120] In some embodiments, the first corrected image is angle-corrected based on the main orientation classification result of the first corrected image to obtain a second corrected image, including: Input the first corrected image into the angle classification model to obtain the main direction classification result output by the angle classification model; The first corrected image is angle-corrected based on the main direction classification result to obtain the second corrected image.

[0121] Specifically, the angle classification model is a pre-trained deep learning model specifically designed to determine the principal orientation of an image. This model can be a lightweight convolutional neural network for fast inference. It is trained to identify and distinguish four main orientation categories.

[0122] After the first corrected image is input into the angle classification model, the model performs a forward propagation calculation and outputs a principal orientation classification result. This result is discrete and explicitly indicates the current orientation of the input image. In a specific embodiment, the classification result is one of the set {0°, 90°, 180°, 270°}. For example, if the first corrected image is inverted, the model should output "180°".

[0123] If the main direction classification result output by the angle classification model is 0°, it indicates that the first corrected image is already positive and no further action is required.

[0124] If the classification result is 90°, then perform an image transformation operation on the first corrected image by rotating it 90 degrees counterclockwise (or 270 degrees clockwise).

[0125] If the classification result is 180°, then perform a 180-degree rotation image transformation operation on the first corrected image.

[0126] If the classification result is 270°, then perform an image transformation operation on the first corrected image by rotating it 270 degrees counterclockwise (or 90 degrees clockwise).

[0127] After the aforementioned angle correction operations, the resulting image is the second corrected image. Regardless of the original orientation of the first corrected image, the second corrected image obtained after processing is guaranteed to be an upright image with the content facing upwards and the orientation at 0 degrees.

[0128] The numerical recognition method provided in this invention solves the problem of main direction rotation that may still exist in the image after perspective transformation. In particular, it successfully overcomes the problem of 180-degree inversion confusion that is difficult to distinguish by traditional methods. It ensures that the image finally sent to the character recognition module has a uniform and standardized positive posture, fundamentally eliminating recognition errors that may be caused by image rotation, and laying a solid foundation for the high accuracy and high robustness of the entire numerical recognition method.

[0129] Figure 3 This is the second flowchart illustrating the numerical recognition method provided by the present invention, as shown below. Figure 3 As shown, the method includes: Step 310: Perform rotation target detection on the image to be recognized to determine the screen area and the numerical display area.

[0130] Step 320: Perform perspective transformation correction on the screen area to obtain the first corrected image.

[0131] Step 330: Perform angle correction on the first corrected image to obtain the second corrected image.

[0132] Step 340: Based on the transformation matrix corresponding to perspective transformation correction and the transformation matrix corresponding to angle correction, map the vertex coordinates of the numerical display area to the second corrected image, and determine the region to be identified in the second corrected image.

[0133] Step 350: Perform character recognition on the region to be recognized based on the character recognition model to obtain the numerical recognition result.

[0134] The numerical identification method provided in this invention is applied to automatically identify the total dissolved solids value displayed on the screen of devices such as water purifiers, and has the following technical effects: (1) The recognition accuracy has been greatly improved.

[0135] In a test using 5,000 real installation site photos, the numerical recognition accuracy reached 99.8%, while the traditional method only achieved about 83%, representing an improvement of about 16.8 percentage points.

[0136] (2) Enhanced robustness at complex angles.

[0137] For on-site photos rotated by ±30°, the recognition accuracy rate is 99.5%, while the traditional method only achieves about 68%, representing an improvement of about 31.5 percentage points.

[0138] (3) It is highly adaptable to reflective / blurred scenes.

[0139] On images containing reflections or slight blur, the recognition success rate reaches 98%, while the traditional method only achieves about 43%, representing an improvement of about 54.6 percentage points.

[0140] (4) The cost of manual verification is significantly reduced.

[0141] The false alarm rate was reduced to 1%, the number of images to be reviewed annually was reduced from 1.36 million to about 16,000, and the manpower required was reduced from 1,360 person-days in the traditional solution to about 16 person-days, a total reduction of about 99.99%.

[0142] (5) High adaptability and strong robustness.

[0143] This invention innovatively combines rotation detection, angle correction, character recognition model construction, training data synthesis, and a phased adversarial training strategy to solve the recognition defects of existing methods in large-angle, reflective, and blurred scenes, achieving high accuracy, low false alarms, and high scene adaptability.

[0144] The apparatus provided in the embodiments of the present invention will be described below. The apparatus described below can be referred to in correspondence with the method described above.

[0145] Figure 4 This is a schematic diagram of the numerical recognition device provided by the present invention, as shown below. Figure 4 As shown, the device includes: The rotation detection module 410 is used to perform rotation target detection on the image to be recognized, determine the screen area in the image to be recognized, the rotation angle of the screen area, and the numerical display area in the screen area. The first-level correction module 420 is used to perform perspective transformation correction on the screen area based on the rotation angle to obtain the first corrected image; The secondary correction module 430 is used to perform angle correction on the first corrected image based on the main direction classification result of the first corrected image to obtain the second corrected image; The region determination module 440 is used to map the vertex coordinates of the numerical display region to the second corrected image based on the transformation matrix corresponding to perspective transformation correction and the transformation matrix corresponding to angle correction, and to determine the region to be identified in the second corrected image based on the mapped vertex coordinates. The character recognition module 450 is used to perform character recognition on the region to be recognized and obtain the numerical recognition result in the image to be recognized.

[0146] The numerical recognition device provided in this invention detects the screen area, the rotation angle of the screen area, and the numerical display area in the screen area of ​​the image to be recognized. It performs two stages of geometric correction: perspective transformation correction and angle correction. It uses the vertex coordinates of the original numerical display area for coordinate mapping, which alleviates the instability caused by weak textures in perspective and text areas. It can stably and accurately extract standardized areas to be recognized from complex original images. This greatly improves the accuracy of numerical recognition on the display screen of home appliances, effectively solves the problem of numerical recognition difficulties caused by factors such as shooting angle, perspective distortion, and image rotation in home appliance after-sales service scenarios, reduces the cost of manual review, and improves the automation level of the overall business process.

[0147] Figure 5 This is a schematic diagram of the structure of the electronic device provided by the present invention, such as... Figure 5 As shown, the electronic device may include: a processor 510, a communications interface 520, a memory 530, and a communications bus 540, wherein the processor, communications interface, and memory communicate with each other via the communications bus. The processor can invoke logical commands stored in the memory to execute the methods described in the above embodiments, for example: Rotational target detection is performed on the image to be recognized to determine the screen region, rotation angle, and numerical display area within the screen region. Perspective transformation correction is applied to the screen region based on the rotation angle to obtain a first corrected image. An angle correction is applied to the first corrected image based on the principal direction classification result to obtain a second corrected image. The vertex coordinates of the numerical display area are mapped to the second corrected image based on the transformation matrices corresponding to the perspective and angle corrections. The region to be recognized is determined in the second corrected image based on the mapped vertex coordinates. Character recognition is then performed on the region to be recognized to obtain the numerical recognition result in the image.

[0148] Furthermore, when the logical commands in the aforementioned memory can be implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this invention, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several commands to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0149] The processor in the electronic device provided in this embodiment of the invention can call logical instructions in the memory to implement the above method. Its specific implementation method is the same as the aforementioned method implementation method and can achieve the same beneficial effects, which will not be repeated here.

[0150] This invention also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, is implemented to perform the methods provided in the above embodiments.

[0151] The specific implementation method is the same as the aforementioned method implementation method and can achieve the same beneficial effects, so it will not be repeated here.

[0152] This invention provides a computer program product, including a computer program that, when executed by a processor, implements the method described above.

[0153] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.

[0154] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.

[0155] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A numerical recognition method characterized by comprising: include: Rotational target detection is performed on the image to be identified to determine the screen region in the image to be identified, the rotation angle of the screen region, and the numerical display area in the screen region; Based on the rotation angle, the screen area is subjected to perspective transformation correction to obtain a first corrected image; The first corrected image is angle-corrected based on the main direction classification result of the first corrected image to obtain the second corrected image; Based on the transformation matrix corresponding to the perspective transformation correction and the transformation matrix corresponding to the angle correction, the vertex coordinates of the numerical display area are mapped to the second corrected image, and the area to be identified is determined in the second corrected image based on the mapped vertex coordinates; Character recognition is performed on the region to be recognized to obtain the numerical recognition result in the image to be recognized.

2. The numerical identification method according to claim 1, characterized in that, The step of performing character recognition on the region to be recognized to obtain the numerical recognition result in the image to be recognized includes: The region to be identified is input into the character recognition model to obtain the numerical recognition result output by the character recognition model; The character recognition model is configured to output a preset placeholder at the position of any character in the numerical recognition result when the confidence level of any character in the numerical recognition result is lower than a preset confidence threshold.

3. The numerical identification method according to claim 2, characterized in that, The character recognition model includes a visual feature extraction module and a connection-time classification module; the visual feature extraction module is constructed based on a residual network. The step of inputting the region to be recognized into the character recognition model and obtaining the numerical recognition result output by the character recognition model includes: The region to be identified is input into the visual feature extraction module to obtain the visual feature vector output by the visual feature extraction module; The visual feature vector is input into the connection time series classification module to obtain the numerical recognition result output by the connection time series classification module.

4. The numerical identification method according to claim 2, characterized in that, The character recognition model is trained based on the following steps: The initial model is pre-trained based on simulated digital sequence images to obtain a basic weight model; The character recognition model is obtained by fine-tuning the basic weight model based on the simulated digital sequence image and the real digital sequence image.

5. The numerical identification method according to claim 4, characterized in that, The simulated digital sequence image is generated based on the following steps: Construct a hybrid seed library; the hybrid seed library includes simulated digital character images and real digital character images; Different numbers of simulated digital character images and / or real digital character images are extracted from the hybrid seed library to synthesize the simulated digital sequence image.

6. The numerical identification method according to claim 5, characterized in that, The synthesis of the simulated digital sequence image includes: Random scene perturbation parameters are introduced during the synthesis process; The scene perturbation parameters include at least one of contrast change, rotation angle, spatial position offset, and scale scaling.

7. The numerical identification method according to claim 5, characterized in that, The simulated digital character image is generated based on the following steps: A simulation generation program based on the independently controlled seven-segment display stroke structure generates positive sample digital character images; Based on the simulation generation program, negative sample digital character images are generated by simulating different types of display defects, and the sample labels of the negative sample digital character images are set as preset placeholders; Based on the positive sample digit character image and the negative sample digit character image, a simulated digit character image is generated.

8. The numerical identification method according to any one of claims 1 to 7, characterized in that, The step of performing perspective transformation correction on the screen area based on the rotation angle to obtain a first corrected image includes: A perspective transformation is performed based on the vertex coordinates of the screen area to map the screen area into a horizontal rectangle with no angle. The first corrected image is obtained by cropping the image to be identified based on the horizontal rectangle without angle.

9. The numerical identification method according to any one of claims 1 to 7, characterized in that, The step of performing angle correction on the first corrected image based on the main direction classification result of the first corrected image to obtain the second corrected image includes: The first corrected image is input into the angle classification model to obtain the main direction classification result output by the angle classification model; Based on the main direction classification result, the first corrected image is angle-corrected to obtain the second corrected image.

10. A numerical recognition device, characterized by include: The rotation detection module is used to perform rotation target detection on the image to be identified, determine the screen area in the image to be identified, the rotation angle of the screen area, and the numerical display area in the screen area; A first-level correction module is used to perform perspective transformation correction on the screen area based on the rotation angle to obtain a first corrected image; The secondary correction module is used to perform angle correction on the first corrected image based on the main direction classification result of the first corrected image to obtain the second corrected image; The region determination module is used to map the vertex coordinates of the numerical display region to the second corrected image based on the transformation matrix corresponding to the perspective transformation correction and the transformation matrix corresponding to the angle correction, and to determine the region to be identified in the second corrected image based on the mapped vertex coordinates. The character recognition module is used to perform character recognition on the region to be recognized and obtain the numerical recognition result in the image to be recognized.

11. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the numerical recognition method according to any one of claims 1 to 9.

12. A non-transitory computer-readable storage medium having stored thereon a computer program, characterized in that, When the computer program is executed by the processor, it implements the numerical recognition method according to any one of claims 1 to 9.

13. A computer program product comprising a computer program, characterized in that, When the computer program is executed by the processor, it implements the numerical recognition method as described in any one of claims 1 to 9.