Text recognition method and electronic device

By constructing depth maps and utilizing phase difference calculations, the problem of low text recognition accuracy in 3D non-planar deformed documents in OCR scenarios was solved, achieving accurate correction and recognition of text distortion.

CN122244882APending Publication Date: 2026-06-19LENOVO (BEIJING) LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
LENOVO (BEIJING) LTD
Filing Date
2026-03-30
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In optical character recognition (OCR) scenarios, existing technologies have low text recognition accuracy, especially when dealing with three-dimensional non-planar deformed paper documents, where they cannot accurately correct text distortion, resulting in poor recognition performance.

Method used

By constructing a depth map based on an image acquisition device, spatial distance information of the surface of the document to be identified is obtained, the first target parameter is determined, and the text to be identified in the second image is identified using this parameter. By combining the depth map and phase difference calculation, text distortion is accurately located and corrected.

Benefits of technology

It improves the accuracy and efficiency of text recognition, can adapt to paper documents with three-dimensional non-planar deformation, and enhances the overall accuracy of text recognition.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122244882A_ABST
    Figure CN122244882A_ABST
Patent Text Reader

Abstract

This application discloses a text recognition method and an electronic device. The method includes: acquiring an image of a document to be recognized using an image acquisition device to obtain a first image; constructing a depth map corresponding to the document to be recognized based on the first image; the depth map includes the spatial distance between each sampling point on the surface of the document to be recognized and the image acquisition device; determining a first target parameter of the document to be recognized based on the depth map; the first target parameter characterizes the difference in distance between each sampling point on the surface of the document to be recognized and the image acquisition device; and recognizing text to be recognized in a second image based at least on the first target parameter, the second image including the text to be recognized acquired from the document to be recognized using the image acquisition device.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of computer technology, and in particular to a text recognition method and electronic device. Background Technology

[0002] In optical character recognition (OCR) scenarios, improving the accuracy of document text recognition has become an urgent problem to be solved. Summary of the Invention

[0003] The technical solution provided in this application is as follows:

[0004] The first aspect of this application provides a text recognition method, including:

[0005] The image acquisition device acquires an image of the document to be recognized, thus obtaining a first image.

[0006] Based on the first image, a depth map corresponding to the document to be identified is constructed; the depth map includes the spatial distance between each sampling point on the surface of the document to be identified and the image acquisition device;

[0007] Based on the depth map, a first target parameter of the document to be identified is determined; the first target parameter characterizes the difference in distance between each sampling point on the surface of the document to be identified and the image acquisition device.

[0008] Based at least on the first target parameter, the text to be identified in the second image is identified, and the second image contains the text to be identified obtained from the document to be identified by the image acquisition device.

[0009] In one possible implementation, the image acquisition device includes two; the first image includes multiple first image pairs, each first image pair including images synchronously acquired by the two image acquisition devices at the same time after a light signal with a target encoding format and a target phase is projected onto the surface of the document to be identified; different first image pairs correspond to different target phases;

[0010] Before constructing the depth map corresponding to the document to be identified, the method further includes:

[0011] Based on the plurality of first image pairs, determine the first phase value and the second phase value of each sampling point on the surface of the document to be identified in the coordinate system of different image acquisition devices;

[0012] The mean value of the first phase value corresponding to all sampling points on the surface of the document to be identified is determined as the first reference phase value, and the mean value of the second phase value corresponding to all sampling points on the surface of the document to be identified is determined as the second reference phase value.

[0013] If a target sampling point exists among the sampling points, a depth map corresponding to the document to be identified is constructed; the difference between the first phase value corresponding to the target sampling point and the first reference phase value is greater than a preset threshold, and / or the difference between the second phase value and the second reference phase value is greater than the preset threshold.

[0014] In one possible implementation, the image acquisition device includes two; the first image includes multiple first image pairs, each first image pair including images synchronously acquired by the two image acquisition devices at the same time after a light signal with a target encoding format and a target phase is projected onto the surface of the document to be identified; different first image pairs correspond to different target phases;

[0015] The step of constructing a depth map corresponding to the document to be identified based on the first image includes:

[0016] Based on the plurality of first image pairs, determine the first phase value and the second phase value of each sampling point on the surface of the document to be identified in the coordinate system of different image acquisition devices;

[0017] Based on the phase difference between the first phase value and the second phase value and the wavelength of the optical signal, the distance between the sampling point and the two image acquisition devices is determined;

[0018] A depth map corresponding to the document to be identified is constructed based on the distance between each sampling point and the two image acquisition devices.

[0019] In one possible implementation, determining the first target parameter of the document to be identified based on the depth map includes:

[0020] The mean value of the first phase value corresponding to all sampling points on the surface of the document to be identified is determined as the first reference phase value, and the mean value of the second phase value corresponding to all sampling points on the surface of the document to be identified is determined as the second reference phase value.

[0021] Based on the reference phase difference between the first reference phase value and the second reference phase value and the wavelength of the optical signal, the reference distance between the document to be identified and the two image acquisition devices is determined;

[0022] The distance difference between each sampling point in the depth map and the two image acquisition devices and the reference distance is determined as the first target parameter.

[0023] In one possible implementation, the recognition of the text to be recognized in the second image, based at least on the first target parameter, includes:

[0024] Select a target region from the first image whose first target parameter satisfies a set distance threshold.

[0025] Based on the first target parameters corresponding to each pixel in the target region, the distortion parameters of the pixel are determined; the distortion parameters characterize the degree of distortion of the pixel.

[0026] Based on the distortion parameters, the pixels are adjusted to obtain the adjusted second image;

[0027] The text to be identified in the adjusted second image is then identified.

[0028] In one possible implementation, the text recognition method further includes:

[0029] Based on the depth map, a second target parameter is determined; the second target parameter represents the angle between the normal vector of the plane of the document to be identified and the vertical direction of the image acquisition device.

[0030] Based on the depth map, a third target parameter is determined; the third target parameter represents the horizontal rotation angle of the document to be identified relative to the vertical direction of the image acquisition device.

[0031] Based at least on the first target parameters, the text to be identified in the second image is identified, including:

[0032] Based on at least one of the second target parameter and the third target parameter, as well as the first target parameter, the text to be identified in the second image is identified.

[0033] In one possible implementation, the text recognition method further includes:

[0034] Based on the depth map, a second target parameter is determined; the second target parameter represents the angle between the normal vector of the plane of the document to be identified and the vertical direction of the image acquisition device.

[0035] Based on at least one of the first target parameter and the second target parameter, the image acquisition device is switched from a first working state to a second working state; the second working state corresponds to the three-dimensional pose of the document to be identified.

[0036] The second image is acquired based on the image acquisition device in the second working state.

[0037] In one possible implementation, switching the image acquisition device from a first operating state to a second operating state based on at least one of the first target parameter and the second target parameter includes at least one of the following:

[0038] Based on the first target parameter, the raised areas on the surface of the document to be identified are identified, and the supplementary light intensity of the image acquisition device corresponding to the raised areas is enhanced;

[0039] Based on the second target parameter, the pitch angle of the image acquisition device is adjusted so that the shooting direction of the image acquisition device is facing the plane of the document to be identified.

[0040] In one possible implementation, the optical signal is generated in the following manner:

[0041] The image acquisition device acquires the document to be identified to obtain a third image;

[0042] Based on the third image, identify the type of the document to be identified;

[0043] Based on the type of the document to be identified, an optical signal with a target encoding format is generated; the target encoding format matches the type of the document to be identified.

[0044] Another aspect of this application provides an electronic device, comprising:

[0045] An image acquisition device is used to acquire images of a document to be recognized, obtaining a first image and a second image; the second image contains the text to be recognized.

[0046] Processor, used for:

[0047] Based on the first image, a depth map corresponding to the document to be identified is constructed; the depth map includes the spatial distance between each sampling point on the surface of the document to be identified and the image acquisition device;

[0048] Based on the depth map, a first target parameter of the document to be identified is determined; the first target parameter characterizes the difference in distance between each sampling point on the surface of the document to be identified and the image acquisition device.

[0049] Based at least on the first target parameters, the text to be identified in the second image is identified. Attached Figure Description

[0050] The above and other features, advantages, and aspects of the embodiments of this disclosure will become more apparent from the accompanying drawings and the following detailed description. Throughout the drawings, the same or similar reference numerals denote the same or similar elements. It should be understood that the drawings are schematic, and the originals and elements are not necessarily drawn to scale.

[0051] Figure 1 A flowchart illustrating a text recognition method provided in Embodiment 1 of this application;

[0052] Figure 2 This is a flowchart illustrating a text recognition method provided in Embodiment 2 of this application;

[0053] Figure 3 This is a flowchart illustrating a text recognition method provided in Embodiment 3 of this application;

[0054] Figure 4 A three-dimensional pose diagram of a document to be identified is provided for this application;

[0055] Figure 5 This is a flowchart illustrating a text recognition method provided in Embodiment 4 of this application;

[0056] Figure 6 This is a flowchart illustrating a text recognition method provided in Embodiment 5 of this application;

[0057] Figure 7 This is a flowchart illustrating a text recognition method provided in Embodiment 6 of this application;

[0058] Figure 8 A schematic diagram of a light pattern provided in this application;

[0059] Figure 9 A schematic diagram of another light pattern provided in this application;

[0060] Figure 10 A schematic diagram of another light pattern provided in this application;

[0061] Figure 11 This is a schematic diagram of the hardware coordination and execution process of a text recognition method provided in this application. Detailed Implementation

[0062] The embodiments of this application are described below with reference to the accompanying drawings. The terminology used in the implementation section of this application is for explaining specific embodiments only and is not intended to limit the scope of this application.

[0063] The embodiments of this application will now be described with reference to the accompanying drawings. Those skilled in the art will recognize that, with technological advancements and the emergence of new scenarios, the technical solutions provided in the embodiments of this application are equally applicable to similar technical problems.

[0064] The terms "first," "second," etc., used in this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such terms can be used interchangeably where appropriate; this is merely a way of distinguishing objects with the same attributes in the embodiments of this application. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion, so that a process, method, system, product, or apparatus that comprises a series of units is not necessarily limited to those units, but may include other units not explicitly listed or inherent to those processes, methods, products, or apparatuses.

[0065] Reference Figure 1 This is a flowchart illustrating a text recognition method provided in Embodiment 1 of this application, as shown below. Figure 1 As shown, the method may include, but is not limited to, the following steps:

[0066] Step S101: The image acquisition device acquires an image of the document to be recognized to obtain a first image.

[0067] In this embodiment, the image acquisition device is a hardware device capable of optical imaging. This device can be a standalone dedicated imaging device, such as a high-definition camera, a document camera, an industrial imaging camera, a scanner, etc., or it can be a built-in imaging module integrated into various electronic devices, such as cameras / imaging components on devices like computers, tablets, smartphones, and portable scanning terminals. It can be flexibly selected according to the actual application scenario, and all of them can clearly capture the visual information of the physical document surface.

[0068] In practical applications, image acquisition devices can be flexibly selected according to specific scenarios to ensure clear capture of visual information from the physical document surface. For example, documents to be identified can include, but are not limited to, various paper documents placed on a desktop, such as contracts, files, and materials used in daily office work. For such documents, image acquisition can be performed either through a dedicated imaging device deployed on the desktop or by using electronic devices with built-in imaging modules (such as computer webcams, tablet webcams, etc.).

[0069] Alternatively, the documents to be identified can include various paper documents that appear in on-site environments outside of a fixed office desk, such as forms for outdoor business transactions, pages from books browsed offline, leaflets distributed on-site, and receipts from stores. For such documents, the built-in imaging modules of portable electronic devices such as mobile phones and tablets can be used for real-time optical imaging recognition, meeting flexible recognition needs for outdoor offices and on-site business processing.

[0070] Step S102: Based on the first image, construct a depth map corresponding to the document to be identified; the depth map includes the spatial distance between each sampling point on the surface of the document to be identified and the image acquisition device.

[0071] In practical applications, documents undergoing use, browsing, storage, and transportation are highly susceptible to three-dimensional non-planar deformation due to various factors. From the perspective of the document's material properties, if it is soft, flexible, or easily deformed by external forces, deformation may occur during normal use. Analyzing the external environment and human actions, repeated folding and squeezing, changes in temperature and humidity, and edge warping due to prolonged storage can all cause different types of three-dimensional non-planar deformation, such as wrinkles, bends, bulges, and depressions.

[0072] The first image, as an optical imaging result of the document to be identified, can intuitively present the visual characteristics of deformation on the document surface through visual differences in color, grayscale, and light and shadow, but it has obvious limitations. It cannot accurately quantify the three-dimensional deformation of the document surface. Specifically, it cannot accurately reflect the specific magnitude of deformation at different locations on the document surface, nor can it accurately determine the specific location of the deformation in three-dimensional space.

[0073] When performing text recognition on the document to be recognized, these three-dimensional deformations may cause serious text distortion problems. If only the visual information provided by the first image is relied upon, the lack of quantitative data on the three-dimensional deformation will not provide effective data support for the accurate correction of distorted text, and thus it will be difficult to achieve high-precision text recognition on documents with non-planar deformations.

[0074] Therefore, constructing a depth map based on the first image has significant practical implications. By constructing a depth map, the three-dimensional physical morphology of a document surface can be transformed into quantifiable spatial data, moving from a purely visual perception level to a precise data quantification level. This lays a data foundation for subsequent work on the accurate correction and recognition of distorted text.

[0075] The sampling points on the surface of the document to be identified can include discrete physical measurement points selected at a certain density on the physical surface of the document to be identified in order to quantitatively characterize the three-dimensional shape of the document. These points are the basic units for collecting and calculating the spatial information of the document surface. There can be multiple correspondences between the sampling points and the pixel coordinates of the first image: either one sampling point corresponds to a single pixel position in the first image, or one sampling point corresponds to a local region composed of multiple adjacent pixels in the first image.

[0076] The spatial distance between each sampling point and the image acquisition device can directly reflect the height position of the corresponding sampling point in three-dimensional space. For example, the sampling points in the raised areas of the document surface are more spatially distant from the image acquisition device, while the sampling points in the recessed and folded valley areas are less spatially distant from the image acquisition device.

[0077] Step S103: Based on the depth map, determine the first target parameter of the document to be identified; the first target parameter characterizes the difference in distance between each sampling point on the surface of the document to be identified and the image acquisition device.

[0078] While depth maps provide spatial distance information between sampling points on the document surface and the image acquisition device, this distance data is relatively independent and scattered. Directly using this raw distance data makes it difficult to intuitively understand the overall deformation of the document surface, and it also fails to provide targeted guidance for subsequent text recognition and distortion correction. For example, it is impossible to quickly determine from this raw distance data which areas of the document have bulges or depressions, and how severe the deformation is.

[0079] To address the challenge of directly utilizing raw depth map data, this embodiment first calculates the spatial distance between all valid sampling points in the depth map, selecting a baseline distance for the entire document (which can be the mean, median, or distance to flat areas, etc.). This baseline distance represents the ideal reference distance when the document has no 3D deformation, providing a unified reference standard for subsequent analysis of the deformation of each sampling point.

[0080] By traversing each valid sampling point within the depth map, the difference between the actual spatial distance of each sampling point and the reference distance is calculated to obtain the distance difference for that sampling point. The sign of the distance difference can represent the convex / concave state of the sampling point relative to the reference surface, and the absolute value of the distance difference represents the degree of deformation at that location.

[0081] By using the distance difference as the first target parameter, the deformation of each sampling point on the document surface relative to the ideal reference state can be intuitively reflected.

[0082] Step S104: Based at least on the first target parameters, identify the text to be identified in the second image, wherein the second image contains the text to be identified obtained from the document to be identified by the image acquisition device.

[0083] The second image can be the result of imaging the same document to be identified from the same perspective in the same scene as the first image. However, the two images differ in their functional focus. The main purpose of the first image is to provide visual information about the complete shape of the document for building a depth map, helping to understand the overall morphological features of the document surface, such as undulations and curvatures. The second image, on the other hand, focuses on preserving information that is crucial for text recognition, such as the strokes, outlines, and layout of the text.

[0084] The second image can be acquired simultaneously with the first image during the same acquisition and imaging process, or it can be acquired separately as a high-definition text image.

[0085] The three-dimensional non-planar deformation of the document to be recognized can cause text distortion, such as stroke bending, stretching, and compression, which seriously affects the accuracy of text recognition. If only the visual information of the text provided by the second image is relied upon, the direction and degree of text distortion cannot be accurately determined due to the lack of quantitative data on the three-dimensional deformation, making it difficult to accurately correct the distorted text.

[0086] The first target parameter can accurately reflect the deformation of each sampling point on the document surface, and can provide the necessary data support for the correction of distorted text.

[0087] In this embodiment, the text recognition strategy can also be optimized based on the first target parameter. For example, for areas with more severe deformation, a more complex and sophisticated recognition algorithm can be used to improve the accuracy of recognition when recognizing the text to be recognized in the second image; while for areas with less deformation, a relatively simple and fast recognition algorithm can be used to improve the recognition efficiency.

[0088] In addition, the first target parameter can also be used to determine the priority of text regions. During the recognition of the text to be recognized, regions with smaller deformation and lower recognition difficulty are processed first to ensure the smoothness and accuracy of recognition.

[0089] In optical character recognition scenarios, paper documents to be recognized are easily affected by material properties, human operation, and environmental factors, resulting in three-dimensional non-planar deformations such as wrinkles, bends, bulges, and depressions. These deformations directly cause problems such as perspective distortion, stroke stretching and compression, and contour deformation of the text on the document surface. However, relying solely on conventional text acquisition images (second images) can only present the two-dimensional visual appearance of the text and cannot obtain quantitative information on the three-dimensional deformation of the document. It is impossible to accurately determine the specific location, direction, and severity of text distortion, nor can it carry out targeted correction, thereby reducing the accuracy of text recognition and becoming the core problem restricting the effectiveness of text recognition.

[0090] To address this issue, this embodiment first acquires a first image using an image acquisition device. Based on the first image, a depth map is constructed containing the spatial distance between sampling points on the document surface and the acquisition device, transforming abstract three-dimensional deformation into quantifiable distance data that can be calculated. Then, a first target parameter is extracted based on the depth map. This parameter accurately characterizes the location distribution and magnitude of document deformation through the difference in distance between sampling points, transforming scattered raw distance data into deformation features that can be directly used for analysis, providing a core quantitative basis for text distortion correction. Subsequently, based on this parameter, not only can the precise location and pixel-level correction of text distortion areas be achieved, restoring the true form of the text, but the recognition algorithm can also be adapted according to the degree of deformation, and the text recognition priority can be defined, balancing recognition accuracy and processing efficiency, eliminating the interference of three-dimensional non-planar deformation on text recognition, and improving the text recognition accuracy of various documents.

[0091] As another optional embodiment of this application, refer to Figure 2 This is a flowchart illustrating a text recognition method provided in Embodiment 2 of this application, as shown below. Figure 2 As shown, the method may include, but is not limited to:

[0092] Step S201: Based on the image acquisition device, an image is acquired from the document to be identified to obtain a first image; the image acquisition device includes two devices; the first image includes multiple first image pairs, each first image pair including an image synchronously acquired by the two image acquisition devices at the same time after a light signal with a target encoding format and a target phase is projected onto the surface of the document to be identified; different first image pairs correspond to different target phases.

[0093] In this embodiment, two image acquisition devices can be deployed at different points on the left and right sides of the document to be identified to form a binocular stereo vision perspective, ensuring the synchronization of imaging time and perspective, and ensuring the accuracy of subsequent phase calculation data.

[0094] In this embodiment, a single structured light projection module can be configured and deployed between the two cameras, facing the document. The module has a built-in phase control chip, which can project light signals of the same target encoding format but different target phases (such as sinusoidal stripe light and grating light) in a time-division manner. Only one target phase light signal is projected at a time, completely covering the surface of the document to be identified.

[0095] Each time the structured light projection module projects a light signal of a specific target phase onto the surface of the document to be identified, two image acquisition devices can simultaneously acquire one frame of image. The two synchronized frames form a first image pair. By switching the projection of light signals of different target phases and repeating the acquisition operation, multiple first image pairs corresponding to different phases can be obtained. The imaging scene, document position, and acquisition perspective of all image pairs can remain unchanged, ensuring the consistency of subsequent data processing.

[0096] Step S201 is an implementation of step S101 in Example 1.

[0097] Step S202: Based on the plurality of first image pairs, determine the first phase value and the second phase value of each sampling point on the surface of the document to be identified in the coordinate system of different image acquisition devices.

[0098] In this embodiment, the two image acquisition devices can be regarded as a first image acquisition device and a second image acquisition device. All first images acquired by the first image acquisition device (which can be regarded as a first group of first images) and all first images acquired by the second image acquisition device (which can be regarded as a second group of first images) can be separated from multiple first image pairs. Preprocessing operations are then performed on the two groups of first images: each frame of the first image is sequentially subjected to grayscale processing to remove color interference, and Gaussian filtering for noise reduction to eliminate ambient stray light and camera imaging noise, avoiding invalid interference from affecting the phase extraction accuracy and ensuring the effectiveness and reliability of subsequent phase data.

[0099] Secondly, based on pixel gradient detection and contour segmentation algorithms, effective regions are filtered for all first images of the first image acquisition device and all first images of the second image acquisition device after preprocessing. The document imaging area and background invalid area in a single frame image are accurately distinguished, irrelevant pixel interference is eliminated, and the effective pixel areas of the corresponding document surface in each frame of the first image acquisition device and each frame of the second image acquisition device are determined.

[0100] Finally, a phase-shifting algorithm is used to extract the phase of the two groups of effective pixel regions after the above screening, and the first phase value of each sampling point in the coordinate system of the first image acquisition device is calculated; the second phase value of the sampling point in the coordinate system of the second image acquisition device is calculated simultaneously.

[0101] When the document to be identified is in a flat state without three-dimensional deformation, the phase distribution of the projected light signal is uniform and regular. When the document to be identified has three-dimensional non-planar deformations such as wrinkles, bends, protrusions, and depressions, the light signal deforms synchronously with the shape of the document, and its phase distribution becomes disordered and shifted. The phase value is the quantitative value that records the degree of phase shift. It is correlated with the spatial position of the sampling point on the surface of the document to be identified and can indirectly reflect the spatial distribution characteristics of the sampling point.

[0102] Step S203: Determine the average of the first phase values ​​corresponding to all sampling points on the surface of the document to be identified, as the first reference phase value, and the average of the second phase values ​​corresponding to all sampling points on the surface of the document to be identified, as the second reference phase value.

[0103] In this embodiment, the first phase value of all valid sampling points on the surface of the document to be identified can be traversed and the arithmetic mean can be calculated, which is the first reference phase value; similarly, the second phase value of all valid sampling points can be traversed and the arithmetic mean can be calculated, which is the second reference phase value.

[0104] Step S204: If a target sampling point exists among the sampling points, construct a depth map corresponding to the document to be identified; the difference between the first phase value corresponding to the target sampling point and the first reference phase value is greater than a preset threshold, and / or the difference between the second phase value and the second reference phase value is greater than the preset threshold.

[0105] In this embodiment, the reference phase value corresponds to the ideal phase state when the document is flat and without three-dimensional deformation. The difference between the phase value and the reference phase value essentially quantifies the phase shift amplitude of the optical signal caused by protrusions / depressions / wrinkles on the document surface. When a three-dimensional deformation such as a protrusion, depression, or wrinkle occurs at a certain location in the document, the phase of the optical signal at the sampling point in that area will shift significantly, and the difference between the phase value and the reference phase value will increase accordingly. If the document surface is flat and without deformation, the phase value at the sampling point is basically the same as the reference phase value, and the difference approaches zero. Therefore, by comparing the difference with a preset threshold, it can be directly determined whether there is a three-dimensional non-planar deformation at the corresponding location.

[0106] By calculating the difference between the first phase value and the first reference phase value, and the difference between the second phase value and the second reference phase value of a single sampling point, if the absolute value of at least one difference is greater than a preset threshold (the threshold can be flexibly configured according to the recognition accuracy requirements), the sampling point can be determined as a target sampling point, indicating that there is a deformation such as a protrusion, depression, or wrinkle at that location; if neither of the two differences exceeds the preset threshold, it can be determined that the sampling point does not have a deformation such as a protrusion, depression, or wrinkle.

[0107] The depth map includes the spatial distance between each sampling point on the surface of the document to be identified and the image acquisition device.

[0108] Step S205: Based on the depth map, determine the first target parameter of the document to be identified; the first target parameter characterizes the difference in distance between each sampling point on the surface of the document to be identified and the image acquisition device;

[0109] Step S206: Based at least on the first target parameters, identify the text to be identified in the second image, wherein the second image contains the text to be identified obtained from the document to be identified by the image acquisition device.

[0110] For a detailed description of steps S205-S206, please refer to the relevant description of steps S103-S104 in Example 1, which will not be repeated here.

[0111] In this embodiment, compared to constructing depth maps for all documents, redundant operations in the entire process of depth map acquisition, calculation, and parsing can be avoided in flat document scenarios (i.e., non-deformable document scenarios). This significantly reduces the loss of ineffective computing power and improves the overall text recognition processing speed. Thus, while constructing depth maps to quantify three-dimensional deformation and improve text recognition accuracy, the processing flow is further optimized, balancing recognition accuracy and execution efficiency. This approach is suitable for optical character recognition scenarios with large-scale document processing and high real-time requirements.

[0112] As another optional embodiment of this application, refer to Figure 3 This is a flowchart illustrating a text recognition method provided in Embodiment 3 of this application. This embodiment mainly describes an implementation of steps S101 and S102 in Embodiment 1. Figure 3 As shown, step S101 may include, but is not limited to:

[0113] Step S1011: Based on the image acquisition device, an image is acquired from the document to be identified to obtain a first image; the image acquisition device includes two devices; the first image includes multiple first image pairs, each first image pair including an image synchronously acquired by the two image acquisition devices at the same time after a light signal with a target encoding format and a target phase is projected onto the surface of the document to be identified; different first image pairs correspond to different target phases.

[0114] For a detailed description of step S1011, please refer to the relevant description of step S201 in Example 2, which will not be repeated here.

[0115] Step S102 may include, but is not limited to:

[0116] Step S1021: Based on the plurality of first image pairs, determine the first phase value and the second phase value of each sampling point on the surface of the document to be identified in the coordinate system of different image acquisition devices.

[0117] For a detailed description of step S1021, please refer to the relevant description of step S202 in Example 2, which will not be repeated here.

[0118] Step S1022: Based on the phase difference between the first phase value and the second phase value and the wavelength of the optical signal, determine the distance between the sampling point and the two image acquisition devices.

[0119] Due to the limitations of the structured light imaging principle, the first and second phase values ​​determined by the phase shift algorithm are both restricted to the 0~2π periodic range. These are periodically repeating folded data and cannot directly reflect the true three-dimensional shape of the document. Therefore, the true phase distribution can be restored by unwrapping.

[0120] The process of unwrapping the first phase value and the second phase value may include, but is not limited to:

[0121] Step S11: Calculate the original phase difference by taking the difference between the first phase value and the second phase value corresponding to the same sampling point within the flat reference area of ​​the document to be identified.

[0122] The original phase difference has no topographic deformation interference and only reflects the inherent system deviation of the dual camera production and installation. Using this as the calibration benchmark, the first phase value and the second phase value of all sampling points of the document to be identified are offset compensation to eliminate the systematic error caused by the camera hardware and complete the global phase calibration.

[0123] Step S12: Combining the phase change gradient (i.e., local phase fluctuation slope) between adjacent sampling points, and relying on the objective characteristics of continuous physical morphology and synchronous continuous phase change of the document surface, the first phase value and the second phase value after calibration are unwrapped and calculated to obtain the corrected first phase value and the corrected second phase value.

[0124] Step S12 can avoid the calculation error of conventional unwrapping algorithm in the phase transition region and break the 0~2π period limit.

[0125] Based on the principle of optical phase measurement profilometry, and utilizing the wave properties of light, combined with the physical mapping relationship between the phase change and the distance from the sampling point to the image acquisition device, the distance between each sampling point and the two image acquisition devices is determined by the following formula:

[0126]

[0127] This indicates the distance between the sampling point and the two image acquisition devices; This represents the phase difference between the first phase value and the second phase value corresponding to the sampling point; Indicates the wavelength of the light signal; This represents the constant value of pi.

[0128] Step S1023: Based on the distance between each sampling point and the two image acquisition devices, construct the depth map corresponding to the document to be identified.

[0129] In this embodiment, the distance between each sampling point and the two image acquisition devices can be used as the depth value at the coordinate point corresponding to the first image according to the actual distribution position of the sampling points on the document surface and the pixel row and column order. This ensures that each depth value in the depth map corresponds to a sampling point on the surface of the document to be identified, with no data misalignment or sampling point omission, and complete coverage of the entire document to be identified.

[0130] In this embodiment, after the light signal is projected onto the surface of the document to be identified, the three-dimensional deformation of the document will cause the light signal collected by the dual cameras to have a phase shift. The difference between the first phase value and the second phase value of the same sampling point in the dual camera coordinate system (i.e., the phase difference) can directly quantify the spatial position difference of the sampling point relative to the dual cameras. Moreover, the magnitude of the phase difference is strictly positively correlated with the actual distance from the sampling point to the dual cameras. Therefore, the true distance of the sampling point can be calculated by combining the inherent wavelength of the light signal, and the accurate mapping of phase features to spatial distance can be achieved without the need for redundant intermediate data.

[0131] Based on this principle, a depth map is constructed. On the one hand, this significantly simplifies the depth calculation process, eliminating cumbersome steps such as complex feature matching and error iteration correction. This reduces unnecessary computational power consumption and shortens the time required to generate the depth map, balancing the processing efficiency and real-time performance of text recognition. It is suitable for scenarios such as large-scale document processing and on-site instant recognition. On the other hand, phase difference is highly sensitive to minute deformations. Combined with the previous phase unwrapping correction operation, it can effectively avoid interference from camera system deviations, phase period folding, and ambient stray light. Compared with traditional methods, it significantly improves the accuracy of distance measurement, thereby constructing a high-quality depth map that closely matches the true three-dimensional shape of the document. This provides a reliable data source for subsequent deformation reduction and text distortion correction, alleviates the text recognition distortion problem caused by three-dimensional non-planar deformation, and enhances the recognition accuracy in deformation scenarios.

[0132] In this embodiment, combined with Figure 4 The effect of constructing a depth map based on phase difference is explained. For example... Figure 4 As shown, after a light signal is projected onto the surface of the document to be identified, the light pattern formed on the surface of the document undergoes corresponding three-dimensional deformation and phase shift due to the document's true three-dimensional posture (including slight wrinkles, edge warping, local protrusions, and other deformations). By substituting the phase difference corresponding to this deformation into the relational formula to solve for the spatial distance, the generated depth map can accurately restore the true three-dimensional contour reflected by the light pattern in Figure 4, realizing a complete transformation from visible light deformation patterns to quantifiable depth data. This distance calculation link based on phase difference is simpler and the data is more stable. The generated depth map truly matches the actual three-dimensional posture of the document shown in Figure 4, providing high-fidelity and highly consistent morphological data for subsequent extraction of the first target parameters and text distortion correction, thereby significantly improving the accuracy of document text recognition with three-dimensional non-planar deformation.

[0133] As another optional embodiment of this application, refer to Figure 5 This is a flowchart illustrating a text recognition method provided in Embodiment 4 of this application. This embodiment mainly describes one implementation of step S103 in Embodiment 3. Figure 5 As shown, the specific steps may include, but are not limited to, the following:

[0134] Step S1031: Determine the average of the first phase values ​​corresponding to all sampling points on the surface of the document to be identified, as the first reference phase value, and the average of the second phase values ​​corresponding to all sampling points on the surface of the document to be identified, as the second reference phase value.

[0135] In this embodiment, the arithmetic mean of the first phase values ​​of all sampling points on the surface of the document to be identified in the coordinate system of the first image acquisition device can be calculated, and this mean value can be defined as the first reference phase value.

[0136] Since a large proportion of the document to be identified is flat and without 3D deformation, the phase values ​​of the sampling points in these areas are minimally affected by surface deformation, and their values ​​are close to the standard phase values ​​when the document is flat under binocular viewing. By calculating the arithmetic mean of the first phase values ​​of all sampling points, the phase shift caused by local wrinkles, protrusions, and other deformed areas can be smoothed out using the characteristics of the global mean. This not only objectively reflects the ideal phase level of the document as a whole under deformation-free conditions, but also effectively avoids interference from local abnormal sampling points and single-point noise on the baseline data, thus ensuring that the obtained mean, as the first baseline phase value, has stability, objectivity, and universality.

[0137] Similarly, the arithmetic mean of the second phase values ​​of all sampling points in the coordinate system of the second image acquisition device is calculated to obtain the second reference phase value.

[0138] Step S1032: Based on the reference phase difference between the first reference phase value and the second reference phase value and the wavelength of the optical signal, determine the reference distance between the document to be identified and the two image acquisition devices.

[0139] In this embodiment, the difference between the first reference phase value and the second reference phase value can be calculated to obtain the reference phase difference. This reference phase difference can characterize the inherent phase difference value acquired by the two image acquisition devices from two perspectives under the ideal state where the document to be identified is flat and without deformation or local shape interference.

[0140] In this embodiment, the reference distance between the document to be identified and the two image acquisition devices can be determined based on the distance relationship determined in Embodiment 3.

[0141] The baseline distance is the standard reference distance between the document to be identified and the dual image acquisition device when the document is completely flat and without any three-dimensional deformation.

[0142] Step S1033: Determine the distance difference between each sampling point in the depth map and the distance between the two image acquisition devices and the reference distance, and use it as the first target parameter.

[0143] In this embodiment, the method for determining the distance between each sampling point in the depth map and the two image acquisition devices can be found in the relevant description of step S1022 in Embodiment 3, and will not be repeated here.

[0144] When the distance difference is positive, the first target parameter can characterize the bulge of the corresponding sampling point relative to the reference surface. The larger the value, the more severe the bulge.

[0145] When the distance difference is negative, the first target parameter can characterize the depression / wrinkle of the corresponding sampling point relative to the reference surface. The larger the absolute value, the more severe the depression and wrinkle.

[0146] When the distance difference approaches zero, the first target parameter can characterize the flatness and lack of deformation of the document in that region.

[0147] In this embodiment, by determining the reference distance through the reference phase difference and then determining the distance difference based on the reference distance, interference from non-deformation factors such as dual-camera system errors and optical path deviations can be eliminated. This allows the first target parameter to be purified to represent the true local deformation of the document, improving the accuracy of deformation area positioning and amplitude quantization. This provides a more reliable quantitative basis for subsequent pixel-level correction of text distortion in the second image and differentiated scheduling of recognition algorithms, further improving the text recognition accuracy of deformed documents.

[0148] As another optional embodiment of this application, refer to Figure 6 This is a flowchart illustrating a text recognition method provided in Embodiment 5 of this application. This embodiment mainly describes one implementation of step S104 in Embodiment 1. Figure 6 As shown, the specific steps may include, but are not limited to, the following:

[0149] Step S1041: Select the target region from the first image whose first target parameters satisfy the set distance threshold.

[0150] A single sampling point on the surface of the document to be identified can correspond to a single pixel position in the first image, or a local pixel region composed of multiple adjacent pixels.

[0151] In this embodiment, all sampling points on the surface of the document to be identified can be traversed, and the first target parameter of each sampling point can be compared with the set distance threshold one by one. If the absolute value of the first target parameter of a single sampling point is greater than the set distance threshold, the single pixel or local multi-pixel region corresponding to the sampling point is determined as the target region corresponding to the single point sampling.

[0152] If the absolute value of the first target parameter of the sampling point is not greater than the set distance threshold, the corresponding pixel area is determined to be a flat and deformation-free area, and no subsequent distortion correction process is required.

[0153] The distance threshold can be flexibly configured according to the actual recognition accuracy and the degree of document deformation, and is not limited in this application.

[0154] Step S1042: Based on the first target parameters corresponding to each pixel in the target region, determine the distortion parameters of the pixel; the distortion parameters characterize the degree of distortion of the pixel.

[0155] In this embodiment, the distortion parameters of each pixel within the target area can be determined based on the positive or negative attribute and absolute value of the first target parameter corresponding to each pixel.

[0156] When the first target parameter is positive, it indicates that the pixel is in a raised area, and the distortion parameter of the pixel is positive. The magnitude of this positive value corresponds to the degree of shrinkage distortion of the text at that pixel. That is, the larger the absolute value of the first target parameter, the higher the convexity, the more severe the shrinkage distortion of the text, and correspondingly, the larger the value of the distortion parameter.

[0157] Conversely, when the first target parameter is negative, it indicates that the pixel is in a concave or wrinkled area, and the distortion parameter of the pixel is negative. The absolute value of this negative value corresponds to the degree of text stretching distortion at that pixel. That is, the larger the absolute value of the first target parameter, the greater the concavity, the more severe the text stretching distortion, and the larger the absolute value of the distortion parameter.

[0158] Step S1043: Based on the distortion parameters, adjust the pixels to obtain the adjusted second image.

[0159] In this embodiment, for the raised area (the distortion parameter is a positive value), the text in this area exhibits shrinkage and stroke convergence distortion due to perspective. According to the value of the distortion parameter, the pixels are appropriately stretched to compensate. The larger the distortion parameter, the greater the stretching amplitude, which offsets the text shrinkage distortion caused by the raised area.

[0160] For concave / wrinkled areas (where the distortion parameter is negative), the text in these areas exhibits stretching and stroke dispersion distortion. Based on the absolute value of the distortion parameter, appropriate compression compensation is applied to the pixels. The larger the absolute value of the distortion parameter, the greater the compression, thus offsetting the text stretching distortion caused by the concavity.

[0161] For the transition zone at the edge of the target area, a smooth adjustment can be made by combining the gradient change of the distortion parameters to avoid pixel breaks and text fragmentation problems after correction.

[0162] After completing pixel adjustments for all target areas, the flattened area and the corrected target area can be integrated to obtain the adjusted second image. This image can eliminate text distortion caused by three-dimensional deformation and restore the standard text layout, stroke outlines, and proportions.

[0163] Step S1044: Recognize the text to be recognized in the adjusted second image.

[0164] In this embodiment, the adjusted second image can first be preprocessed by grayscale conversion, Gaussian noise reduction, and binarization to enhance the contrast between text strokes and document background and remove image noise interference.

[0165] Next, according to the text layout direction of the image, the preprocessed second image is segmented into text lines and characters, and the contours, strokes and other features of individual characters are extracted.

[0166] The features are then input into the OCR recognition model to complete character matching and semantic verification, and output the text recognition result.

[0167] In this embodiment, by selecting the target area whose first target parameter meets the set distance threshold from the first image, redundant correction operations in flat areas can be eliminated, reducing unnecessary computing power consumption and balancing processing efficiency and correction targeting.

[0168] By determining the distortion parameters of each pixel based on the first target parameters corresponding to each pixel in the target region, a reliable quantitative basis can be provided for subsequent pixel correction, avoiding correction deviations.

[0169] Based on the aforementioned distortion parameters, the raised areas are stretched and the recessed / wrinkled areas are compressed to achieve pixel-level distortion compensation, restoring the true strokes and layout of the text and completely eliminating text distortion and proportional loss caused by wrinkles, curls, and unevenness. Finally, by performing OCR recognition on the corrected image, character misjudgment and missed detection can be avoided, further improving recognition accuracy.

[0170] As another optional embodiment of this application, refer to Figure 7 This is a flowchart illustrating a text recognition method provided in Embodiment 6 of this application. Figure 7 As shown, the method may include, but is not limited to, the following steps:

[0171] Step S301: The image acquisition device acquires an image of the document to be recognized to obtain a first image.

[0172] Step S302: Based on the first image, construct a depth map corresponding to the document to be identified; the depth map includes the spatial distance between each sampling point on the surface of the document to be identified and the image acquisition device.

[0173] Step S303: Based on the depth map, determine the first target parameter of the document to be identified; the first target parameter characterizes the difference in distance between each sampling point on the surface of the document to be identified and the image acquisition device.

[0174] For a detailed description of steps S301-S303, please refer to the relevant description of steps S101-S103 in Example 1, which will not be repeated here.

[0175] Step S304: Based on the depth map, determine the second target parameter; the second target parameter represents the angle between the normal vector of the plane of the document to be identified and the vertical direction of the image acquisition device.

[0176] In this embodiment, sampling points in flat areas of the document to be identified (i.e., sampling points with no local deformation where the first target parameter is close to 0) can be filtered from the depth map to remove interference points in wrinkled or uneven areas.

[0177] Based on the three-dimensional spatial coordinates of the sampling points in the flat area, the equation of the reference plane of the document to be identified is obtained by fitting, and then the normal vector of the reference plane is solved.

[0178] The vertical direction of the image acquisition device (vertical direction of the optical axis / vertical direction of the camera imaging plane) can be set as the standard reference vector, and the angle between the normal vector of the reference plane and the standard reference vector can be calculated, which is the tilt angle.

[0179] The larger the tilt angle, the more severe the overall tilt of the document to be recognized, and the more obvious the global perspective distortion. When the tilt angle is 0°, it means that the document to be recognized is in a horizontal standard posture with no tilt distortion.

[0180] Step S305: Based on the depth map, determine the third target parameter; the third target parameter represents the horizontal rotation angle of the document to be identified relative to the vertical direction of the image acquisition device.

[0181] In this embodiment, based on the reference plane (i.e. the plane corresponding to the reference plane equation) fitted in step S304, feature clues such as document edges and text line directions can be extracted, and the horizontal placement axis of the document can be determined by combining the pixel coordinates and spatial position mapping relationship of the sampling points in the depth map.

[0182] Using the vertical direction of the image acquisition device as the reference axis, calculate the horizontal angle between the horizontal placement axis of the document and the reference axis; this is the rotation angle.

[0183] The larger the rotation angle, the greater the degree of horizontal rotation distortion of the document to be recognized; when the rotation angle is 0°, it means that the text lines of the document to be recognized are parallel to the imaging border of the image acquisition device, and there is no horizontal rotation distortion.

[0184] Step S306: Based on at least one of the second target parameter and the third target parameter and the first target parameter, identify the text to be identified in the second image.

[0185] In this embodiment, global perspective correction can be performed on the second image based on the second target parameters. Specifically, according to the tilt angle, reverse pitch and lateral tilt compensation can be performed on the image to eliminate global perspective stretching and compression distortion caused by the overall tilt of the document, and restore the image layout of the document in a standard horizontal posture.

[0186] In this embodiment, the second image can be horizontally rotated and corrected according to the third target parameter. Specifically, the image can be rotated in the opposite direction according to the rotation angle to make the text lines parallel to the image border, thus correcting problems such as skewed text layout and incorrect character orientation.

[0187] In this embodiment, pixel distortion correction can be performed based on the first target parameter. Specifically, the correction logic of Embodiment 5 can be used to stretch and compensate pixels in raised areas and compress and compensate pixels in recessed / wrinkled areas to eliminate the problems of text stroke distortion and contour distortion caused by local micro-deformation.

[0188] The correction process can be flexibly adapted to different scenarios, and the execution order of various correction steps is not strictly limited and can be adjusted according to the actual distortion situation. For example, when only local deformation exists, the first target parameter alone can be used to complete the local pixel distortion correction; when only global attitude shift exists, a single parameter or a combination of parameters from tilt angle and rotation angle can be selected to complete the global attitude correction; when local 3D deformation and global attitude shift exist simultaneously, global perspective correction, horizontal rotation correction, and local pixel distortion correction can be performed as needed to eliminate various distortion interferences.

[0189] After correction, the text to be recognized in the corrected second image can be identified. The recognition process can be found in the relevant description of step S1044 in Example 5, and will not be repeated here.

[0190] Step S306 is an implementation of step S104 in Example 1.

[0191] In this embodiment, by determining the second and third target parameters, the overall tilt and horizontal rotation of the document can be quantitatively characterized, respectively. Combined with the quantification of local deformation by the first target parameter, various distortion conditions of the document can be comprehensively covered. Based on these parameters, global perspective correction, horizontal rotation correction, and pixel distortion correction are performed on the second image, which can eliminate text distortion caused by global perspective distortion, horizontal rotation skew, and local micro-deformation, restoring the true shape and layout of the text. Furthermore, by recognizing the corrected image, the accuracy of text recognition can be significantly improved, meeting the requirements of high-precision text recognition.

[0192] Furthermore, the correction process is flexible and adaptable to different distortion scenarios, and different parameters can be used alone or in combination for correction, which improves the targeting and efficiency of the treatment.

[0193] As another optional embodiment of this application, a text recognition method provided in embodiment 7 of this application may include, but is not limited to, the following steps:

[0194] Step S401: The image acquisition device acquires an image of the document to be recognized to obtain a first image.

[0195] Step S402: Based on the first image, construct a depth map corresponding to the document to be identified; the depth map includes the spatial distance between each sampling point on the surface of the document to be identified and the image acquisition device.

[0196] Step S403: Based on the depth map, determine the first target parameter of the document to be identified; the first target parameter characterizes the difference in distance between each sampling point on the surface of the document to be identified and the image acquisition device.

[0197] For detailed procedures of steps S401-S403, please refer to the relevant description of steps S101-S103 in Example 1, which will not be repeated here.

[0198] Step S404: Based on the depth map, determine the second target parameter; the second target parameter represents the angle between the normal vector of the plane of the document to be identified and the vertical direction of the image acquisition device.

[0199] For a detailed description of step S404, please refer to the relevant description of step S304 in Example 6, which will not be repeated here.

[0200] Step S405: Based on at least one of the first target parameter and the second target parameter, switch the image acquisition device from a first working state to a second working state; the second working state corresponds to the three-dimensional pose of the document to be identified.

[0201] The first working state may include, but is not limited to: the initial working state preset when the image acquisition device leaves the factory, or the initial working state that is enabled by default under normal operation.

[0202] In the first operating state, the parameters used by the image acquisition device for image acquisition are mainly applicable to relatively ideal document scenarios, such as a flat document surface, no obvious tilt angle, and no local deformation. However, it should be noted that these parameters are not only applicable to ideal scenarios, but only relatively speaking, they can achieve better basic acquisition results under such ideal conditions.

[0203] Limited by this design concept, the first working state often does not perform well when collecting documents in non-standard scenarios such as wrinkles, unevenness, and tilt.

[0204] In the second working state, the parameters used by the image acquisition device for image acquisition can include parameters adjusted based on the current degree of document deformation and its specific posture in space. Through this targeted optimization, the second working state can effectively improve various defects that occur in non-standard documents during the imaging process, such as image blurring and distortion, thereby obtaining higher quality image acquisition results.

[0205] Step S406: Acquire a second image based on the image acquisition device in the second working state.

[0206] By acquiring a second image based on the image acquisition device in the second working state, various imaging defects caused by document deformation and posture shift can be avoided, and the imaging quality is greatly improved, providing a clear, regular, and high-quality second image without obvious distortion for subsequent text recognition.

[0207] Step S407: Based at least on the first target parameters, identify the text to be identified in the second image.

[0208] The detailed process of step S407 can be described with reference to the relevant description of step S306 in embodiment 6, and will not be repeated here.

[0209] In this embodiment, based on at least one of the first target parameter and the second target parameter, the image acquisition device is switched from a first working state to a second working state. In the second working state, the image acquisition device can take pictures according to the optimized parameters. At this time, the acquired image has successfully avoided most of the distortion problems caused by document tilt and local deformation, and the imaging quality is significantly improved, laying a good foundation for subsequent processing.

[0210] To further pursue perfect image presentation and improve the accuracy of text recognition, the high-quality second image acquired is processed again using the first target parameters to correct and improve any minor distortions that may remain in the image, thereby further improving the image quality.

[0211] Images that have undergone dual optimization processing can minimize the interference of imaging defects on text recognition, thereby significantly improving the accuracy of text recognition.

[0212] As another optional embodiment of this application, a text recognition method provided in Embodiment 8 of this application, this embodiment is mainly an implementation of step S405 in Embodiment 7, and may specifically include, but is not limited to, at least one of the following:

[0213] Step S4051: Based on the first target parameter, identify the raised area on the surface of the document to be identified, and enhance the supplementary light intensity of the image acquisition device corresponding to the raised area.

[0214] In this embodiment, the first target parameter corresponding to each sampling point can be traversed, and the target sampling points with positive first target parameters and absolute values ​​exceeding the preset deformation threshold can be selected, and the document area corresponding to the target sampling point can be determined as the raised area.

[0215] Image acquisition devices can be equipped with zoned adjustable fill light modules (such as zoned fill lights or matrix fill light components) to directionally enhance the fill light intensity of the located raised areas based on their coordinates.

[0216] Flat and recessed areas can maintain the default fill light parameters, achieving differentiated fill light adaptation.

[0217] In this embodiment, the raised areas of the document are prone to uneven local reflections, shadow occlusion, and dark strokes because their spatial height is higher than the reference plane. Simple global illumination cannot solve the problem of local brightness differences. By directionally enhancing the illumination of the raised areas, local shadows can be eliminated, the brightness of the image can be balanced, the clarity and contrast of the text strokes in the raised areas can be enhanced, the loss of text details due to lighting defects can be avoided, and the problem of abnormal imaging brightness caused by local deformation can be solved.

[0218] Step S4052: Based on the second target parameter, adjust the pitch angle of the image acquisition device so that the shooting direction of the image acquisition device is facing the plane of the document to be identified.

[0219] In this embodiment, the global tilt direction (pitch, yaw) and tilt angle of the document can be determined based on the second target parameter, thus clarifying the deviation direction and magnitude of the shooting angle.

[0220] The image acquisition device can be equipped with an angle-adjustable gimbal or a built-in angle adjustment component. Based on the tilt angle and direction of the second target parameter, the camera's pitch angle and tilt angle are finely adjusted in the opposite direction until the shooting direction is perpendicular to the document reference plane, thereby achieving direct calibration of the shooting perspective and eliminating perspective offset.

[0221] In this embodiment, global document tilt can cause perspective distortion, text stretching and compression, and skewed layout in the captured image, making pixel distortion likely during subsequent image correction. By adjusting the shooting angle to achieve a direct view, the perspective distortion caused by global tilt is offset at the source, ensuring the captured image remains regular and the text proportions are normal, reducing the difficulty of subsequent image correction and avoiding secondary distortion.

[0222] As another optional embodiment of this application, a text recognition method is provided in Embodiment 9 of this application. This embodiment is mainly an implementation of the optical signal in Embodiment 2. The optical signal can be, but is not limited to, being generated in the following ways:

[0223] Step S21: The document to be identified is acquired using the image acquisition device to obtain a third image.

[0224] In this embodiment, a third image can be obtained based on the image acquisition device, which is not limited to capturing the overall visual image of the document to be identified. This image does not need to focus on text details, but focuses on preserving global features such as the material texture, layout, background color depth, and font density of the document, providing a visual basis for subsequent document type determination.

[0225] The third image can be a low-resolution preview image, which takes less time to acquire and has a small data volume, without increasing the overall processing latency, thus balancing real-time performance and feature extraction requirements.

[0226] Step S22: Based on the third image, identify the type of the document to be identified.

[0227] In this embodiment, grayscale analysis, texture recognition, and edge detection can be performed on the third image to extract key features of the document to be identified, which may include dimensional features such as material reflectivity, paper thickness, background color uniformity, text layout density, whether it contains tables / barcodes / stamps, and whether it is colored paper.

[0228] The extracted features are matched with a preset document type library to determine the type of document to be identified. Common types may include, but are not limited to: ordinary matte paper documents, glossy coated paper documents, thin translucent paper, thick hard cards, high-density text documents, sparse layout documents, invoice documents with complex stamps / barcodes, wrinkled and aged documents, etc.

[0229] Step S23: Based on the type of the document to be identified, generate an optical signal with a target encoding format; the target encoding format matches the type of the document to be identified.

[0230] In this embodiment, a DLP micromirror projector or a laser interferometer can be used as the structure diagram light projection module to generate a high-contrast, interference-resistant light signal.

[0231] Depending on the type of document to be identified, the structure map optical projection module can switch between optical signal encoding and wavelength modes to avoid material interference. For example, for thick hard cards or neatly formatted tickets, such as… Figure 8 As shown, it can project a checkerboard structured light pattern. The checkerboard grid is regular and the corner features are clear, which makes it easy to quickly locate the phase shift of the sampling point. It can accurately capture subtle deformations such as card warping and local protrusions, and simplify the phase matching process and improve the efficiency of depth measurement.

[0232] For glossy / reflective documents (such as plastic covers, coated paper forms), such as Figure 9As shown, 940nm infrared speckle can be used as the light signal. The infrared band is used to replace visible light, avoiding the problems of pattern overexposure and phase shift caused by visible light reflection, and ensuring that the light pattern is clear and complete.

[0233] For transparent / coated documents (such as documents covered with plastic film), such as Figure 10 As shown, polarization-coded light patterns can be selected, and polarizers can be used to filter out ambient stray light and background transmission interference, eliminate signal distortion caused by transparent materials, and ensure the accuracy of phase extraction.

[0234] For ordinary matte paper documents, standard visible light stripe / spot pattern can be used, which balances projection efficiency and shape reproduction accuracy, and is suitable for general office document scenarios.

[0235] In this embodiment, the type of the document to be identified is obtained by acquiring a low-resolution third image of the document to be identified. Then, a matching target encoding format light signal is generated according to the document type. For example, a checkerboard light pattern is projected for thick hard cards, and infrared speckle is selected for highly reflective documents. This can avoid the interference of different materials on the light signal, ensure that the light pattern is clear and complete, improve the accuracy of phase extraction, and thus improve the accuracy of depth measurement. This provides a high-quality foundation for subsequent text recognition and effectively improves the accuracy of text recognition.

[0236] Next, combine Figure 11 The hardware coordination and execution process of the text recognition method in this application are described. Figure 11 As shown, the text recognition method can operate in conjunction with four main modules, which may include, but are not limited to:

[0237] Structured light projection module: It can project structured light patterns (i.e., light signals) in target encoding format using a DLP micromirror projector.

[0238] The structure map light projection module can switch between speckle, stripe, checkerboard and other light patterns according to the type of document to be identified (e.g., different document materials) to achieve precise matching between light signals and document materials.

[0239] Image acquisition module: It can deploy two high frame rate cameras to form a binocular stereo vision acquisition unit, which is used to capture the distortion pattern on the surface of the document to be identified, and acquire multiple first image pairs to provide raw imaging data for subsequent phase calculation.

[0240] The 3D reconstruction algorithm module receives multiple first image pairs from the image acquisition module. It performs phase calculation and phase difference calculation based on a phase-shifting algorithm, and calculates the spatial distance between sampling points using optical phase measurement profilometry, thereby generating a depth map (i.e., depth mapping) corresponding to the document to be identified. It can also perform 3D pose calculation and output core data such as first target parameters and second target parameters.

[0241] The 3D enhancement recognition execution module (i.e., the execution end (based on 3D information enhancement)) can perform pixel distortion correction based on the first target parameters output by the 3D reconstruction algorithm module; and / or, perform global perspective correction on the second image based on the second target parameters; and / or, perform horizontal rotation correction on the second image based on the third target parameters.

[0242] Of course, the second image can be an image acquired after targeted supplemental lighting is applied to the raised areas of the surface of the document to be identified, based on the first target parameters.

[0243] After correcting the second image, OCR recognition can be performed on the text to be recognized in the corrected second image.

[0244] The text recognition device provided in this application will be described below. The text recognition device described below can be referred to in correspondence with the text recognition method described above.

[0245] Text recognition device, including:

[0246] The acquisition module is used to acquire images of the document to be recognized based on the image acquisition device, and obtain the first image.

[0247] A construction module is used to construct a depth map corresponding to the document to be identified based on the first image; the depth map includes the spatial distance between each sampling point on the surface of the document to be identified and the image acquisition device.

[0248] The first determining module is used to determine a first target parameter of the document to be identified based on the depth map; the first target parameter characterizes the difference in distance between each sampling point on the surface of the document to be identified and the image acquisition device.

[0249] The recognition module is used to recognize the text to be recognized in the second image based at least on the first target parameters, wherein the second image contains the text to be recognized obtained from the document to be recognized by the image acquisition device.

[0250] The image acquisition device may include two; the first image includes multiple first image pairs, each first image pair including images synchronously acquired by two image acquisition devices at the same time after a light signal with a target encoding format and a target phase is projected onto the surface of the document to be identified; different first image pairs correspond to different target phases.

[0251] The text recognition device may also include:

[0252] The second determining module is used to determine, based on the plurality of first image pairs, the first phase value and the second phase value of each sampling point on the surface of the document to be identified in the coordinate system of different image acquisition devices.

[0253] The third determining module is used to determine the average of the first phase values ​​corresponding to all sampling points on the surface of the document to be identified, as the first reference phase value, and the average of the second phase values ​​corresponding to all sampling points on the surface of the document to be identified, as the second reference phase value.

[0254] The construction module can be specifically used to: if a target sampling point exists among the sampling points, construct a depth map corresponding to the document to be identified; the difference between the first phase value corresponding to the target sampling point and the first reference phase value is greater than a preset threshold, and / or the difference between the second phase value and the second reference phase value is greater than the preset threshold.

[0255] The image acquisition device may include two; the first image includes multiple first image pairs, each first image pair comprising images synchronously acquired by two image acquisition devices at the same time after a light signal with a target encoding format and target phase is projected onto the surface of the document to be identified; different first image pairs correspond to different target phases; the construction module can specifically be used for:

[0256] Based on the plurality of first image pairs, determine the first phase value and the second phase value of each sampling point on the surface of the document to be identified in the coordinate system of different image acquisition devices;

[0257] Based on the phase difference between the first phase value and the second phase value and the wavelength of the optical signal, the distance between the sampling point and the two image acquisition devices is determined;

[0258] A depth map corresponding to the document to be identified is constructed based on the distance between each sampling point and the two image acquisition devices.

[0259] In this embodiment, the first determining module can be specifically used for:

[0260] The mean value of the first phase value corresponding to all sampling points on the surface of the document to be identified is determined as the first reference phase value, and the mean value of the second phase value corresponding to all sampling points on the surface of the document to be identified is determined as the second reference phase value.

[0261] Based on the reference phase difference between the first reference phase value and the second reference phase value and the wavelength of the optical signal, the reference distance between the document to be identified and the two image acquisition devices is determined;

[0262] The distance difference between each sampling point in the depth map and the two image acquisition devices and the reference distance is determined as the first target parameter.

[0263] The recognition module can be specifically used for:

[0264] Select a target region from the first image whose first target parameters satisfy a set distance threshold;

[0265] Based on the first target parameters corresponding to each pixel in the target region, the distortion parameters of the pixel are determined; the distortion parameters characterize the degree of distortion of the pixel.

[0266] Based on the distortion parameters, the pixels are adjusted to obtain the adjusted second image;

[0267] The text to be identified in the adjusted second image is then identified.

[0268] In this embodiment, the text recognition device may further include:

[0269] The fourth determining module is used to determine a second target parameter based on the depth map; the second target parameter represents the angle between the normal vector of the plane of the document to be identified and the vertical direction of the image acquisition device.

[0270] The fifth determining module is used to determine a third target parameter based on the depth map; the third target parameter represents the horizontal rotation angle of the document to be identified relative to the vertical direction of the image acquisition device.

[0271] The recognition module can be specifically used for:

[0272] Based on at least one of the second target parameter and the third target parameter, as well as the first target parameter, the text to be identified in the second image is identified.

[0273] The text recognition device may further include:

[0274] The switching module is used to switch the image acquisition device from a first working state to a second working state based on at least one of the first target parameter and the second target parameter; the second working state corresponds to the three-dimensional pose of the document to be identified.

[0275] The acquisition module can also be used to acquire a second image based on the image acquisition device in the second working state.

[0276] The switching module switches the image acquisition device from a first operating state to a second operating state based on at least one of the first target parameter and the second target parameter, and may include at least one of the following:

[0277] Based on the first target parameter, the raised areas on the surface of the document to be identified are identified, and the supplementary light intensity of the image acquisition device corresponding to the raised areas is enhanced;

[0278] Based on the second target parameter, the pitch angle of the image acquisition device is adjusted so that the shooting direction of the image acquisition device is facing the plane of the document to be identified.

[0279] The text recognition device may also include:

[0280] Generate modules for:

[0281] The image acquisition device acquires the document to be identified to obtain a third image;

[0282] Based on the third image, identify the type of the document to be identified;

[0283] Based on the type of the document to be identified, an optical signal with a target encoding format is generated; the target encoding format matches the type of the document to be identified.

[0284] In another embodiment of this application, an electronic device is provided, which may include:

[0285] An image acquisition device is used to acquire images of a document to be recognized, obtaining a first image and a second image; the second image contains the text to be recognized.

[0286] Processor, used for:

[0287] Based on the first image, a depth map corresponding to the document to be identified is constructed; the depth map includes the spatial distance between each sampling point on the surface of the document to be identified and the image acquisition device;

[0288] Based on the depth map, a first target parameter of the document to be identified is determined; the first target parameter characterizes the difference in distance between each sampling point on the surface of the document to be identified and the image acquisition device.

[0289] Based at least on the first target parameters, the text to be identified in the second image is identified.

[0290] It should also be noted that the device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate, and the components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. In addition, in the device embodiment drawings provided in this application, the connection relationship between modules indicates that they have a communication connection, which can be implemented as one or more communication buses or signal lines.

[0291] Through the above description of the embodiments, those skilled in the art can clearly understand that this application can be implemented by means of software plus necessary general-purpose hardware, or it can be implemented by special-purpose hardware including application-specific integrated circuits, special-purpose CPUs, special-purpose memory, special-purpose components, etc. Generally, any function performed by a computer program can be easily implemented by corresponding hardware, and the specific hardware structure used to implement the same function can also be diverse, such as analog circuits, digital circuits, or special-purpose circuits. However, for this application, software program implementation is more often the preferred implementation method. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a readable storage medium, such as a computer floppy disk, USB flash drive, mobile hard disk, ROM, RAM, magnetic disk, or optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, training equipment, or network device, etc.) to execute the methods described in the various embodiments of this application.

[0292] In the above embodiments, implementation can be achieved, in whole or in part, through software, hardware, firmware, or any combination thereof. When implemented in software, it can be implemented, in whole or in part, as a computer program product.

[0293] The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of this application are generated. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions may be transmitted from one website, computer, training device, or data center to another website, computer, training device, or data center via wired (e.g., coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that a computer can store or a data storage device such as a training device or data center that integrates one or more available media. The available media may be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., DVDs), or semiconductor media (e.g., solid-state drives (SSDs)).

Claims

1. A text recognition method, comprising: The image acquisition device acquires an image of the document to be recognized, thus obtaining a first image. Based on the first image, a depth map corresponding to the document to be identified is constructed; The depth map includes the spatial distance between each sampling point on the surface of the document to be identified and the image acquisition device; Based on the depth map, the first target parameters of the document to be identified are determined; The first target parameter characterizes the difference in distance between each sampling point on the surface of the document to be identified and the image acquisition device; Based at least on the first target parameter, the text to be identified in the second image is identified, and the second image contains the text to be identified obtained from the document to be identified by the image acquisition device.

2. The text recognition method according to claim 1, wherein the image acquisition device comprises two; the first image comprises a plurality of first image pairs, the first image pair comprising images synchronously acquired by the two image acquisition devices at the same time after a light signal having a target encoding format and a target phase is projected onto the surface of the document to be recognized; The target phase is different for each of the first images; Before constructing the depth map corresponding to the document to be identified, the method further includes: Based on the plurality of first image pairs, determine the first phase value and the second phase value of each sampling point on the surface of the document to be identified in the coordinate system of different image acquisition devices; The mean value of the first phase value corresponding to all sampling points on the surface of the document to be identified is determined as the first reference phase value, and the mean value of the second phase value corresponding to all sampling points on the surface of the document to be identified is determined as the second reference phase value. If a target sampling point exists among the sampling points, a depth map corresponding to the document to be identified is constructed; the difference between the first phase value corresponding to the target sampling point and the first reference phase value is greater than a preset threshold, and / or the difference between the second phase value and the second reference phase value is greater than the preset threshold.

3. The text recognition method according to claim 1, wherein the image acquisition device comprises two; the first image comprises a plurality of first image pairs, and the first image pair comprises images synchronously acquired by the two image acquisition devices at the same time after a light signal with a target encoding format and a target phase is projected onto the surface of the document to be recognized; The target phase is different for each of the first images; The step of constructing a depth map corresponding to the document to be identified based on the first image includes: Based on the plurality of first image pairs, determine the first phase value and the second phase value of each sampling point on the surface of the document to be identified in the coordinate system of different image acquisition devices; Based on the phase difference between the first phase value and the second phase value and the wavelength of the optical signal, the distance between the sampling point and the two image acquisition devices is determined; A depth map corresponding to the document to be identified is constructed based on the distance between each sampling point and the two image acquisition devices.

4. The text recognition method according to claim 3, wherein determining the first target parameter of the document to be recognized based on the depth map includes: The mean value of the first phase value corresponding to all sampling points on the surface of the document to be identified is determined as the first reference phase value, and the mean value of the second phase value corresponding to all sampling points on the surface of the document to be identified is determined as the second reference phase value. Based on the reference phase difference between the first reference phase value and the second reference phase value and the wavelength of the optical signal, the reference distance between the document to be identified and the two image acquisition devices is determined; The distance difference between each sampling point in the depth map and the two image acquisition devices and the reference distance is determined as the first target parameter.

5. The text recognition method according to claim 1, wherein recognizing the text to be recognized in the second image based at least on the first target parameter includes: Select a target region from the first image whose first target parameter satisfies a set distance threshold. Based on the first target parameters corresponding to each pixel in the target region, the distortion parameters of the pixel are determined; the distortion parameters characterize the degree of distortion of the pixel. Based on the distortion parameters, the pixels are adjusted to obtain the adjusted second image; The text to be identified in the adjusted second image is then identified.

6. The text recognition method according to claim 1, further comprising: Based on the depth map, determine the parameters of the second target; The second target parameter represents the angle between the normal vector of the plane of the document to be identified and the vertical direction of the image acquisition device; Based on the depth map, a third target parameter is determined; the third target parameter represents the horizontal rotation angle of the document to be identified relative to the vertical direction of the image acquisition device. Based at least on the first target parameters, the text to be identified in the second image is identified, including: Based on at least one of the second target parameter and the third target parameter, as well as the first target parameter, the text to be identified in the second image is identified.

7. The text recognition method according to claim 1, further comprising: Based on the depth map, determine the parameters of the second target; The second target parameter represents the angle between the normal vector of the plane of the document to be identified and the vertical direction of the image acquisition device; Based on at least one of the first target parameter and the second target parameter, the image acquisition device is switched from a first working state to a second working state; the second working state corresponds to the three-dimensional pose of the document to be identified. The second image is acquired based on the image acquisition device in the second working state.

8. The text recognition method according to claim 7, wherein switching the image acquisition device from a first working state to a second working state based on at least one of the first target parameter and the second target parameter includes at least one of the following: Based on the first target parameter, the raised areas on the surface of the document to be identified are identified, and the supplementary light intensity of the image acquisition device corresponding to the raised areas is enhanced; Based on the second target parameter, the pitch angle of the image acquisition device is adjusted so that the shooting direction of the image acquisition device is facing the plane of the document to be identified.

9. The text recognition method according to claim 2, wherein the optical signal is generated in the following manner: The image acquisition device acquires the document to be identified to obtain a third image; Based on the third image, identify the type of the document to be identified; Based on the type of the document to be identified, an optical signal with a target encoding format is generated; the target encoding format matches the type of the document to be identified.

10. An electronic device, comprising: An image acquisition device is used to acquire images of a document to be recognized, obtaining a first image and a second image; the second image contains the text to be recognized. Processor, used for: Based on the first image, a depth map corresponding to the document to be identified is constructed; the depth map includes the spatial distance between each sampling point on the surface of the document to be identified and the image acquisition device; Based on the depth map, the first target parameters of the document to be identified are determined; The first target parameter characterizes the difference in distance between each sampling point on the surface of the document to be identified and the image acquisition device; Based at least on the first target parameters, the text to be identified in the second image is identified.