Japanese translation recognition method based on image recognition

By generating a specular reflection probability map and performing image decomposition and repair, the problem of text recognition in Japanese documents under complex lighting conditions was solved, achieving efficient translation under strong light interference.

CN122244878APending Publication Date: 2026-06-19HAINAN VOCATIONAL COLLEGE OF SCI & TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HAINAN VOCATIONAL COLLEGE OF SCI & TECH
Filing Date
2026-03-24
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing image recognition technologies struggle to effectively handle text recognition in high-brightness areas under complex lighting conditions, especially in Japanese documents. Optical interference caused by specular reflection leads to decreased contrast and artifact noise. Existing methods cannot effectively decouple reflection and texture components, resulting in low recognition rates.

Method used

By calculating the specular reflection probability map, using an image decomposition model to apply gradient direction orthogonality constraints and energy conservation constraints, the image is decoupled into specular light components and reflectivity components. Stroke flow field analysis and anisotropic diffusion are used for repair, and dynamic weighted suppression is combined with a character recognition model to generate Japanese translation results.

Benefits of technology

It significantly improves the robustness of Japanese text recognition and translation in strong light interference scenarios, and can accurately remove additive light spot noise from a single image and restore the text texture, ensuring accurate translation results in complex lighting environments.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122244878A_ABST
    Figure CN122244878A_ABST
Patent Text Reader

Abstract

This application relates to the field of computer vision technology and discloses a Japanese translation recognition method based on image recognition. By constructing a two-color reflection model based on physical optics and a gradient mutual exclusion mechanism, this invention can accurately remove additive spot noise from a single image. In the saturated region where information is lost, it uses Eulerian elastic potential energy flow field reconstruction technology to achieve stroke repair that conforms to the topological inertia of writing. To address the inference risk introduced in the repair process, this invention innovatively integrates Bayesian uncertainty gating and semantic error correction mechanisms. It dynamically adjusts the weights of visual perception and language priors according to the credibility of physical repair, forcing the system to automatically fill semantic gaps in visually blurred regions using contextual logic. Combined with multi-hypothesis centroid vector retrieval technology, this invention effectively overcomes the vulnerability of traditional methods to overall translation failure due to local recognition errors under spot occlusion.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of computer vision technology, and in particular to a Japanese translation recognition method based on image recognition. Background Technology

[0002] With the rapid development of cross-language information processing technology and mobile terminal computing power, computer vision-based scene text recognition technology has been widely applied in fields such as tourism translation, business office work, and automated information collection. In practical applications, Japanese, as a language with a complex character structure and diverse layout, often presents highly uncontrolled recognition scenarios. For example, in cross-border tourism or imported goods management scenarios, users frequently need to use mobile terminals to instantly photograph and translate various Japanese documents. These documents typically include restaurant menus covered with polyethylene plastic film, beautifully coated magazine covers, product labels encased in transparent plastic, or promotional posters displayed in glass shop windows. Such scenarios are common in complex indoor lighting or strong outdoor natural light environments. The high reflectivity of the photographing medium makes light easily generate unpredictable interference during the imaging process. This unstructured optical environment poses a significant challenge to subsequent image recognition processes.

[0003] During the imaging process on the aforementioned high-gloss surface, the light source produces strong specular reflection at specific angles, causing the light intensity received by local areas on the imaging sensor to exceed the dynamic range limit of the image sensor. This optical interference from the physical layer causes local pixel values ​​in the image to reach quantization saturation, resulting in brightness truncation. The direct technical consequence is a sharp decrease in the contrast between the Japanese character strokes and the background in the highlight area, with pixel gradient features completely disappearing. Especially for Japanese kanji and kana with extremely high stroke density and containing a large number of tiny voiced diacritics, highlight occlusion leads to the loss of key morphological features. Furthermore, the edges of specular reflection areas usually have sharp brightness abrupt changes, which are easily misidentified as character strokes or texture contours by edge detection operators during image preprocessing, resulting in severe artifact noise and seriously damaging the connectivity and integrity of text lines.

[0004] Existing image text recognition technologies have significant limitations in dealing with such lighting interference. Traditional binarization methods based on global or local thresholds rely primarily on the statistical characteristics of the image's grayscale histogram. These methods assume that lighting changes are gradual and continuous, thus failing to effectively handle sudden nonlinear brightness jumps, often resulting in the incorrect filtering or omitting of text within highlight areas. While some highlight removal techniques based on multi-frame fusion exist, they require highly stable acquisition equipment and multiple images from different angles, making them impractical for rapid capture scenarios with handheld mobile devices. Existing end-to-end deep learning recognition models, although possessing some feature extraction capabilities, still lack effective feature restoration mechanisms when faced with the physical loss of semantic information caused by strong light spot occlusion, failing to effectively decouple reflection and texture components within a single image. Therefore, at the current technological level, there is an urgent need for an improved solution that can effectively suppress specular reflection interference and restore text texture under highlights by starting from the underlying visual signal processing, in order to address the low recognition rate of existing technologies in complex lighting environments. Summary of the Invention

[0005] This application proposes a Japanese translation recognition method based on image recognition to address the problems mentioned in the background art.

[0006] To achieve the above objectives, this application adopts the following technical solution: a Japanese translation recognition method based on image recognition, comprising the following steps:

[0007] Step S1: Obtain the RGB image to be recognized containing Japanese text, calculate the approximation degree between the pixel brightness value in the RGB image to be recognized and the preset sensor saturation threshold, calculate the deviation degree between the pixel chromaticity and the local light source chromaticity in the RGB image to be recognized, and generate a specular reflection probability map based on the approximation degree between the pixel brightness value and the preset sensor saturation threshold and the deviation degree between the pixel chromaticity and the local light source chromaticity.

[0008] Step S2: Input the RGB image to be identified and the specular reflection probability map generated in step S1 into the image decomposition model. Use the specular reflection probability map as a weighted constraint mask. In the image decomposition model, perform gradient direction orthogonal constraints on the pixel saturated regions of the RGB image to be identified and perform energy conservation constraints on the unsaturated regions of the RGB image to be identified. Decouple the RGB image to be identified into specular light components and reflectivity components.

[0009] Step S3: Extract the edge gradient from the specular light component generated in step S2 as the repair boundary constraint, perform stroke flow field analysis on the reflectivity component generated in step S2, extract the stroke tangential vector field of the region not affected by specular light interference in the reflectivity component, use the stroke tangential vector field to guide the pixel to perform anisotropic diffusion to the data hole region covered by the specular light component, and generate the repaired reflectivity map and repair uncertainty map based on the anisotropic diffusion results.

[0010] Step S4: Input the repaired reflectance map generated in step S3 and the repair uncertainty map generated in step S3 into the character recognition model. In the character recognition model, the feature response of the repaired reflectance map is dynamically weighted and suppressed using the repair uncertainty map, and a Japanese character sequence is output. Based on the Japanese character sequence, the translation database is searched to generate a Japanese translation result.

[0011] Furthermore, in step S1, the specific operation of calculating the approximation degree between the pixel brightness value in the RGB image to be identified and the preset sensor saturation threshold is as follows:

[0012] For each pixel in the RGB image to be identified, the maximum voltage response value of the pixel in the red, green and blue optical channels is extracted as the channel peak response;

[0013] A thermal noise equivalent coefficient, which characterizes the combined standard deviation of sensor readout noise and thermal noise, is introduced to construct a photoelectric saturation potential energy function based on the Fermi-Dirac distribution.

[0014] The nonlinear mapping relationship between the channel peak response and the preset sensor saturation threshold is calculated using the photoelectric saturation potential energy function to generate the photoelectric cutoff probability term;

[0015] The thermal noise equivalent coefficient determines the response gradient of the photoelectric saturation potential energy function near the preset sensor saturation threshold, causing the photoelectric saturation potential energy function to exhibit a nonlinear numerical change near the preset sensor saturation threshold.

[0016] When the peak response of the channel is lower than the preset sensor saturation threshold, the photoelectric cutoff probability term is calculated to be a value close to zero by the photoelectric saturation potential energy function;

[0017] When the peak response of the channel approaches the preset sensor saturation threshold, the photoelectric cutoff probability term is calculated to be a value close to one by the photoelectric saturation potential energy function.

[0018] The photoelectric truncation probability term is output to the processing flow for generating the specular reflection probability map, serving as the data basis for subsequent calculations.

[0019] Further, in step S1, the deviation between the pixel chromaticity and the local light source chromaticity in the RGB image to be identified is calculated. Based on the approximation degree of the pixel brightness value to the preset sensor saturation threshold and the deviation degree between the pixel chromaticity and the local light source chromaticity, the specific operation of generating the specular reflection probability map is as follows:

[0020] Within a local neighborhood window centered on the pixel in the RGB image to be identified, the set of pixels whose peak channel response is in a preset high brightness ratio range is selected.

[0021] The local light source chromaticity is obtained by calculating the geometric center of the color vectors of all pixels in the pixel set and performing normalization.

[0022] Construct an inverse cosine spectral angle calculation model containing a numerical clamping operator to calculate the spectral angular distance between the pixel spectral vector of the pixel in the RGB image to be identified and the chromaticity of the local light source in the color space;

[0023] A surface roughness factor, which characterizes the micro-geometric roughness of the material surface, is introduced to weight the spectral angular distance, generating a spectral homochromaticity constraint term.

[0024] The numerical clamping operator numerically truncates the dot product of the pixel spectral vector and the local light source chromaticity of the pixel in the RGB image to be identified, so that the input value entering the inverse cosine spectral angle calculation model is limited to a closed interval between negative one and one.

[0025] The photoelectric cutoff probability term and the spectral homochromaticity constraint term are nonlinearly multiplied to generate a specular reflection probability map.

[0026] The values ​​of the specular reflection probability map are distributed in the range of zero to one, and are used to characterize the confidence level of each pixel in the RGB image to be identified that it loses physical texture information due to photon flux overload and spectral direction dominated by the light source.

[0027] Furthermore, in step S2, the specific operation of performing energy conservation constraints in the unsaturated region of the RGB image to be identified is as follows:

[0028] The image decomposition model constructs an additive noise inverse operation logic based on the superposition principle of physical optics, which identifies the RGB image to be identified as the linear superposition result of the diffuse reflection light component carrying text information and the specular reflection light with the mirror properties of a pure light source;

[0029] The image decomposition model identifies the pixel coordinates with values ​​close to zero in the specular reflection probability map as the unsaturated region of the RGB image to be identified, and determines that the sensor is in the linear operating range and the physical information is intact within the unsaturated region of the RGB image to be identified.

[0030] The image decomposition model constructs a weighted variational decomposition functional that includes the energy conservation term in the unsaturated region, and uses the inverse value of the specular reflection probability map as the confidence weight of the energy conservation term in the unsaturated region.

[0031] The image decomposition model introduces a robust penalty potential energy function that simulates the statistical distribution of physical measurement errors into the energy conservation term in the unsaturated region. The robust penalty potential energy function is composed of square root operations containing tiny positive numbers to improve tolerance to outlier noise.

[0032] The image decomposition model forces the difference between the RGB image to be identified and the specular light component predicted by the image decomposition model to be equal to the reflectance component predicted by the image decomposition model by minimizing the robust penalty potential function. This achieves pixel fidelity constraint on the reflectance component in the unsaturated region, ensuring that the reflectance component retains the original color and texture features of the RGB image to be identified.

[0033] Furthermore, in step S2, the specific operation of performing gradient direction orthogonal constraints on the pixel saturation regions of the RGB image to be identified in the image decomposition model is to perform gradient mutual exclusion calculation:

[0034] The image decomposition model identifies the pixel coordinates with values ​​close to one in the specular reflection probability map as the pixel saturation region of the RGB image to be identified, and determines that the energy conservation constraint fails due to physical truncation within the pixel saturation region of the RGB image to be identified.

[0035] The image decomposition model introduces a gradient mutually exclusive structure separation mechanism based on the assumption of signal source independence, which determines that the edges of the specular light component determined by the shape of the light source and the edges of the reflectivity component determined by the strokes of the text do not overlap in spatial distribution.

[0036] The image decomposition model constructs gradient mutual exclusion terms in the saturation region and calculates the spatial gradient fields of the specular light component and the reflectivity component, respectively.

[0037] The image decomposition model calculates the vector magnitude product of the spatial gradient fields of the specular light component and the spatial gradient fields of the reflectivity component at corresponding positions, and introduces a hyperbolic tangent limiting operator to perform numerical saturation processing on the vector magnitude product in order to constrain the gradient response range.

[0038] The image decomposition model introduces a gradient activation sensitivity parameter to enhance the rejection gain for weak edges. By minimizing the gradient mutual exclusion term in the saturation region, it is forced that if the specular light component has a spatial gradient in the pixel saturation region of the RGB image to be identified, the spatial gradient of the reflectivity component approaches zero. Thus, the light spot edge of the specular light component is spatially separated from the stroke structure of the reflectivity component through structural independence constraints.

[0039] Furthermore, in step S3, the stroke flow field analysis is performed on the reflectivity components generated in step S2. The specific operation of extracting the stroke tangential vector field of the region not affected by specular interference in the reflectivity components is to perform flow field reconstruction based on Euler elastic potential energy:

[0040] The pixel coordinates in the specular light component whose brightness response exceeds the preset specular threshold are identified as data hole regions, and the pixel coordinates in the reflectivity component located outside the data hole regions are identified as valid observation regions.

[0041] The structure tensor of reflectivity component is extracted at the boundary between the effective observation area and the data void area. Eigenvalue decomposition is performed on the structure tensor to extract the unit eigenvector corresponding to the minimum eigenvalue and it is defined as the boundary tangential field.

[0042] An Euler elastic potential energy model incorporating curve bending stiffness parameters and compression modulus parameters is introduced to construct an energy functional concerning the flow field angular phase within the data void region.

[0043] In the energy functional, a smoothing term that penalizes the degree of flow field fluctuation and a curvature term that penalizes the degree of streamline divergence are set. By minimizing the flow field distribution inside the data void region calculated by the energy functional, the reconstructed stroke tangential vector field is output. The reconstructed stroke tangential vector field satisfies the geometric continuity constraint and curvature consistency constraint inside the data void region, so that the reconstructed stroke tangential vector field exhibits the bending shape characteristics of a physical elastic rod, realizing the topological extension of the turning features and writing inertia of Japanese characters.

[0044] Furthermore, in step S3, the specific operation of guiding pixels to anisotropically diffuse into the data hole region covered by the mirror light component using the stroke tangential vector field is to perform shock-assisted transport reaction diffusion calculation:

[0045] Based on the reconstructed stroke tangential vector field, a structural diffusion tensor is constructed. The structural diffusion tensor is defined as an anisotropic matrix with high conductivity along the direction of the reconstructed stroke tangential vector field and low conductivity perpendicular to the direction of the reconstructed stroke tangential vector field.

[0046] A partial differential equation containing transport, diffusion, and shock wave response terms is constructed. The transport terms are used to drive the texture pixels with reflectivity components to perform pure convective displacement along the streamline direction of the reconstructed stroke tangential vector field.

[0047] Thermal diffusion mapping is performed using the diffusion term along the tangent direction of the reconstructed stroke tangential vector field;

[0048] A shock wave filter based on the Hessian matrix is ​​introduced as the shock wave response term, and the second directional derivative of the reflectivity component in the edge normal direction is calculated.

[0049] When the second directional derivative is positive, reverse diffusion calculation is performed to gather edge energy; when the second directional derivative is negative, forward diffusion calculation is performed, and the edge expansion caused by thermal diffusion is numerically compensated through the shock wave response term to generate a repaired reflectivity map.

[0050] The stroke edges in the restored reflectance map exhibit sharpening characteristics determined by the shock wave response term.

[0051] Furthermore, in step S3, the specific operation of generating the uncertainty repair map based on the anisotropic diffusion results is to perform entropy increment calculation based on Riemannian metrics:

[0052] A confidence decay analysis based on the second law of thermodynamics was performed inside the data void region, and it was determined that the confidence of the repair result of the reflectivity component showed a nonlinear decreasing trend with the increase of transmission distance.

[0053] The Riemann metric space inside the data hole region is defined using the inverse matrix of the structure diffusion tensor;

[0054] The minimum physical propagation time for a pixel in a void region of computational data to reach the boundary of the effective observation region within the Riemann metric space is defined as the Riemann geodesic distance.

[0055] In the calculation of Riemann geodesic distance, the path weight along the reconstructed stroke tangential vector field direction is less than the path weight perpendicular to the reconstructed stroke tangential vector field direction.

[0056] The Riemann geodesic distance is mapped to a probability value with a numerical distribution in the range of zero to one using the exponential decay function. This generates a repair uncertainty map, which quantifies the inference uncertainty generated by anisotropic diffusion within the data void region. The output is used for subsequent steps to adjust the feature response weights of the identification model.

[0057] Furthermore, in step S4, the specific operation of dynamically weighting and suppressing the feature responses of the repaired reflectivity map using the repair uncertainty map in the character recognition model is to perform a feature fusion calculation that minimizes Bayesian risk:

[0058] When the character recognition model performs the feature extraction process, it runs signal demodulation calculation under noisy channel. The character recognition model uses a deep convolutional neural network to process the repaired reflectivity map generated in step S3 and outputs the original visual feature tensor.

[0059] The character recognition model synchronously acquires the repair uncertainty map generated in step S3, and downsamples the spatial resolution of the repair uncertainty map to the same dimension as the spatial resolution of the original visual feature tensor.

[0060] The character recognition model is constructed based on a nonlinear reliability gating function of the Weber-Fechner law. The nonlinear reliability gating function uses the pixel values ​​of the repair uncertainty map to calculate the transmittance weights in the feature space.

[0061] The nonlinear reliability gate function includes sigmoid activation function transformation logic and dark current constant term;

[0062] The dark current constant term limits the minimum value of the transmittance weight, thereby preserving gradient backpropagation channels in regions where pixel values ​​approach one in the uncertainty map during the calculation of the feature response.

[0063] The character recognition model calculates the Hadamard product of the original visual feature tensor and the transmittance weights to generate a Bayesian weighted feature map.

[0064] The Bayesian weighted feature map performs numerical attenuation of the feature response intensity in local regions where the pixel value of the corresponding repair uncertainty map approaches one, and performs feature preservation of the original visual feature tensor in local regions where the pixel value of the corresponding repair uncertainty map approaches zero.

[0065] Furthermore, in step S4, the specific operations for outputting the Japanese character sequence, retrieving the translation database based on the Japanese character sequence, and generating the Japanese translation result are: performing context-prior-driven sequence decoding and fault-tolerant retrieval in the multi-hypothesis semantic space.

[0066] The character recognition model performs sequence decoding computation based on maximum a posteriori probability estimation using Bayesian weighted feature maps;

[0067] During the sequence decoding process, the character recognition model jointly calculates the visual likelihood probability term determined by the Bayesian weighted feature map and the language prior probability term determined by the pre-trained Japanese language model.

[0068] When the feature response intensity of the Bayesian weighted feature map in a local region is lower than the preset response threshold, the sequence decoding calculation reduces the contribution weight of the visual likelihood probability term to the generation of the Japanese character sequence and uses the language prior probability term to perform semantic error correction on the Japanese character sequence.

[0069] The character recognition model uses a beam search strategy to generate a candidate set containing multiple candidate character sequences;

[0070] The character recognition model calculates the predicted probability of each candidate character sequence in the candidate set, and uses the predicted probability to calculate the weighted average of the semantic vectors corresponding to each candidate character sequence, and outputs the centroid query vector.

[0071] The character recognition model retrieves the translated text with the highest cosine similarity to the centroid query vector from the translation database and outputs the Japanese translation result.

[0072] The cosine similarity calculation process introduces a numerically stable quantity to perform non-zeroing of the denominator, and the calculation process of the centroid query vector utilizes the clustering characteristics of the high-dimensional semantic space.

[0073] The beneficial effects of this invention are as follows:

[0074] This invention significantly improves the robustness of Japanese text recognition and translation in strong light interference scenarios. By constructing a two-color reflection model based on physical optics and a gradient mutual exclusion mechanism, this invention can accurately remove additive spot noise from a single image. In saturated regions where information is lost, it utilizes Eulerian elastic potential energy flow field reconstruction technology to achieve stroke restoration that conforms to the topological inertia of writing. To address the inference risks introduced during the restoration process, this invention innovatively integrates Bayesian uncertainty gating and semantic error correction mechanisms. It dynamically adjusts the weights of visual perception and language priors based on the credibility of physical restoration, forcing the system to automatically fill semantic gaps in visually blurred regions using contextual logic. Combined with multi-hypothesis centroid vector retrieval technology, this invention effectively overcomes the vulnerability of traditional methods to overall translation failure due to local recognition errors under spot occlusion, ensuring accurate and fluent translation results even in complex lighting environments. Attached Figure Description

[0075] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on the provided drawings without creative effort:

[0076] Figure 1 This is a flowchart of the method of the present invention. Detailed Implementation

[0077] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0078] Example

[0079] like Figure 1 As shown, this invention discloses a Japanese translation recognition method based on image recognition, comprising the following steps:

[0080] Step S1: Obtain the RGB image to be recognized containing Japanese text, calculate the approximation degree between the pixel brightness value in the RGB image to be recognized and the preset sensor saturation threshold, calculate the deviation degree between the pixel chromaticity and the local light source chromaticity in the RGB image to be recognized, and generate a specular reflection probability map based on the approximation degree between the pixel brightness value and the preset sensor saturation threshold and the deviation degree between the pixel chromaticity and the local light source chromaticity.

[0081] In a preferred embodiment of the present invention, step S1 distinguishes between valid white paper backgrounds and invalid specular reflection spots from a physical level, providing accurate confidence priors for subsequent image restoration.

[0082] Specifically, in step S1, the operation of calculating the approximation of the pixel brightness value in the RGB image to be identified to the preset sensor saturation threshold is as follows: for the pixel in the RGB image to be identified, extract the maximum voltage response value of the pixel in the red, green and blue optical channels as the channel peak response.

[0083] Specifically, the acquired RGB image to be identified is first normalized, mapping the pixel values ​​to a floating-point range of zero to one, corresponding to the voltage response range of the photoelectric sensor from dark current level to full well capacity. For each pixel position in the image, its average brightness is not considered. Instead, the maximum value of the pixel in the three optical channels of red, green and blue is extracted by comparison algorithm. The significance of this operation is that, according to the photoelectric effect principle, as long as the number of photoelectrons in any color channel exceeds the potential well capacity, the physical information of the pixel will be irreversibly distorted. Therefore, the maximum response value must be used as the judgment benchmark.

[0084] A thermal noise equivalent coefficient, which characterizes the combined standard deviation of sensor readout noise and thermal noise, is introduced to construct a photoelectric saturation potential energy function based on the Fermi-Dirac distribution.

[0085] In this embodiment, the thermal noise equivalence coefficient is preferably set to 0.02. This value is selected based on the Johnson-Nyquist thermal noise theory and the measured data of common complementary metal-oxide-semiconductor (CMOS) sensors. That is, under room temperature conditions, the standard deviation of the background noise generated by the thermal motion of charge carriers and the circuit readout process is about two percent of the full scale. The role of this coefficient in the algorithm is to control the softness of the saturation determination, that is, to define the fuzzy boundary width of the sensor from the linear region to the saturation region, and to prevent the algorithm from misjudging the transient high value caused by random thermal noise as physical saturation. The constructed photoelectric saturation potential energy function adopts the mathematical form of the Fermi distribution, that is, to use the natural exponential function to simulate the statistical distribution law of electrons at the edge of the potential well.

[0086] The nonlinear mapping relationship between the channel peak response and the preset sensor saturation threshold is calculated using the photoelectric saturation potential energy function, and a photoelectric cutoff probability term is generated. The thermal noise equivalent coefficient determines the response gradient of the photoelectric saturation potential energy function near the preset sensor saturation threshold, so that the photoelectric saturation potential energy function exhibits a nonlinear numerical change near the preset sensor saturation threshold.

[0087] In this embodiment, the preset sensor saturation threshold is set to 0.98. The physical basis for choosing this value is that, through linearity testing of the sensor's photoelectric conversion characteristics, it was found that when the amount of photoelectrons accumulated reaches 98% of the full-well capacity, the photoelectric response curve begins to show significant nonlinear bending. At this time, although the pixel has not reached the value of 1, the physical information has been distorted.

[0088] The specific calculation process is as follows: First, calculate the difference between the channel peak response and the preset sensor saturation threshold. Divide this difference by the thermal noise equivalent coefficient to obtain the normalized deviation. Then, calculate the natural constant with the negative of the normalized deviation as the exponent. Finally, add one to the exponent and take its reciprocal. This calculation process makes the probability term exhibit a smooth S-shaped curve change near the threshold, rather than a sudden step change.

[0089] When the channel peak response is lower than the preset sensor saturation threshold, the photoelectric truncation probability term is calculated to be close to zero using the photoelectric saturation potential energy function; when the channel peak response approaches the preset sensor saturation threshold, the photoelectric truncation probability term is calculated to be close to one using the photoelectric saturation potential energy function; the photoelectric truncation probability term is output to the processing flow for generating the specular reflection probability map as the data basis for subsequent calculations.

[0090] Through the above mechanism, for text or background pixels that are less affected by light and are in the linear response zone, the calculated photoelectric truncation probability term will be extremely close to zero; while for pixels that are caused by charge overflow due to strong light, regardless of their original color, the probability term will rise rapidly to one. This step quantifies the possibility of physical truncation of each pixel from the perspective of optoelectronics.

[0091] In step S1, the deviation between pixel chromaticity and local light source chromaticity in the RGB image to be identified is calculated. Based on the approximation of pixel brightness value to preset sensor saturation threshold and the deviation between pixel chromaticity and local light source chromaticity, the specific operation of generating specular reflection probability map is as follows: within a local neighborhood window centered on the pixel in the RGB image to be identified, a set of pixels whose channel peak response is in a preset high brightness ratio range is selected; the local light source chromaticity is obtained by calculating the geometric center of the color vectors of all pixels in the pixel set and performing normalization processing.

[0092] Specifically, in order to adapt to the uneven lighting (such as side lighting from a desk lamp) commonly found in Japanese document shooting scenarios, this embodiment defines the size of the local neighborhood window as 32 by 32 pixels. This size choice balances the sufficiency of statistical samples with the spatial locality of lighting changes.

[0093] The preset high brightness ratio range is set to the top 5% of the brightness ranking within the window. Based on the dual-color reflection model, on most media surfaces, the brightest pixel is dominated by the specular reflection component of the light source, rather than the diffuse reflection component of the object. Therefore, this set of pixels can most accurately represent the color of the light source itself. The color vectors of these top 5% of pixels are geometrically averaged and their magnitudes are normalized to obtain the local light source chromaticity containing only directional information.

[0094] An inverse cosine spectral angle calculation model containing a numerical clamping operator is constructed to calculate the spectral angular distance between the pixel spectral vector of the pixel in the RGB image to be identified and the spectral angular distance between the local light source chromaticity in the color space.

[0095] During the calculation, the dot product of the spectral vector of the current pixel and the chromaticity of the local light source is calculated and divided by the product of their magnitudes to obtain the cosine value. At this point, a numerical clamping operator must be executed to force the cosine value to be limited to between -1 plus 10 to the power of -7 and +1 minus 10 to the power of -7. The basis for introducing this operator and the tiny quantity of 10 to the power of -7 is that computer floating-point operations may produce tiny precision overflows (e.g., 1.0000001) when processing parallel or reverse vectors. This would cause the inverse cosine function to have no solution and cause the program to crash. Through the clamping operation, it is ensured that the algorithm can stably output effective spectral angular distance under any extreme lighting conditions.

[0096] A surface roughness factor, which characterizes the microscopic geometric roughness of the material surface, is introduced to perform weighted calculations on the spectral angular distance, generating a spectral homochromaticity constraint term.

[0097] In this embodiment, the surface roughness factor is set to 0.15. This value is selected based on the micro-plane theory to measure the optical properties of the surface of common printed materials (such as laminated menus and coated paper magazines). The smoother the surface, the smaller the value, which means that the reflected beam is narrower and the tolerance for color deviation is lower.

[0098] The calculation logic is as follows: calculate the square of the spectral angular distance, divide it by twice the square of the surface roughness factor, take the opposite of the quotient, and finally calculate the natural constant as a power of the opposite. This Gaussian weighting function ensures that the spectral homochromaticity constraint term approaches one only when the pixel color is extremely close to the light source color.

[0099] The numerical clamping operator truncates the dot product of the pixel spectral vector and the local light source chromaticity of the pixel in the RGB image to be identified, so that the input value entering the inverse cosine spectral angle calculation model is limited to a closed interval between -1 and 1. The photoelectric truncation probability term and the spectral homochromaticity constraint term are nonlinearly multiplied to generate a specular reflection probability map. The values ​​of the specular reflection probability map are distributed in the interval between zero and one, which is used to characterize the confidence that each pixel in the RGB image to be identified loses physical texture information due to photon flux overload and spectral direction dominated by the light source.

[0100] Finally, the photoelectric truncation probability term is directly multiplied by the spectral homochromaticity constraint term. This multiplication coupling mechanism ensures that the generated specular reflection probability map will only output a high confidence level when a pixel simultaneously satisfies the two physical conditions of extremely high brightness leading to physical truncation and extremely consistent color with the light source.

[0101] In summary, this embodiment solves the technical problem of existing technologies being unable to distinguish between high-brightness white paper backgrounds and high-brightness specular reflection spots under complex lighting conditions through the above steps. This embodiment accurately locates the areas where image information is lost from a physical level by coupling the physical saturation characteristics of photodiodes with the spectral consistency of the two-color reflection model. This provides accurate spatial attention priors for the physical constraint decomposition network in the subsequent step S2, effectively preventing the forced fitting of erroneous data in subsequent steps. Comparative experiments show that, under the same lighting interference, the method using the parameter configuration described in this embodiment improves the detection rate of specular reflection areas by 20% compared to the traditional brightness threshold method, and reduces the misjudgment rate of white background text by more than 15%.

[0102] Step S2: Input the RGB image to be identified and the specular reflection probability map generated in step S1 into the image decomposition model. Use the specular reflection probability map as a weighted constraint mask. In the image decomposition model, perform gradient direction orthogonal constraints on the pixel saturated regions of the RGB image to be identified, and perform energy conservation constraints on the unsaturated regions of the RGB image to be identified. Decouple the RGB image to be identified into specular light components and reflectivity components.

[0103] This embodiment illustrates how to utilize the physical prior generated in step S1 to accurately separate additive specular reflection spots and underlying diffuse text textures in a single image using variational mathematics. Unlike traditional image enhancement techniques that directly perform uniform dehazing or contrast stretching on the entire image, this embodiment employs a regional physical constraint strategy, which solves the technical problem that physical information loss in pixel saturation areas cannot be restored by simple subtraction.

[0104] Specifically, in step S2, the specific operation of performing energy conservation constraints in the unsaturated region of the RGB image to be identified is as follows: the image decomposition model constructs additive noise inverse operation logic based on the superposition principle of physical optics, and identifies the RGB image to be identified as the linear superposition result of diffuse reflection light component carrying text information and specular reflection light with pure light source mirror properties.

[0105] In this embodiment, the imaging process of the RGB image to be identified is modeled as a two-color reflection model. The image decomposition model resolves the photon energy received by the image sensor into a linear sum of two physical sources: one part is the diffuse reflection component that returns after the incident light penetrates the medium surface and is scattered by pigment particles, and the diffuse reflection component carries the shape and color information of the Japanese characters; the other part is the specular reflection light formed by the Fresnel reflection of the incident light directly on the medium surface, and the specular reflection light is manifested as an additive light spot with the same color as the light source.

[0106] The image decomposition model does not directly solve the complex multiplicative illumination model. Instead, it transforms the spot removal task into an inverse operation problem of subtracting the additive specular light component from the observed signal through logarithmic domain transformation or direct linear assumption.

[0107] The image decomposition model identifies the pixel coordinates with values ​​close to zero in the specular reflection probability map as the unsaturated region of the RGB image to be identified, and determines that the sensor is in the linear operating range and the physical information is intact within the unsaturated region of the RGB image to be identified. The image decomposition model constructs a weighted variational decomposition functional that includes the energy conservation term in the unsaturated region, and uses the inverse value of the specular reflection probability map as the confidence weight of the energy conservation term in the unsaturated region.

[0108] In this embodiment, the image decomposition model sets the threshold for determining unsaturated regions to 0.05. When the value of the specular reflection probability map at a certain pixel position is lower than 0.05, the image decomposition model determines that the RGB image to be identified is in an unsaturated region at that position. The physical basis for choosing 0.05 as the threshold for determining unsaturated regions comes from the statistical analysis of dark current noise and shot noise of complementary metal-oxide-semiconductor sensors. This value corresponds to a confidence interval of three times the signal-to-noise ratio, which is sufficient to eliminate the interference of random noise on the probability determination and ensure that the pixels determined to be in unsaturated regions are indeed in the linear response region of photoelectric conversion.

[0109] When constructing the weighted variational decomposition functional, the image decomposition model calculates the difference between one and the specular reflection probability map, and uses this difference as the confidence weight of the energy conservation term in the unsaturated region. The role of this weighting mechanism is to achieve spatial adaptive adjustment of the physical constraint strength: in regions where it is certain that no physical truncation has occurred, the confidence weight approaches one, and the image decomposition model will strictly perform physical subtraction; in regions with a high probability of photoelectric saturation, the confidence weight approaches zero, and the image decomposition model automatically reduces its dependence on the energy conservation equation, thereby avoiding ringing artifacts caused by forcibly fitting truncated data.

[0110] The image decomposition model introduces a robust penalty potential energy function that simulates the statistical distribution of physical measurement errors into the energy conservation term in the unsaturated region. The robust penalty potential energy function is composed of square root operations containing tiny positive numbers to improve tolerance to outlier noise. By minimizing the robust penalty potential energy function, the image decomposition model forces the difference between the RGB image to be identified and the specular light component predicted by the image decomposition model to be equal to the reflectivity component predicted by the image decomposition model. This achieves pixel fidelity constraint on the reflectivity component in the unsaturated region, ensuring that the reflectivity component retains the original color and texture features of the RGB image to be identified.

[0111] In this embodiment, the robust penalty potential function specifically adopts the Chabonnier penalty function form. The calculation process is as follows: the image decomposition model first calculates the residual value obtained by subtracting the predicted specular light component from the RGB image to be identified and then subtracting the predicted reflectivity component. The square of the residual value is then calculated as the sum of the squares of the numerically stable infinitesimal quantities. Finally, the square root of this sum is performed.

[0112] In this embodiment, the numerically stable micro-value is preferably set to 10 to the power of -3. The basis for choosing 10 to the power of -3 as the numerically stable micro-value is that this value is slightly higher than the noise floor level of the sensor quantization noise, which can avoid the non-differentiability problem at the zero gradient point and suppress the excessive pulling of dead pixels or hot noise on the optimization process.

[0113] Comparative experiments have shown that, compared to traditional mean square error constraints, the robust penalty potential function, which includes a numerically stable small amount, improves the texture preservation rate of the image decomposition model by 12% when processing low-light noise areas, effectively preventing the blurring of text edges during the removal of light spots.

[0114] In step S2, the specific operation of performing gradient direction orthogonal constraint on the pixel saturation region of the RGB image to be identified in the image decomposition model is to perform gradient mutual exclusion calculation: the image decomposition model identifies the pixel coordinates with values ​​close to one in the specular reflection probability map as the pixel saturation region of the RGB image to be identified, and determines that the energy conservation constraint fails due to physical truncation in the pixel saturation region of the RGB image to be identified.

[0115] In this embodiment, when the value of the specular reflection probability map at a certain pixel position is higher than 0.95, the image decomposition model determines that the RGB image to be identified is in the pixel saturation region at that position. At this time, since the photon flux exceeds the full well capacity, the pixel value of the RGB image to be identified is truncated to the maximum range, and the physical linear superposition relationship no longer holds. If the image decomposition model continues to perform subtraction operation, it will cause color distortion in the solved reflectivity component. Therefore, the image decomposition model automatically blocks the gradient backpropagation of the energy conservation constraint in this region and instead activates the blind source separation mechanism based on structural features.

[0116] The image decomposition model introduces a gradient mutual exclusion structure separation mechanism based on the assumption of signal source independence, which determines that the edges of the specular light component determined by the shape of the light source and the edges of the reflectivity component determined by the strokes of the text do not overlap in spatial distribution; the image decomposition model constructs gradient mutual exclusion terms in the saturation region and calculates the spatial gradient fields of the specular light component and the reflectivity component respectively.

[0117] In this embodiment, the image decomposition model uses a difference operator to extract the first derivatives of the specular light component and the reflectivity component in the horizontal and vertical directions, respectively, generating corresponding spatial gradient fields. The physical basis of this mechanism is that the edge contour of the specular light component depends on the geometry of the illumination source and the microscopic undulations of the medium surface, typically exhibiting a low-frequency, smooth, sheet-like structure; while the edge contour of the reflectivity component depends on the stroke trajectory of the printed text, exhibiting a high-frequency, sharp, linear structure. According to the sparse component analysis theory of statistical physics, the joint probability of the signal edges generated by the two independent physical processes simultaneously undergoing abrupt changes at the same pixel location approaches zero.

[0118] The image decomposition model calculates the product of the spatial gradient fields of the specular light component and the spatial gradient fields of the reflectivity component at corresponding positions, and introduces a hyperbolic tangent limiting operator to perform numerical saturation processing on the product of the vector magnitudes in order to constrain the gradient response range.

[0119] In this embodiment, the image decomposition model first calculates the pointwise product of the gradient magnitudes of the two components. To prevent the generation of extremely large gradient values ​​at strong edges, which would lead to gradient explosion during the optimization process, the image decomposition model uses this product as an independent variable to input the hyperbolic tangent function for mapping, strictly limiting the output value to the range of zero to one. The role of introducing the hyperbolic tangent limiting operator is to provide a non-linear soft truncation mechanism, so that the optimization algorithm can focus on the separation of low-to-medium intensity edges, rather than being dominated by a very small number of high-contrast edges in the entire loss function.

[0120] The image decomposition model introduces a gradient activation sensitivity parameter to enhance the rejection gain for weak edges. By minimizing the gradient mutual exclusion term in the saturation region, it is forced that if the specular light component has a spatial gradient in the pixel saturation region of the RGB image to be identified, the spatial gradient of the reflectivity component approaches zero. Thus, the light spot edge of the specular light component is spatially separated from the stroke structure of the reflectivity component through structural independence constraints.

[0121] In this embodiment, the gradient activation sensitivity parameter is preferably set to fifty. The basis for choosing fifty as the gradient activation sensitivity parameter is that, through the histogram statistics of a large number of defocused spot edge gradients, it was found that the gradient amplitude of the spot edge is usually small and diffuse. Setting a higher sensitivity parameter is equivalent to introducing a high-gain amplifier in the linear region of the hyperbolic tangent function, which can significantly enhance the image decomposition model's ability to perceive weak spot boundaries.

[0122] The image decomposition model mathematically exerts a structural repulsion force by minimizing the gradient mutual exclusion term after sensitivity amplification and hyperbolic tangent limiting: at any pixel, it forces that the gradient of the specular light component and the gradient of the reflectivity component cannot simultaneously take large values.

[0123] Experimental data show that after introducing this gradient activation sensitivity parameter, the image decomposition model improves the accuracy of stripping the edges of blurred light spots by 35% compared with the time when this parameter is not introduced, effectively solving the problem of light spot contours remaining in the reflectivity component and forming artifacts.

[0124] In summary, this embodiment solves the physical challenge of separating mixed illumination components in a single image by performing physical subtraction based on a robust penalty potential function in the unsaturated region and blind source decoupling based on a gradient mutual exclusion structure separation mechanism in the saturated region. This method not only achieves pixel-level color fidelity in the unsaturated region, but also uses structural priors to achieve morphological separation of light spots and text in the saturated region where information is lost, generating a pure reflectivity component. This processing result removes the additive illumination noise covering the text. However, for the severely saturated central region, the reflectivity component may still have physical data voids, which provides a clear and well-defined operation object for the subsequent step S3 to perform topology repair using fluid dynamics principles.

[0125] Step S3: Extract the edge gradient from the specular light component generated in step S2 as the repair boundary constraint. Perform stroke flow field analysis on the reflectivity component generated in step S2, extract the stroke tangential vector field of the region in the reflectivity component that is not affected by specular light interference, and use the stroke tangential vector field to guide the pixel to perform anisotropic diffusion to the data hole region covered by the specular light component. Generate the repaired reflectivity map and the repair uncertainty map based on the results of the anisotropic diffusion.

[0126] This embodiment addresses the information loss problem caused by physical truncation in step S2. Unlike traditional image restoration techniques based on texture synthesis, this embodiment views Japanese character strokes as fluids with topological continuity. By reconstructing an invisible physical field to drive the directional transport of visible texture pixels, it achieves physical connection of broken strokes at the microscopic level.

[0127] Specifically, in step S3, the stroke flow field analysis is performed on the reflectivity component generated in step S2. The specific operation of extracting the stroke tangential vector field of the reflectivity component that is not affected by specular interference is to perform flow field reconstruction based on Euler elastic potential energy: the pixel coordinates of the specular light component whose brightness response exceeds the preset specular threshold are identified as data hole regions, and the pixel coordinates of the reflectivity component located outside the data hole regions are identified as valid observation regions.

[0128] In this embodiment, the objects to be processed are the reflectivity component and specular light component output in step S2. In this embodiment, the preset specular threshold is selected as 0.9. The physical basis for selecting 0.9 as the preset specular threshold comes from the photon statistical distribution analysis of complementary metal-oxide-semiconductor sensors in the specular overflow state. When the normalized brightness response exceeds 0.9, the signal-to-noise ratio of the pixel value decreases exponentially, and the physical texture information has been irreversibly lost. The area where the normalized brightness response exceeds 0.9 is defined as the data hole area, which can prevent the subsequent repair process from performing texture inference based on the erroneous residual signal. The method generates a binary mask that identifies the location of missing physical information. The binary mask strictly divides the image space into the data hole area of ​​physical vacuum and the effective observation area that retains the real texture.

[0129] The structural tensor of reflectivity component is extracted at the boundary between the effective observation area and the data void area. Eigenvalue decomposition is performed on the structural tensor to extract the unit eigenvector corresponding to the minimum eigenvalue and define it as the boundary tangential field. An Euler elastic potential energy model including curve bending stiffness parameter and compression modulus parameter is introduced to construct an energy functional about the flow field angle phase inside the data void area.

[0130] In this embodiment, the method first calculates the gradient field outer product of the reflectivity components at the edge of the effective observation area to construct a structure tensor matrix. The method performs eigenvalue decomposition on the structure tensor matrix and extracts the unit eigenvector corresponding to the minimum eigenvalue. Since the direction corresponding to the minimum eigenvalue is perpendicular to the gradient direction where the grayscale change is most drastic, the unit eigenvector accurately indicates the geometric extension direction of the Japanese character strokes.

[0131] When constructing the energy functional of the flow field angle phase, this embodiment sets the curve bending stiffness parameter to 1.0 and the compression modulus parameter to 5.0. The basis for choosing 1.0 as the curve bending stiffness parameter is the elastic rod theory, which gives the flow field physical inertia to resist high-frequency jitter. The basis for choosing 5.0 as the compression modulus parameter is the writing geometry of Japanese boldface or Song typeface Chinese characters, that is, the stroke width remains constant during the extension process.

[0132] By introducing the compression modulus parameter, the energy functional can penalize the divergence and convergence behavior of streamlines, forcing streamlines to maintain parallel extension within the data void region. Comparative experiments show that, compared with the harmonic field model using only the Laplace operator, introducing the Euler elastic potential energy model improves the curvature retention rate of strokes at the repaired corners by 40%, effectively avoiding the phenomenon of repairing square turns into circular artifacts.

[0133] In the energy functional, a smoothing term that penalizes the degree of flow field fluctuation and a curvature term that penalizes the degree of streamline divergence are set. By minimizing the flow field distribution inside the data void region calculated by the energy functional, the reconstructed stroke tangential vector field is output. The reconstructed stroke tangential vector field satisfies the geometric continuity constraint and curvature consistency constraint inside the data void region, so that the reconstructed stroke tangential vector field exhibits the bending shape characteristics of a physical elastic rod, realizing the topological extension of the turning features and writing inertia of Japanese characters.

[0134] In this embodiment, the method uses the variational method to solve for the minimum value of the energy functional, thereby establishing a virtual guiding force field inside the data void region. The reconstructed stroke tangential vector field not only smoothly connects with the texture direction of the effective observation area at the boundary, but also maintains the original curvature trend of the stroke inside the data void region. This flow field reconstruction mechanism based on Euler elastic potential energy simulates the natural bending behavior of an elastic rod under stress from the physical topology level, ensuring that the repaired stroke structure conforms to the kinematic laws of Japanese writing and solving the technical problem that broken strokes cannot be naturally connected.

[0135] In step S3, the specific operation of guiding pixels to anisotropically diffuse into the data hole region covered by the mirror light component using the stroke tangential vector field is to perform shock wave-assisted transport reaction diffusion calculation: construct a structure diffusion tensor based on the reconstructed stroke tangential vector field, and define the structure diffusion tensor as an anisotropic matrix with high conductivity along the direction of the reconstructed stroke tangential vector field and low conductivity perpendicular to the direction of the reconstructed stroke tangential vector field.

[0136] In this embodiment, the structural diffusion tensor is designed as a second-order symmetric positive definite matrix. In this embodiment, the ratio of the principal eigenvalues ​​in the tangential vector field direction of the reconstructed stroke to the secondary eigenvalues ​​in the vertical direction of the structural diffusion tensor is set to 100:1. The physical basis for choosing the 100:1 eigenvalue ratio comes from the laminar flow model in fluid mechanics. This ratio constructs an extremely anisotropic transmission channel, ensuring that texture information can only be transmitted over long distances along the stroke extension direction, while being strictly confined in the normal direction perpendicular to the stroke. This parameter setting effectively prevents texture crosstalk between different strokes and ensures that the edges of the repaired strokes will not undergo blurring diffusion.

[0137] A partial differential equation containing transport, diffusion, and shock response terms is constructed. The transport term is used to drive the texture pixels of reflectivity components to perform pure convective displacement along the streamline direction of the reconstructed stroke tangential vector field. The diffusion term is used to perform thermal diffusion mapping in the tangential direction of the reconstructed stroke tangential vector field.

[0138] In this embodiment, the transport term simulates the advection of matter and is responsible for transporting the pixel colors of the effective observation area to the center of the data void region without loss; the diffusion term simulates a weak heat conduction process and only works in the streamline tangential direction to smoothly stitch together the splicing gaps between pixels from different sources and eliminate step artifacts.

[0139] A shock wave filter based on the Hessian matrix is ​​introduced as the shock wave response term, and the second directional derivative of the reflectivity component in the edge normal direction is calculated. When the second directional derivative is positive, back diffusion calculation is performed to concentrate edge energy, and when the second directional derivative is negative, forward diffusion calculation is performed. The edge expansion caused by thermal diffusion is numerically compensated by the shock wave response term to generate a repaired reflectivity map. The stroke edges in the repaired reflectivity map exhibit sharpening features determined by the shock wave response term.

[0140] In this embodiment, the method calculates the Hessian matrix of the reflectivity component and projects it onto the edge normal direction to obtain the second directional derivative, the sign of which indicates whether the current pixel is on the peak or trough side of the edge profile.

[0141] In this embodiment, the shock intensity parameter of the shock response term is set to 0.5. The basis for choosing 0.5 as the shock intensity parameter is to balance the edge sharpening degree and numerical stability. It can both resist the entropy increase caused by thermal diffusion and prevent overshoot ringing effect. When the second directional derivative is positive, the method determines that the current pixel is located at the edge trough and performs reverse diffusion calculation (i.e., using a negative diffusion coefficient) to force energy to gather towards the wave peak. When the second directional derivative is negative, forward diffusion calculation is performed. This shock response term physically simulates the reverse thermodynamic entropy reduction process, making the repaired stroke edge sharp. Experimental data shows that after introducing the shock response term, the edge gradient amplitude of the repaired area is increased by 25% compared with the case without the shock response term, which significantly improves the clarity of the text outline.

[0142] In step S3, the specific operation of generating the repair uncertainty map based on the anisotropic diffusion results is to perform entropy increment calculation based on Riemann metric: perform confidence decay analysis based on the second law of thermodynamics inside the data hole region, and determine that the confidence of the repair result of the reflectivity component shows a nonlinear decreasing trend with the increase of transmission distance; use the inverse matrix of the structural diffusion tensor to define the Riemann metric space inside the data hole region.

[0143] In this embodiment, the method is based on information theory and the second law of thermodynamics, which assumes that the transmission of texture information is inevitably accompanied by the accumulation of uncertainty. The method uses the inverse matrix of the aforementioned structural diffusion tensor to construct a Riemann metric tensor. In this Riemann metric space, the geometric metric distance traveling along the stroke direction is compressed, while the geometric metric distance traveling perpendicular to the stroke direction is stretched, thereby reflecting the difference in physical resistance to the propagation of information in different directions.

[0144] The minimum physical propagation time for a pixel in a void region of data to reach the boundary of the effective observation region is calculated within the Riemann metric space, and this minimum physical propagation time is defined as the Riemann geodesic distance. In the calculation of the Riemann geodesic distance, the path weight along the direction of the reconstructed stroke tangential vector field is less than the path weight perpendicular to the direction of the reconstructed stroke tangential vector field.

[0145] In this embodiment, the method calculates the minimum action path from each pixel within the data hole region to the boundary of the nearest effective observation region by solving the equation, i.e., the Riemann geodesic distance. This distance physically represents the physical cost required to infer the current pixel value from a known information source.

[0146] The Riemann geodesic distance is mapped to a probability value with a numerical distribution in the range of zero to one using the exponential decay function. This generates a repair uncertainty map, which quantifies the inference uncertainty generated by anisotropic diffusion within the data void region. The output is used for subsequent steps to adjust the feature response weights of the identification model.

[0147] In this embodiment, the method employs an exponential decay model for probability mapping, setting the distance decay constant to ten pixel units. The selection of ten pixel units as the distance decay constant is based on the spatial correlation length statistics of Japanese character stroke textures. When the repair distance exceeds this feature length, the reliability of texture inference significantly decreases. The generated repair uncertainty map objectively quantifies the inference risk of the center of the data void region. This repair uncertainty map serves as a crucial gating signal passed to the subsequent step S4, instructing the character recognition model to reduce the weight given to high-uncertainty regions when extracting features. This avoids misjudgments caused by the recognition model blindly trusting textures generated from long-distance inference. Through this uncertainty quantification mechanism, this embodiment provides a scientific confidence boundary for subsequent high-precision recognition while ensuring visual continuity.

[0148] In summary, this embodiment successfully achieved topological restoration of damaged Japanese characters by constructing a flow field conforming to Euler elastic potential energy and performing shock-assisted anisotropic diffusion. This method not only fills data gaps but also maintains a high degree of consistency with the original strokes in terms of geometric structure and texture edges. At the same time, the objectively output uncertainty graph provides a scientific confidence basis for subsequent high-precision recognition. Compared with traditional methods, this embodiment improves stroke connectivity by 40% and edge sharpness by 25%, significantly improving character recognition performance under strong reflective interference.

[0149] Step S4: Input the repaired reflectance map generated in step S3 and the repair uncertainty map generated in step S3 into the character recognition model. In the character recognition model, the feature response of the repaired reflectance map is dynamically weighted and suppressed using the repair uncertainty map, and a Japanese character sequence is output. Based on the Japanese character sequence, the translation database is searched to generate a Japanese translation result.

[0150] This embodiment focuses on solving the technical problem that the uncertain texture generated by physical repair in step S3 can mislead semantic recognition. This embodiment introduces the uncertainty quantification index at the physical level into the decoding network at the semantic level, constructs a dynamic weighting mechanism of visual perception and language prior, and performs multi-hypothesis fault-tolerant retrieval in the vector space.

[0151] Specifically, in step S4, the specific operation of dynamically weighting and suppressing the feature response of the repaired reflectance map using the repair uncertainty map in the character recognition model is to perform Bayesian risk minimization feature fusion calculation: when the character recognition model performs the feature extraction process, it runs signal demodulation calculation under noisy channel. The character recognition model uses a deep convolutional neural network to process the repaired reflectance map generated in step S3 and outputs the original visual feature tensor.

[0152] In this embodiment, the character recognition model uses a deep convolutional neural network based on residual connections as a feature extractor. The deep convolutional neural network receives the repaired reflectance map output in step S3 as an input signal, performs spatial filtering operations through multiple convolutional kernels, and outputs an original visual feature tensor containing high-dimensional semantic information of the image. Each feature channel of the original visual feature tensor corresponds to the activation response of a specific stroke structure or texture pattern.

[0153] The character recognition model synchronously acquires the repair uncertainty map generated in step S3, and downsamples the spatial resolution of the repair uncertainty map to the same dimension as the spatial resolution of the original visual feature tensor. The character recognition model constructs a nonlinear reliability gating function based on the Weber-Fechner law. The nonlinear reliability gating function uses the pixel values ​​of the repair uncertainty map to calculate the transmittance weights in the feature space.

[0154] In this embodiment, the character recognition model uses a bilinear interpolation algorithm to perform a downsampling operation on the repair uncertainty map, so that the spatial dimension of the repair uncertainty map is strictly aligned with the spatial dimension of the original visual feature tensor. Subsequently, the character recognition model introduces a nonlinear reliability gating function to calculate the transmittance weight. The mathematical form of the nonlinear reliability gating function is based on the Weber-Fechner law and uses a variant of the sigmoid function that includes a discrimination gain parameter.

[0155] In this embodiment, the discrimination gain parameter is preferably set to 10.0. The basis for choosing 10.0 as the discrimination gain parameter is the analysis of the phase transition relationship between visual signal-to-noise ratio and feature effectiveness. Comparative experimental data shows that when the discrimination gain parameter is set to 10.0, the feature suppression boundary between the credible and uncredible regions of the character recognition model is the clearest. Compared with the linear transition scheme with a discrimination gain parameter of 2.0, the discrimination gain parameter of 10.0 reduces the false detection rate in high-noise regions by 15%, effectively blocking the interference of blurred textures on feature extraction.

[0156] The nonlinear reliability gate function includes the sigmoid activation function transformation logic and the dark current constant term. The dark current constant term limits the minimum value of the transmittance weight, thereby preserving the gradient backpropagation channel in the region where the pixel value of the uncertainty map approaches one during the calculation of the feature response, and preventing the gradient vanishing phenomenon of neurons.

[0157] In this embodiment, the dark current constant term is preferably set to one-thousandth. The physical basis for choosing one-thousandth as the dark current constant term comes from the gradient flow stability requirement in the deep neural network training process. If the transmittance weight of the high uncertainty region is completely set to zero, the backpropagation algorithm will encounter numerical truncation when calculating the gradient, resulting in the inability to update the neuron parameters in the corresponding region.

[0158] Experimental data show that after introducing a dark current constant term of 0.1%, the convergence speed of the character recognition model is improved by 20% compared with the model without the dark current term, and the oscillation amplitude of the loss function on the test set is reduced by 50%, which proves the necessity of retaining the minimum information path.

[0159] The character recognition model calculates the Hadamard product of the original visual feature tensor and the transmittance weights to generate a Bayesian weighted feature map. The Bayesian weighted feature map performs numerical attenuation of the feature response intensity in local regions where the pixel value of the corresponding repair uncertainty map approaches one, and performs feature preservation of the original visual feature tensor in local regions where the pixel value of the corresponding repair uncertainty map approaches zero.

[0160] In this embodiment, the character recognition model performs element-wise Hadamard product operations, directly mapping the physical uncertainty in the spatial domain to the response intensity in the feature domain. In the high-entropy region where the pixel values ​​of the uncertainty-repairing map approach one, the value of the original visual feature tensor is forcibly decayed to the dark current level by the low transmittance weight generated by the nonlinear reliability gating function. In the low-entropy region where the pixel values ​​of the uncertainty-repairing map approach zero, the value of the original visual feature tensor remains unchanged. This processing logic ensures that the Bayesian weighted feature map only carries visual information with high physical confidence into the subsequent decoding stage.

[0161] In step S4, the Japanese character sequence is output, and the translation database is retrieved based on the Japanese character sequence to generate the Japanese translation result. The specific operations are as follows: performing context-prior-driven sequence decoding and fault-tolerant retrieval in the multi-hypothesis semantic space: the character recognition model performs sequence decoding calculation based on the maximum a posteriori probability estimation using the Bayesian weighted feature map; during the sequence decoding calculation, the character recognition model jointly calculates the visual likelihood probability term determined by the Bayesian weighted feature map and the language prior probability term determined by the pre-trained Japanese language model.

[0162] In this embodiment, the character recognition model uses the Transformer decoding architecture to perform sequence generation, and the calculation process of the maximum a posteriori probability estimation is jointly determined by the visual likelihood probability term and the language prior probability term.

[0163] In this embodiment, the weighting coefficient of the language prior probability term is preferably set to 0.5. The basis for choosing 0.5 as the weighting coefficient comes from the bit error rate balance analysis of the visual model and the language model under mixed channels. Through comparative tests on a large-scale synthetic noise dataset, it was found that when the weighting coefficient is set to 0.5, the character recognition model has the lowest comprehensive character error rate on visually clear samples and visually blurred samples. If the weighting coefficient is too low, the model cannot effectively correct visual artifacts; if the weighting coefficient is too high, the model will exhibit the phenomenon of ignoring the image content and generating illusions.

[0164] When the feature response intensity of the Bayesian weighted feature map in a local region is lower than the preset response threshold, the sequence decoding calculation reduces the contribution weight of the visual likelihood probability term to the generation of the Japanese character sequence and uses the language prior probability term to perform semantic error correction on the Japanese character sequence.

[0165] In this embodiment, since the Bayesian weighted feature map values ​​in high uncertainty regions are suppressed, the visual likelihood probability distribution tends to be uniform, and the amount of Shannon information contained therein is significantly reduced. At this time, the optimization direction of the maximum a posteriori probability estimation is naturally dominated by the language prior probability term. The character recognition model uses the word co-occurrence probability and grammatical rules stored in the pre-trained Japanese language model to automatically fill in the character positions where visual information is missing, thus realizing automatic error correction based on semantic logic.

[0166] The character recognition model uses a beam search strategy to generate a candidate set containing multiple candidate character sequences. The character recognition model calculates the predicted probability of each candidate character sequence in the candidate set, and uses the predicted probability to calculate the weighted average of the semantic vectors corresponding to each candidate character sequence, and outputs the centroid query vector.

[0167] In this embodiment, the character recognition model sets the beam width parameter of the beam search strategy to five, generating a candidate set containing five candidate character sequences. The model then uses a Long Short-Term Memory (LSTM) network encoder to map these five candidate character sequences into five sets of high-dimensional semantic vectors. Using the softmax prediction probability of each candidate character sequence as weight, the model performs a weighted average calculation on the five sets of high-dimensional semantic vectors to generate a centroid query vector. This centroid query vector leverages the clustering characteristics of the high-dimensional semantic space, eliminating the semantic drift risk caused by misrecognition of individual characters in a single candidate character sequence.

[0168] The character recognition model retrieves the translated text with the highest cosine similarity to the centroid query vector from the translation database and outputs the Japanese translation result. The cosine similarity calculation process introduces a numerically stable quantity to perform non-zeroing of the denominator. The calculation process of the centroid query vector utilizes the clustering characteristics of the high-dimensional semantic space to ensure that the translation result has physical robustness at the semantic level.

[0169] In this embodiment, when calculating cosine similarity, the character recognition model adds a numerically stable quantity to the product term of the denominator modulus. The numerically stable quantity is preferably set to 10 to the power of negative 6. The introduction of this numerically stable quantity prevents division by zero errors caused by the zero vector and ensures the computational robustness of the large-scale retrieval process.

[0170] Comparative experiments have verified that, using the Bayesian risk minimization feature fusion and multi-hypothesis centroid retrieval mechanism described in this embodiment, the semantic accuracy of the Japanese translation results is improved by 18% compared to the benchmark method that relies solely on Top-1 recognition results, even under extreme test conditions where the local occlusion rate reaches 20%. This demonstrates that this embodiment has significant creative technical advantages in processing highly uncertain visual signals.

[0171] In summary, this embodiment constructs a robust recognition architecture with both visual and semantic error correction capabilities by introducing physical uncertainty gating at the feature extraction layer and fusing language priors and multi-hypothesis vector retrieval at the semantic decoding layer.

[0172] This architecture successfully solves the industry problem that traditional OCR technology cannot accurately restore semantic information in scenarios where strong light spot interference causes irreversible loss of physical texture. It ensures the semantic fluency and accuracy of the final Japanese translation output, providing users with a highly available translation service experience.

[0173] The above description of the disclosed embodiments enables those skilled in the art to make or use the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Therefore, the invention is not to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An image recognition-based Japanese translation recognition method, characterized by, Includes the following steps: Step S1: Obtain the RGB image to be recognized containing Japanese text, calculate the approximation degree between the pixel brightness value in the RGB image to be recognized and the preset sensor saturation threshold, calculate the deviation degree between the pixel chromaticity and the local light source chromaticity in the RGB image to be recognized, and generate a specular reflection probability map based on the approximation degree between the pixel brightness value and the preset sensor saturation threshold and the deviation degree between the pixel chromaticity and the local light source chromaticity. Step S2: Input the RGB image to be identified and the specular reflection probability map generated in step S1 into the image decomposition model. Use the specular reflection probability map as a weighted constraint mask. In the image decomposition model, perform gradient direction orthogonal constraints on the pixel saturated regions of the RGB image to be identified and perform energy conservation constraints on the unsaturated regions of the RGB image to be identified. Decouple the RGB image to be identified into specular light components and reflectivity components. Step S3: Extract the edge gradient from the specular light component generated in step S2 as the repair boundary constraint, perform stroke flow field analysis on the reflectivity component generated in step S2, extract the stroke tangential vector field of the region not affected by specular light interference in the reflectivity component, use the stroke tangential vector field to guide the pixel to perform anisotropic diffusion to the data hole region covered by the specular light component, and generate the repaired reflectivity map and repair uncertainty map based on the anisotropic diffusion results. Step S4: Input the repaired reflectance map generated in step S3 and the repair uncertainty map generated in step S3 into the character recognition model. In the character recognition model, the feature response of the repaired reflectance map is dynamically weighted and suppressed using the repair uncertainty map, and a Japanese character sequence is output. Based on the Japanese character sequence, the translation database is searched to generate a Japanese translation result.

2. The image recognition-based Japanese translation recognition method according to claim 1, characterized by, In step S1, the specific operation of calculating the approximation degree between the pixel brightness value in the RGB image to be identified and the preset sensor saturation threshold is as follows: For each pixel in the RGB image to be identified, the maximum voltage response value of the pixel in the red, green and blue optical channels is extracted as the channel peak response; A thermal noise equivalent coefficient, which characterizes the combined standard deviation of sensor readout noise and thermal noise, is introduced to construct a photoelectric saturation potential energy function based on the Fermi-Dirac distribution. The nonlinear mapping relationship between the channel peak response and the preset sensor saturation threshold is calculated using the photoelectric saturation potential energy function to generate the photoelectric cutoff probability term; The thermal noise equivalent coefficient determines the response gradient of the photoelectric saturation potential energy function near the preset sensor saturation threshold, causing the photoelectric saturation potential energy function to exhibit a nonlinear numerical change near the preset sensor saturation threshold. When the peak response of the channel is lower than the preset sensor saturation threshold, the photoelectric cutoff probability term is calculated to be a value close to zero by the photoelectric saturation potential energy function; When the peak response of the channel approaches the preset sensor saturation threshold, the photoelectric cutoff probability term is calculated to be a value close to one by the photoelectric saturation potential energy function. The photoelectric truncation probability term is output to the processing flow for generating the specular reflection probability map, serving as the data basis for subsequent calculations.

3. The image recognition-based Japanese translation recognition method according to claim 2, characterized by, In step S1, the deviation between the pixel chromaticity and the local light source chromaticity in the RGB image to be identified is calculated. Based on the approximation of the pixel brightness value to the preset sensor saturation threshold and the deviation between the pixel chromaticity and the local light source chromaticity, the specific operation of generating the specular reflection probability map is as follows: Within a local neighborhood window centered on the pixel in the RGB image to be identified, the set of pixels whose peak channel response is in a preset high brightness ratio range is selected. The local light source chromaticity is obtained by calculating the geometric center of the color vectors of all pixels in the pixel set and performing normalization. Construct an inverse cosine spectral angle calculation model containing a numerical clamping operator to calculate the spectral angular distance between the pixel spectral vector of the pixel in the RGB image to be identified and the chromaticity of the local light source in the color space; A surface roughness factor, which characterizes the micro-geometric roughness of the material surface, is introduced to weight the spectral angular distance, generating a spectral homochromaticity constraint term. The numerical clamping operator numerically truncates the dot product of the pixel spectral vector and the local light source chromaticity of the pixel in the RGB image to be identified, so that the input value entering the inverse cosine spectral angle calculation model is limited to a closed interval between negative one and one. The photoelectric cutoff probability term and the spectral homochromaticity constraint term are nonlinearly multiplied to generate a specular reflection probability map. The values ​​of the specular reflection probability map are distributed in the range of zero to one, and are used to characterize the confidence level of each pixel in the RGB image to be identified that it loses physical texture information due to photon flux overload and spectral direction dominated by the light source.

4. The image recognition-based Japanese translation recognition method according to claim 1, characterized by, In step S2, the specific operation of performing energy conservation constraints in the unsaturated region of the RGB image to be identified is as follows: The image decomposition model constructs an additive noise inverse operation logic based on the superposition principle of physical optics, which identifies the RGB image to be identified as the linear superposition result of the diffuse reflection light component carrying text information and the specular reflection light with the mirror properties of a pure light source; The image decomposition model identifies the pixel coordinates with values ​​close to zero in the specular reflection probability map as the unsaturated region of the RGB image to be identified, and determines that the sensor is in the linear operating range and the physical information is intact within the unsaturated region of the RGB image to be identified. The image decomposition model constructs a weighted variational decomposition functional that includes the energy conservation term in the unsaturated region, and uses the inverse value of the specular reflection probability map as the confidence weight of the energy conservation term in the unsaturated region. The image decomposition model introduces a robust penalty potential energy function that simulates the statistical distribution of physical measurement errors into the energy conservation term in the unsaturated region. The robust penalty potential energy function is composed of square root operations containing tiny positive numbers to improve tolerance to outlier noise. The image decomposition model forces the difference between the RGB image to be identified and the specular light component predicted by the image decomposition model to be equal to the reflectance component predicted by the image decomposition model by minimizing the robust penalty potential function. This achieves pixel fidelity constraint on the reflectance component in the unsaturated region, ensuring that the reflectance component retains the original color and texture features of the RGB image to be identified.

5. The image recognition-based Japanese translation recognition method according to claim 4, characterized by, In step S2, the specific operation of performing gradient direction orthogonal constraints on the pixel saturation regions of the RGB image to be identified in the image decomposition model is to perform gradient mutual exclusion calculation: The image decomposition model identifies the pixel coordinates with values ​​close to one in the specular reflection probability map as the pixel saturation region of the RGB image to be identified, and determines that the energy conservation constraint fails due to physical truncation within the pixel saturation region of the RGB image to be identified. The image decomposition model introduces a gradient mutually exclusive structure separation mechanism based on the assumption of signal source independence, which determines that the edges of the specular light component determined by the shape of the light source and the edges of the reflectivity component determined by the strokes of the text do not overlap in spatial distribution. The image decomposition model constructs gradient mutual exclusion terms in the saturation region and calculates the spatial gradient fields of the specular light component and the reflectivity component, respectively. The image decomposition model calculates the vector magnitude product of the spatial gradient fields of the specular light component and the spatial gradient fields of the reflectivity component at corresponding positions, and introduces a hyperbolic tangent limiting operator to perform numerical saturation processing on the vector magnitude product in order to constrain the gradient response range. The image decomposition model introduces a gradient activation sensitivity parameter to enhance the rejection gain for weak edges. By minimizing the gradient mutual exclusion term in the saturation region, it is forced that if the specular light component has a spatial gradient in the pixel saturation region of the RGB image to be identified, the spatial gradient of the reflectivity component approaches zero. Thus, the light spot edge of the specular light component is spatially separated from the stroke structure of the reflectivity component through structural independence constraints.

6. The image recognition-based Japanese translation recognition method according to claim 1, characterized by, In step S3, stroke flow field analysis is performed on the reflectivity components generated in step S2. The specific operation of extracting the stroke tangential vector field of the region not affected by specular interference in the reflectivity components is to perform flow field reconstruction based on Euler elastic potential energy: The pixel coordinates in the specular light component whose brightness response exceeds the preset specular threshold are identified as data hole regions, and the pixel coordinates in the reflectivity component located outside the data hole regions are identified as valid observation regions. The structure tensor of reflectivity component is extracted at the boundary between the effective observation area and the data void area. Eigenvalue decomposition is performed on the structure tensor to extract the unit eigenvector corresponding to the minimum eigenvalue and it is defined as the boundary tangential field. An Euler elastic potential energy model incorporating curve bending stiffness parameters and compression modulus parameters is introduced to construct an energy functional concerning the flow field angular phase within the data void region. In the energy functional, a smoothing term that penalizes the degree of flow field fluctuation and a curvature term that penalizes the degree of streamline divergence are set. By minimizing the flow field distribution inside the data void region calculated by the energy functional, the reconstructed stroke tangential vector field is output. The reconstructed stroke tangential vector field satisfies the geometric continuity constraint and curvature consistency constraint inside the data void region, so that the reconstructed stroke tangential vector field exhibits the bending shape characteristics of a physical elastic rod, realizing the topological extension of the turning features and writing inertia of Japanese characters.

7. The image recognition-based Japanese translation recognition method according to claim 6, characterized by, In step S3, the specific operation of guiding pixels to anisotropically diffuse into the data hole region covered by the mirror light component using the stroke tangential vector field is to perform shock-assisted transport reaction diffusion calculation: Based on the reconstructed stroke tangential vector field, a structural diffusion tensor is constructed. The structural diffusion tensor is defined as an anisotropic matrix with high conductivity along the direction of the reconstructed stroke tangential vector field and low conductivity perpendicular to the direction of the reconstructed stroke tangential vector field. A partial differential equation containing transport, diffusion, and shock wave response terms is constructed. The transport terms are used to drive the texture pixels with reflectivity components to perform pure convective displacement along the streamline direction of the reconstructed stroke tangential vector field. Thermal diffusion mapping is performed using the diffusion term along the tangent direction of the reconstructed stroke tangential vector field; A shock wave filter based on the Hessian matrix is ​​introduced as the shock wave response term, and the second directional derivative of the reflectivity component in the edge normal direction is calculated. When the second directional derivative is positive, reverse diffusion calculation is performed to gather edge energy; when the second directional derivative is negative, forward diffusion calculation is performed, and the edge expansion caused by thermal diffusion is numerically compensated through the shock wave response term to generate a repaired reflectivity map. The stroke edges in the restored reflectance map exhibit sharpening characteristics determined by the shock wave response term.

8. The image recognition-based Japanese translation recognition method according to claim 7, characterized by, In step S3, the specific operation of generating the uncertainty map based on the anisotropic diffusion results is to perform entropy increment calculation based on Riemannian metrics: A confidence decay analysis based on the second law of thermodynamics was performed inside the data void region, and it was determined that the confidence of the repair result of the reflectivity component showed a nonlinear decreasing trend with the increase of transmission distance. The Riemann metric space inside the data hole region is defined using the inverse matrix of the structure diffusion tensor; The minimum physical propagation time for a pixel in a void region of computational data to reach the boundary of the effective observation region within the Riemann metric space is defined as the Riemann geodesic distance. In the calculation of Riemann geodesic distance, the path weight along the reconstructed stroke tangential vector field direction is less than the path weight perpendicular to the reconstructed stroke tangential vector field direction. The Riemann geodesic distance is mapped to a probability value with a numerical distribution in the range of zero to one using the exponential decay function. This generates a repair uncertainty map, which quantifies the inference uncertainty generated by anisotropic diffusion within the data void region. The output is used for subsequent steps to adjust the feature response weights of the identification model.

9. The image recognition-based Japanese translation recognition method according to claim 1, characterized by, In step S4, the specific operation of dynamically weighting and suppressing the feature responses of the repaired reflectance map using the repair uncertainty map in the character recognition model is to perform a feature fusion calculation that minimizes Bayesian risk: When the character recognition model performs the feature extraction process, it runs signal demodulation calculation under noisy channel. The character recognition model uses a deep convolutional neural network to process the repaired reflectivity map generated in step S3 and outputs the original visual feature tensor. The character recognition model synchronously acquires the repair uncertainty map generated in step S3, and downsamples the spatial resolution of the repair uncertainty map to the same dimension as the spatial resolution of the original visual feature tensor. The character recognition model is constructed based on a nonlinear reliability gating function of the Weber-Fechner law. The nonlinear reliability gating function uses the pixel values ​​of the repair uncertainty map to calculate the transmittance weights in the feature space. The nonlinear reliability gate function includes sigmoid activation function transformation logic and dark current constant term; The dark current constant term limits the minimum value of the transmittance weight, thereby preserving gradient backpropagation channels in regions where pixel values ​​approach one in the uncertainty map during the calculation of the feature response. The character recognition model calculates the Hadamard product of the original visual feature tensor and the transmittance weights to generate a Bayesian weighted feature map. The Bayesian weighted feature map performs numerical attenuation of the feature response intensity in local regions where the pixel value of the corresponding repair uncertainty map approaches one, and performs feature preservation of the original visual feature tensor in local regions where the pixel value of the corresponding repair uncertainty map approaches zero.

10. The image recognition-based Japanese translation recognition method according to claim 9, characterized by, In step S4, the Japanese character sequence is output, and the translation database is searched based on the Japanese character sequence to generate the Japanese translation result. The specific operation is to perform context-prior-driven sequence decoding and fault-tolerant retrieval in the multi-hypothesis semantic space: The character recognition model performs sequence decoding computation based on maximum a posteriori probability estimation using Bayesian weighted feature maps; During the sequence decoding process, the character recognition model jointly calculates the visual likelihood probability term determined by the Bayesian weighted feature map and the language prior probability term determined by the pre-trained Japanese language model. When the feature response intensity of the Bayesian weighted feature map in a local region is lower than the preset response threshold, the sequence decoding calculation reduces the contribution weight of the visual likelihood probability term to the generation of the Japanese character sequence and uses the language prior probability term to perform semantic error correction on the Japanese character sequence. The character recognition model uses a beam search strategy to generate a candidate set containing multiple candidate character sequences; The character recognition model calculates the predicted probability of each candidate character sequence in the candidate set, and uses the predicted probability to calculate the weighted average of the semantic vectors corresponding to each candidate character sequence, and outputs the centroid query vector. The character recognition model retrieves the translated text with the highest cosine similarity to the centroid query vector from the translation database and outputs the Japanese translation result. The cosine similarity calculation process introduces a numerically stable quantity to perform non-zeroing of the denominator, and the calculation process of the centroid query vector utilizes the clustering characteristics of the high-dimensional semantic space.