A multimodal file financial document attachment information extraction and recognition method

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By combining image preprocessing and multi-dimensional degradation analysis with grayscale histogram and edge detection algorithms, the problem of text sticking or breaking after fading in thermally printed financial documents has been solved, achieving efficient text information extraction and recognition, and improving the accuracy and reliability of document information.

CN122244874APending Publication Date: 2026-06-19GUANGDONG POWER GRID CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: GUANGDONG POWER GRID CO LTD
Filing Date: 2026-03-30
Publication Date: 2026-06-19

Application Information

Patent Timeline

30 Mar 2026

Application

19 Jun 2026

Publication

CN122244874A

IPC: G06V30/148; G06V30/146; G06V30/19; G06V30/164; G06V30/18; G06V10/82; G06V20/62; G06N3/0464

AI Tagging

Application Domain

Character and pattern recognition Biological models

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing technologies are insufficient to effectively address recognition errors caused by variations in stroke width and character spacing in thermally printed financial documents, especially when text sticks together or breaks after fading, leading to recognition mistakes.

Method used

Noise is removed through image preprocessing, and the degree of coating fading is quantified by combining grayscale histogram analysis. Edge detection and distance transformation algorithms are used to analyze the distribution of character stroke width and spacing uniformity. Grayscale attenuation and adhesion risk level are fused to separate character structure. Broken parts are completed by connecting components. Finally, the OCR path is adaptively switched to extract text.

Benefits of technology

It significantly improves the accuracy and reliability of information extraction from thermally printed financial documents at different degradation stages, provides multi-dimensional degradation analysis and adaptive recognition strategies, and supports the digital management of financial documents.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122244874A_ABST

Patent Text Reader

Abstract

This application provides a method for extracting and recognizing attachment information of multimodal financial documents, including: acquiring the original image data of thermally printed financial documents through a scanning device, removing noise interference using an image preprocessing algorithm to obtain a denoised document image; extracting the thermal coating area from the denoised document image, calculating the overall gray-level attenuation of the coating using a gray-level histogram analysis method to determine the fading level; extracting the pixel interval distance between adjacent characters from width distribution statistics and character regions, quantifying the interval uniformity using a distance transformation algorithm, and assessing the potential adhesion risk level; extracting stroke connectivity components from the separated character structure, identifying broken stroke defects, and completing the broken parts through connectivity components to obtain a complete character form; and adaptively switching the OCR path from the complete character form according to the adhesion degradation stage, and extracting the final text using an optical character recognition engine.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of information technology, and in particular to a method for extracting and recognizing information from attachments to multimodal financial documents. Background Technology

[0002] In the field of financial document processing, accurate identification and extraction of attachment information is particularly important, especially in scenarios involving a large number of transaction vouchers. This technology directly relates to the integrity and reliability of financial data. Whether it's a bank slip or a store receipt, these documents often contain crucial transaction records. Errors in identification can lead to accounting discrepancies and even financial losses. However, some existing methods often struggle to handle complex real-world situations when dealing with specific types of documents. Many solutions focus more on the surface features of the document content, ignoring the physical changes that occur over long-term storage or due to environmental influences. These changes are not simply a decrease in clarity, but involve deep degradation of the text's form, leading to various unpredictable errors during identification. In documents with wider strokes, theoretically, the text can retain some recognizable structure even after fading. However, if the character spacing is small, the strokes of adjacent characters may merge due to blurred edges, becoming a single unit and completely altering the original text form. Conversely, when the stroke width is small, although the merging problem is reduced, the text is prone to breakage after fading, with some strokes disappearing, resulting in incomplete information. These two factors intertwine, making it difficult for recognition systems to find a fixed rule to handle all situations. Specifically, this contradiction is particularly evident when processing thermally printed receipts. For example, on a supermarket receipt, after a period of storage, the text on the thermal paper may become blurred due to fading. If the strokes of the amount numerals on the receipt are thick and the spacing is narrow, the numbers "3" and "8" may stick together due to edge diffusion, being misinterpreted as an irregular shape, leading to incorrect amount reading. This recognition bias caused by the interaction of stroke width and character spacing becomes a major obstacle in the practical application of recognition systems. Therefore, designing adaptive recognition strategies to address the dynamic changes in stroke width and character spacing at different stages of degradation is a key issue in improving the accuracy of financial document information extraction. Summary of the Invention

[0003] This invention provides a method for extracting and recognizing information from attachments to multimodal financial documents, including:

[0004] The original image data of thermally printed financial documents is collected by scanning equipment, and image preprocessing algorithms are used to remove noise interference to obtain the denoised document image.

[0005] The thermal coating area is extracted from the denoised document image, and the overall gray-level attenuation of the coating is calculated using the gray-level histogram analysis method to determine the degree of fading.

[0006] Based on the fading level, character regions are identified and stroke outlines are segmented from the denoised document image. An edge detection algorithm is used to obtain the residual effective width of each character stroke, and width distribution statistics are obtained.

[0007] The pixel spacing distance between adjacent characters is extracted from the width distribution statistics and the character region. The distance transformation algorithm is used to quantify the spacing uniformity and assess the potential adhesion risk level.

[0008] By integrating the overall grayscale attenuation of the coating, the potential adhesion risk level, and the pixel spacing between adjacent characters, the adhesion-type degradation stage is determined, and the separated character structure is obtained based on the adhesion-type degradation stage.

[0009] Stroke connectivity components are extracted from the separated character structure, broken and degraded stroke defects are identified, and the broken parts are filled in by the connectivity components to obtain the complete character form;

[0010] Based on the adhesion-type degradation stage, the OCR path is adaptively switched from the complete character form, and the final text is extracted using an optical character recognition engine.

[0011] Furthermore, the process of acquiring the original image data of the thermally printed financial document through a scanning device, and using an image preprocessing algorithm to remove noise interference to obtain a denoised document image includes:

[0012] Based on the color channel information of the original image, the original image is converted into a single-channel grayscale image;

[0013] The grayscale image is processed using a median filtering algorithm. The median of the neighboring pixels at each pixel position in the grayscale image is taken as the new grayscale value of that pixel. Noise caused by scratches and dust particles on the surface of thermal paper is removed to obtain a denoised document image.

[0014] Furthermore, the step of extracting the thermal coating area from the denoised document image, calculating the overall grayscale attenuation of the coating using a grayscale histogram analysis method, and determining the fading level includes:

[0015] From the denoised document image, based on the feature that the gray value of the thermal coating area is higher than that of the unprinted background area, the thermal coating area is identified and extracted using a threshold segmentation method. The gray values of all pixels in the thermal coating area are obtained, the number of pixels corresponding to each gray level is counted, and a gray histogram is generated.

[0016] Locate the gray level position corresponding to the maximum number of pixels in the gray level histogram, take this gray level position as the current gray level peak of the coating, compare the current gray level peak of the coating with the standard gray level reference value, calculate the difference between the current gray level peak of the coating and the standard gray level reference value, and obtain the overall gray level attenuation of the coating.

[0017] The overall grayscale attenuation of the coating is matched with multiple attenuation threshold ranges. If the overall grayscale attenuation of the coating falls into a set attenuation threshold range, the fading level corresponding to that range is determined.

[0018] Furthermore, the step of identifying character regions and segmenting stroke outlines from the denoised document image based on the fading level, obtaining the residual effective width of each character stroke using an edge detection algorithm, and obtaining width distribution statistics includes:

[0019] According to the fading level, a corresponding binarization threshold is set for the denoised document image. The denoised document image is converted into a binary image through the binarization threshold. The connected regions of foreground pixels are identified from the binary image, and each connected region is used as a character region.

[0020] For the character region, the Sobel edge detection algorithm is used to perform gradient convolution operations in the horizontal and vertical directions on the pixels within the character region, and the gradient magnitude G is calculated, where... Gx is the horizontal gradient, and Gy is the vertical gradient;

[0021] A preset gradient threshold T is used to mark the boundary along the pixel positions where the gradient magnitude exceeds T, thus obtaining the stroke outline of each character.

[0022] The outer pixels of the stroke outline are peeled off layer by layer through a thinning process until the center line with a single pixel width is retained, thus obtaining the stroke skeleton. The vertical distance from each skeleton point of the stroke skeleton to the boundary of the stroke outline is measured to both sides, and the sum of the vertical distances on both sides is taken as the residual effective width at the skeleton point.

[0023] The residual effective widths of all skeleton points within the character region are summarized, and the frequency of each width value is counted to obtain width distribution statistics.

[0024] Furthermore, the step of extracting the pixel spacing distance between adjacent characters from the width distribution statistics and the character region, quantifying the spacing uniformity using a distance transformation algorithm, and assessing the potential adhesion risk level includes:

[0025] From the character region, two adjacent character regions are located sequentially according to the character arrangement order to form adjacent character pairs. The background pixel region between each pair of adjacent character pairs is obtained, and the pixel span of the background pixel region in the horizontal direction is measured to obtain the pixel interval distance between adjacent characters.

[0026] A distance transformation algorithm is used for the background pixel region. The distance value from each background pixel to the nearest character outline is marked layer by layer from the boundary of the background pixel region inward. The maximum and minimum distance values in the background pixel region are extracted, and the difference between the two is calculated as the interval uniformity index.

[0027] The most frequently occurring width value is extracted from the width distribution statistics as the stroke width mode value. The stroke width mode value is compared with the pixel spacing distance. The potential adhesion risk level is determined based on the comparison result of the stroke width mode value and the pixel spacing distance and the spacing uniformity index.

[0028] The potential adhesion risk levels include high risk, medium risk and low risk. High risk corresponds to the case where the modulus of the stroke width exceeds the preset ratio threshold of the pixel spacing distance and the spacing uniformity index is lower than the preset uniformity threshold. Medium risk corresponds to the case where only one of the conditions is met. Low risk corresponds to the case where neither of the two conditions is met.

[0029] Furthermore, by fusing the overall grayscale attenuation of the coating, the potential adhesion risk level, and the pixel spacing distance between adjacent characters, an adhesion-type degradation stage is determined. Based on the adhesion-type degradation stage, the separated character structure is analyzed, including:

[0030] Based on a comprehensive determination of the overall grayscale attenuation of the coating, the potential adhesion risk level, and the pixel spacing distance between adjacent characters, if the overall grayscale attenuation of the coating is lower than a preset attenuation threshold and the potential adhesion risk level is low risk, then the adhesion-type degradation stage is determined to be mild adhesion.

[0031] If the potential adhesion risk level is medium risk, then the adhesion-type degradation stage is determined to be moderate adhesion.

[0032] If the potential adhesion risk level is high risk or the overall grayscale attenuation of the coating exceeds the preset attenuation threshold, then the adhesion-type degradation stage is determined to be severe adhesion.

[0033] For the character regions in the adhesion-type degradation stage that are slightly adhered, the gray value changes are scanned along the background pixel region between adjacent characters, and the pixel positions where the gray value jumps from low to high are located as adhesion boundaries. Pixels are cut along the adhesion boundaries to obtain the character regions after boundary separation.

[0034] For character regions in the adhesion-type degradation stage that are moderately or severely adhered, a watershed segmentation algorithm is used to process the adhesion region. The pixel with the local minimum gray value in the adhesion region is marked as a seed point. The extension is made layer by layer from the seed point to the surrounding pixels until the adjacent extension regions meet to form a segmentation line. The adhesion character regions are separated along the segmentation line to obtain the character regions after the segmentation line is separated.

[0035] By summing the character regions separated by the boundary and the character regions separated by the dividing line, the pixel set and contour boundary of each independent character are extracted to obtain the separated character structure.

[0036] Furthermore, the step of extracting stroke connectivity components from the separated character structure, identifying fragmented and degenerate stroke defects, and completing the fragmented parts through connectivity components to obtain a complete character form includes:

[0037] From the separated character structure, the pixel set of each independent character is marked for connectivity. Pixels that are connected to each other are grouped into the same connected component, and pixels that are not connected to each other are grouped into different connected components, so as to obtain the set of stroke connected components inside each character.

[0038] For characters containing multiple connected components in the stroke connected component set, extract the pixels on the edge of each connected component that are connected to only a single adjacent pixel as endpoint pixels, calculate the Euclidean distance d and the angle θ between the stroke tangent directions of all different connected component endpoint pairs, sort the endpoint pairs in ascending order of d, and check them one by one. If d is less than a preset pixel threshold and θ is less than a preset angle threshold, it is determined that there are stroke defects between the connected components, and they are connected.

[0039] For the two connected components with missing strokes, a straight line segment is drawn at the midpoint of the tangent direction of the two endpoint pixels as the starting and ending points, and the pixel position covered by the connection path is filled with the foreground pixel value, so as to connect the two originally disconnected connected components into one connected component, until there are no more endpoint pairs that meet the conditions in the character.

[0040] After completing the break-and-repair process for all connected components within the character, the repaired connected components are merged into a unified set of pixels to obtain the complete character form.

[0041] The complete character form includes a set of completed stroke connected components, wherein the pixel set and outline boundary of each character are connected by connected components.

[0042] Furthermore, the step of adaptively switching the OCR path from the complete character form based on the adhesive degradation stage, and extracting the final text using an optical character recognition engine, includes:

[0043] Based on the adhesion-type degradation stage, the corresponding identification parameter configuration is queried from the identification path mapping table to obtain the primary identification parameter configuration that matches the current degradation stage;

[0044] The optical character recognition is initialized using the primary selection recognition parameter configuration. Each character pixel set in the complete character form is compared with the pre-stored character template library one by one. The pixel overlap between the character pixel set and each character template is calculated. The character corresponding to the character template with the highest pixel overlap is selected as the candidate recognition result. The pixel overlap is used as the confidence value of the candidate recognition result.

[0045] For characters in the candidate recognition results whose confidence values are lower than a preset confidence threshold, the alternative recognition parameter configuration corresponding to the current degradation stage is obtained from the recognition path mapping table for secondary recognition. The confidence value of the secondary recognition result is compared with the confidence value of the original recognition result, and the target confidence value is selected as the final recognition result of the character. The final recognition results of all characters are summarized to obtain the final text.

[0046] The technical solutions provided by the embodiments of the present invention may include the following beneficial effects:

[0047] This invention discloses a method for extracting and recognizing information from attachments to multimodal financial documents. It proposes a systematic solution to address the unique business scenario where information extraction from thermally printed financial documents is difficult due to degradation issues such as fading, adhesion, and breakage. The invention removes noise through image preprocessing and quantifies the degree of coating fading using grayscale histogram analysis to accurately assess the degradation level. Based on this, it analyzes the distribution and spacing uniformity of character stroke widths using edge detection and distance transform algorithms, integrates grayscale attenuation and adhesion risk levels to determine the adhesion-type degradation stage and separate the character structure. Simultaneously, for breakage-type degradation, it restores complete character forms by completing missing strokes using connected components. Finally, it adaptively switches the OCR path based on the degradation stage to efficiently extract text information. The core innovation of this invention lies in the combination of multi-dimensional degradation analysis and adaptive recognition strategies, significantly improving the accuracy and reliability of information extraction from degraded documents, providing crucial technical support for the digital management of financial documents. Attached Figure Description

[0048] Figure 1 This is a flowchart of a method for extracting and recognizing attachment information of multimodal financial documents according to the present invention.

[0049] Figure 2 This is a schematic diagram of a method for extracting and recognizing attachment information of multimodal financial documents according to the present invention.

[0050] Figure 3This is another schematic diagram of a method for extracting and recognizing attachment information of multimodal financial documents according to the present invention. Detailed Implementation

[0051] To enable those skilled in the art to better understand the technical solutions in this specification, the technical solutions in the embodiments of this specification will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this specification, and not all embodiments. Based on the embodiments in this specification, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of this specification.

[0052] like Figures 1-3 This embodiment of a method for extracting and recognizing information from attachments to multimodal financial documents may specifically include:

[0053] S101. The original image data of the thermally printed financial documents is collected by the scanning device, and the image preprocessing algorithm is used to remove noise interference to obtain the denoised document image.

[0054] Based on the color channel information of the original image, the original image is converted into a single-channel grayscale image. A median filtering algorithm is then used to process the grayscale image, taking the median of its neighboring pixels as the new grayscale value for each pixel. This process removes noise caused by scratches and dust particles on the thermal paper surface, resulting in a denoised document image.

[0055] In one implementation, the scanning device uses a flatbed optical scanner to capture thermally printed financial documents. After the scanner's light source illuminates the document surface, the photosensitive element receives the reflected light and converts it into a digital signal, forming an original image containing three color channels: red, green, and blue. Since the text information on thermal documents is mainly manifested as grayscale differences, the three-channel pixel values of the original image are combined using a weighted average method, specifically calculated as Gray = 0.299R + 0.587G + 0.114B, where R, G, and B are the red, green, and blue channel values of the pixel, respectively, resulting in a single-channel grayscale image.

[0056] Specifically, the median filtering algorithm is performed pixel-by-pixel in a sliding window manner on the grayscale image. The window covers the current pixel and its surrounding neighboring pixels. The grayscale values of all pixels within the window are arranged in ascending order, and the grayscale value at the middle position after arrangement replaces the original grayscale value of the current pixel. Minor scratches and adhering dust particles on the surface of thermal paper caused by the storage environment appear as isolated abnormal bright spots or dark spots in the grayscale image. Median filtering replaces these isolated abnormal points with the grayscale level of surrounding normal pixels by taking the median of the neighborhood, thereby eliminating noise interference.

[0057] For example, after the above filtering process, the noise points caused by scratches and dust in the grayscale image are effectively suppressed, resulting in a denoised document image. The denoised document image retains the grayscale distribution characteristics of the thermally printed text.

[0058] S102. Extract the thermal coating area from the denoised document image, and use the grayscale histogram analysis method to calculate the overall grayscale attenuation of the coating and determine the degree of fading.

[0059] From the denoised document image, based on the characteristic that the grayscale value of the thermal coating area is higher than that of the unprinted background area, a threshold segmentation method is used to identify and extract the thermal coating area. The grayscale values of all pixels within the thermal coating area are obtained, and the number of pixels corresponding to each grayscale level is counted to generate a grayscale histogram. The grayscale level position corresponding to the maximum number of pixels in the grayscale histogram is located, and this grayscale level position is taken as the current grayscale peak value of the coating. The current grayscale peak value of the coating is compared with a preset standard grayscale reference value, which is the typical grayscale peak value of an unfaded thermal coating. The difference between the current grayscale peak value of the coating and the standard grayscale reference value is calculated to obtain the overall grayscale attenuation of the coating. The overall grayscale attenuation of the coating is matched with multiple preset attenuation threshold intervals. These attenuation threshold intervals are divided into several continuous intervals according to the magnitude of the attenuation. If the overall grayscale attenuation of the coating falls into a certain attenuation threshold interval, the fading level corresponding to that interval is determined.

[0060] In one embodiment, the extraction of the thermal coating area is based on the identification of gray value differences. The printed area of the thermal paper presents a darker gray due to the color development of the thermal coating, while the unprinted background area maintains a lighter gray. By setting a gray value threshold, the two are separated, thereby extracting the thermal coating area containing text information.

[0061] Specifically, the grayscale histogram generation process involves statistically analyzing the grayscale value of each pixel within the thermal coating area. The grayscale value typically ranges from 0 to 255. During the analysis, all pixels within the coating area are traversed, and the number of pixels at each grayscale level is recorded, forming a histogram with grayscale level as the horizontal axis and pixel count as the vertical axis. In this grayscale histogram, the grayscale level with the highest pixel count represents the current grayscale peak of the coating, reflecting the dominant grayscale level in the thermal coating area. When the thermal document is in an unfaded state, the coating's grayscale peak is lower, corresponding to a darker color; as the fading deepens, the coating's grayscale peak gradually increases, shifting towards lighter colors.

[0062] It should be noted that the standard grayscale reference value is a typical grayscale peak value of the coating area of the same type of thermal document that has not faded, which is collected in advance and used as a reference standard to measure the degree of fading.

[0063] In one embodiment, the current grayscale peak of the coating is the highest grayscale value obtained by performing grayscale analysis on the thermal document image. This value is subtracted from the standard grayscale reference value, and the difference is the overall grayscale attenuation of the coating. The larger the attenuation value, the more severe the fading. Furthermore, the fading level is determined using an interval matching method. A preset attenuation threshold interval divides the attenuation amount into several continuous ranges. For example, intervals with smaller attenuations correspond to a mild fading level, intervals with medium attenuations correspond to a moderate fading level, and intervals with larger attenuations correspond to a severe fading level. Based on the specific interval in which the attenuation amount falls, the current fading level of the document is determined. This level information reflects the degradation state of the thermal document.

[0064] S103. Based on the degree of fading, identify character regions and segment stroke outlines from the denoised document image. Use an edge detection algorithm to obtain the residual effective width of each character stroke and obtain width distribution statistics.

[0065] A corresponding binarization threshold is set for the denoised document image based on the fading level. If the fading level is mild, a first binarization threshold is used; if the fading level is severe, a second binarization threshold is used. The denoised document image is converted into a binary image using the binarization threshold. Connected regions of foreground pixels are identified from the binary image, and each connected region is used as a character region. For each character region, the Sobel edge detection algorithm is used to perform horizontal and vertical gradient convolution operations on the pixels within the character region, calculating the gradient magnitude G. Gx represents the horizontal gradient, and Gy represents the vertical gradient. A preset gradient threshold T is set to 0.5 times the average gradient magnitude of all gradient values within the character region. Boundaries are marked along pixels where the gradient magnitude exceeds T, yielding the stroke outline of each character. From the stroke outline, pixels on the outer edges are peeled away layer by layer until a centerline of single-pixel width is retained, resulting in the stroke skeleton. The vertical distance from each skeleton point to the boundary of the stroke outline is measured in both directions, and the sum of these vertical distances is taken as the residual effective width at that skeleton point. The residual effective widths of all skeleton points within the character region are summarized, and the frequency of each width value is calculated to obtain width distribution statistics.

[0066] In one implementation, the correspondence between the fading level and the binarization threshold is established based on the fading characteristics of the thermal document. In a lightly fading state, the grayscale contrast between the text area and the background area remains high, allowing for accurate separation of the foreground and background using a higher binarization threshold. However, in a heavily fading state, the text grayscale has significantly lightened and is close to the background grayscale; using a higher threshold would misclassify some text pixels as background, while a lower binarization threshold preserves more text pixel information.

[0067] Specifically, the adaptive binarization process involves querying a pre-established threshold mapping table based on the fading level. This table records the binarization threshold corresponding to each fading level. When the fading level is mild, a higher threshold is obtained from the mapping table; when the fading level is moderate, a medium threshold is obtained; and when the fading level is severe, a lower threshold is obtained. After obtaining the threshold, the grayscale value of each pixel in the denoised document image is iterated. If the pixel grayscale value is lower than the binarization threshold, the pixel is marked as a foreground pixel and assigned the value black; if the pixel grayscale value is higher than or equal to the binarization threshold, the pixel is marked as a background pixel and assigned the value white, thus obtaining a binary image.

[0068] It should be noted that character region recognition adopts a connected component labeling method. In the binary image, adjacent foreground pixels form connected components, and each independent connected component corresponds to a character or a component of a character. These connected components are extracted and labeled as character regions. Further, the Sobel edge detection algorithm performs gradient calculations on the pixels within the character regions. The Sobel algorithm uses two convolutional kernels to calculate the horizontal and vertical gradients respectively. The horizontal convolutional kernel is used to detect vertical edges, and the vertical convolutional kernel is used to detect horizontal edges. During the calculation, a rectangular window is formed by the surrounding adjacent pixels, centered on each pixel within the character region. The grayscale value of the pixels within the window is multiplied element-wise by the horizontal convolutional kernel and then summed to obtain the horizontal gradient value of that pixel. Similarly, the grayscale value of the pixels within the window is multiplied element-wise by the vertical convolutional kernel and then summed to obtain the vertical gradient value of that pixel. The square root of the sum of the squares of the horizontal and vertical gradient values is used to obtain the gradient magnitude of the pixel, which reflects the degree of grayscale change at that pixel location.

[0069] For example, the gradient threshold is set based on the overall gradient distribution characteristics of the character region. Pixels with gradient magnitudes exceeding the preset gradient threshold are marked as boundary pixels, and the set of these boundary pixels constitutes the stroke outline of the character.

[0070] In one possible implementation, the stroke skeleton extraction employs a thinning process. The principle of thinning is to peel away the outer pixels of the stroke outline layer by layer while maintaining stroke connectivity. Specifically, the process involves: traversing the foreground pixels within the stroke outline, checking each foreground pixel to determine if it is a boundary pixel, and verifying that the number of connected components C in its 8-neighborhood foreground pixels is 1 and the Euler number E remains unchanged to ensure that deletion does not cause stroke breakage or topological changes. Here, C refers to the number of connected components, and E refers to the Euler number. If the conditions are met, the pixel is marked as a pixel to be deleted. After one round of traversal, all pixels to be deleted are removed from the foreground, resulting in the shrunken stroke region. This traversal and deletion process is repeated until the stroke region can no longer shrink. The remaining single-pixel-width foreground pixel sequence constitutes the stroke skeleton. The stroke skeleton preserves the stroke's topological structure and direction information.

[0071] Understandably, the measurement of the residual effective width is performed point-by-point along the stroke skeleton. For each skeleton point on the stroke skeleton, the local direction of the stroke at that point is determined. Extending to both sides in a direction perpendicular to the local direction, the distance from the skeleton point to the stroke outline boundary is measured respectively. The residual effective width at that skeleton point is obtained by adding the distances on both sides. When fading causes the stroke edges to blur or disappear, the stroke outline shrinks inward, and the residual effective width decreases accordingly.

[0072] In one embodiment, the width distribution statistics summarize the residual effective width of all skeleton points within the character area, count the frequency of each width value, and form a distribution chart with the width value on the horizontal axis and the frequency on the vertical axis. The width distribution statistics reflect the overall thickness of the character strokes in the document. A concentration of width values in a larger range indicates that the strokes are well preserved, while a concentration of width values in a smaller range or a scattered distribution indicates that the strokes have degraded to varying degrees.

[0073] S104. Extract the pixel spacing distance between adjacent characters from the width distribution statistics and character region, use the distance transformation algorithm to quantify the spacing uniformity, and assess the potential adhesion risk level.

[0074] Adjacent character pairs are formed by sequentially locating two adjacent character regions in the character region according to their arrangement. The background pixel region between each pair is obtained, and the horizontal pixel span of the background pixel region is measured to obtain the pixel interval distance between adjacent characters. A distance transformation algorithm is applied to the background pixel region, marking the distance value from each background pixel to the nearest character outline layer by layer from the boundary of the background pixel region inwards. The maximum and minimum distance values within the background pixel region are extracted, and the difference between them is calculated as an interval uniformity index. The smaller the interval uniformity index value, the more uniform the interval distribution. The most frequently occurring width value is extracted from the width distribution statistics as the stroke width mode. The stroke width mode is compared with the pixel interval distance. If the stroke width mode exceeds a preset proportional threshold of the pixel interval distance, and the interval uniformity index is lower than the preset uniformity threshold, the potential adhesion risk level is determined to be high risk; if only one condition is met, it is determined to be medium risk; if neither condition is met, it is determined to be low risk.

[0075] In one implementation, adjacent character pairs are located based on the horizontal arrangement of the character regions in the document image, pairing adjacent character regions sequentially from left to right. Each pair of adjacent character pairs contains a background pixel region that does not contain character pixels; the horizontal pixel span of this background pixel region is the pixel interval distance between adjacent characters.

[0076] Specifically, the distance transformation algorithm involves pixel-by-pixel distance marking within the background pixel region. Starting from the boundary of the background pixel region, background pixels located at the boundary and immediately adjacent to the character outline are marked with a distance value of one. This process then progresses inwards into the background region, increasing the distance value by one for each pixel layer, until all pixels within the background region have been marked. After marking, each pixel within the background region has a distance value representing its distance to the nearest character outline. With uniform spacing, pixel distance values along the center line of the background region are similar; with uneven spacing, the distribution of distance values within the background region varies considerably.

[0077] It should be noted that the spacing uniformity index is quantified by extracting the difference between the maximum and minimum distance values within the background area. The smaller the difference, the more regular the shape of the background area and the more uniform the spacing distribution. Furthermore, the mode value of stroke width is extracted by selecting the most frequently occurring width value from the width distribution statistics. This mode value reflects the typical thickness of the character strokes in the document. The mode value of stroke width is compared with the pixel spacing distance to determine the proportion of stroke width relative to character spacing.

[0078] In one embodiment, the potential adhesion risk level is determined using a dual-condition comprehensive evaluation method. When the modulus of the stroke width exceeds a preset proportional threshold of the pixel spacing distance, and the spacing uniformity index is lower than a preset uniformity threshold, it indicates that the strokes are thick and the spacing is regular. After fading, the stroke edges are likely to spread, causing adjacent characters to adhere, and this is judged as high risk. When only one condition is met, it is judged as medium risk. When neither condition is met, it is judged as low risk.

[0079] S105, the overall grayscale attenuation of the fusion coating, the potential adhesion risk level and the pixel spacing distance between adjacent characters are used to determine the adhesion-type degradation stage, and the separated character structure is obtained based on the adhesion-type degradation stage analysis.

[0080] The degradation stage is determined by comprehensively considering the overall grayscale attenuation of the coating, the potential adhesion risk level, and the pixel spacing between adjacent characters. If the overall grayscale attenuation of the coating is lower than a preset attenuation threshold and the potential adhesion risk level is low, the degradation stage is classified as mild adhesion. If the potential adhesion risk level is medium, the degradation stage is classified as moderate adhesion. If the potential adhesion risk level is high or the overall grayscale attenuation of the coating exceeds a preset attenuation threshold, the degradation stage is classified as severe adhesion. For the character region classified as mild adhesion, the grayscale value changes are scanned along the background pixel region between adjacent characters. The pixel position where the grayscale value jumps from low to high is located as the adhesion boundary. Pixel segmentation is performed along the adhesion boundary to obtain the character region after boundary separation. For character regions exhibiting moderate or severe adhesion in the adhesion-type degradation stage, a watershed segmentation algorithm is employed to process the adhesion regions. Within these adhesion regions, pixels with locally minimum grayscale values are marked as seed points. The algorithm then expands layer by layer from these seed points towards surrounding pixels until adjacent expanded regions meet, forming a dividing line. This dividing line separates the adhesion-boundary character regions, yielding the character regions separated by the dividing line. By combining the character regions separated by the boundary and those separated by the dividing line, the pixel set and contour boundary of each individual character are extracted to obtain the separated character structure.

[0081] In one implementation, the degradation evaluation vector is established by converting three different dimensions of indicators—overall grayscale attenuation of the coating, potential adhesion risk level, and pixel spacing distance between adjacent characters—into a vector representation in a unified format. The construction of this degradation evaluation vector makes multi-factor fusion judgment a feasible quantitative process.

[0082] Specifically, the degradation assessment vector comprises three components. The normalized value of grayscale attenuation is obtained by dividing the overall grayscale attenuation of the coating by a preset maximum attenuation reference value, resulting in a value between zero and one. The level coding value of adhesion risk is obtained by encoding low risk, medium risk, and high risk as incremental values. The normalized value of pixel spacing distance is obtained by dividing the pixel spacing distance between adjacent characters by a preset standard character spacing reference value, resulting in a value between zero and one. These three components are arranged sequentially to form the degradation assessment vector, which reflects the current overall degradation status of the document.

[0083] It should be noted that the determination of the adhesion-type degradation stage is achieved by comparing the degradation evaluation vector with a preset stage division boundary. The stage division boundary is a pre-established set of threshold combinations, each corresponding to a degradation stage, including mild adhesion, moderate adhesion, and severe adhesion. Furthermore, character separation for the mild adhesion stage employs boundary localization and pixel segmentation. In the mild adhesion state, adjacent characters are only connected by local pixels, and the adhesion region is narrow. The location of the grayscale value abrupt change is determined by scanning pixel by pixel along the connection region between adjacent characters to identify the pixel location where the grayscale value changes significantly; this location is the adhesion boundary between the two characters. The connecting pixels are then cut along the adhesion boundary, with the two parts of pixels belonging to the two adjacent characters respectively, thus obtaining a preliminarily separated character region. When the amount numbers on a supermarket receipt show slight fading, the connection between adjacent numbers is usually quite narrow, and boundary segmentation can separate them.

[0084] In one possible implementation, the watershed segmentation algorithm is based on the concept of a watershed in topography. The gray values of the image are regarded as the terrain height, the areas with low gray values are regarded as basins, and the areas with high gray values are regarded as ridges. The segmentation line is located at the ridge position between adjacent basins.

[0085] For example, the specific implementation process of watershed segmentation is to process character regions in the moderately or heavily adhered stages. Within the adhered region, each originally independent character contains a central area with a lower grayscale value, which corresponds to a basin in the terrain. Local grayscale minima are searched within the adhered region, and these minima are marked as seed points, each seed point representing a potential independent character. Starting from each seed point, the expansion extends to surrounding pixels, with the rule of prioritizing the inclusion of pixels with lower grayscale values in the current region. When the expanded regions from different seed points meet, the pixels at the meeting point are no longer included in either region, and these pixels at the meeting point are connected to form a dividing line. The dividing line divides the adhered character region into multiple independent regions, each independent region corresponding to a character.

[0086] Understandably, the extraction of the separated character structure involves integrating the results of the two separation methods mentioned above. For the initially separated character regions obtained from the mild adhesion stage, the pixel set and contour boundary of each region are directly extracted; for the character regions separated by dividing lines obtained from the moderate or severe adhesion stage, the pixel set and contour boundary of each independent region are also extracted.

[0087] In one embodiment, the separated character structure contains complete pixel position information and outer contour boundary coordinates of each character. The separated character structure retains the spatial morphological features of the characters, providing independent character units for subsequent character recognition and stroke integrity detection.

[0088] S106. Extract stroke connectivity components from the separated character structure, identify broken and degenerate stroke defects, and complete the broken parts through connectivity components to obtain the complete character form.

[0089] From the separated character structure, the pixel set of each independent character is marked for connectivity. Pixels that are connected to each other are grouped into the same connected component, and pixels that are not connected to each other are grouped into different connected components, resulting in a set of stroke connected components within each character. For characters whose set of stroke connected components contains multiple connected components, pixels on the edge of each connected component that are connected to only a single adjacent pixel are extracted as endpoint pixels. The Euclidean distance d and the angle θ between the stroke tangent directions of all endpoint pairs of different connected components are calculated. The endpoint pairs are sorted in ascending order of d, and checked sequentially. If d is less than 5 pixels and θ is less than 30 degrees, it is determined that there is a stroke defect between the two connected components, and they are connected. For the two connected components with stroke defects, a straight line segment is drawn along the midpoint of the tangent directions of the two endpoint pixels as the starting and ending points, serving as the connection path. The pixel positions covered by the connection path are filled with foreground pixel values, connecting the two originally disconnected connected components into one connected component. This process is repeated until there are no more endpoint pairs that meet the conditions within the character. After completing the break-and-repair process for all connected components within the character, the repaired connected components are merged into a unified set of pixels to obtain the complete character form.

[0090] In one implementation, connectivity labeling employs a two-pass pixel-by-pixel scan and label propagation approach. The pixel set of each character in the separated character structure is traversed. During the first scan, when a foreground pixel is scanned, the labels of its adjacent pixels are checked. If multiple different labels exist, an equivalent pair is recorded, and the smallest label is assigned; otherwise, a new label is assigned. The second scan merges labels based on equivalent pairs. After the scan is complete, pixels with the same label form a connected component.

[0091] It should be noted that the set of connected stroke components reflects the connectivity state of strokes within a character. In a normal character, all stroke pixels are interconnected, forming a single connected component; in a character experiencing fractured degradation, some strokes fade and disappear, causing pixels to break apart, forming multiple independent connected components.

[0092] Specifically, endpoint pixel extraction is a crucial step in identifying break locations. An endpoint pixel is defined as a pixel on the edge of a connected component that is connected to only a single adjacent foreground pixel; these pixels are located at the end of a stroke or at a break. During extraction, all pixels of each connected component are traversed, and the number of foreground pixels in the eight-neighborhood of each pixel is counted. If a pixel has only one foreground pixel in its eight-neighborhood, it is marked as an endpoint pixel. In cases where characters on thermal documents break due to fading, the originally continuous strokes are divided into multiple connected components. Endpoint pixels are generated at the break point of each connected component, indicating the location requiring repair. Furthermore, break determination employs a dual constraint of distance and direction conditions. The distance condition is achieved by calculating the Euclidean distance between endpoint pixels of different connected components. The Euclidean distance is the square root of the sum of the squares of the horizontal and vertical coordinate differences between two pixels. When the Euclidean distance between two endpoint pixels is less than a preset break distance threshold, it indicates that the two connected components are spatially close enough to suggest a possible original connection. The direction condition is achieved by calculating the tangent direction of the stroke at the two endpoint pixels. The tangent direction of the stroke is determined by the direction of the line connecting the endpoint pixel and its adjacent foreground pixel. If the angle between the tangent directions of the two endpoint pixels is less than a preset angle threshold, it indicates that the stroke direction of the two broken ends has continuity.

[0093] For example, the break distance threshold is set based on the typical width of the character strokes on thermal documents. Typically, the pixel span of the break area does not exceed several times the stroke width, and two connected components exceeding this range are not considered as the break portion of the same stroke.

[0094] In one possible implementation, the connection path is drawn based on the positions of the two endpoint pixels and the tangent direction. Using the coordinates of the two endpoint pixels as the start and end points, the tangent angle values θ1 and θ2 of the two endpoint pixels are obtained respectively. The angle difference d = atan2(sin(θ2-θ1), cos(θ2-θ1)) is calculated, and the average angle is θ1 + d / 2, which is taken as the direction angle of the connection path. Starting from the starting endpoint pixel, a straight line segment is drawn along the stated direction angle to the ending endpoint pixel using the Bresenham algorithm; this straight line segment is the connection path. At each pixel position traversed by the connection path, the grayscale value of that pixel is set to the foreground pixel value to fill the broken portion. For horizontal or vertical breaks in monetary numbers caused by fading, the connection path is usually a short horizontal or vertical line segment; for diagonal breaks, the connection path extends along the original diagonal direction of the stroke.

[0095] Understandably, the width of the pixel fill is consistent with the width of the original stroke. When filling, the fill is centered on the connecting path and extends to both sides to a range equivalent to the width of the adjacent connected component strokes, so that the repaired strokes visually blend smoothly with the original strokes.

[0096] In one embodiment, after all connected component pairs with missing strokes in each character are connected, all connected components of each character after completion are merged into a character pixel set. The character pixel set contains all foreground pixel position information of the character, thereby obtaining a complete character shape. The complete character shape restores the pixel structure of multiple characters before the breakage.

[0097] S107. Based on the adhesion-type degradation stage, the OCR path is adaptively switched from the complete character form, and the final text is extracted using an optical character recognition engine.

[0098] A recognition path mapping table is pre-established, recording the primary and alternative recognition parameter configurations for three degradation stages: mild adhesion, moderate adhesion, and severe adhesion. The recognition parameter configurations include a binarized grayscale threshold and a character template matching tolerance. Based on the adhesion-type degradation stage, the corresponding recognition parameter configuration is queried from the recognition path mapping table to obtain the primary recognition parameter configuration matching the current degradation stage. The optical character recognition is initialized using the primary recognition parameter configuration. Each character pixel set in the complete character form is compared one by one with a pre-stored character template library. The pixel overlap between the character pixel set and each character template is calculated. The character corresponding to the character template with the highest pixel overlap is selected as the candidate recognition result, and the pixel overlap is used as the confidence value of the candidate recognition result. For characters in the candidate recognition results whose confidence values are lower than a preset confidence threshold, the alternative recognition parameter configuration corresponding to the current degradation stage is obtained from the recognition path mapping table for secondary recognition. The confidence value of the secondary recognition result is compared with the confidence value of the original recognition result, and the target confidence value is selected as the final recognition result of the character. The final recognition results of all characters are summarized to obtain the final text.

[0099] In one implementation, the identification path mapping table is stored using a key-value pair structure, where the key is the identifier of the sticky degradation stage, and the value is the corresponding set of identification parameter configurations. The set of identification parameter configurations corresponding to each degradation stage includes a primary identification parameter configuration and alternative identification parameter configurations. The primary configuration is used for the first identification, and the alternative configurations are used for the second identification when the confidence level is insufficient.

[0100] Specifically, the binarization grayscale threshold in the recognition parameter configuration controls the boundary between character pixels and background pixels; a lower grayscale threshold retains more character pixels. The character template matching tolerance controls the allowed positional offset range during pixel overlap calculation; a larger tolerance value allows for greater tolerance to character deformation. For severe adhesion degradation, the primary configuration typically uses a lower grayscale threshold and a larger matching tolerance. Further, the pixel overlap calculation process involves superimposing and comparing the character pixel set with the character template. The character template library pre-stores standard pixel distribution maps for numbers zero to nine and common symbols; each template records the relative position of foreground pixels in the standard character. During comparison, the geometric center of the character pixel set is aligned with the geometric center of the template, and the number of overlapping foreground pixels in the character pixel set and template foreground pixels is counted. The number of overlapping pixels is divided by the total number of foreground pixels in the template to obtain the pixel overlap value. The closer the pixel overlap value is to one, the higher the matching degree between the character pixel set and the template.

[0101] It should be noted that the confidence score is the same as the pixel overlap score. When this score is lower than the preset confidence threshold, it indicates that the reliability of the first recognition result is insufficient, triggering the alternative recognition parameter configuration for a second recognition. The second recognition uses different grayscale threshold values and matching tolerance values to recalculate the pixel overlap score.

[0102] In one embodiment, the final recognition results of all characters obtained through the aforementioned character recognition process are concatenated according to the original arrangement order of the characters in the document image to form a continuous string. The string is the final text extracted from the thermal printed financial document, which can be further used for privacy information desensitization processing.

[0103] If the technical solution of this application involves the collection, processing, or application of personal information, the relevant products have, before implementing any personal information processing activities, fully and clearly informed individuals of the processing rules in accordance with the "Personal Information Protection Law of the People's Republic of China" and other current laws and regulations, and obtained their voluntary and explicit consent. If sensitive personal information is involved, the product has obtained the individual's separate consent before processing, and such consent is given in an explicit manner. For example, prominent signs are set up in the area where information collection devices such as cameras are located, clearly indicating "Entering is considered as consent to the collection of personal information"; or through pop-ups, checkboxes, user-initiated uploads, etc., under the premise of clearly listing the processor's identity, processing purpose, processing method, and information type, the user actively completes the authorization operation. The above mechanisms ensure that all personal information processing activities are based on legal authorization and fully comply with national compliance requirements regarding personal information protection.

[0104] The above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit it. The present invention has been described in detail with reference to preferred embodiments. Those skilled in the art should understand that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the present invention, and all such modifications and substitutions should be covered within the scope of the claims of the present invention.

Claims

1. A method for extracting and recognizing attachment information of multimodal financial documents, characterized in that, include: The original image data of thermally printed financial documents is collected by scanning equipment, and image preprocessing algorithms are used to remove noise interference to obtain the denoised document image. The thermal coating area is extracted from the denoised document image, and the overall gray-level attenuation of the coating is calculated using the gray-level histogram analysis method to determine the degree of fading. Based on the fading level, character regions are identified and stroke outlines are segmented from the denoised document image. An edge detection algorithm is used to obtain the residual effective width of each character stroke, and width distribution statistics are obtained. The pixel spacing distance between adjacent characters is extracted from the width distribution statistics and the character region. The distance transformation algorithm is used to quantify the spacing uniformity and assess the potential adhesion risk level. By integrating the overall grayscale attenuation of the coating, the potential adhesion risk level, and the pixel spacing between adjacent characters, the adhesion-type degradation stage is determined, and the separated character structure is obtained based on the adhesion-type degradation stage. Stroke connectivity components are extracted from the separated character structure, broken and degraded stroke defects are identified, and the broken parts are filled in by the connectivity components to obtain the complete character form; Based on the adhesion-type degradation stage, the OCR path is adaptively switched from the complete character form, and the final text is extracted using an optical character recognition engine.

2. A method for multi-modal document financial-instrument-attachment information extraction and recognition as claimed in claim 1, wherein, The process involves acquiring the original image data of thermally printed financial documents using a scanning device, and then employing an image preprocessing algorithm to remove noise interference to obtain a denoised document image, including: Based on the color channel information of the original image, the original image is converted into a single-channel grayscale image; The grayscale image is processed using a median filtering algorithm. The median of the neighboring pixels at each pixel position in the grayscale image is taken as the new grayscale value of that pixel. Noise caused by scratches and dust particles on the surface of thermal paper is removed to obtain a denoised document image.

3. The method for extracting and recognizing attachment information of multimodal financial documents according to claim 1, characterized in that, The step of extracting the thermal coating area from the denoised document image, calculating the overall gray-level attenuation of the coating using gray-level histogram analysis, and determining the fading level includes: From the denoised document image, based on the feature that the gray value of the thermal coating area is higher than that of the unprinted background area, the thermal coating area is identified and extracted using a threshold segmentation method. The gray values of all pixels in the thermal coating area are obtained, the number of pixels corresponding to each gray level is counted, and a gray histogram is generated. Locate the gray level position corresponding to the maximum number of pixels in the gray level histogram, take this gray level position as the current gray level peak of the coating, compare the current gray level peak of the coating with the standard gray level reference value, calculate the difference between the current gray level peak of the coating and the standard gray level reference value, and obtain the overall gray level attenuation of the coating. The overall grayscale attenuation of the coating is matched with multiple attenuation threshold ranges. If the overall grayscale attenuation of the coating falls into a set attenuation threshold range, the fading level corresponding to that range is determined.

4. The method for extracting and recognizing attachment information of multimodal financial documents according to claim 1, characterized in that, The process of identifying character regions and segmenting stroke outlines from the denoised document image based on the fading level, obtaining the residual effective width of each character stroke using an edge detection algorithm, and obtaining width distribution statistics includes: According to the fading level, a corresponding binarization threshold is set for the denoised document image. The denoised document image is converted into a binary image through the binarization threshold. The connected regions of foreground pixels are identified from the binary image, and each connected region is used as a character region. For the character region, the Sobel edge detection algorithm is used to perform gradient convolution operations in the horizontal and vertical directions on the pixels within the character region, and the gradient magnitude G is calculated, where... Gx is the horizontal gradient, and Gy is the vertical gradient; A preset gradient threshold T is used to mark the boundary along the pixel positions where the gradient magnitude exceeds T, thus obtaining the stroke outline of each character. The outer pixels of the stroke outline are peeled off layer by layer through a thinning process until the center line with a single pixel width is retained, thus obtaining the stroke skeleton. The vertical distance from each skeleton point of the stroke skeleton to the boundary of the stroke outline is measured to both sides, and the sum of the vertical distances on both sides is taken as the residual effective width at the skeleton point. The residual effective widths of all skeleton points within the character region are summarized, and the frequency of each width value is counted to obtain width distribution statistics.

5. The method for extracting and recognizing attachment information of multimodal financial documents according to claim 1, characterized in that, The step of extracting the pixel spacing distance between adjacent characters from the width distribution statistics and the character region, quantifying the spacing uniformity using a distance transformation algorithm, and assessing the potential adhesion risk level includes: From the character region, two adjacent character regions are located sequentially according to the character arrangement order to form adjacent character pairs. The background pixel region between each pair of adjacent character pairs is obtained, and the pixel span of the background pixel region in the horizontal direction is measured to obtain the pixel interval distance between adjacent characters. A distance transformation algorithm is used for the background pixel region. The distance value from each background pixel to the nearest character outline is marked layer by layer from the boundary of the background pixel region inward. The maximum and minimum distance values in the background pixel region are extracted, and the difference between the two is calculated as the interval uniformity index. The most frequently occurring width value is extracted from the width distribution statistics as the stroke width mode value. The stroke width mode value is compared with the pixel spacing distance. The potential adhesion risk level is determined based on the comparison result of the stroke width mode value and the pixel spacing distance and the spacing uniformity index. The potential adhesion risk levels include high risk, medium risk and low risk. High risk corresponds to the case where the modulus of the stroke width exceeds the preset ratio threshold of the pixel spacing distance and the spacing uniformity index is lower than the preset uniformity threshold. Medium risk corresponds to the case where only one of the conditions is met. Low risk corresponds to the case where neither of the two conditions is met.

6. The method for extracting and recognizing attachment information of multimodal financial documents according to claim 1, characterized in that, The process involves fusing the overall grayscale attenuation of the coating, the potential adhesion risk level, and the pixel spacing between adjacent characters to determine the adhesion-type degradation stage. Based on this adhesion-type degradation stage, the separated character structure is analyzed, including: Based on a comprehensive determination of the overall grayscale attenuation of the coating, the potential adhesion risk level, and the pixel spacing distance between adjacent characters, if the overall grayscale attenuation of the coating is lower than a preset attenuation threshold and the potential adhesion risk level is low risk, then the adhesion-type degradation stage is determined to be mild adhesion. If the potential adhesion risk level is medium risk, then the adhesion-type degradation stage is determined to be moderate adhesion. If the potential adhesion risk level is high risk or the overall grayscale attenuation of the coating exceeds the preset attenuation threshold, then the adhesion-type degradation stage is determined to be severe adhesion. For the character regions in the adhesion-type degradation stage that are slightly adhered, the gray value changes are scanned along the background pixel region between adjacent characters, and the pixel positions where the gray value jumps from low to high are located as adhesion boundaries. Pixels are cut along the adhesion boundaries to obtain the character regions after boundary separation. For character regions in the adhesion-type degradation stage that are moderately or severely adhered, a watershed segmentation algorithm is used to process the adhesion region. The pixel with the local minimum gray value in the adhesion region is marked as a seed point. The extension is made layer by layer from the seed point to the surrounding pixels until the adjacent extension regions meet to form a segmentation line. The adhesion character regions are separated along the segmentation line to obtain the character regions after the segmentation line is separated. By summing the character regions separated by the boundary and the character regions separated by the dividing line, the pixel set and contour boundary of each independent character are extracted to obtain the separated character structure.

7. The method for extracting and recognizing attachment information of multimodal financial documents according to claim 1, characterized in that, The process of extracting stroke connectivity components from the separated character structure, identifying fragmented and degenerate stroke defects, and completing the broken parts using connectivity components to obtain a complete character form includes: From the separated character structure, the pixel set of each independent character is marked for connectivity. Pixels that are connected to each other are grouped into the same connected component, and pixels that are not connected to each other are grouped into different connected components, so as to obtain the set of stroke connected components inside each character. For characters containing multiple connected components in the stroke connected component set, extract the pixels on the edge of each connected component that are connected to only a single adjacent pixel as endpoint pixels, calculate the Euclidean distance d and the angle θ between the stroke tangent directions of all different connected component endpoint pairs, sort the endpoint pairs in ascending order of d, and check them one by one. If d is less than a preset pixel threshold and θ is less than a preset angle threshold, it is determined that there are stroke defects between the connected components, and they are connected. For the two connected components with missing strokes, a straight line segment is drawn at the midpoint of the tangent direction of the two endpoint pixels as the starting and ending points, and the pixel position covered by the connection path is filled with the foreground pixel value, so as to connect the two originally disconnected connected components into one connected component, until there are no more endpoint pairs that meet the conditions in the character. After completing the break-and-repair process for all connected components within the character, the repaired connected components are merged into a unified set of pixels to obtain the complete character form. The complete character form includes a set of completed stroke connected components, wherein the pixel set and outline boundary of each character are connected by connected components.

8. The method for extracting and recognizing attachment information of multimodal financial documents according to claim 1, characterized in that, The step of adaptively switching the OCR path from the complete character form according to the adhesion-type degradation stage, and extracting the final text using an optical character recognition engine, includes: Based on the adhesion-type degradation stage, the corresponding identification parameter configuration is queried from the identification path mapping table to obtain the primary identification parameter configuration that matches the current degradation stage; The optical character recognition is initialized using the primary selection recognition parameter configuration. Each character pixel set in the complete character form is compared with the pre-stored character template library one by one. The pixel overlap between the character pixel set and each character template is calculated. The character corresponding to the character template with the highest pixel overlap is selected as the candidate recognition result. The pixel overlap is used as the confidence value of the candidate recognition result. For characters in the candidate recognition results whose confidence values are lower than a preset confidence threshold, the alternative recognition parameter configuration corresponding to the current degradation stage is obtained from the recognition path mapping table for secondary recognition. The confidence value of the secondary recognition result is compared with the confidence value of the original recognition result, and the target confidence value is selected as the final recognition result of the character. The final recognition results of all characters are summarized to obtain the final text.