Image recognition method, device, equipment and storage medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By performing contrast enhancement preprocessing and segmentation on bank card images, and combining a dual-channel image recognition model with depth and width, the problem of low accuracy and precision in bank card number recognition is solved, achieving more efficient bank card number recognition.

CN116343222BActive Publication Date: 2026-06-19INDUSTRIAL AND COMMERCIAL BANK OF CHINA

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: INDUSTRIAL AND COMMERCIAL BANK OF CHINA
Filing Date: 2023-03-27
Publication Date: 2026-06-19

Application Information

Patent Timeline

27 Mar 2023

Application

19 Jun 2026

Publication

CN116343222B

IPC: G06V30/148; G06V30/16; G06T7/11

AI Tagging

Application Domain

Image analysis

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing bank card number recognition methods have low accuracy and precision, mainly because the contrast of bank card numbers superimposed on complex backgrounds is poor, and OCR technology has great difficulty in recognizing them in complex environments.

Method used

By performing contrast enhancement preprocessing on the bank card image, segmentation processing is performed using a preset image segmentation model, and multiple recognition processes are performed using a pre-configured dual-channel image recognition model with depth and width, thereby improving the contrast between the bank card number area and the background area and extracting features.

Benefits of technology

It effectively improves the contrast between the bank card number area and the complex background area, enhances the segmentation efficiency of the image segmentation model for the card number area, and improves the accuracy and precision of bank card number recognition.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN116343222B_ABST

Patent Text Reader

Abstract

This application provides an image recognition method, apparatus, device, and storage medium, relating to the field of image processing technology. The method includes: preprocessing a bank card image to enhance contrast, obtaining a bank card image to be recognized; segmenting the bank card image using a preset image segmentation model, obtaining an image segmentation result containing the bank card number region output by the preset image segmentation model; and performing multiple depth and width recognition processes on the image segmentation result using a pre-configured depth and width dual-channel image recognition model to obtain the recognition result of the bank card image to be recognized. The method of this application effectively improves the contrast between the bank card number region and the background region by enhancing the contrast of the bank card image, improving the segmentation efficiency of the preset image segmentation model for the card number region, and extracting features from the image segmentation results at two different levels of depth and width, effectively improving the accuracy and precision of recognition.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of image processing technology, and in particular to an image recognition method, apparatus, device, and storage medium. Background Technology

[0002] With the rapid development of mobile internet technology, mobile payment has become the mainstream payment method in people's daily lives, and many mobile payment-related businesses require the use of bank card numbers.

[0003] Currently, optical character recognition (OCR) technology is commonly used to identify bank card numbers. OCR technology captures images of bank cards with a camera, performs text detection on the bank card number in the image, and further recognizes the text content based on the text detection technology to obtain the text information in the image.

[0004] However, when users upload photos of their bank cards, there may be complex shooting environments, varied shooting angles, and distortion. Moreover, bank card numbers are usually superimposed on complex backgrounds, resulting in poor contrast. In addition, apart from the card number being a string of numbers, the rest of the text in bank card photos is not the target. These problems undoubtedly increase the recognition difficulty of OCR technology, resulting in low accuracy and precision of OCR-based bank card number recognition methods, which cannot effectively recognize bank card numbers. Summary of the Invention

[0005] This application provides an image recognition method, apparatus, device, and storage medium to solve the problem of low accuracy and precision in existing bank card number recognition methods.

[0006] In a first aspect, this application provides an image recognition method, comprising:

[0007] A bank card image is acquired, and the bank card image is preprocessed to enhance contrast, thereby obtaining the bank card image to be identified;

[0008] A preset image segmentation model is used to segment the bank card image to be identified, and the image segmentation result containing the bank card number region is obtained from the output of the preset image segmentation model.

[0009] A pre-configured dual-channel image recognition model with depth and width is used to perform multiple depth and width recognition processes on the image segmentation result to obtain the recognition result of the bank card image to be recognized.

[0010] Secondly, this application provides an image recognition device, comprising:

[0011] An image processing unit is used to acquire a bank card image, perform preprocessing on the bank card image to enhance contrast, and obtain a bank card image to be identified.

[0012] The image segmentation unit is used to segment the bank card image to be identified using a preset image segmentation model, and obtain the image segmentation result containing the bank card number region output by the preset image segmentation model.

[0013] The processing unit is also used to perform multiple depth and width recognition processes on the image segmentation result using a pre-configured dual-channel image recognition model with depth and width to obtain the recognition result of the bank card image to be recognized.

[0014] Thirdly, this application provides an electronic device, including: a processor, a memory, and a transceiver;

[0015] Interconnection of processor, memory, and transceiver circuits;

[0016] The memory stores instructions that the computer executes;

[0017] A transceiver is used to send and receive data.

[0018] The processor executes computer execution instructions stored in memory, causing the processor to perform the method described in the first aspect.

[0019] Fourthly, this application provides a computer-readable storage medium storing computer-executable instructions, which, when executed by a processor, are used to implement the method described in the first aspect.

[0020] Fifthly, this application provides a computer program product, including a computer program that, when executed by a processor, implements the method described in the first aspect.

[0021] The image recognition method, apparatus, device, and storage medium provided in this application acquire a bank card image, perform enhanced contrast preprocessing on the bank card image to obtain a bank card image to be recognized, and use a preset image segmentation model to segment the bank card image to be recognized to obtain an image segmentation process containing the bank card number region, thus obtaining a preliminary image segmentation result. Further, a pre-configured depth and width dual-channel image recognition model is used to perform multiple depth and width recognition processes on the image segmentation result to obtain the bank card image recognition result. By enhancing the contrast of the bank card image, the contrast between the bank card number region and its complex background region is effectively improved. It also improves the segmentation efficiency of the preset image segmentation model for the card number region. The use of a dual-channel image recognition model to extract features from the preliminary image segmentation result at two different levels—depth and width—effectively improves the accuracy and precision of bank card number recognition. Attached Figure Description

[0022] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application.

[0023] Figure 1 A schematic diagram of the network architecture of the image recognition method provided in this application;

[0024] Figure 2 A flowchart illustrating an image recognition method provided in this application;

[0025] Figure 3 A flowchart illustrating another image recognition method provided in this application;

[0026] Figure 4 A schematic diagram of the structure of a pre-configured dual-channel image recognition model with depth and width provided in this application;

[0027] Figure 5 A schematic diagram of the pre-defined asymmetric convolution residual block structure provided in this application;

[0028] Figure 6 A schematic diagram of the structure of the preset asymmetric convolutional residual channel attention module provided in this application;

[0029] Figure 7 A schematic diagram of the structure of another pre-configured dual-channel image recognition model for depth and width provided in this application;

[0030] Figure 8 This is a schematic diagram of the structure of asymmetric convolution provided in this application;

[0031] Figure 9 A schematic diagram of the pre-defined multi-scale width convolutional module provided in this application;

[0032] Figure 10 This is a schematic diagram of the structure of the bichannel convolution provided in this application;

[0033] Figure 11 A schematic diagram of the structure of an image recognition device provided in this application;

[0034] Figure 12 This is a first block diagram of an electronic device used to implement the image recognition method of this application;

[0035] Figure 13 This is a second block diagram of an electronic device used to implement the image recognition method of this application.

[0036] The accompanying drawings illustrate specific embodiments of this application, which will be described in more detail below. These drawings and descriptions are not intended to limit the scope of the concept in any way, but rather to illustrate the concept of this application to those skilled in the art through reference to particular embodiments. Detailed Implementation

[0037] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numbers in different drawings denote the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this application. Rather, they are merely examples of apparatuses and methods consistent with some aspects of this application as detailed in the appended claims.

[0038] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties. Furthermore, the collection, use and processing of the relevant data must comply with relevant laws, regulations and standards, and corresponding operation entry points are provided for users to choose to authorize or refuse.

[0039] It should be noted that the image recognition method and apparatus of this application can be used in the field of fintech or other related fields, and can also be used in any field other than fintech. The application field of the image recognition method and apparatus of this application is not limited.

[0040] Currently, optical character recognition (OCR) technology is commonly used to identify bank card numbers. OCR technology captures images of bank cards with a camera, performs text detection on the bank card number in the image, and further recognizes the text content based on the text detection technology to obtain the text information in the image.

[0041] When users upload photos of their bank cards, there may be complex shooting environments, varied shooting angles, and distortion. Moreover, bank card numbers are usually superimposed on complex backgrounds, resulting in poor contrast. In addition, apart from the card number being a string of numbers, the rest of the text in bank card photos is not the target. These problems undoubtedly increase the difficulty of OCR technology recognition, resulting in low accuracy and precision of OCR-based bank card number recognition methods, which cannot effectively recognize bank card numbers.

[0042] Therefore, addressing the issue of low accuracy and precision in existing bank card number recognition methods, the inventors discovered in their research that a pre-configured dual-channel image recognition model with depth and width enhances the contrast of the bank card image, segments the bank card, and combines this with the image recognition model. Specifically, the process involves acquiring a bank card image, performing pre-processing to enhance contrast, obtaining the bank card image to be recognized, using a pre-configured image segmentation model to segment the bank card image to be recognized, obtaining an image segmentation process containing the bank card number region, and obtaining a preliminary image segmentation result. Further, the pre-configured dual-channel image recognition model with depth and width performs multiple depth and width recognition processes on the image segmentation result to obtain the final bank card image recognition result. By enhancing the contrast of the bank card image, the contrast between the bank card number region and its complex background region is effectively improved. This also improves the segmentation efficiency of the pre-configured image segmentation model for the card number region. The dual-channel image recognition model extracts features from the preliminary image segmentation result at both depth and width levels, effectively improving the accuracy and precision of bank card number recognition.

[0043] Therefore, based on the above-mentioned inventive discoveries, the inventors proposed the technical solutions of the embodiments of the present invention. The network architecture and application scenarios of the image recognition method provided by the embodiments of the present invention will be described below.

[0044] like Figure 1 As shown, the network architecture of the image recognition method provided in this embodiment of the invention includes: a user terminal 1 and a server 2. The user terminal 1 and the server 2 are connected for communication. The user terminal 1 is configured with a client, and the user handles relevant business in the client. The user uploads a bank card image through the client, the server 2 obtains the bank card image, performs preprocessing to enhance the contrast of the bank card image, and obtains the bank card image to be recognized; the server 2 uses a preset image segmentation model to segment the bank card image to be recognized, and obtains the image segmentation result containing the bank card number area output by the preset image segmentation model; the server 2 uses a pre-configured dual-channel image recognition model of depth and width to perform multiple depth and width recognition processes on the image segmentation result, and obtains the recognition result of the bank card image to be recognized. By enhancing the contrast of the bank card image, the contrast between the bank card number area and its complex background area is effectively improved, and the segmentation efficiency of the preset image segmentation model for the card number area is also improved. The use of a dual-channel image recognition model to extract features from the preliminary image segmentation result from two different levels of depth and width effectively improves the accuracy and precision of bank card number recognition.

[0045] The embodiments of this application will now be described in detail with reference to the accompanying drawings.

[0046] Figure 2This is a flowchart illustrating an image recognition method provided in this application, which is applied to an electronic device. The electronic device can be a digital computer of various forms, such as a cellular phone, smartphone, laptop computer, desktop computer, workstation, personal digital assistant, server, blade server, mainframe computer, and other suitable computers. Figure 2 As shown, the method includes:

[0047] Step 201: Obtain a bank card image and perform preprocessing to enhance the contrast of the bank card image to obtain the bank card image to be identified.

[0048] In this embodiment, the bank card image is acquired either by the user transmitting the bank card to the electronic device or by the electronic device acquiring the bank card image through a camera unit. When photographing a bank card, the card may be superimposed on a complex background, resulting in poor contrast of the card number and affecting subsequent recognition. Therefore, before recognition, the bank card image undergoes preprocessing to enhance contrast, yielding the image to be recognized. Specifically, Contrast Limited Adaptive Histgram Equalization (CLAHE) can be used to enhance image contrast. The CLAHE algorithm divides the image into multiple small regions and limits the contrast of each region. This not only improves the contrast between the target card number area and the background in the bank card image but also limits noise enhancement. Other methods can also be used to enhance image contrast, not limited to those described above.

[0049] Step 202: Using a preset image segmentation model, the bank card image to be identified is segmented to obtain the image segmentation result containing the bank card number region output by the preset image segmentation model.

[0050] In this embodiment, the preset image segmentation model is a pre-trained model that can detect the target region, which is the bank card number region in the bank card image. The preset image segmentation model is used to perform preliminary segmentation processing on the bank card image to be identified, and the image segmentation result containing the bank card number region is obtained from the output of the preset image segmentation model.

[0051] The preset image segmentation model can be the YOLO algorithm. The core of the YOLO algorithm is to use the entire image as input and solve object detection as a regression problem, directly regressing the position of the pre-selected box and its category at the output layer. The YOLO algorithm has gone through several versions, including YOLOv1, YOLOv2, YOLOv3, YOLOv4, and YOLOv5. For example, YOLOv1 borrowed from the GoogleNet network and made improvements by replacing the Inception module with a 3×3 convolution in parallel with a 1×1 convolution. The object detection layer takes the feature map extracted from the modified GoogleNet and passes it through 4 convolutional layers and 2 fully connected layers, finally generating a 7×7×30 output. The purpose of passing it through 4 convolutional layers first is to improve the model's generalization ability.

[0052] Step 203: Using a pre-configured dual-channel image recognition model with depth and width, perform multiple depth and width recognition processes on the image segmentation results to obtain the recognition result of the bank card image to be recognized.

[0053] In this embodiment, the initially established dual-channel image recognition model of depth and width is trained to obtain a pre-configured dual-channel image recognition model of depth and width. The pre-configured dual-channel image recognition model of depth and width is used to perform multiple depth and width recognition processes on the image segmentation results to obtain the recognition result of the bank card image to be recognized, and the bank card number can be obtained.

[0054] This application acquires a bank card image, performs enhanced contrast preprocessing on the bank card image to obtain a bank card image to be identified, uses a preset image segmentation model to segment the bank card image to obtain an image segmentation process containing the bank card number region, and obtains a preliminary image segmentation result. Further, a pre-configured depth and width dual-channel image recognition model is used to perform multiple depth and width recognition processes on the image segmentation result to obtain the bank card image recognition result. By enhancing the contrast of the bank card image, the contrast between the bank card number region and its complex background region is effectively improved. It also improves the segmentation efficiency of the preset image segmentation model for the card number region. The use of a dual-channel image recognition model to extract features from the preliminary image segmentation result at two different levels (depth and width) effectively improves the accuracy and precision of bank card number recognition.

[0055] Figure 3 A flowchart illustrating another image recognition method provided in this application, which is applied to electronic devices, such as... Figure 3 As shown, the method includes:

[0056] Step 301a: Perform contrast enhancement preprocessing on the bank card image to be trained to obtain a preprocessed image.

[0057] In this embodiment, a bank card image to be trained is acquired, and the image undergoes preprocessing to enhance contrast. Specifically, the Contrast Limited Adaptive Histgram Equalization (CLAHE) algorithm can be used to enhance the image contrast, improving the contrast between the target card number area and the background in the bank card image. To better identify the target area of the bank card image, a secondary enhancement process can be performed. Specifically, the bank card image to be trained is first subjected to grayscale transformation to obtain a grayscale image; the CLAHE algorithm is then used to enhance the contrast of the grayscale image to obtain a contrast-enhanced image; finally, a gamma transform is used to perform a secondary contrast enhancement process on the contrast-enhanced image to obtain a preprocessed image.

[0058] Step 301b: Perform image augmentation on the preprocessed image to obtain training data.

[0059] In this embodiment, the quantity and diversity of training samples usually play a decisive role in the performance of the model. However, due to the special nature of bank cards, it is difficult to collect a large number of training samples. Therefore, existing preprocessed images are used for amplification to enrich the training data.

[0060] Optionally, the preprocessed image is subjected to image augmentation to obtain training data, including:

[0061] Each preprocessed image is rotated to obtain a rotated image; each preprocessed image is flipped to obtain a flipped image; each preprocessed image is then subjected to Gaussian noise addition and color conversion to obtain a color-converted image; the rotated, flipped, and color-converted images are used to augment the training data.

[0062] In this embodiment, the sample data is enriched primarily through image rotation, image flipping, and color conversion. Multiple rotation angles are pre-set, and each preprocessed image is rotated according to these angles to obtain rotated images. Flipping includes horizontal and vertical flipping; each preprocessed image is either horizontally or vertically flipped to obtain flipped images. Gaussian noise is added to each preprocessed image sequentially, followed by color conversion to obtain color-converted images. The training data is augmented using the rotated, flipped, and color-converted images. This augmentation process enriches the training data, thereby reducing data collection and labeling costs while effectively improving the overall performance and generalization ability of the model.

[0063] Step 301c: Using a preset image segmentation model, the training data is segmented to obtain the image segmentation result of the preset image segmentation model that contains the bank card number region.

[0064] In this embodiment, the preset image segmentation model is a pre-trained model that can detect the target region, which is the bank card number region in the bank card image. The preset image segmentation model is used to perform preliminary segmentation processing on the training data to obtain the image segmentation result of the preset image segmentation model that contains the bank card number region.

[0065] Step 301d: Based on the image segmentation results to be trained and the label data of the bank card number in the bank card image to be trained, the initially established dual-channel image recognition model of depth and width is trained to obtain the pre-configured dual-channel image recognition model of depth and width after training.

[0066] In this embodiment, based on the image segmentation results to be trained and the label data of the bank card number in the bank card image to be trained, the initially established dual-channel image recognition model of depth and width is trained to obtain the pre-configured dual-channel image recognition model of depth and width. The dual-channel image recognition model includes a depth classification network model and a width classification network model, and performs feature extraction on the image segmentation results to be trained from two different levels of depth and width.

[0067] Step 301: Obtain a bank card image, perform preprocessing to enhance the contrast of the bank card image, and obtain the bank card image to be identified.

[0068] Optionally, the bank card image undergoes contrast-enhancing preprocessing to obtain the bank card image to be identified, including:

[0069] The bank card image is processed by grayscale transformation to obtain a grayscale image of the bank card; the contrast-limited adaptive histogram image equalization algorithm is used to enhance the contrast of the grayscale image of the bank card to obtain a contrast-enhanced image of the bank card; the contrast-enhanced image of the bank card is then subjected to a second contrast enhancement process using gamma transform to obtain the bank card image to be identified.

[0070] In this embodiment, the bank card image undergoes preprocessing to enhance contrast. Specifically, the Contrast-Limited Adaptive Histgram Equalization (CLAHE) algorithm can be used to enhance the image contrast, improving the contrast between the target card number area and the background. To better identify the target area of the bank card image, a secondary enhancement process can be performed. Specifically, the bank card image is first subjected to grayscale transformation to obtain a grayscale image; the CLAHE algorithm is then used to enhance the contrast of the grayscale image, resulting in a contrast-enhanced image; finally, a gamma transform is used to perform a secondary contrast enhancement process on the contrast-enhanced image, yielding the bank card image to be identified.

[0071] Step 302: Using a preset image segmentation model, the bank card image to be identified is segmented to obtain the image segmentation result containing the bank card number region output by the preset image segmentation model.

[0072] In this embodiment, step 302 has the same technical features as step 202. For a detailed description, please refer to step 202, which will not be repeated here.

[0073] Step 303: Using a pre-configured dual-channel image recognition model with depth and width, perform multiple depth and width recognition processes on the image segmentation results to obtain the recognition result of the bank card image to be recognized.

[0074] In one possible implementation, a pre-configured dual-channel image recognition model with depth and width is used to perform multiple depth and width recognition processes on the image segmentation results to obtain the recognition result of the bank card image to be recognized, including:

[0075] The image segmentation results are processed multiple times using a pre-defined asymmetric convolutional residual block structure, a pre-defined asymmetric convolutional residual channel attention module, and a corresponding max pooling layer in a deep classification network model to obtain a first feature image. Similarly, a second feature image is obtained by performing multiple feature extraction and max pooling operations on the image segmentation results using a pre-defined multi-scale width convolutional module and a corresponding max pooling layer in a width classification network model. Finally, the first and second feature images are concatenated to obtain the bank card image recognition result.

[0076] In this embodiment, the pre-configured dual-channel image recognition model with depth and width includes a depth classification network model and a width classification network model. The depth classification network model includes a preset asymmetric convolutional residual block structure and a preset asymmetric convolutional residual channel attention module. The preset asymmetric convolutional residual block structure, the preset asymmetric convolutional residual channel attention module and the corresponding max pooling layer are used to perform multiple feature extraction operations and max pooling operations on the image segmentation results to obtain the first feature image.

[0077] Optionally, if two feature extraction operations are performed on the image segmentation result, a preset asymmetric convolutional residual block structure, a preset asymmetric convolutional residual channel attention module, and the corresponding max pooling layer in the deep classification network model are used to perform multiple feature extraction and max pooling operations on the image segmentation result to obtain the first feature image, including:

[0078] The image segmentation result is input to the first image processing layer to obtain a first result. The first image processing layer sequentially performs batch normalization, ReLU activation, asymmetric convolution, and DropBlock operations on the input image segmentation result. The first result is then input to the second image processing layer to obtain a second result. The second image processing layer sequentially performs batch normalization, ReLU activation, and asymmetric convolution operations on the input first result. The second result and the image segmentation result are then added to obtain a first summed result. The first summed result is then input to the third image processing layer to obtain a third result. The third image processing layer sequentially performs batch normalization, ReLU activation, and asymmetric convolution operations on the input first summed result. The process involves several steps: first, inputting the third result into the fourth image processing layer; second, inputting the third result into the channel attention module to obtain an enhanced feature image; third, inputting the first summing result into the enhanced feature image to obtain a second summing result; fourth, inputting the second summing result into a max pooling layer to obtain a pooling result; and fifth, using the first, second, third, and fourth image processing layers and the channel attention module to perform secondary feature extraction on the pooling result to obtain a first feature image.

[0079] See Figures 4-7 Taking two feature extractions as an example, the deep classification network model includes: a preset asymmetric convolutional residual block structure and a preset asymmetric convolutional residual channel attention module. The preset asymmetric convolutional residual block structure includes: a first image processing layer and a second image processing layer; the preset asymmetric convolutional residual channel attention module includes: a third image processing layer, a fourth image processing layer, and a channel attention module.

[0080] See Figure 5 The image segmentation result is input into a first image processing layer with a preset asymmetric convolutional residual block structure to obtain the first result output by the first image processing layer. The first image processing layer sequentially performs batch normalization (BN), ReLU activation, asymmetric convolution, and DropBlock operations on the input image segmentation result. Batch normalization maintains the distribution of image data and avoids covariate shift caused by parameter updates. The ReLU function is used to activate the network parameters, and then asymmetric convolution is used to extract features from the bank card image. The DropBlock operation is used for regularization to effectively prevent overfitting.

[0081] See also Figure 5 The first result is input into the second image processing layer with a preset asymmetric convolution residual block structure to obtain the second result output by the second image processing layer. The second image processing layer performs batch normalization (BN), ReLU activation, and asymmetric convolution operations on the input first result in sequence. The input image segmentation result is added to the second result obtained by two consecutive convolutions to obtain the first sum result, which is used as the output of the preset asymmetric convolution residual block structure.

[0082] See Figure 6 The first summation result is input into the third image processing layer of the preset asymmetric convolution residual channel attention module to obtain the third result output by the third image processing layer. The third image processing layer performs BN batch normalization, ReLU activation, asymmetric convolution and DropBlock operations on the input first summation result in sequence.

[0083] See also Figure 6 The third result is input into the fourth image processing layer of the preset asymmetric convolution residual channel attention module to obtain the fourth result output by the fourth image processing layer. The features of the bank card image are extracted by performing two batch normalization operations, ReLU activation operations, asymmetric convolution operations, and DropBlock operations in sequence.

[0084] See also Figure 6 The fourth result is input into the channel attention module to obtain the enhanced feature image output by the channel attention module. The channel attention module is used to extract key information from the fourth result, thereby emphasizing important features of the image and ignoring unnecessary features, while also suppressing noise interference from the background region on the classification task. The enhanced feature image obtained through the channel attention mechanism is then pixel-wise added to the first addition result to obtain the second addition result, and a 1×1 convolution is used to adjust the number of channels before outputting the result.

[0085] Further, the second summed result is subjected to max pooling using the corresponding max pooling layer to obtain the pooling result. A first image processing layer, a second image processing layer, a third image processing layer, a fourth image processing layer, and a channel attention module are then used to perform secondary feature extraction on the pooling result to obtain a first feature image. Specifically, the pooling result is input to the first image processing layer to obtain a first result. The first image processing layer sequentially performs batch normalization, ReLU activation, asymmetric convolution, and DropBlock operations on the input pooling result. Further, the first result is input to the second image processing layer to obtain a second result. The second image processing layer sequentially performs batch normalization, ReLU activation, and asymmetric convolution on the input first result. Further, the second result and the pooling result are added to obtain a first summed result. The first summed result is input to the third image processing layer to obtain a third result. The third image processing layer sequentially performs batch normalization, ReLU activation, asymmetric convolution, and DropBlock operations on the input first summed result. Furthermore, the third result is input into the fourth image processing layer to obtain the fourth result. The fourth image processing layer sequentially performs batch normalization, ReLU activation, asymmetric convolution, and DropBlock operations on the input third result. Further, the fourth result is input into the channel attention module to obtain the enhanced feature image. The first summation result and the enhanced feature image are then added together to obtain the first feature image.

[0086] It should be noted that if two feature extraction operations are performed on the image segmentation result, one pooling operation will be performed. Four feature extraction operations can also be performed; see [link to documentation]. Figure 7 If four feature extraction operations are performed on the image segmentation result, then three pooling operations are performed, with one pooling operation inserted between every two feature extraction operations.

[0087] In this embodiment, a channel attention mechanism is used to model the interdependencies between feature channels and adaptively rescale the features of each channel. This not only enables the network model to learn key feature information along the channel dimension autonomously, but also enhances the network model's ability to discriminate and learn.

[0088] Optionally, the asymmetric convolution operation uses a convolution kernel to extract features from the image after ReLU activation, obtaining a horizontal feature image, a vertical feature image, and a standard convolutional feature image. The horizontal and vertical feature images are then concatenated and fused, and the concatenated and fused result is added to the standard convolutional image.

[0089] See Figure 8The algorithm uses convolutional kernels to extract features from images that have undergone ReLU activation. Specifically, the asymmetric convolution is an improved asymmetric convolution that combines traditional 3×3 convolution with 1×3 and 3×1 asymmetric convolution. A 1×3 convolutional kernel is used to extract horizontal features, resulting in a horizontal feature image. A 3×1 convolutional kernel is used to extract vertical features, resulting in a vertical feature image. A traditional 3×3 convolution is then used to extract image features, resulting in a standard convolutional feature image. The horizontal and vertical feature images are then concatenated and fused. The fused result is added to the standard convolutional image. The improved asymmetric convolution uses two different convolution methods: traditional convolution and asymmetric convolution. Therefore, batch normalization is required after each convolution to ensure that the feature map data follows a normal distribution. The improved asymmetric convolution module combines traditional convolution with asymmetric convolution, which not only enhances the expressive power of the standard rectangular convolution kernel in traditional convolution, but also enhances the feature extraction capability of the network model by utilizing the robustness of asymmetric convolution to image flipping and rotation, thereby improving the classification performance of the network model.

[0090] Optionally, if two feature extraction operations are performed on the image segmentation result, a pre-defined multi-scale width convolutional module and corresponding max pooling layer in the width classification network model are used to perform multiple feature extraction and max pooling operations on the image segmentation result to obtain a second feature image, including:

[0091] The image segmentation result is input into the first branch to obtain the first result. The first branch performs bichannel convolution, batch normalization, and ReLU activation on the input image segmentation result sequentially. The image segmentation result is input into the second branch to obtain the second result. The second branch performs bichannel convolution, batch normalization, and ReLU activation on the input image segmentation result sequentially. The image segmentation result is input into the third branch to obtain the third result. The first, second, and third results are concatenated and fused to obtain the concatenated and fused result. Max pooling is performed on the concatenated and fused result using a max pooling layer to obtain the pooling result. The pooling result is input into the first, second, and third branches respectively, and the corresponding results are concatenated and fused to obtain the second feature image.

[0092] See Figure 4 and Figure 9Taking two feature extraction steps as an example, the deep classification network model includes a pre-defined multi-scale width convolutional module. This module increases the number of channels in each layer of the network to obtain features from the bank card image. The core idea is to concatenate the features extracted by different convolutional kernels in parallel, ultimately obtaining a wider feature map. It should be noted that using convolutional kernels of different sizes can capture receptive fields of different sizes, and using a concatenation operation to fuse feature maps of multiple scales results in richer feature information, which helps improve the accuracy of classification.

[0093] Specifically, see Figure 9 The system includes a pre-defined multi-scale width convolution module with three branches: a first branch, a second branch, and a third branch. The first branch performs a single bi-channel convolution on the input image, the second branch performs two consecutive bi-channel convolutions on the input image, and the third branch performs a 1×1 conventional convolution.

[0094] The image segmentation results are input into the first branch to obtain the first result output by the first branch. The first branch performs bichannel convolution, batch normalization (BN), and ReLU activation operations on the input image segmentation results in sequence.

[0095] Specifically, the image segmentation result is input into the second branch to obtain the second result. The second branch performs bichannel convolution, batch normalization, ReLU activation, bichannel convolution, batch normalization and ReLU activation operations on the input image segmentation result in sequence.

[0096] Specifically, the image segmentation result is input into the third branch to obtain the third result. The first, second, and third results are then concatenated and fused to obtain the concatenated and fused result. Max pooling is performed on the concatenated result using the corresponding pooling layer to obtain the pooling result. The pooling result is then input into the first branch to obtain the first result. The pooling result is then input into the second branch to obtain the second result. The pooling result is then input into the third branch to obtain the third result. The three results are then concatenated and fused to obtain the second feature image.

[0097] In this embodiment, the output feature map width of the multi-scale width convolution module is increased. Instead of a uniform distribution, it clusters highly correlated features together, weakening the influence of non-critical features. This results in less redundant information in the network's propagation through layers, thus accelerating the network's convergence speed. Furthermore, in the specific implementation, batch normalization and ReLU activation operations are sequentially performed after each bi-channel convolution. Additionally, due to the information decay problem during forward propagation, using 1×1 traditional convolution preserves the information of the input image and enhances non-linear characteristics without increasing computational complexity.

[0098] Optionally, the two-channel convolution operation uses a convolution kernel to extract features from the image after the previous operation to obtain the corresponding feature image, and then concatenates and fuses the image input to the two-channel convolution operation with the corresponding feature image.

[0099] See Figure 10 The bi-channel convolution operation extracts features from the image after the previous operation using convolution kernels. Unlike traditional convolution where all channels are convolved together in each iteration, bi-channel convolution sets the number of convolution kernels in the current layer to N and divides it into two equal groups, denoted as Group 1 and Group 2, with each group containing N / 2 kernels. Next, the input image undergoes a 3×3 convolution with N / 2 kernels in Group 1 and another 3×3 convolution with N / 2 kernels in Group 2. The feature images obtained from these two convolution operations are then concatenated with the image input to the bi-channel convolution operation. Finally, a 1×1 convolution is used to adjust the number of channels in the concatenated feature map to N before outputting it. The bi-channel convolution operation divides the number of convolution kernels into two equal groups and applies them to the input image in parallel. Then, the feature images obtained from each group of convolution kernels are concatenated and fused with the input image. This not only effectively obtains rich and diverse feature information from the bank card image, but also enhances the expressive power of feature extraction.

[0100] Figure 11 A schematic diagram of the structure of an image recognition device provided in this application is shown below. Figure 11 As shown, the image recognition device 1100 provided in this embodiment includes an image processing unit 1101 and an image segmentation unit 1002.

[0101] The image processing unit 1101 acquires a bank card image, performs contrast enhancement preprocessing on the bank card image, and obtains the bank card image to be recognized. The image segmentation unit 1102 uses a preset image segmentation model to segment the bank card image to be recognized, obtaining the image segmentation result containing the bank card number region output by the preset image segmentation model. The processing unit 1101 also uses a pre-configured depth and width dual-channel image recognition model to perform multiple depth and width recognition processes on the image segmentation result, obtaining the recognition result of the bank card image to be recognized.

[0102] Optionally, the image recognition device may further include a training unit.

[0103] The training unit is used to preprocess the bank card image to be trained by enhancing its contrast, thereby obtaining a preprocessed image; to perform image augmentation on the preprocessed image to obtain training data; to segment the training data using a preset image segmentation model, thereby obtaining the image segmentation result of the preset image segmentation model that contains the bank card number region to be trained; and to train the initially established dual-channel image recognition model of depth and width based on the image segmentation result to be trained and the label data of the bank card number in the bank card image to be trained, thereby obtaining the pre-configured dual-channel image recognition model of depth and width after training.

[0104] Optionally, the image processing unit is further configured to perform multiple feature extraction and max pooling operations on the image segmentation result using a preset asymmetric convolutional residual block structure, a preset asymmetric convolutional residual channel attention module, and a corresponding max pooling layer in a deep classification network model to obtain a first feature image; and to perform multiple feature extraction and max pooling operations on the image segmentation result using a preset multi-scale width convolutional module and a corresponding max pooling layer in a width classification network model to obtain a second feature image; and to stitch the first feature image and the second feature image together to obtain the bank card image recognition result to be identified.

[0105] Optionally, the image processing unit is further configured to input the image segmentation result into the first image processing layer to obtain the first result. The first image processing layer sequentially performs batch normalization, ReLU activation, asymmetric convolution, and DropBlock operations on the input image segmentation result.

[0106] The first result is input to the second image processing layer to obtain the second result. The second image processing layer performs batch normalization, ReLU activation, and asymmetric convolution on the input first result in sequence. The second result is then added to the image segmentation result to obtain the first summed result. The first summed result is input to the third image processing layer to obtain the third result. The third image processing layer performs batch normalization, ReLU activation, asymmetric convolution, and DropBlock on the input first summed result in sequence. The third result is input to the fourth image processing layer to obtain the fourth result. The fourth image processing layer performs batch normalization, ReLU activation, asymmetric convolution, and DropBlock on the input third result in sequence. The fourth result is input to the channel attention module to obtain the enhanced feature image. The first summed result and the enhanced feature image are then added to obtain the second summed result. A max pooling layer is used to perform max pooling on the second summed result to obtain the pooling result.

[0107] The pooling operation results are subjected to secondary feature extraction using a first image processing layer, a second image processing layer, a third image processing layer, a fourth image processing layer, and a channel attention module to obtain the first feature image.

[0108] Optionally, the image processing unit is further configured to input the image segmentation result into a first branch to obtain a first result, wherein the first branch sequentially performs bichannel convolution, batch normalization, and ReLU activation on the input image segmentation result; input the image segmentation result into a second branch to obtain a second result, wherein the second branch sequentially performs bichannel convolution, batch normalization, ReLU activation, bichannel convolution, batch normalization, and ReLU activation on the input image segmentation result; input the image segmentation result into a third branch to obtain a third result; concatenate and fuse the first, second, and third results to obtain a concatenated and fused result; perform max pooling on the concatenated and fused result using a max pooling layer to obtain a pooling result; and input the pooling result into the first, second, and third branches respectively, and concatenate and fuse the corresponding results to obtain a second feature image.

[0109] Optionally, the image processing unit is further configured to perform grayscale transformation on the bank card image to obtain a grayscale image of the bank card; to perform contrast enhancement processing on the grayscale image of the bank card using a contrast-limited adaptive histogram image equalization algorithm to obtain a contrast-enhanced image of the bank card; and to perform secondary contrast enhancement processing on the contrast-enhanced image of the bank card using gamma transform to obtain the bank card image to be identified.

[0110] Optionally, the image processing unit is also used to perform image rotation processing on each preprocessed image to obtain a rotated image; perform image flipping processing on each preprocessed image to obtain a flipped image; perform Gaussian noise addition processing and color conversion processing on each preprocessed image in sequence to obtain a color-converted image; and use the rotated image, the flipped image, and the color-converted image to augment the training data.

[0111] Figure 12 This is a first block diagram of an electronic device used to implement the image recognition method of this application, as shown below. Figure 12 As shown, the electronic device 1200 includes: a memory 1201, a processor 1202, and a transceiver 1203.

[0112] The processor 1202, memory 1201, and transceiver 1203 are interconnected;

[0113] Transceiver 1203 is used for sending and receiving data;

[0114] Memory 1201 stores computer-executed instructions;

[0115] The processor 1202 executes computer execution instructions stored in the memory 1201, causing the processor 502 to perform the method provided in any of the above embodiments.

[0116] Figure 13 This is a second block diagram of an electronic device used to implement the image recognition method of this application, as shown below. Figure 13 As shown, the electronic device can be a computer, digital broadcasting terminal, messaging device, tablet device, personal digital assistant, server, server cluster, etc.

[0117] Electronic device 800 may include one or more of the following components: processing component 802, memory 804, power supply component 806, multimedia component 808, audio component 810, input / output (I / O) interface 812, sensor component 814, and communication component 816.

[0118] Processing component 802 typically controls the overall operation of electronic device 800, such as operations associated with display, telephone calls, data communication, camera operation, and recording operations. Processing component 802 may include one or more processors 820 to execute instructions to complete all or part of the steps of the methods described above. Furthermore, processing component 802 may include one or more modules to facilitate interaction between processing component 802 and other components. For example, processing component 802 may include a multimedia module to facilitate interaction between multimedia component 808 and processing component 802.

[0119] Memory 804 is configured to store various types of data to support the operation of electronic device 800. Examples of this data include instructions for any application or method operating on electronic device 800, contact data, phonebook data, messages, pictures, videos, etc. Memory 804 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk.

[0120] Power supply component 806 provides power to various components of electronic device 800. Power supply component 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to electronic device 800.

[0121] Multimedia component 808 includes a screen that provides an output interface between electronic device 800 and user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touchscreen to receive input signals from the user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensors may sense not only the boundaries of touch or swipe actions but also the duration and pressure associated with the touch or swipe operation. In some embodiments, multimedia component 808 includes a front-facing camera and / or a rear-facing camera. When electronic device 800 is in an operating mode, such as a shooting mode or video mode, the front-facing camera and / or rear-facing camera may receive external multimedia data. Each front-facing camera and rear-facing camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

[0122] Audio component 810 is configured to output and / or input audio signals. For example, audio component 810 includes a microphone (MIC) configured to receive external audio signals when electronic device 800 is in an operating mode, such as call mode, recording mode, and voice recognition mode. The received audio signals may be further stored in memory 804 or transmitted via communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

[0123] I / O interface 812 provides an interface between processing component 802 and peripheral interface modules, such as keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to, home buttons, volume buttons, power buttons, and lock buttons.

[0124] Sensor assembly 814 includes one or more sensors for providing state assessments of various aspects of electronic device 800. For example, sensor assembly 814 can detect the on / off state of electronic device 800, the relative positioning of components such as the display and keypad of electronic device 800, changes in position of electronic device 800 or a component of electronic device 800, the presence or absence of user contact with electronic device 800, orientation or acceleration / deceleration of electronic device 800, and temperature changes of electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. Sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, sensor assembly 814 may also include an accelerometer, gyroscope, magnetometer, pressure sensor, or temperature sensor.

[0125] Communication component 816 is configured to facilitate wired or wireless communication between electronic device 800 and other devices. Electronic device 800 can access wireless networks based on communication standards, such as WiFi, 2G, or 3G, or combinations thereof. In one exemplary embodiment, communication component 816 receives broadcast signals or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, communication component 816 also includes a near-field communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

[0126] In an exemplary embodiment, the electronic device 800 may be implemented by one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components to perform the methods described above.

[0127] In an exemplary embodiment, a non-transitory computer-readable storage medium including instructions is also provided, such as a memory 804 including instructions, which can be executed by a processor 820 of an electronic device 800 to perform the above-described method. For example, the non-transitory computer-readable storage medium may be a ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, and optical data storage device, etc.

[0128] In an exemplary embodiment, a computer-readable storage medium is also provided, which stores computer-executable instructions that are executed by a processor using the methods in any of the above embodiments.

[0129] In an exemplary embodiment, a computer program product is also provided, including a computer program that is executed by a processor using the methods of any of the above embodiments.

[0130] Other embodiments of this application will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of this application that follow the general principles of this application and include common knowledge or customary techniques in the art not disclosed herein. The specification and examples are to be considered exemplary only, and the true scope and spirit of this application are indicated by the following claims.

[0131] It should be understood that this application is not limited to the precise structure described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope. The scope of this application is limited only by the appended claims.

Claims

1. An image recognition method characterized by, The method includes: A bank card image is acquired, and the bank card image is preprocessed to enhance contrast, thereby obtaining the bank card image to be identified; A preset image segmentation model is used to segment the bank card image to be identified, and the image segmentation result containing the bank card number region is obtained from the output of the preset image segmentation model. A pre-configured dual-channel image recognition model with depth and width is used to perform multiple depth and width recognition processes on the image segmentation result to obtain the recognition result of the bank card image to be recognized. The pre-configured dual-channel image recognition model for depth and width includes: a depth classification network model and a width classification network model; The method employs a pre-configured dual-channel image recognition model with depth and width to perform multiple depth and width recognition processes on the image segmentation result to obtain the recognition result of the bank card image to be recognized, including: The image segmentation result is subjected to multiple feature extraction and max pooling operations using the preset asymmetric convolutional residual block structure, the preset asymmetric convolutional residual channel attention module and the corresponding max pooling layer in the deep classification network model to obtain the first feature image. The image segmentation result is subjected to multiple feature extraction and max pooling operations using the preset multi-scale width convolution module and the corresponding max pooling layer in the width classification network model to obtain the second feature image. The first feature image and the second feature image are stitched together to obtain the recognition result of the bank card image to be identified; The preset multi-scale width convolutional module includes: a first branch, a second branch, and a third branch; If two feature extraction operations are performed on the image segmentation result, then the preset multi-scale width convolution module and corresponding max pooling layer in the width classification network model are used to perform multiple feature extraction operations and max pooling operations on the image segmentation result to obtain a second feature image, including: The image segmentation result is input into the first branch to obtain a first result. The first branch performs binary channel convolution, batch normalization and ReLU activation operations on the input image segmentation result in sequence. The image segmentation result is input into the second branch to obtain the second result. The second branch performs binary channel convolution, batch normalization, ReLU activation, binary channel convolution, batch normalization and ReLU activation on the input image segmentation result in sequence. The image segmentation result is input into the third branch to obtain the third result; The first result, the second result, and the third result are spliced and fused together to obtain the spliced and fused result. The max pooling layer is used to perform max pooling operation on the splicing and fusion result to obtain the pooling operation result; The pooling operation results are input to the first branch, the second branch, and the third branch respectively, and the corresponding results are spliced and fused to obtain the second feature image.

2. The method of claim 1, wherein, The pre-configured dual-channel image recognition model for depth and width is obtained as follows: The bank card images to be trained are preprocessed to enhance contrast, resulting in preprocessed images. The preprocessed image is subjected to image augmentation processing to obtain training data; The training data is segmented using a preset image segmentation model to obtain the image segmentation result of the preset image segmentation model that contains the bank card number region. Based on the image segmentation results to be trained and the label data of the bank card number in the bank card image to be trained, the initially established dual-channel image recognition model of depth and width is trained to obtain the pre-configured dual-channel image recognition model of depth and width.

3. The method according to claim 1, characterized in that, The preset asymmetric convolutional residual block structure includes: a first image processing layer and a second image processing layer; the preset asymmetric convolutional residual channel attention module includes: a third image processing layer, a fourth image processing layer and a channel attention module. If two feature extraction operations are performed on the image segmentation result, then the process of using the preset asymmetric convolutional residual block structure, the preset asymmetric convolutional residual channel attention module, and the corresponding max pooling layer in the deep classification network model to perform multiple feature extraction operations and max pooling operations on the image segmentation result to obtain the first feature image includes: The image segmentation result is input into the first image processing layer to obtain a first result. The first image processing layer performs batch normalization, ReLU activation, asymmetric convolution and DropBlock operations on the input image segmentation result in sequence. The first result is input into the second image processing layer to obtain the second result. The second image processing layer performs batch normalization, ReLU activation and asymmetric convolution operations on the input first result in sequence. The second result and the image segmentation result are added together to obtain the first summation result; The first summation result is input to the third image processing layer to obtain the third result. The third image processing layer performs batch normalization, ReLU activation, asymmetric convolution and DropBlock operations on the input first summation result in sequence. The third result is input into the fourth image processing layer to obtain the fourth result. The fourth image processing layer performs batch normalization, ReLU activation, asymmetric convolution and DropBlock operations on the input third result in sequence. The fourth result is input into the channel attention module to obtain the enhanced feature image; The first addition result and the enhanced feature image are added together to obtain the second addition result; The second summation result is subjected to max pooling operation using the max pooling layer to obtain the pooling operation result. The pooling operation result is subjected to secondary feature extraction using the first image processing layer, the second image processing layer, the third image processing layer, the fourth image processing layer, and the channel attention module to obtain the first feature image.

4. The method of claim 3, wherein, The asymmetric convolution operation involves using a convolution kernel to extract features from an image that has undergone ReLU activation, resulting in a horizontal feature image, a vertical feature image, and a standard convolutional feature image. The horizontal and vertical feature images are then concatenated and fused, and the fused result is added to the standard convolutional feature image.

5. The method of claim 1, wherein, The bi-channel convolution operation involves using a convolution kernel to extract features from the image after the previous operation, obtaining the corresponding feature image, and then concatenating and fusing the image input to the bi-channel convolution operation with the corresponding feature image.

6. The method according to any one of claims 1 to 5, characterized in that, The preprocessing of the bank card image to enhance contrast and obtain the bank card image to be identified includes: The bank card image is subjected to grayscale transformation to obtain a grayscale image of the bank card; A contrast-limited adaptive histogram image equalization algorithm is used to enhance the contrast of the grayscale image of the bank card, resulting in a contrast-enhanced image of the bank card. The bank card contrast-enhanced image is subjected to secondary contrast enhancement processing using gamma transform to obtain the bank card image to be identified.

7. The method of claim 2, wherein, The step of performing image augmentation on the preprocessed image to obtain training data includes: Each preprocessed image is rotated to obtain the rotated image. Each preprocessed image is flipped to obtain the flipped image. Each preprocessed image is sequentially subjected to Gaussian noise addition and color conversion to obtain the color-converted image. The training data is augmented using the rotated image, the flipped image, and the color-converted image.

8. An image recognition apparatus characterized by comprising: The device includes: An image processing unit is used to acquire a bank card image, perform preprocessing on the bank card image to enhance contrast, and obtain a bank card image to be identified. The image segmentation unit is used to segment the bank card image to be identified using a preset image segmentation model, and obtain the image segmentation result containing the bank card number region output by the preset image segmentation model. The image processing unit is also used to perform multiple depth and width recognition processes on the image segmentation result using a pre-configured dual-channel image recognition model of depth and width to obtain the recognition result of the bank card image to be recognized; The pre-configured dual-channel image recognition model for depth and width includes: a depth classification network model and a width classification network model; The image processing unit is specifically used to perform multiple feature extraction and max pooling operations on the image segmentation result using the preset asymmetric convolutional residual block structure, the preset asymmetric convolutional residual channel attention module and the corresponding max pooling layer in the deep classification network model, so as to obtain the first feature image. The image segmentation result is subjected to multiple feature extraction and max pooling operations using the preset multi-scale width convolution module and the corresponding max pooling layer in the width classification network model to obtain the second feature image. The first feature image and the second feature image are stitched together to obtain the recognition result of the bank card image to be identified; The preset multi-scale width convolutional module includes: a first branch, a second branch, and a third branch; The image processing unit is further configured to input the image segmentation result into the first branch to obtain a first result, wherein the first branch sequentially performs a binary channel convolution operation, a batch normalization operation, and a ReLU activation operation on the input image segmentation result; The image segmentation result is input into the second branch to obtain the second result. The second branch performs binary channel convolution, batch normalization, ReLU activation, binary channel convolution, batch normalization and ReLU activation on the input image segmentation result in sequence. The image segmentation result is input into the third branch to obtain the third result; The first result, the second result, and the third result are spliced and fused together to obtain the spliced and fused result. The max pooling layer is used to perform max pooling operation on the splicing and fusion result to obtain the pooling operation result; The pooling operation results are input to the first branch, the second branch, and the third branch respectively, and the corresponding results are spliced and fused to obtain the second feature image.

9. An electronic device comprising: A processor, and a memory communicatively connected to the processor; The memory stores computer-executed instructions; The processor executes computer execution instructions stored in the memory to implement the method as described in any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer-executable instructions, which, when executed by a processor, are used to implement the method as described in any one of claims 1 to 7.

11. A computer program product comprising a computer program that, when executed by a processor, implements the method of any one of claims 1 to 7.