Image recognition methods and systems

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By constructing a deep learning network model based on CNN and LSTM, and directly inputting the entire CAPTCHA image for training, the problem of low recognition accuracy caused by character overlap, noise, and distortion is solved, achieving efficient CAPTCHA recognition and reducing the need for manual maintenance.

CN116416486BActive Publication Date: 2026-06-30CHINA MOBILE INFORMATION TECHNOLOGY CO LTD +1

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: CHINA MOBILE INFORMATION TECHNOLOGY CO LTD
Filing Date: 2022-01-04
Publication Date: 2026-06-30

Application Information

Patent Timeline

04 Jan 2022

Application

30 Jun 2026

Publication

CN116416486B

IPC: G06V10/774; G06V10/82; G06N3/0464; G06N3/049; G06N3/08

AI Tagging

Technology Topics

Term memory Network model

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing technologies face problems such as low image matching accuracy due to character overlap, distortion, and noise when recognizing CAPTCHAs, and character segmentation and character library maintenance consume a lot of manpower.

Method used

An image recognition method based on deep learning network models is adopted. A deep learning network model is constructed using convolutional neural network (CNN) and long short-term memory network (LSTM). The model is directly trained by inputting the whole image, which solves the problems of character overlap, background noise and distortion. The CAPTCHA recognition is achieved through feature extraction, fusion and decoding.

Benefits of technology

No character segmentation is required; the model can be applied immediately after training, achieving an accuracy rate of over 97%. It saves manpower for maintenance and is adaptable to CAPTCHA images in different scenarios.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN116416486B_ABST

Patent Text Reader

Abstract

This invention provides an image recognition method and system. The method includes: constructing a deep learning network model based on a convolutional neural network (CNN) and a long short-term memory network (LSTM); training the deep learning network model to determine a CAPTCHA recognition model; and inputting a CAPTCHA image to be recognized into the CAPTCHA recognition model to obtain a recognition result for the CAPTCHA image. The system is used to execute the above method. This invention, based on a deep learning network model, uses the entire image as input, eliminating the need for character segmentation. Problems such as high background noise, character overlap, and character distortion in CAPTCHA images can be solved through model training. Once trained, the model requires no maintenance or updates. Furthermore, the CAPTCHA recognition model, trained based on the deep learning network model constructed using CNN and LSTM, can achieve classification and extraction of the CAPTCHA image to be recognized.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of image processing technology, and in particular to an image recognition method and system. Background Technology

[0002] To intercept malicious requests, the business system is designed with relatively complex image CAPTCHAs, rendering the traditional recognition capabilities used by the power channel detection system inadequate for monitoring access requirements. This results in the detection service facing the dilemma of being unable to perform specific functional availability testing.

[0003] Existing technology is based on image comparison: The first step is to segment the CAPTCHA into multiple characters; the second step is to compare the character with an image of each character in a character database, selecting the character with the highest similarity as the recognition result (this requires manual data collection and full annotation to maintain the character database); the third step is to concatenate the previously recognized results in sequence to form the corresponding result. The flowchart is as follows... Figure 1 As shown.

[0004] Image comparison methods have the following problems:

[0005] 1. It cannot recognize CAPTCHAs with overlapping characters that cannot be cut.

[0006] 2. Distortion and noise reduce the accuracy of image comparison, which can easily lead to the inability to carry out detection operations;

[0007] 3. Cutting images and annotating the entire font library consumes a lot of manpower. Summary of the Invention

[0008] The image recognition method and system provided by this invention are used to solve at least one of the above-mentioned problems in the prior art. Based on a deep learning network model, the input information is the entire image, and there is no need to perform character segmentation. Problems such as high background noise, character overlap, and character distortion in CAPTCHA images can all be solved through model training. Once the model is trained, there is no need for maintenance or updates. At the same time, the CAPTCHA recognition model is trained based on a deep learning network model constructed with CNN and LSTM, which can realize the classification and extraction of CAPTCHA images to be recognized.

[0009] The present invention provides an image recognition method, comprising:

[0010] Construct deep learning network models based on Convolutional Neural Network (CNN) and Long Short-Term Memory Network (LSTM);

[0011] The deep learning network model is trained to determine the CAPTCHA recognition model;

[0012] The CAPTCHA image to be recognized is input into the CAPTCHA recognition model to obtain the recognition result of the CAPTCHA image to be recognized.

[0013] According to an image recognition method provided by the present invention, the step of constructing a deep learning network model based on a convolutional neural network (CNN) and a long short-term memory network (LSTM) includes:

[0014] Establish a simple network structure consisting of multiple stacked CNNs;

[0015] Establish a complex network structure consisting of multiple convolutional blocks, multiple skip connection layers, and one fully connected layer;

[0016] Based on the simple network structure and the complex network structure, the feature extraction layer of the deep learning network model is constructed;

[0017] Based on the LSTM, construct the feature fusion layer of the deep learning network model;

[0018] The number of CNN layers required to build the simple network structure is determined by the application scenario of the CAPTCHA image to be recognized.

[0019] According to an image recognition method provided by the present invention, training the deep learning network model to determine a CAPTCHA recognition model includes:

[0020] The collected CAPTCHA images are divided into training set, test set and verification set, and the training set, test set and verification set are labeled respectively;

[0021] The training set is input into the deep learning network model for training, and training is stopped when the loss function of the deep learning network model reaches a first preset value and the recognition accuracy of the deep learning network model reaches a second preset value.

[0022] The CAPTCHA recognition model is determined based on the trained deep learning network model;

[0023] The recognition accuracy is determined by inputting the test set into the trained deep learning network model;

[0024] The verification set is used to verify the recognition accuracy of the CAPTCHA recognition model.

[0025] According to an image recognition method provided by the present invention, the application scenario of the verification code image to be recognized is determined in the following manner:

[0026] The application scenario of the CAPTCHA image to be recognized is determined based on the image size, background, and character clarity of the CAPTCHA image to be recognized.

[0027] According to an image recognition method provided by the present invention, the loss function of the deep learning network model is obtained in the following manner:

[0028] Determine the application scenarios for each CAPTCHA image in the training set;

[0029] If the application scenario is a simple scenario, then the loss function is determined based on the cross-entropy function;

[0030] If the application scenario is a complex scenario, then the loss function is determined based on the CTC loss function.

[0031] According to an image recognition method provided by the present invention, the step of inputting a verification code image to be recognized into the verification code recognition model to obtain a recognition result of the verification code image to be recognized includes:

[0032] Based on the feature extraction layer in the CAPTCHA recognition model, feature extraction is performed on the CAPTCHA image to be recognized;

[0033] Based on the feature fusion layer in the CAPTCHA recognition model, feature fusion is performed on the feature extraction results;

[0034] Based on the CTC algorithm in the CAPTCHA recognition model, the feature fusion result is decoded;

[0035] The recognition result is determined based on the decoding result.

[0036] The present invention also provides an image recognition system, comprising: a model building module, a model determination module, and an image recognition module;

[0037] The model building module is used to build deep learning network models based on convolutional neural networks (CNN) and long short-term memory networks (LSTM).

[0038] The model determination module is used to train the deep learning network model to determine the CAPTCHA recognition model;

[0039] The image recognition module is used to input the CAPTCHA image to be recognized into the CAPTCHA recognition model in order to obtain the recognition result of the CAPTCHA image to be recognized.

[0040] The present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of any of the image recognition methods described above.

[0041] The present invention also provides a processor-readable storage medium storing a computer program for causing the processor to execute the steps of any of the image recognition methods described above.

[0042] The present invention also provides a computer program product, including a computer program that, when executed by a processor, implements the steps of any of the image recognition methods described above.

[0043] The image recognition method and system provided by this invention are based on a deep learning network model. The input information is the entire image, which does not require character segmentation. Problems such as high background noise, character overlap, and character distortion in CAPTCHA images can be solved through model training. Once the model is trained, it does not require maintenance or updates. At the same time, the CAPTCHA recognition model is trained based on a deep learning network model constructed with CNN and LSTM, which can realize the classification and extraction of CAPTCHA images to be recognized. Attached Figure Description

[0044] To more clearly illustrate the technical solutions in this invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.

[0045] Figure 1 This is a flowchart illustrating the CAPTCHA recognition method provided by existing technology;

[0046] Figure 2 This is one of the flowcharts illustrating the image recognition method provided by the present invention;

[0047] Figure 3 This is the second flowchart illustrating the image recognition method provided by the present invention;

[0048] Figure 4 This is a flowchart illustrating the training process of the deep learning network model provided by the present invention.

[0049] Figure 5 This is one of the schematic diagrams of the verification code image recognition results provided by the present invention;

[0050] Figure 6 This is the second schematic diagram of the verification code image recognition result provided by the present invention;

[0051] Figure 7 This is one of the application scenarios of the verification code image provided by the present invention;

[0052] Figure 8 This is the second application scenario of the verification code image provided by the present invention;

[0053] Figure 9 This is a flowchart illustrating the process of verifying the accuracy of the CAPTCHA recognition model provided by this invention.

[0054] Figure 10 This is a schematic diagram of the image recognition system provided by the present invention;

[0055] Figure 11 This is a schematic diagram of the physical structure of the electronic device provided by the present invention. Detailed Implementation

[0056] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention.

[0057] Figure 2 This is a flowchart illustrating the image recognition method provided by the present invention, as shown below. Figure 2 As shown, the method includes:

[0058] S1. Construct a deep learning network model based on the Convolutional Neural Network (CNN) and the Long Short-Term Memory Network (LSTM).

[0059] S2. Train the deep learning network model to determine the CAPTCHA recognition model;

[0060] S3. Input the CAPTCHA image to be recognized into the CAPTCHA recognition model to obtain the recognition result of the CAPTCHA image to be recognized.

[0061] It should be noted that the above method can be implemented by computer equipment.

[0062] Optionally, a deep learning network model consisting of a convolutional neural network (CNN) and a long short-term memory network (LSTM) can be used as the CAPTCHA recognition model. Specifically, the CAPTCHA recognition model is obtained by training the deep learning network model based on CNN and LSTM.

[0063] Compared with traditional machine learning and neural networks, Convolutional Neural Networks (CNNs) have the characteristics of local perception and parameter sharing, making them more suitable for capturing image features. They are often used in image-related scenarios such as feature extraction and image classification.

[0064] Compared to traditional neural networks and RNNs, LSTM adds a C state variable, which propagates from the beginning to the end of the sequence, encompassing the semantic information of the entire sequence. Therefore, LSTM can capture dependencies over longer distances within the sequence. In this invention, the LSTM layer is primarily responsible for encoding and fusing the feature data extracted by the CNN layer, combining them to form more generalized feature data.

[0065] By inputting the CAPTCHA image to be recognized into the CAPTCHA recognition model, the recognition result of the CAPTCHA image can be obtained, such as the characters in the CAPTCHA image.

[0066] The CAPTCHA recognition model can be encapsulated using the Tornado framework and provided with an HTTP interface for the probe program. Specifically, the TensorFlow framework's servicing model is used for hot deployment of the CAPTCHA recognition model. The CAPTCHA recognition model is then loaded into memory, and the interface service is provided to the outside world through the Tornado framework's HTTP service. In this way, when the probe program collects a CAPTCHA image, it can directly call the encapsulated CAPTCHA recognition model through the HTTP interface to recognize the collected CAPTCHA image.

[0067] The image recognition method provided by this invention is based on a deep learning network model. The input information is the entire image, eliminating the need for character segmentation. Problems such as high background noise, character overlap, and character distortion in CAPTCHA images can all be solved through model training. Once the model is trained, it requires no maintenance or updates. Furthermore, the CAPTCHA recognition model is trained based on a deep learning network model constructed using CNN and LSTM, enabling the classification and extraction of CAPTCHA images to be recognized.

[0068] Furthermore, in one embodiment, step S1 may specifically include:

[0069] S11. Establish a simple network structure consisting of multiple superimposed CNNs;

[0070] S12. Establish a complex network structure consisting of multiple convolutional blocks, multiple skip connection layers, and one fully connected layer;

[0071] S13. Construct feature extraction layers for deep learning network models based on simple and complex network structures;

[0072] S14. Construct a feature fusion layer for a deep learning network model based on LSTM;

[0073] The number of CNN layers required to build a simple network structure is determined by the application scenario of the CAPTCHA image to be recognized.

[0074] Furthermore, in one embodiment, the application scenario of the verification code image to be identified is determined in the following way:

[0075] The application scenario of the CAPTCHA image to be recognized is determined based on its image size, background, and character clarity.

[0076] Optionally, before constructing a deep learning network model based on a Convolutional Neural Network (CNN) and a Long Short-Term Memory (LSTM) network, the development environment must first be configured, as follows:

[0077] When developing with Python, the following environment is required:

[0078] Python 3.8, Anaconda 3, CUDA 10.2, TensorFlow 1.14, and Tornado 6.0.

[0079] A deep learning network model, specifically a CNN+LSTM structure, was used to construct an end-to-end CAPTCHA recognition model with a unified overall network structure.

[0080] Constructing the feature extraction layer network structure:

[0081] The construction of the feature extraction layer network structure includes:

[0082] Step 1: Establish a simple network structure:

[0083] Two feature extraction layers were developed for the CAPTCHA style. The simple network is composed of multiple CNN layers (e.g., 5 or X layers). The number of CNN layers required to build the simple network structure is determined by the application scenario of the CAPTCHA image to be recognized. The application scenario of the CAPTCHA image to be recognized is determined by the image size, background, and character clarity of the image. If the application scenario of the CAPTCHA image to be recognized is a simple scenario, a simple network structure composed of 5 CNN layers is used. If the application scenario of the CAPTCHA image to be recognized is a complex scenario, a simple network structure composed of X CNN layers is used.

[0084] The simple network structure consisting of 5 stacked CNN layers is as follows:

[0085] First layer: Convolution kernel: 7, Number of convolution kernels: 32, Stride: (1, 1);

[0086] Second layer: Convolution kernel: 5, Number of convolution kernels: 64, Stride: (1, 1);

[0087] Third layer: Convolution kernel: 3, Number of convolution kernels: 128, Stride: (1, 1);

[0088] Fourth layer: Convolution kernel: 3, Number of convolution kernels: 128, Stride: (1, 1);

[0089] Fifth layer: Convolution kernel: 3, Number of convolution kernels: 64, Stride: (1, 1).

[0090] Step 2: Establish a complex network structure:

[0091] Complex networks consist of multiple convolutional blocks, such as two convolutional blocks (containing a convolutional layer with a kernel size of 7, a kernel number of 16, and a stride of 1, and a batch normalization (BN) layer, which is used to speed up the training and convergence of the network and prevent gradient vanishing and overfitting), multiple skip connection blocks, such as four skip connection blocks (containing two convolutional layers, two BN layers, and a 1*1 convolutional layer), and a fully connected layer.

[0092] Step 3: Perform feature fusion using an LSTM network structure:

[0093] To address the issue of small characters in CAPTCHA images, where the feature data obtained from multiple CNN layers has limited effectiveness (e.g., characters like 's' and '5' cannot be recognized), we introduce an LSTM network structure after the CNN network. This structure consists of an input gate (selectively retaining current information), a forget gate (selectively retaining previous information), and an output gate (generating the current time state). The output features of the convolutional modules are used as input features and fed into the LSTM to obtain the sequence and its length. The sequence and its length are then fed into the `beam_search_decoder` function to predict a sparse matrix tensor. Finally, the resulting sparse matrix tensor is converted back into a sequence and decoded into characters A-Z, az, and 1-9. Further fusing and re-encoding the feature data obtained from the CNN network with the LSTM to form new features effectively improves the model's recognition accuracy.

[0094] The image recognition method provided by this invention introduces a deep learning network model and provides two network structures suitable for capturing the character features of image verification codes. This solves the problems that existing technologies cannot address and enables the recognition of verification codes of variable length, such as those with high background noise, overlapping characters, and distorted characters.

[0095] Furthermore, in one embodiment, step S2 may specifically include:

[0096] S21. Divide the collected CAPTCHA images into training set, test set and verification set, and label the training set, test set and verification set respectively;

[0097] S22. Input the training set into the deep learning network model for training, and stop training when the loss function of the deep learning network model reaches the first preset value and the recognition accuracy of the deep learning network model reaches the second preset value.

[0098] S23. Determine the CAPTCHA recognition model based on the trained deep learning network model;

[0099] The recognition accuracy is determined by inputting the test set into the trained deep learning network model;

[0100] The validation set is used to verify the recognition accuracy of the CAPTCHA recognition model.

[0101] Optionally, such as Figure 3 As shown, CAPTCHA images can be obtained by collecting data from various business system pages. The filename format of each CAPTCHA image is then checked for correctness; if not, the filename format is modified. The CAPTCHA images are then divided into training, testing, and validation sets. Python web crawlers are used to generate data scripts for each business system, which are then used to crawl CAPTCHA images from each system. At least 11,000 images are collected from each business system, divided into a testing set of 2,000 images, a training set of 8,000 images, and a validation set of 1,000 images. The testing, training, and validation sets are manually labeled (the text in the CAPTCHA images consists of 3 to 6 randomly selected uppercase and lowercase letters from 10 numbers).

[0102] The training method employs a supervised approach to train the deep learning network model. Specifically, an iterative training scheme is used. First, a small amount of labeled data (training set) is used to train the deep learning network model. The training set is then input into the deep learning network model for training. The output of the deep learning network model is then used to label the data. Finally, after manual verification and correction, the data is added back to the training set for convergence training. The parameters of the deep learning network model are adjusted until convergence (i.e., the loss function of the deep learning network model reaches a first preset value). After convergence, the model's recognition accuracy is verified to determine if it has reached the target (i.e., whether the recognition accuracy of the deep learning network model reaches a second preset value). Once the target is reached, the model is deployed online. For different categories of CAPTCHA images, separate labeling and training are used. When the model has not converged, a data script is generated using the model's output data and added to the original training set for further training.

[0103] It should be noted that the recognition accuracy of the deep learning network model is determined by inputting the test set into the deep learning network model and comparing the recognition results with the manually labeled results of the test set.

[0104] Furthermore, traditional training methods require preparing all the labeled data at once before training and validating the model. This approach means that complex CAPTCHAs require a large amount of labeled data, resulting in a significant investment of manpower in data labeling.

[0105] A novel iterative training method is adopted: first, a small amount of labeled data is used to train the deep learning network model; then, the deep learning network model is used to label the data; finally, manual verification and correction are performed before the data is combined with the training set to train the deep learning network model for convergence. This saves significant manpower and enables rapid deployment, shortening the deployment cycle. It supports production, expands the probing capabilities, and allows for functional availability probing of the entire network, helping to promptly identify problems in business systems.

[0106] Based on the above steps, a CAPTCHA recognition model was implemented to recognize CAPTCHA images of variable length. The CAPTCHA recognition model employs supervised training for iterative training. Different types of CAPTCHAs are individually labeled, trained, and have their parameters tuned. Depending on the CAPTCHA style, the model name, character types (uppercase letters, uppercase and lowercase letters and numbers, lowercase letters, numbers, uppercase alphanumeric, lowercase alphanumeric, uppercase and lowercase letters), image width, and image height are set to improve the recognition accuracy of single-category CAPTCHAs. During training, for different CAPTCHA styles, parameters are adjusted according to their complexity, such as the number of CNN layers, filter size, number of pooling layers, and number of LSTM neurons. The methods and specific steps for parameter tuning are as follows... Figure 4 As shown:

[0107] Input the training set into the deep learning network model and configure the model parameters: select the number of CNN layers and configure the character categories according to the number of characters and character styles of the verification code, and set the training termination condition (e.g., recognition accuracy of 95% and loss function of 0.5). Start training and determine whether it has converged. If not, regenerate the training set by mixing the data output by the model with the original training set until convergence. Then, determine whether the recognition accuracy is lower than 80%. If so, regenerate the training set by mixing the new data labeled by the data script with the original training set. If not, stop training when the recognition accuracy reaches 95%. If the recognition accuracy is lower than 95%, adjust the model parameters and continue the above training process.

[0108] The trained deep learning network model is used as a CAPTCHA recognition model. The recognition results for different types of CAPTCHAs based on this model are as follows: Figure 5 and Figure 6 As shown.

[0109] The image recognition method provided by this invention is based on supervised deep learning network model training and iterative training. Different types of CAPTCHA images are individually labeled, trained and parameter-tuned, which improves the recognition accuracy of single-category CAPTCHAs.

[0110] Furthermore, in one embodiment, the loss function of the deep learning network model is obtained as follows:

[0111] Determine the application scenarios for each CAPTCHA image in the training set;

[0112] If the application scenario is simple, then the loss function is determined based on the cross-entropy function;

[0113] If the application scenario is complex, the loss function is determined based on the CTC loss function.

[0114] Optionally, during the convergence training of the deep learning network model, different loss functions are introduced according to the application scenarios of each CAPTCHA image in the training set. Specifically:

[0115] If the application scenario of the verification code image is as follows: Figure 7 In the simple scenario shown, the images of this type of CAPTCHA are small, the background is simple, and the characters are square and clear. For training this type of CAPTCHA image, the cross-entropy function can be used as the loss function of the deep learning network model.

[0116] If the application scenario of the verification code image is as follows: Figure 8 In the complex scenarios shown, such CAPTCHA images are large or have complex backgrounds, and characters may be distorted, stuck together, or deformed. By introducing the CTC (Connectionist Temporal Classification) algorithm and using the CTC loss function as the loss function of the deep learning network model, this problem can be solved.

[0117] CTC is suitable for time series problems where the alignment between input features and output labels is uncertain. CTC can automatically optimize model parameters and alignment boundaries simultaneously end-to-end. Compared to methods like cross-entropy, CTC is more suitable for decoding of variable lengths and is commonly used in scenarios such as speech recognition.

[0118] LSTM outputs as many inputs as inputs. CTC (Concurrent Character Tracing) is introduced to address the alignment problem between input features and output labels. Due to variations in character spacing or distortion in CAPTCHAs, the same character can appear in different forms. Character distortion can lead to the identification of duplicate characters. Therefore, CTC is used to solve the alignment problem. After training the deep learning network model, the recognition results are processed by removing spacing characters and duplicate characters (if the same character appears consecutively, it indicates only one character; if there are spacing characters in between, it indicates that the character appears multiple times).

[0119] In deep learning network models, CTC can function as a loss function to train the parameters of a dynamic programming algorithm. The feature matrices obtained in steps 2 and 3 are used with the Adam algorithm to obtain the CTC loss function result. On the other hand, it acts as a decoder, responsible for encoding the vectors of the CAPTCHA image after CNN and LSTM encoding into corresponding characters, enabling the decoding of variable-length CAPTCHA images and outputting the recognition result.

[0120] For simple scenarios, a deep learning network model with a simple network structure consisting of 5 CNNs is used for training. For complex scenarios, a deep learning network model with a simple network structure consisting of X CNNs is used for training. If a scenario is determined to be simple but the target accuracy cannot be achieved after training with a deep learning network model with a simple network structure consisting of 5 CNNs, a deep learning network model with a simple network structure consisting of X CNNs will be used for retraining.

[0121] The image recognition method provided by this invention selects different convergence conditions to train the model for CAPTCHA images in different application scenarios, thereby improving the convergence speed of the model.

[0122] Furthermore, in one embodiment, step S3 may specifically include:

[0123] S31. Based on the feature extraction layer in the CAPTCHA recognition model, perform feature extraction on the CAPTCHA image to be recognized;

[0124] S32. Based on the feature fusion layer in the CAPTCHA recognition model, perform feature fusion on the feature extraction results;

[0125] S33. Based on the CTC algorithm in the CAPTCHA recognition model, decode the feature fusion result;

[0126] S34. Determine the recognition result based on the decoding result.

[0127] Optionally, see Figure 9The 1000 images from the validation set are input into the trained model to verify the final recognition result. Specifically, the labeled validation set is input into the CAPTCHA recognition model, and the application scenario of each CAPTCHA image in the validation set is determined. If it is a simple scenario, a feature extraction layer consisting of a simple network structure (CNN5) composed of 5 layers of CNN and a complex network structure is selected to extract features. Based on the feature fusion layer composed of LSTM, the feature extraction results are fused. Finally, the CTC algorithm in the CAPTCHA recognition model is used to decode the feature fusion results, and the recognition result is obtained based on the decoding result to verify the recognition accuracy of the CAPTCHA recognition model.

[0128] For complex scenarios, a simple network structure (CNNX) consisting of X layers of CNNs and a feature extraction layer consisting of a complex network structure are selected to extract features. Then, based on a feature fusion layer consisting of LSTM, the feature extraction results are fused. Finally, the CTC algorithm in the CAPTCHA recognition model is used to decode the feature fusion results. The recognition result is obtained based on the decoding result to verify the recognition accuracy of the CAPTCHA recognition model.

[0129] The image recognition method provided by this invention is based on a deep learning network model. The input information is the entire image, eliminating the need for character segmentation. Problems such as high background noise, character overlap, and character distortion can all be solved through model training. Once the model is trained, it can be deployed online without maintenance or updates, and the verification recognition accuracy can reach over 97%.

[0130] The image recognition system provided by the present invention is described below. The image recognition system described below can be referred to in correspondence with the image recognition method described above.

[0131] Figure 10 This is a schematic diagram of the image recognition system provided by the present invention, such as... Figure 10 As shown, it includes: a model building module 1010, a model determination module 1011, and an image recognition module 1012;

[0132] Model building module 1010 is used to build deep learning network models based on convolutional neural network (CNN) and long short-term memory network (LSTM).

[0133] The model determination module 1011 is used to train the deep learning network model to determine the CAPTCHA recognition model;

[0134] The image recognition module 1012 is used to input the CAPTCHA image to be recognized into the CAPTCHA recognition model in order to obtain the recognition result of the CAPTCHA image to be recognized.

[0135] The image recognition system provided by this invention is based on a deep learning network model. The input information is the entire image, eliminating the need for character segmentation. Problems such as high background noise, character overlap, and character distortion in CAPTCHA images can all be solved through model training. Once the model is trained, it requires no maintenance or updates. Furthermore, the CAPTCHA recognition model is trained based on a deep learning network model constructed using CNN and LSTM, enabling the classification and extraction of CAPTCHA images to be recognized.

[0136] Furthermore, in one embodiment, the model building module 1010 can also be specifically used for:

[0137] Establish a simple network structure consisting of multiple stacked CNNs;

[0138] Establish a complex network structure consisting of multiple convolutional blocks, multiple skip connection layers, and one fully connected layer;

[0139] Based on the simple network structure and the complex network structure, a feature extraction layer for a deep learning network model is constructed.

[0140] Based on LSTM, a feature fusion layer is constructed for a deep learning network model.

[0141] The number of CNN layers required to build a simple network structure is determined by the application scenario of the CAPTCHA image to be recognized.

[0142] The application scenarios for the CAPTCHA images to be recognized are determined in the following way:

[0143] The application scenario of the CAPTCHA image to be recognized is determined based on its image size, background, and character clarity.

[0144] The image recognition system provided by this invention introduces a deep learning network model, providing two network structures suitable for capturing the character features of image verification codes. This solves the problems that existing technologies cannot address, and enables the recognition of verification codes of variable length, such as those with high background noise, overlapping characters, and distorted characters.

[0145] Furthermore, in one embodiment, the model determination module 1011 may also be specifically used for:

[0146] The collected CAPTCHA images are divided into training set, test set and verification set, and the training set, test set and verification set are labeled respectively;

[0147] The training set is input into the deep learning network model for training, and training is stopped when the loss function of the deep learning network model reaches the first preset value and the recognition accuracy of the deep learning network model reaches the second preset value.

[0148] Based on the trained deep learning network model, determine the CAPTCHA recognition model;

[0149] The recognition accuracy is determined by inputting the test set into the trained deep learning network model;

[0150] The validation set is used to verify the recognition accuracy of the CAPTCHA recognition model.

[0151] The image recognition system provided by this invention is based on supervised deep learning network model training and iterative training. Different types of CAPTCHA images are individually labeled, trained and parameter-tuned, which improves the recognition accuracy of single-category CAPTCHAs.

[0152] Furthermore, in one embodiment, the model determination module 1011 may also be specifically used for:

[0153] Determine the application scenarios for each CAPTCHA image in the training set;

[0154] If the application scenario is simple, then the loss function is determined based on the cross-entropy function;

[0155] If the application scenario is complex, the loss function is determined based on the CTC loss function.

[0156] The image recognition system provided by this invention selects different convergence conditions to train the model for CAPTCHA images in different application scenarios, thereby improving the convergence speed of the model.

[0157] Furthermore, in one embodiment, the image recognition module 1012 may also be specifically used for:

[0158] Based on the feature extraction layer in the CAPTCHA recognition model, feature extraction is performed on the CAPTCHA image to be recognized;

[0159] Based on the feature fusion layer in the CAPTCHA recognition model, feature fusion is performed on the feature extraction results;

[0160] Based on the CTC algorithm in the CAPTCHA recognition model, the feature fusion result is decoded;

[0161] The recognition result is determined based on the decoding result.

[0162] The image recognition system provided by this invention is based on a deep learning network model. The input information is the entire image, eliminating the need for character segmentation. Problems such as high background noise, character overlap, and character distortion can all be solved through model training. Once the model is trained, it can be deployed online without maintenance or updates, and the verification recognition accuracy can reach over 97%.

[0163] Figure 11 This is a schematic diagram of the physical structure of an electronic device provided by the present invention, such as... Figure 11 As shown, the electronic device may include a processor 1110, a communication interface 1111, a memory 1112, and a bus 1113, wherein the processor 1110, the communication interface 1111, and the memory 1112 communicate with each other via the bus 1113. The processor 1110 can call logical instructions in the memory 1112 to execute the following methods:

[0164] Construct deep learning network models based on Convolutional Neural Network (CNN) and Long Short-Term Memory Network (LSTM);

[0165] Train a deep learning network model to determine the CAPTCHA recognition model;

[0166] The image of the CAPTCHA to be recognized is input into the CAPTCHA recognition model to obtain the recognition result of the CAPTCHA image.

[0167] Furthermore, the logical instructions in the aforementioned memory can be implemented as software functional units and sold or used as independent products, and can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer power supply (which may be a personal computer, server, or network power supply, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0168] Furthermore, this invention discloses a computer program product, which includes a computer program stored on a non-transitory computer-readable storage medium. The computer program includes program instructions, and when these instructions are executed by a computer, the computer can perform the image recognition methods provided in the above-described method embodiments, such as including:

[0169] Construct deep learning network models based on Convolutional Neural Network (CNN) and Long Short-Term Memory Network (LSTM);

[0170] Train a deep learning network model to determine the CAPTCHA recognition model;

[0171] The image of the CAPTCHA to be recognized is input into the CAPTCHA recognition model to obtain the recognition result of the CAPTCHA image.

[0172] On the other hand, the present invention also provides a processor-readable storage medium storing a computer program for causing the processor to execute the methods provided in the above embodiments, such as including...

[0173] Construct deep learning network models based on Convolutional Neural Network (CNN) and Long Short-Term Memory Network (LSTM);

[0174] Train a deep learning network model to determine the CAPTCHA recognition model;

[0175] The image of the CAPTCHA to be recognized is input into the CAPTCHA recognition model to obtain the recognition result of the CAPTCHA image.

[0176] The system embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.

[0177] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., including several instructions to cause a computer power supply (which may be a personal computer, server, or network power supply, etc.) to execute the methods described in various embodiments or some parts of the embodiments.

[0178] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. An image recognition method, characterized in that, include: Construct deep learning network models based on Convolutional Neural Network (CNN) and Long Short-Term Memory Network (LSTM); The deep learning network model is trained to determine the CAPTCHA recognition model; The CAPTCHA image to be recognized is input into the CAPTCHA recognition model to obtain the recognition result of the CAPTCHA image to be recognized; The loss function of the deep learning network model is obtained in the following way: Determine the application scenarios for each CAPTCHA image in the training set; If the application scenario is a simple scenario, then the loss function is determined based on the cross-entropy function; If the application scenario is a complex scenario, then the loss function is determined based on the CTC loss function; The construction of a deep learning network model based on Convolutional Neural Network (CNN) and Long Short-Term Memory Network (LSTM) includes: Establish a simple network structure consisting of multiple stacked CNNs; Establish a complex network structure consisting of multiple convolutional blocks, multiple skip connection layers, and one fully connected layer; Based on the simple network structure and the complex network structure, the feature extraction layer of the deep learning network model is constructed; Based on the LSTM, construct the feature fusion layer of the deep learning network model; The number of CNN layers required to build the simple network structure is determined by the application scenario of the CAPTCHA image to be recognized, which includes simple and complex scenarios.

2. The image recognition method according to claim 1, characterized in that, Training the deep learning network model to determine the CAPTCHA recognition model includes: The collected CAPTCHA images are divided into training set, test set and verification set, and the training set, test set and verification set are labeled respectively; The training set is input into the deep learning network model for training, and training is stopped when the loss function of the deep learning network model reaches a first preset value and the recognition accuracy of the deep learning network model reaches a second preset value. The CAPTCHA recognition model is determined based on the trained deep learning network model; The recognition accuracy is determined by inputting the test set into the trained deep learning network model; The verification set is used to verify the recognition accuracy of the CAPTCHA recognition model.

3. The image recognition method according to claim 1, characterized in that, The application scenario of the verification code image to be recognized is determined in the following way: The application scenario of the CAPTCHA image to be recognized is determined based on the image size, background, and character clarity of the CAPTCHA image to be recognized.

4. The image recognition method according to claim 1, characterized in that, The step of inputting the CAPTCHA image to be recognized into the CAPTCHA recognition model to obtain the recognition result of the CAPTCHA image to be recognized includes: Based on the feature extraction layer in the CAPTCHA recognition model, feature extraction is performed on the CAPTCHA image to be recognized; Based on the feature fusion layer in the CAPTCHA recognition model, feature fusion is performed on the feature extraction results; Based on the CTC algorithm in the CAPTCHA recognition model, the feature fusion result is decoded; The recognition result is determined based on the decoding result.

5. An image recognition system, characterized in that, include: The module consists of a model building module, a model determination module, and an image recognition module. The model building module is used to build deep learning network models based on convolutional neural networks (CNN) and long short-term memory networks (LSTM). The model determination module is used to train the deep learning network model to determine the CAPTCHA recognition model; The image recognition module is used to input the verification code image to be recognized into the verification code recognition model in order to obtain the recognition result of the verification code image to be recognized. The loss function of the deep learning network model is obtained in the following way: Determine the application scenarios for each CAPTCHA image in the training set; If the application scenario is a simple scenario, then the loss function is determined based on the cross-entropy function; If the application scenario is a complex scenario, then the loss function is determined based on the CTC loss function; The construction of a deep learning network model based on Convolutional Neural Network (CNN) and Long Short-Term Memory Network (LSTM) includes: Establish a simple network structure consisting of multiple stacked CNNs; Establish a complex network structure consisting of multiple convolutional blocks, multiple skip connection layers, and one fully connected layer; Based on the simple network structure and the complex network structure, the feature extraction layer of the deep learning network model is constructed; Based on the LSTM, construct the feature fusion layer of the deep learning network model; The number of CNN layers required to build the simple network structure is determined by the application scenario of the CAPTCHA image to be recognized, which includes simple and complex scenarios.

6. An electronic device comprising a processor and a memory storing a computer program, characterized in that, When the processor executes the computer program, it implements the steps of the image recognition method according to any one of claims 1 to 4.

7. A processor-readable storage medium, characterized in that, The processor-readable storage medium stores a computer program for causing the processor to perform the steps of the image recognition method according to any one of claims 1 to 4.

8. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the image recognition method as described in any one of claims 1 to 4.