Image recognition method and device, computer readable storage medium and electronic device

By adding a target network module to the image recognition model and improving the network structure, the problem of low accuracy in graphic logo recognition was solved, enabling efficient recognition of graphic logos on mobile terminals and improving recognition accuracy and user experience.

CN115471779BActive Publication Date: 2026-06-19INDUSTRIAL AND COMMERCIAL BANK OF CHINA

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
INDUSTRIAL AND COMMERCIAL BANK OF CHINA
Filing Date
2022-10-18
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing image recognition models have low accuracy in recognizing graphic logos, which fails to meet the performance requirements of mobile terminals.

Method used

By adding a target network module to the first recognition model, a target recognition model is generated. Combined with image enhancement processing and network structure improvement, including improvements to residual structure and cascade structure, a target recognition model is generated and applied on a mobile terminal platform.

Benefits of technology

While reducing the model size, the recognition accuracy of graphic logos and model performance have been improved, thus enhancing the user experience.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115471779B_ABST
    Figure CN115471779B_ABST
Patent Text Reader

Abstract

This invention discloses an image recognition method, apparatus, computer-readable storage medium, and electronic device, relating to the field of artificial intelligence technology. The method includes: acquiring at least one video frame, wherein the video frame includes at least one image to be recognized; inputting the image to be recognized into a target recognition model, outputting multiple recognition results, wherein the target recognition model is obtained by adding a target network module to a first recognition model, the target network module being used to expand the network branches of the first recognition model, and the recognition result representing the probability that the image to be recognized belongs to the category corresponding to the recognition result; determining the target category to which the image to be recognized belongs based on the multiple recognition results, and determining the image to be recognized as a target image if the target category meets preset conditions, wherein the target image includes at least a target graphic logo. This invention solves the technical problem of low accuracy in recognizing graphic logos in existing image recognition models.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of artificial intelligence technology, and more specifically, to an image recognition method, apparatus, computer-readable storage medium, and electronic device. Background Technology

[0002] To enhance the appeal of their applications, financial institutions have launched a series of activities for users to participate in. For example, they have introduced augmented reality (AR) lucky draws on their applications (such as mobile banking apps). Users can scan the graphic logos (e.g., the financial institution's icon) of nearby financial institutions through the application to participate in the lucky draw, thereby enhancing the user experience and improving user retention.

[0003] To meet these needs, a series of image recognition models for identifying graphic logos have emerged. However, since these models need to be applied to mobile terminals, there are certain requirements on the number of parameters and performance of the models. Currently, although the image recognition models used in existing technologies have a small number of parameters, their recognition effect is poor, which reduces the performance of the models and results in low accuracy in recognizing graphic logos.

[0004] There is currently no effective solution to the above problems. Summary of the Invention

[0005] This invention provides an image recognition method, apparatus, computer-readable storage medium, and electronic device to at least solve the technical problem of low accuracy in recognizing graphic logos in existing image recognition models.

[0006] According to one aspect of the present invention, an image recognition method is provided, comprising: acquiring at least one video frame, wherein the video frame includes at least an image to be recognized; inputting the image to be recognized into a target recognition model and outputting multiple recognition results, wherein the target recognition model is obtained by adding a target network module to a first recognition model, the target network module being used to expand the network branches of the first recognition model, and the recognition result representing the probability that the image to be recognized belongs to the category corresponding to the recognition result; determining the target category to which the image to be recognized belongs based on the multiple recognition results, and determining the image to be recognized as a target image if the target category meets preset conditions, wherein the target image includes at least a target graphic logo.

[0007] Furthermore, the image recognition method also includes: acquiring a target sample dataset; training an initial recognition model based on the target sample dataset to obtain a trained recognition model; and fusing the target layer network structure of the trained recognition model to obtain a target recognition model.

[0008] Furthermore, the image recognition method further includes: obtaining a first recognition model, wherein the first recognition model includes at least an inverse residual structure, and the inverse residual structure includes at least a residual structure and a cascaded structure; adding a first network module and a second network module between the pointwise convolutional layer and the depth convolutional layer of the residual structure to obtain a first target initial network module, wherein the kernel size of the depth convolutional layer of the residual structure is a first size, the first network module consists of a batch normalization layer, and the second network module consists of a batch normalization layer and a depth convolutional layer with a kernel size of a second size; adding a third network module in the cascaded structure to obtain a second target initial network module, wherein the third network module includes at least a first network branch and a second network branch, the kernel size of the depth convolutional layer of the first network branch is a third size, the kernel size of the depth convolutional layer of the second network branch is a fourth size, and the sizes of the first size, second size, third size, and fourth size are different; generating an initial recognition model based on the first target initial network module and the second target initial network module.

[0009] Furthermore, the image recognition method further includes: transforming the batch normalization layer in the first network module to obtain a first deep convolutional layer, wherein the kernel size of the first deep convolutional layer is a first size; transforming the deep convolutional layer with a kernel size of a second size to obtain a second deep convolutional layer, wherein the kernel size of the second deep convolutional layer is the first size; fusing the batch normalization layer in the second network module with the second deep convolutional layer to obtain a third deep convolutional layer, wherein the kernel size of the third deep convolutional layer is the first size; and fusing the first deep convolutional layer, the third deep convolutional layer, and the deep convolutional layer of the residual structure to obtain a target recognition model.

[0010] Furthermore, the image recognition method also includes: acquiring a sample dataset; performing image enhancement processing on the sample dataset to obtain a target sample dataset, wherein the image enhancement processing includes at least random obfuscation enhancement processing and background replacement enhancement processing.

[0011] Furthermore, the image recognition method also includes: converting the target recognition model into a target file and integrating the target file into the target platform, wherein the target file is in a file format that the target platform can recognize.

[0012] Furthermore, the image recognition method also includes: responding to a page navigation command and displaying a preset page, wherein the preset page is used to guide the target object to participate in activities on the preset page.

[0013] According to another aspect of the present invention, an image recognition apparatus is also provided, comprising: an acquisition module, configured to acquire at least one video frame, wherein the video frame includes at least an image to be recognized; a processing module, configured to input the image to be recognized into a target recognition model and output multiple recognition results, wherein the target recognition model is obtained by adding a target network module to a first recognition model, the target network module being used to expand the network branches of the first recognition model, and the recognition result representing the probability that the image to be recognized belongs to the category corresponding to the recognition result; and a determination module, configured to determine the target category to which the image to be recognized belongs based on the multiple recognition results, and, if the target category meets preset conditions, determine the image to be recognized as a target image, wherein the target image includes at least a target graphic logo.

[0014] According to another aspect of the present invention, a computer-readable storage medium is also provided, wherein a computer program is stored in the computer program, wherein the computer program is configured to execute the above-described image recognition method at runtime.

[0015] According to another aspect of the present invention, an electronic device is also provided, the electronic device including one or more processors; a memory for storing one or more programs, wherein when the one or more programs are executed by the one or more processors, the one or more processors are configured to run the programs, wherein the programs are configured to execute the above-described image recognition method during runtime.

[0016] In this embodiment of the invention, a method is adopted to improve the first recognition model to identify the target image. First, at least one video frame is acquired. Then, the image to be identified is input into the target recognition model, and multiple recognition results are output. Then, the target category to which the image to be identified belongs is determined based on the multiple recognition results. If the target category meets preset conditions, the image to be identified is determined to be the target image. The target image includes at least a target graphic logo, the video frame includes at least the image to be identified, and the target recognition model is obtained by adding a target network module to the first recognition model. The target network module is used to expand the network branches of the first recognition model, and the recognition result represents the probability that the image to be identified belongs to the category corresponding to that recognition result.

[0017] In the above process, by acquiring at least one video frame, an accurate data foundation is provided for subsequent target image recognition; by adding a target network module to the first recognition model, the first recognition model is improved, and a target recognition model can be obtained; the target recognition model can realize the recognition of target images, thereby realizing the recognition of target graphic logos, which can improve the recognition accuracy and improve the performance of the model while reducing the model size.

[0018] Therefore, the technical solution of the present invention achieves the goal of accurately identifying target graphic logos, thereby improving the technical effect of image recognition models in recognizing graphic logos, and solving the technical problem of low recognition accuracy of image recognition models in the prior art. Attached Figure Description

[0019] The accompanying drawings, which are included to provide a further understanding of the invention and form part of this application, illustrate exemplary embodiments of the invention and, together with their description, serve to explain the invention and do not constitute an undue limitation thereof. In the drawings:

[0020] Figure 1 This is a flowchart of an optional image recognition method according to an embodiment of the present invention;

[0021] Figure 2 This is a schematic diagram of an optional residual structure according to an embodiment of the present invention;

[0022] Figure 3 This is a schematic diagram of an optional series structure according to an embodiment of the present invention;

[0023] Figure 4 This is a training schematic diagram of an optional improved residual structure according to an embodiment of the present invention;

[0024] Figure 5 This is a schematic diagram of an optional improved residual structure according to an embodiment of the present invention;

[0025] Figure 6 This is a schematic diagram of an optional improved series structure according to an embodiment of the present invention;

[0026] Figure 7 This is a schematic diagram of an optional image recognition device according to an embodiment of the present invention;

[0027] Figure 8 This is a schematic diagram of an optional electronic device according to an embodiment of the present invention. Detailed Implementation

[0028] To enable those skilled in the art to better understand the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.

[0029] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this invention are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of the invention described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.

[0030] It should be noted that all relevant information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for display, data used for analysis, etc.) involved in this invention are information and data authorized by the user or fully authorized by all parties. For example, this system has an interface with the relevant user or organization. Before obtaining relevant information, it needs to send an acquisition request to the aforementioned user or organization through the interface, and obtain the relevant information after receiving consent from the aforementioned user or organization.

[0031] Example 1

[0032] According to an embodiment of the present invention, an embodiment of an image recognition method is provided. It should be noted that the steps shown in the flowchart in the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions. Furthermore, although a logical order is shown in the flowchart, in some cases, the steps shown or described may be executed in a different order than that shown here.

[0033] Figure 1 This is a flowchart of an optional image recognition method according to an embodiment of the present invention, such as... Figure 1 As shown, the method includes the following steps:

[0034] Step S101: Obtain at least one video frame, wherein the video frame includes at least the image to be identified.

[0035] In the above steps, at least one video frame can be acquired through devices such as application systems, processors, and electronic devices. Optionally, at least one video frame can be acquired through an image recognition system. For example, when a user participates in a lottery activity through a financial institution's application, after clicking the application's scan function, the image recognition system can acquire at least one video frame in real time through the user's mobile phone camera.

[0036] It should be noted that, in the above process, acquiring at least one video frame provides an accurate data foundation for subsequent target image recognition.

[0037] Step S102: Input the image to be recognized into the target recognition model and output multiple recognition results. The target recognition model is obtained by adding a target network module to the first recognition model. The target network module is used to expand the network branches of the first recognition model. The recognition result represents the probability that the image to be recognized belongs to the category corresponding to the recognition result.

[0038] In the above steps, the image to be recognized can be an image from at least one video frame acquired in real time by the user's mobile phone camera. The recognition result can correspond to multiple categories, such as text category, animal category, etc. Optionally, the first recognition model can be the MobileNet v2 model. The MobileNet model is a lightweight neural network proposed by the Google team for embedded devices such as mobile phones. Its basic unit is a depthwise separable convolution, which consists of depthwise convolution (DW) and pointwise convolution (PW). The MobileNetv2 model adds an inverse residual structure to the MobileNet model. This structure first uses PW convolution for dimensionality upscaling, then uses 3*3 DW convolution to extract features from each channel, and finally uses PW convolution for feature dimensionality reduction. Optionally, Figure 2 This is a schematic diagram of an optional residual structure according to an embodiment of the present invention. Figure 3 This is a schematic diagram of an optional series structure according to an embodiment of the present invention. Optionally, the reverse residual structure includes at least a residual structure and a series structure, such as... Figure 2 and Figure 3 As shown, the step size of the residual structure is 1, and the step size of the cascade structure is 2.

[0039] Optionally, in this embodiment, after a user clicks the scan function of the application, the image to be recognized can be input into the target recognition model through the image recognition system, and multiple recognition results will be output. Optionally, the recognition results correspond to multiple categories. For example, financial institution A launches an augmented reality (AR) lottery activity on its own application (e.g., a mobile banking APP). Users scan the graphic logo (e.g., the logo of the financial institution) of nearby financial institution A through the application. The image recognition system inputs the image to be recognized into the target recognition model and outputs recognition results in four categories: the logo category of financial institution A, the Chinese character category corresponding to the logo of financial institution A, the logo category of other financial institutions, and other categories. Specifically, the output recognition results are probability values ​​for the four categories. For example, after recognition by the target recognition model, the probability that the image to be recognized belongs to the logo category of financial institution A is 0.7, the probability that the image to be recognized belongs to the Chinese character category corresponding to the logo of financial institution A is 0.1, the probability that the image to be recognized belongs to the logo category of other financial institutions is 0.1, and the probability that the image to be recognized belongs to other categories is 0.1.

[0040] It should be noted that the above process improved the first recognition model; the target recognition model can recognize the target image, thereby realizing the recognition of the target graphic logo, and can improve the recognition accuracy of the model while reducing the model size.

[0041] Step S103: Determine the target category to which the image to be identified belongs based on multiple recognition results, and determine the image to be identified as a target image if the target category meets preset conditions, wherein the target image includes at least a target graphic logo.

[0042] In the above steps, the target category can be the logo category of the financial institution A, the target graphic logo can be the logo of the financial institution A, and the preset condition can be that the probability and number of times the target recognition model identifies the image to be recognized as belonging to the target category exceeds a preset threshold. For example, if the probability of identifying the image to be recognized as belonging to the logo category of financial institution A is greater than 0.8 for ten consecutive times, then the image to be recognized is determined to be the target image, that is, the logo of financial institution A is recognized.

[0043] Based on the scheme defined in steps S101 to S103 above, it can be understood that in this embodiment of the invention, the method of improving the first recognition model to recognize the target image involves first acquiring at least one video frame, then inputting the image to be recognized into the target recognition model, outputting multiple recognition results, determining the target category to which the image to be recognized belongs based on the multiple recognition results, and determining the image to be recognized as the target image if the target category meets preset conditions. The target image includes at least a target graphic logo, the video frame includes at least the image to be recognized, the target recognition model is obtained by adding a target network module to the first recognition model, the target network module is used to expand the network branches of the first recognition model, and the recognition result represents the probability that the image to be recognized belongs to the category corresponding to the recognition result.

[0044] It is noteworthy that, in the above process, acquiring at least one video frame provides an accurate data foundation for subsequent target image recognition; by adding a target network module to the first recognition model, the first recognition model is improved, resulting in a target recognition model; the target recognition model enables the recognition of target images, thereby achieving the recognition of target graphic logos, which improves the recognition accuracy and performance of the model while reducing its size.

[0045] Therefore, the technical solution of the present invention achieves the goal of accurately identifying target graphic logos, thereby improving the technical effect of image recognition models in recognizing graphic logos, and solving the technical problem of low recognition accuracy of image recognition models in the prior art.

[0046] In one optional embodiment, the target recognition model is generated by the following method: obtaining a target sample dataset; training an initial recognition model based on the target sample dataset to obtain a trained recognition model; and fusing the target layer network structure of the trained recognition model to obtain the target recognition model.

[0047] Optionally, the target sample dataset can be obtained by image augmentation of the sample dataset, and the initial recognition model can be an improved version of the first recognition model, i.e., the MobileNet v2 model. Optionally, training the initial recognition model with the target sample dataset can yield a convergent recognition model, i.e., the trained recognition model. The model training method can be selected by the developers from widely used model training methods in the field, which will not be elaborated here.

[0048] Specifically, in one optional embodiment, before training the initial recognition model based on the target sample dataset to obtain the trained recognition model, a first recognition model is first obtained. Then, a first network module and a second network module are added between the pointwise convolutional layer and the deep convolutional layer of the residual structure to obtain a first target initial network module. Then, a third network module is added to the cascaded structure to obtain a second target initial network module. Finally, an initial recognition model is generated based on the first target initial network module and the second target initial network module. The first recognition model includes at least an inverse residual structure, which includes at least a residual structure and a cascaded structure. The kernel size of the deep convolutional layer of the residual structure is a first size. The first network module consists of a batch normalization layer. The second network module consists of a batch normalization layer and a deep convolutional layer with a kernel size of the second size. The third network module includes at least a first network branch and a second network branch. The kernel size of the deep convolutional layer of the first network branch is a third size, and the kernel size of the deep convolutional layer of the second network branch is a fourth size. The sizes of the first, second, third, and fourth sizes are different.

[0049] Optionally, the first size can refer to a kernel size of 3*3, the second size can refer to a kernel size of 1*1, the third size can refer to a kernel size of 5*5, and the fourth size can refer to a kernel size of 7*7. Specifically, the improvements to the first recognition model, i.e., the MobileNet v2 model, mainly involve improvements to the residual structure and the cascaded structure.

[0050] Optional, Figure 4 This is a training schematic diagram of an optional improved residual structure according to an embodiment of the present invention, such as... Figure 4 As shown, the first network module can be a batch normalization layer, i.e., a BN layer, and the second network module can be composed of a BN layer and a depthwise convolutional layer with a kernel size of the second size, i.e., a DW1*1 layer.

[0051] Optionally, between the pointwise convolutional layer (PW layer) of the residual structure of the first recognition model and the depthwise convolutional layer (DW layer) of the residual structure, a BN layer, a BN layer, and a DW1*1 layer are added, and the result is as follows: Figure 4 The DW convolution (with a kernel size of 3*3), BN layer, BN layer and DW1*1 layer shown are used as the first target initial network module, which realizes the improvement of the residual structure of the MobileNet v2 model.

[0052] Optional, Figure 6 This is a schematic diagram of an optional improved series structure according to an embodiment of the present invention, such as... Figure 6As shown, the third network module includes at least a first network branch and a second network branch, wherein the first network branch can be the branch containing DW5*5, and the second network branch can be the branch containing DW7*7. Optionally, it will be as follows: Figure 6 The branches containing DW3*3, DW5*5, and DW7*7, as shown, serve as the initial network modules for the second objective, improving upon the cascaded structure of the MobileNet v2 model. Specifically, feature extraction can be performed using three parallel branches, namely, multi-scale feature extraction at 3*3, 5*5, and 7*7 scales respectively. Then, an output consistency mechanism can be employed to ensure consistent output shapes. Finally, the multi-scale feature extraction results are added together to obtain the final result.

[0053] It should be noted that, in the above process, by adding network branches, feature maps of different scales can be fused, thereby improving the model's learning ability.

[0054] Specifically, in one optional embodiment, during the process of fusing the target layer network structure of the trained recognition model to obtain the target recognition model, the batch normalization layer in the first network module is transformed to obtain a first deep convolutional layer. Then, the deep convolutional layer with a kernel size of the second size is transformed to obtain a second deep convolutional layer. Next, the batch normalization layer in the second network module and the second deep convolutional layer are fused to obtain a third deep convolutional layer. Finally, the first deep convolutional layer, the third deep convolutional layer, and the deep convolutional layer of the residual structure are fused to obtain the target recognition model. Here, the kernel size of the first deep convolutional layer, the kernel size of the second deep convolutional layer, and the kernel size of the third deep convolutional layer are all the first size.

[0055] Optionally, in this embodiment, the model structure used during training is as follows: Figure 4 As shown, after training, the model undergoes fusion processing to obtain the following result: Figure 5 The diagram shows the improved residual structure. Specifically, firstly, as shown... Figure 4 The BN layer shown is converted into a DW3*3 layer, i.e., the first depthwise convolutional layer. Then, as shown... Figure 4 The DW1*1 layer shown is converted into a DW3*3 layer, i.e., the second depthwise convolutional layer, and then... Figure 4 The BN layer connected to the original DW1*1 layer is merged with the DW3*3 layer into a single DW3*3 layer, which is the third depthwise convolutional layer. Thus, three parallel DW3*3 layers are obtained, which are: depthwise convolutional layers with residual structures, as shown... Figure 4 The DW convolution shown; by, as Figure 4 The BN layer shown is transformed into a DW3*3 layer, which is the first depthwise convolutional layer; as shown in the figure Figure 4 The BN layer and DW1*1 layer shown are transformed and fused to obtain the DW3*3 layer, which is the third depthwise convolutional layer. Further, fusing the three DW3*3 layers into a single DW3*3 convolutional layer is as follows: Figure 5 The DW convolution shown yields the target recognition model for actual deployment. The transformation and fusion methods for the aforementioned network layers can be chosen by the developers themselves from widely used methods in the field, and will not be elaborated upon here.

[0056] It should be noted that in the above process, a network model with a multi-branch topology is used during training, while the fused model is used during actual deployment. This can reduce the network size without losing network accuracy, reduce the amount of computation in the actual recognition process, and make the target recognition model more suitable for mobile terminals.

[0057] Optionally, in this embodiment, transfer learning can be used to avoid the model learning from scratch, thereby enabling the model to obtain better initialized weight parameters. Transfer learning is a widely used method in this field and will not be elaborated upon here.

[0058] In one optional embodiment, a sample dataset is obtained before obtaining the target sample dataset; the sample dataset is subjected to image enhancement processing to obtain the target sample dataset, wherein the image enhancement processing includes at least random obfuscation enhancement processing and background replacement enhancement processing.

[0059] Optionally, in addition to common image enhancement methods such as rotation, translation, cropping, and noise addition, this embodiment also employs random obfuscation enhancement and background replacement enhancement. Random obfuscation enhancement involves randomly combining n*n images from the sample dataset without overlap to obtain multiple new images, thereby expanding the dataset, enriching the distribution of training images, and improving the model's generalization ability. Specifically, the program script can be set to specify the number of original images n in each row of the generated image, resulting in a total of n*n original images. Then, n*n images are randomly selected from the sample dataset, and each image is sequentially enhanced with rotation, whitening, etc., and then pasted onto a blank canvas to generate new images. These new images are then saved for training. Background replacement enhancement can be achieved by manually annotating the mask coordinates of the financial institution A's logo. Optionally, the individual logo can be extracted based on the mask coordinates. Then, the extracted logo can be used to randomly replace the background. Specifically, the logo is copied to a random position on the random background, and a series of operations such as random size transformation, random lighting enhancement, and random scaling and cropping are performed on the logo to generate an arbitrary number of enhanced sample datasets for training. This yields the target sample dataset.

[0060] It should be noted that the above process expands the training sample dataset, enriching the diversity and randomness of the samples, and further improving the model's recognition accuracy.

[0061] In one optional embodiment, after fusing the target layer network structure of the trained recognition model to obtain the target recognition model, the target recognition model is converted into a target file and the target file is integrated into the target platform, wherein the target file is in a file format that the target platform can recognize.

[0062] Optionally, the target platform can be an application on a mobile terminal such as a mobile phone, for example, a mobile banking app of financial institution A. Optionally, the target recognition model can be converted into a .lite format file, i.e., the target file, and then the target file can be integrated into the target platform.

[0063] It should be noted that by converting the target recognition model into a target file, the target recognition model can be flexibly applied to various platforms, thus improving the model's versatility.

[0064] In one optional embodiment, after determining that the image to be identified is the target image, a page jump instruction is responded to and a preset page is displayed, wherein the preset page is used to guide the target object to participate in activities on the preset page.

[0065] Optionally, after determining that the image to be recognized is the target image—for example, after determining that the image scanned by the user through the application is the target image, namely the logo of financial institution A—the application can redirect the user to a lottery page (a preset page) to allow the user to participate in a lottery activity. It should be noted that this process enhances the user experience, thereby improving user retention.

[0066] Therefore, the technical solution of the present invention achieves the goal of accurately identifying target graphic logos, thereby improving the technical effect of image recognition models in recognizing graphic logos, and solving the technical problem of low recognition accuracy of image recognition models in the prior art.

[0067] Example 2

[0068] According to an embodiment of the present invention, an embodiment of an image recognition device is provided, wherein, Figure 7 This is a schematic diagram of an optional image recognition device according to an embodiment of the present invention, such as... Figure 7As shown, the device includes: an acquisition module 701, configured to acquire at least one video frame, wherein the video frame includes at least an image to be identified; a processing module 702, configured to input the image to be identified into a target recognition model and output multiple recognition results, wherein the target recognition model is obtained by adding a target network module to a first recognition model, the target network module is used to expand the network branches of the first recognition model, and the recognition result represents the probability that the image to be identified belongs to the category corresponding to the recognition result; and a determination module 703, configured to determine the target category to which the image to be identified belongs based on the multiple recognition results, and determine the image to be identified as a target image if the target category meets preset conditions, wherein the target image includes at least a target graphic logo.

[0069] It should be noted that the above-mentioned acquisition module 701, processing module 702 and determination module 703 correspond to steps S101 to S103 in the above embodiments. The three modules and the corresponding steps implement the same examples and application scenarios, but are not limited to the content disclosed in the above embodiment 1.

[0070] Optionally, the image recognition device further includes: a first acquisition module for acquiring a target sample dataset; a training module for training an initial recognition model based on the target sample dataset to obtain a trained recognition model; and a first processing module for fusing the target layer network structure of the trained recognition model to obtain a target recognition model.

[0071] Optionally, the image recognition device further includes: a second acquisition module for acquiring a first recognition model, wherein the first recognition model includes at least an inverse residual structure, and the inverse residual structure includes at least a residual structure and a cascaded structure; a second processing module for adding a first network module and a second network module between the pointwise convolutional layer and the depth convolutional layer of the residual structure to obtain a first target initial network module, wherein the kernel size of the depth convolutional layer of the residual structure is a first size, the first network module consists of a batch normalization layer, and the second network module consists of a batch normalization layer and a depth convolutional layer with a kernel size of a second size; a third processing module for adding a third network module in the cascaded structure to obtain a second target initial network module, wherein the third network module includes at least a first network branch and a second network branch, the kernel size of the depth convolutional layer of the first network branch is a third size, and the kernel size of the depth convolutional layer of the second network branch is a fourth size, and the sizes of the first size, second size, third size, and fourth size are different; and a generation module for generating an initial recognition model based on the first target initial network module and the second target initial network module.

[0072] Optionally, the first processing module includes: a first transformation module for transforming the batch normalization layer in the first network module to obtain a first depthwise convolutional layer, wherein the kernel size of the first depthwise convolutional layer is a first size; a second transformation module for transforming the depthwise convolutional layer with a kernel size of a second size to obtain a second depthwise convolutional layer, wherein the kernel size of the second depthwise convolutional layer is the first size; a first fusion module for fusing the batch normalization layer and the second depthwise convolutional layer in the second network module to obtain a third depthwise convolutional layer, wherein the kernel size of the third depthwise convolutional layer is the first size; and a second fusion module for fusing the first depthwise convolutional layer, the third depthwise convolutional layer, and the depthwise convolutional layer of the residual structure to obtain a target recognition model.

[0073] Optionally, the image recognition device further includes: a third acquisition module for acquiring a sample dataset; and a fourth processing module for performing image enhancement processing on the sample dataset to obtain a target sample dataset, wherein the image enhancement processing includes at least random obfuscation enhancement processing and background replacement enhancement processing.

[0074] Optionally, the image recognition device further includes: a fifth processing module, used to convert the target recognition model into a target file and integrate the target file into the target platform, wherein the target file is in a file format that the target platform can recognize.

[0075] Optionally, the image recognition device further includes a response module for responding to page navigation instructions and displaying a preset page, wherein the preset page is used to guide the target object to participate in activities on the preset page.

[0076] Example 3

[0077] According to another aspect of the present invention, a computer-readable storage medium is also provided, wherein a computer program is stored in the computer-readable storage medium, and the computer program is configured to execute the above-described image recognition method at runtime.

[0078] Example 4

[0079] According to another aspect of the present invention, an electronic device is also provided, wherein, Figure 8 This is a schematic diagram of an optional electronic device according to an embodiment of the present invention, such as... Figure 8As shown, the electronic device includes one or more processors; a memory for storing one or more programs, which, when executed by the one or more processors, enable the one or more processors to run the programs, wherein the programs are configured to execute the image recognition method described above. When the processor executes the program, it performs the following steps: acquiring at least one video frame, wherein the video frame includes at least an image to be recognized; inputting the image to be recognized into a target recognition model, and outputting multiple recognition results, wherein the target recognition model is obtained by adding a target network module to a first recognition model, the target network module being used to expand the network branches of the first recognition model, and the recognition result representing the probability that the image to be recognized belongs to the category corresponding to the recognition result; determining the target category to which the image to be recognized belongs based on the multiple recognition results, and, if the target category meets preset conditions, determining the image to be recognized as a target image, wherein the target image includes at least a target graphic logo.

[0080] Optionally, the processor may also perform the following steps when executing the program: acquiring the target sample dataset; training the initial recognition model based on the target sample dataset to obtain the trained recognition model; and fusing the target layer network structure of the trained recognition model to obtain the target recognition model.

[0081] Optionally, the processor further implements the following steps when executing the program: obtaining a first recognition model, wherein the first recognition model includes at least an inverse residual structure, and the inverse residual structure includes at least a residual structure and a concatenated structure; adding a first network module and a second network module between the pointwise convolutional layer and the depth convolutional layer of the residual structure to obtain a first target initial network module, wherein the kernel size of the depth convolutional layer of the residual structure is a first size, the first network module consists of a batch normalization layer, and the second network module consists of a batch normalization layer and a depth convolutional layer with a kernel size of a second size; adding a third network module in the concatenated structure to obtain a second target initial network module, wherein the third network module includes at least a first network branch and a second network branch, the kernel size of the depth convolutional layer of the first network branch is a third size, the kernel size of the depth convolutional layer of the second network branch is a fourth size, and the sizes of the first, second, third, and fourth sizes are different; generating an initial recognition model based on the first target initial network module and the second target initial network module.

[0082] Optionally, the processor further implements the following steps when executing the program: transforming the batch normalization layer in the first network module to obtain a first deep convolutional layer, wherein the kernel size of the first deep convolutional layer is a first size; transforming the deep convolutional layer with a kernel size of a second size to obtain a second deep convolutional layer, wherein the kernel size of the second deep convolutional layer is the first size; fusing the batch normalization layer in the second network module with the second deep convolutional layer to obtain a third deep convolutional layer, wherein the kernel size of the third deep convolutional layer is the first size; fusing the first deep convolutional layer, the third deep convolutional layer, and the deep convolutional layer of the residual structure to obtain a target recognition model.

[0083] Optionally, the processor may also perform the following steps when executing the program: acquiring a sample dataset; performing image enhancement processing on the sample dataset to obtain a target sample dataset, wherein the image enhancement processing includes at least random obfuscation enhancement processing and background replacement enhancement processing.

[0084] Optionally, the processor may also perform the following steps when executing the program: converting the target recognition model into a target file and integrating the target file into the target platform, wherein the target file is in a file format that the target platform can recognize.

[0085] Optionally, when the processor executes the program, it also performs the following steps: responding to a page jump instruction and displaying a preset page, wherein the preset page is used to guide the target object to participate in activities on the preset page.

[0086] The devices mentioned in this article can be servers, PCs, tablets, mobile phones, etc.

[0087] The sequence numbers of the above embodiments of the present invention are for descriptive purposes only and do not represent the superiority or inferiority of the embodiments.

[0088] In the above embodiments of the present invention, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments.

[0089] In the several embodiments provided in this application, it should be understood that the disclosed technical content can be implemented in other ways. The device embodiments described above are merely illustrative; for example, the division of units can be a logical functional division, and in actual implementation, there may be other division methods. For instance, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the displayed or discussed mutual coupling, direct coupling, or communication connection may be through some interfaces; the indirect coupling or communication connection between units or modules may be electrical or other forms.

[0090] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0091] Furthermore, the functional units in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.

[0092] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, read-only memory (ROM), random access memory (RAM), portable hard drives, magnetic disks, or optical disks.

[0093] The above description is only a preferred embodiment of the present invention. It should be noted that for those skilled in the art, several improvements and modifications can be made without departing from the principle of the present invention, and these improvements and modifications should also be considered within the scope of protection of the present invention.

Claims

1. An image recognition method, characterized in that, include: Acquire at least one video frame, wherein the video frame includes at least the image to be identified; The image to be identified is input into the target recognition model, and multiple recognition results are output, wherein the recognition result represents the probability that the image to be identified belongs to the category corresponding to the recognition result; The target category to which the image to be identified belongs is determined based on multiple recognition results, and if the target category meets preset conditions, the image to be identified is determined to be a target image, wherein the target image includes at least a target graphic logo; The target recognition model is generated using the following method: Obtain the target sample dataset; train the initial recognition model based on the target sample dataset to obtain the trained recognition model; The target layer network structure of the trained recognition model is fused to obtain the target recognition model. Before training the initial recognition model based on the target sample dataset to obtain the trained recognition model: A first recognition model is obtained, wherein the first recognition model includes at least an inverse residual structure, and the inverse residual structure includes at least a residual structure and a concatenated structure; a first network module and a second network module are added between the pointwise convolutional layer and the depthwise convolutional layer of the residual structure to obtain a first target initial network module, wherein the kernel size of the depthwise convolutional layer of the residual structure is a first size, the first network module consists of a batch normalization layer, and the second network module consists of the batch normalization layer and a depthwise convolutional layer with a kernel size of a second size; a third network module is added to the concatenated structure to obtain a second target initial network module, wherein the third network module includes at least a first network branch and a second network branch, the kernel size of the depthwise convolutional layer of the first network branch is a third size, and the kernel size of the depthwise convolutional layer of the second network branch is a fourth size, and the sizes of the first size, the second size, the third size, and the fourth size are different; the initial recognition model is generated based on the first target initial network module and the second target initial network module.

2. The method according to claim 1, characterized in that, The target layer network structure of the trained recognition model is fused to obtain the target recognition model, including: The batch normalization layer in the first network module is transformed to obtain a first depthwise convolutional layer, wherein the kernel size of the first depthwise convolutional layer is the first size; The deep convolutional layer with a kernel size of the second size is transformed to obtain a second deep convolutional layer, wherein the kernel size of the second deep convolutional layer is the first size; The batch normalization layer in the second network module is fused with the second deep convolutional layer to obtain a third deep convolutional layer, wherein the kernel size of the third deep convolutional layer is the first size; The first deep convolutional layer, the third deep convolutional layer, and the deep convolutional layer of the residual structure are fused together to obtain the target recognition model.

3. The method according to claim 1, characterized in that, Before obtaining the target sample dataset, the method further includes: Obtain the sample dataset; The sample dataset is subjected to image enhancement processing to obtain the target sample dataset, wherein the image enhancement processing includes at least random obfuscation enhancement processing and background replacement enhancement processing.

4. The method according to claim 1, characterized in that, After fusing the target layer network structure of the trained recognition model to obtain the target recognition model, the method further includes: The target recognition model is converted into a target file, and the target file is integrated into the target platform, wherein the target file is in a file format that the target platform can recognize.

5. The method according to claim 1, characterized in that, After determining that the image to be identified is the target image, the method further includes: Responding to a page redirection command, a preset page is displayed, wherein the preset page is used to guide the target object to participate in activities on the preset page.

6. An image recognition device, characterized in that, include: An acquisition module is configured to acquire at least one video frame, wherein the video frame includes at least one image to be identified; The processing module is used to input the image to be identified into the target recognition model and output multiple recognition results, wherein the recognition result represents the probability that the image to be identified belongs to the category corresponding to the recognition result; The determining module is configured to determine the target category to which the image to be identified belongs based on multiple recognition results, and to determine the image to be identified as a target image if the target category meets preset conditions, wherein the target image includes at least a target graphic logo; The first acquisition module is used to acquire the target sample dataset; the training module is used to train the initial recognition model based on the target sample dataset to obtain the trained recognition model. The first processing module is used to fuse the target layer network structure of the trained recognition model to obtain the target recognition model. The image recognition device further includes: a second acquisition module for acquiring a first recognition model, wherein the first recognition model includes at least an inverse residual structure, and the inverse residual structure includes at least a residual structure and a cascaded structure; a second processing module for adding a first network module and a second network module between the pointwise convolutional layer and the depthwise convolutional layer of the residual structure to obtain a first target initial network module, wherein the kernel size of the depthwise convolutional layer of the residual structure is a first size, the first network module is composed of a batch normalization layer, and the second network module is composed of the batch normalization layer and a depthwise convolutional layer with a kernel size of a second size; a third processing module for adding a third network module to the cascaded structure to obtain a second target initial network module, wherein the third network module includes at least a first network branch and a second network branch, the kernel size of the depthwise convolutional layer of the first network branch is a third size, and the kernel size of the depthwise convolutional layer of the second network branch is a fourth size, wherein the sizes of the first size, the second size, the third size, and the fourth size are different; and a generation module for generating the initial recognition model based on the first target initial network module and the second target initial network module.

7. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program, wherein the computer program is configured to execute the image recognition method according to any one of claims 1 to 5 when it is run.

8. An electronic device, characterized in that, The electronic device includes one or more processors; A memory for storing one or more programs, which, when executed by one or more processors, cause the one or more processors to be configured to run the programs, wherein the programs are configured to execute the image recognition method according to any one of claims 1 to 5.