Image recognition method, device and computer readable storage medium
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- DONGGUAN POLYTECHNIC
- Filing Date
- 2026-02-04
- Publication Date
- 2026-06-26
Smart Images

Figure CN122289883A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of image recognition technology, and in particular to image recognition methods, apparatus and computer-readable storage media. Background Technology
[0002] With the rapid development of computer vision technology, image recognition has been widely used in many fields such as autonomous driving, medical image diagnosis, and remote sensing monitoring, and the requirements for recognition accuracy, generalization ability and scene adaptability continue to increase.
[0003] In recent years, image recognition models based on self-attention mechanisms (such as Visual Transformer, ViT) have broken the limitations of traditional convolutional neural networks (CNNs) that rely on local receptive field modeling by leveraging their global information interaction capabilities. By calculating the correlation weights between all elements in the input sequence, they can directly capture the global dependencies of the image and have shown potential performance advantages on large-scale image datasets.
[0004] However, existing self-attention image recognition models still suffer from low recognition accuracy in practical applications, making it difficult to meet the application requirements of high-precision scenarios. Summary of the Invention
[0005] This application provides an image recognition method, apparatus, and computer-readable storage medium that can improve the accuracy of image recognition.
[0006] To achieve the above objectives, this application adopts the following technical solution: In a first aspect, an image recognition method is provided, the method comprising: acquiring an image to be recognized; performing vectorization processing on the image to be recognized to obtain an image vector; inputting the image vector to be recognized into a preset self-attention image recognition model to obtain an image recognition result; the preset self-attention image recognition model includes a quantum Transformer module and a fully connected layer, the quantum Transformer module including a quantum self-attention module and a feedforward neural network.
[0007] This scheme improves image recognition accuracy by integrating the advantages of quantum computing and classical deep learning through a quantum Transformer module. First, the quantum self-attention module utilizes the superposition and entanglement properties of quantum states, theoretically enabling exponential parallelism in processing high-dimensional features after image vectorization. This allows for more efficient capture of long-range, complex global dependencies and subtle patterns in images—capabilities that classical self-attention struggles to fully achieve with limited computational resources. Second, the features output from the quantum module undergo further nonlinear transformation and integration by a classical feedforward neural network, enhancing the model's representational capabilities. Simultaneously, the fully connected layer acts as a classifier, robustly mapping the learned deep abstract features to specific categories. The entire architecture combines the potentially superior information processing capabilities of quantum computing with the mature optimization mechanisms of classical neural networks, achieving complementarity and enhancement at the feature extraction and relationship modeling levels. This results in a more refined and global understanding of image recognition tasks, ultimately improving classification accuracy.
[0008] In conjunction with the first aspect, in some embodiments of the first aspect, vectorizing the image to be identified to obtain a target image to be identified includes: segmenting the image to be identified into multiple sub-images; vectorizing each sub-image in the multiple sub-images to obtain multiple sub-image vectors; and concatenating the multiple sub-image vectors to obtain a vector of the image to be identified.
[0009] This technical solution significantly enhances the image recognition model's ability to process complex visual information through a "divide and conquer-integrate" strategy. The specific technical effects are reflected in three aspects: First, by segmenting the image to be recognized into multiple sub-images, refined extraction of local features is achieved, allowing the model to focus on image details (such as texture and edges) and avoid the loss of detailed information during global processing. Second, each sub-image is independently vectorized and ultimately concatenated, preserving spatial structure information (through sequential concatenation and positional encoding) and constructing a hierarchical image representation, providing clear local feature units for subsequent self-attention mechanisms. Finally, this structured vector input enables the quantum Transformer module to more efficiently model local and global dependencies—quantum self-attention can perform parallel correlation calculations between sub-image vectors, and the feedforward neural network further fuses features, thereby enhancing the model's understanding of the overall semantics of the image. Overall, this solution improves the model's robustness to image scaling, translation, and local mutations, while providing a suitable data structure for leveraging the advantages of quantum computing, contributing to more accurate image recognition in complex scenarios.
[0010] In conjunction with the first aspect, in some embodiments of the first aspect, before acquiring the image to be recognized, the method further includes: acquiring a quantum Transformer module and an original self-attention image recognition model; the original self-attention image recognition model includes a classical Transformer module and a fully connected layer; replacing the classical Transformer module in the original self-attention image recognition model with a quantum Transformer module to obtain a preset self-attention image recognition model.
[0011] The Quantum Transformer module can utilize quantum superposition and quantum entanglement to process high-dimensional feature spaces with exponential parallelism, more efficiently modeling complex, non-local relationships between pixels or image blocks, and capturing subtle patterns that traditional models may ignore, thereby improving the accuracy of preset self-attention image recognition models for image recognition.
[0012] In conjunction with the first aspect, in some embodiments of the first aspect, obtaining a quantum Transformer module includes: connecting a quantum self-attention module and a feedforward neural network to obtain an initial quantum Transformer module; and stacking and connecting multiple initial quantum Transformer modules to obtain a quantum Transformer module.
[0013] This approach, which stacks deep quantum Transformer modules, leverages the depth and hierarchical structure of quantum computing to simulate and enhance the feature abstraction capabilities of classical deep learning. By connecting quantum self-attention and quantum feedforward networks to form a basic module, nonlinear transformations and feature renormalization on quantum states are achieved. Stacking multiple such modules constructs a deep quantum processing pipeline, enabling the input quantum state to undergo increasingly abstract and complex quantum feature extraction and fusion layer by layer. This deep stacking allows the model to capture deeper and more complex patterns of association in data (such as multi-level, multi-scale visual semantic relationships in images) and may, through the cumulative effects of quantum interference and entanglement, achieve representational capabilities surpassing those of classical deep models.
[0014] Secondly, an image recognition apparatus is provided for implementing the image recognition method of the first aspect described above. The image recognition apparatus includes modules, units, or means corresponding to the above method. These modules, units, or means can be implemented in hardware, software, or by hardware executing corresponding software. The hardware or software includes one or more modules or units corresponding to the above functions.
[0015] In conjunction with the second aspect, in some embodiments of the second aspect, the apparatus includes: an acquisition module and a processing module; the acquisition module is used to acquire an image to be recognized; the processing module is used to perform vectorization processing on the image to be recognized to obtain an image vector to be recognized; the processing module is used to input the image vector to be recognized into a preset self-attention image recognition model to obtain an image recognition result; the preset self-attention image recognition model includes a quantum Transformer module and a fully connected layer, and the quantum Transformer module includes a quantum self-attention module and a feedforward neural network.
[0016] In conjunction with the second aspect, in some embodiments of the second aspect, a processing module is used to perform vectorization processing on the image to be recognized to obtain a target image to be recognized, including: segmenting the image to be recognized into multiple sub-images; performing vectorization processing on each of the multiple sub-images to obtain multiple sub-image vectors; and concatenating the multiple sub-image vectors to obtain a vector of the image to be recognized.
[0017] In conjunction with the second aspect, in some embodiments of the second aspect, before acquiring the image to be recognized, the processing module is further configured to: acquire a quantum Transformer module and an original self-attention image recognition model; the original self-attention image recognition model includes a classical Transformer module and a fully connected layer; replace the classical Transformer module in the original self-attention image recognition model with a quantum Transformer module to obtain a preset self-attention image recognition model.
[0018] In conjunction with the second aspect, in some embodiments of the second aspect, a processing module for a quantum Transformer module includes: connecting a quantum self-attention module and a feedforward neural network to obtain an initial quantum Transformer module; and stacking and connecting multiple initial quantum Transformer modules to obtain a quantum Transformer module.
[0019] In a third aspect, an image recognition apparatus is provided, comprising: at least one processor and a memory for storing processor-executable instructions; wherein the processor is configured to execute the instructions to implement the method provided by the first aspect and any possible implementation thereof.
[0020] Fourthly, a computer-readable storage medium is provided, wherein when instructions in the computer-readable storage medium are executed by a processor of an image recognition device, the image recognition device is enabled to perform the method provided by the first aspect and any possible implementation thereof.
[0021] Fifthly, a computer program product containing instructions is provided that, when run on a computer, enables the computer to perform the methods provided in the first aspect and any possible implementation thereof.
[0022] The technical effects of any one of the second to fifth aspects can be found in the technical effects of the different embodiments of the first aspect described above, and will not be repeated here. Attached Figure Description
[0023] Figure 1 A schematic diagram of the architecture of an image recognition system provided in this application; Figure 2 A flowchart illustrating an image recognition method provided in this application; Figure 3 A flowchart illustrating yet another image recognition method provided in this application; Figure 4 A flowchart illustrating yet another image recognition method provided in this application; Figure 5 A flowchart illustrating yet another image recognition method provided in this application; Figure 6 A schematic diagram of the structure of an image recognition device provided in this application; Figure 7 This is a schematic diagram of the structure of another image recognition device provided in this application. Detailed Implementation
[0024] In the description of this application, unless otherwise stated, "multiple" means two or more. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of a single item or a plurality of items. For example, at least one of a, b, or c can mean: a, b, c, ab, ac, bc, or abc, where a, b, and c can be single or multiple.
[0025] Furthermore, to facilitate a clear description of the technical solutions in the embodiments of this application, the terms "first" and "second" are used in the embodiments of this application to distinguish identical or similar items with substantially the same function and effect. Those skilled in the art will understand that the terms "first" and "second" do not limit the quantity or execution order, and the terms "first" and "second" are not necessarily different.
[0026] In this application, the terms "exemplary" or "for example" are used to indicate that something is an example, illustration, or description. Any embodiment or design described as "exemplary" or "for example" in this application should not be construed as being better or more advantageous than other embodiments or designs. Specifically, the use of terms such as "exemplary" or "for example" is intended to present the relevant concepts in a specific manner to facilitate understanding.
[0027] It is understood that the term "embodiment" used throughout the specification means that a specific feature, structure, or characteristic related to an embodiment is included in at least one embodiment of this application. Therefore, various embodiments throughout the specification do not necessarily refer to the same embodiment. Furthermore, these specific features, structures, or characteristics can be combined in any suitable manner in one or more embodiments. It is understood that in the various embodiments of this application, the sequence number of each process does not imply the order of execution; the execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application.
[0028] It is understood that in this application, "when," "if," and "if" all refer to the corresponding processing that will be carried out under certain objective circumstances, and are not limited to a specific time, nor do they require that there must be a judgment action when implemented, nor do they imply any other limitations.
[0029] It is understood that some optional features in the embodiments of this application can be implemented independently in certain scenarios without relying on other features, such as the current solution on which they are based, to solve the corresponding technical problems and achieve the corresponding effects. Alternatively, they can be combined with other features as needed in certain scenarios. Correspondingly, the apparatus given in the embodiments of this application can also implement these features or functions, which will not be elaborated here.
[0030] In this application, unless otherwise specified, the same or similar parts between the various embodiments can be referred to each other. In the various embodiments and implementation methods of the various embodiments in this application, unless otherwise specified or logically conflicting, the terminology and / or descriptions between different embodiments and between the implementation methods of the various embodiments are consistent and can be mutually referenced. The technical features in different embodiments and between the implementation methods of the various embodiments can be combined according to their inherent logical relationships to form new embodiments, implementation methods, implementation methods, or implementation approaches. The following embodiments of this application do not constitute a limitation on the scope of protection of this application.
[0031] Figure 1 This is a schematic diagram of the architecture of an image recognition system provided in this application. The technical solutions of the embodiments of this application can be applied to... Figure 1 The image recognition system shown, such as Figure 1As shown, the image recognition system 10 includes an image recognition device 11 and an electronic device 12.
[0032] The image recognition device 11 is directly or indirectly connected to the electronic device 12. This connection can be wired or wireless, and this application embodiment does not limit this.
[0033] The image recognition device 11 and the electronic device 12 can exchange data.
[0034] It should be noted that the image recognition device 11 and the electronic device 12 can be independent devices or integrated into the same device; this application does not make any specific limitation in this regard.
[0035] When the image recognition device 11 and the electronic device 12 are integrated into the same device, the communication method between the image recognition device 11 and the electronic device 12 is the same as the communication method between internal modules of the device. In this case, the communication process between the two is the same as that when the image recognition device 11 and the electronic device 12 are independent of each other.
[0036] In the following embodiments provided in this application, the image recognition device 11 and the electronic device 12 are described as being configured independently of each other.
[0037] In practical applications, the image recognition method provided in this application embodiment can be applied to the image recognition device 11, or to the devices included in the image recognition device 11.
[0038] The image recognition method provided in this application embodiment will be described below with reference to the accompanying drawings, taking the application of the image recognition method to the image recognition device 11 as an example.
[0039] Figure 2 A flowchart illustrating an image recognition method provided in this application is shown below. Figure 2 As shown, the method includes the following steps: S201, The image recognition device acquires the image to be recognized.
[0040] The image to be identified may include an object to be identified, such as a street lamp, a car, or a pedestrian. This application does not impose any specific restrictions on this.
[0041] As one possible implementation method, combined with Figure 1 The image recognition device receives a message from an electronic device, which includes an image to be recognized, and the image recognition device obtains the image to be recognized from the message.
[0042] S202. The image recognition device performs vectorization processing on the image to be recognized to obtain the image vector.
[0043] As one possible implementation, the image recognition device segments the image to be recognized into multiple sub-images; for each of the multiple sub-images, the sub-image is vectorized to obtain multiple sub-image vectors; the multiple sub-image vectors are concatenated to obtain the image vector to be recognized.
[0044] It should be noted that for a detailed description of this possible implementation method, please refer to the relevant description in the subsequent sections of the specific implementation method of this application, which will not be described here.
[0045] S203. The image recognition device inputs the image vector to be recognized into a preset self-attention image recognition model to obtain the image recognition result.
[0046] The preset self-attention image recognition model includes a quantum Transformer module and a fully connected layer. The quantum Transformer module includes a quantum self-attention module and a feedforward neural network.
[0047] A fully connected layer is a core component of a neural network. Its function is to globally connect all features in the input data to each neuron, achieving feature integration and transformation. Specifically, each neuron in this layer is connected to all outputs of the previous layer, and a linear transformation (i.e., calculating a weighted sum) is performed through the weight matrix and bias vector. Then, non-linearity is introduced through activation functions (such as ReLU or Sigmoid), ultimately outputting a new feature representation. This structure can fuse global information from input features and is often used at the end of classification or regression tasks to map learned high-dimensional features to the target space.
[0048] The quantum self-attention module is a core component of the quantum Transformer module, designed to re-implement or enhance the functionality of classical self-attention using quantum computing mechanisms. Its core idea is to encode the input classical word embedding sequence as a series of quantum states (e.g., representing each word as a superposition of a set of qubits), and then compute the "attention weights" between these quantum states through parameterized quantum circuits (composed of tunable quantum gates). This process typically utilizes quantum entanglement and quantum interference to simulate or compute complex correlations between sequence elements in parallel, whose quantum state interactions naturally encode global dependencies. Finally, the evolved quantum states are collapsed into classical information via quantum measurement, outputting a weighted context-aware representation.
[0049] A feedforward neural network (FNN) is one of the most basic and simplest neural network models. Its information flow is strictly unidirectional, from the input layer to the output layer, without any recurrent or feedback connections. It typically consists of an input layer, one or more hidden layers, and an output layer. Each layer contains several neurons (or nodes), and neurons in adjacent layers are fully connected through weighted connections. Each neuron receives the output of the previous layer, calculates its weighted sum, adds a bias, and then passes it through a non-linear activation function (such as ReLU, Sigmoid, or Tanh) to produce its output, which is then passed to the next layer. This hierarchical structure allows the network to progressively learn complex non-linear mappings in the data by combining features from lower layers, essentially implementing a universal function approximator from input to output. Feedforward neural networks are fundamental to deep learning and are widely used in tasks such as image recognition, speech processing, and regression prediction. Their training process typically uses backpropagation to optimize the loss function and adjust the network weights.
[0050] Understandably, after the image recognition device inputs the image vector to be recognized into the preset self-attention image recognition model, the preset self-attention image recognition model can process the image vector to be recognized and obtain the image recognition result.
[0051] It should be noted that the specific process by which the preset self-attention image recognition model processes the vector of the image to be recognized can refer to existing solutions, and will not be described in detail here.
[0052] Based on S201-S203, this scheme improves image recognition accuracy by integrating the advantages of quantum computing and classical deep learning through a quantum Transformer module. First, the quantum self-attention module utilizes the superposition and entanglement properties of quantum states, theoretically enabling exponential parallelism in processing high-dimensional features after image vectorization. This allows for more efficient capture of long-range, complex global dependencies and subtle patterns in images, something classical self-attention struggles to achieve when computational resources are limited. Second, the features output by the quantum module undergo further nonlinear transformation and integration by a classical feedforward neural network, enhancing the model's representational capabilities. Simultaneously, the fully connected layer acts as a classifier, robustly mapping the learned deep abstract features to specific categories. This entire architecture combines the potentially superior information processing capabilities of quantum computing with the mature optimization mechanisms of classical neural networks, achieving complementarity and enhancement at the feature extraction and relationship modeling levels. This results in a more refined and global understanding of image recognition tasks, ultimately improving classification accuracy.
[0053] The above is a general description of the image recognition method provided in this application. The image recognition method provided in this application will be further described below with reference to the accompanying drawings.
[0054] In a design, such as Figure 3 As shown, S202 provided in the specific embodiments of this application may specifically include the following steps: S301, The image recognition device divides the image to be recognized into multiple sub-images.
[0055] As one possible implementation, the image recognition device divides the image to be recognized into multiple sub-images on an average basis according to a preset number of segments.
[0056] For example, the preset number of segments can be 8, 9, 10, etc., and this application does not impose a specific limit on the specific value of this preset number of segments.
[0057] S302. The image recognition device performs vectorization processing on each of the multiple sub-images to obtain multiple sub-image vectors.
[0058] As one possible implementation, the image recognition device performs standardization operations on each sub-image, including resizing to a uniform size, grayscale conversion, and color space transformation, followed by normalization to reduce the impact of illumination and scale differences. Subsequently, the device extracts pixel-level features from the sub-images or extracts basic features through operations such as shallow convolution, flattening the feature matrix into a one-dimensional continuous vector. The flattened vector is then standardized to improve numerical stability. Finally, the device applies principal component analysis or an autoencoder for dimensionality reduction, preserving key information while reducing computational load. This results in multiple sub-image vectors.
[0059] S303. The image recognition device connects multiple sub-image vectors to obtain the image vector to be recognized.
[0060] As one possible implementation, the image recognition device checks whether all sub-image vectors have the same dimension. If the dimensions are inconsistent, they are unified to the same length through padding, truncation, or projection to ensure the validity of subsequent connections. Then, the image recognition device concatenates each sub-image vector along a specified axis according to the spatial order of the sub-images in the original image (e.g., a grid order from left to right or top to bottom). Subsequently, the image recognition device adds positional encoding information to the concatenated vector. This can be achieved through classical sine / cosine positional encoding or learnable positional embedding layers, ensuring the model utilizes spatial structure information. Thus, the final image vector to be recognized is a continuous vector representation that integrates the local features of all sub-images and implicitly contains spatial order. This can be directly used as input to a pre-defined self-attention image recognition model for subsequent quantum Transformer modules to perform global relationship modeling and classification.
[0061] Based on S301-S303, this technical solution significantly improves the image recognition model's ability to process complex visual information through a "divide and conquer-integrate" strategy. The specific technical effects are reflected in three aspects: First, by segmenting the image to be recognized into multiple sub-images, refined extraction of local features is achieved, allowing the model to focus on image details (such as texture and edges) and avoid the loss of detailed information during global processing. Second, each sub-image is independently vectorized and ultimately concatenated, preserving spatial structure information (through sequential concatenation and positional encoding) and constructing a hierarchical image representation, providing clear local feature units for subsequent self-attention mechanisms. Finally, this structured vector input enables the quantum Transformer module to more efficiently model local and global dependencies—quantum self-attention can perform parallel correlation calculations between sub-image vectors, and the feedforward neural network further fuses features, thereby enhancing the model's understanding of the overall semantics of the image. Overall, this solution improves the model's robustness to image scaling, translation, and local mutations, while providing a suitable data structure for leveraging the advantages of quantum computing, contributing to more accurate image recognition in complex scenarios.
[0062] In a design, such as Figure 4 As shown, prior to S201, the image recognition method provided in this application may further include the following steps: S401, Image recognition device acquires quantum Transformer module and original self-attention image recognition model.
[0063] The original self-attention image recognition model includes a classic Transformer module and a fully connected layer.
[0064] For example, the original self-attention image recognition model can be Vision Transformer, Swing Transformer, or Twins. Of course, the original self-attention image recognition model can also be a model with other names, and this application does not impose any specific restrictions on this.
[0065] Among them, Vision Transformer (VIT) is a pioneering foundational model that applies the pure Transformer architecture to the field of image recognition. It completely abandons convolutional operations; its core idea is to segment the input image into fixed-size image patches, linearly project each patch into a sequence embedding (similar to word vectors in NLP), add positional encoding, and then directly input it into a standard Transformer encoder composed of stacked multi-head self-attention layers and feedforward network layers. ViT establishes global dependencies among all image patches through a self-attention mechanism, thereby learning the global contextual information of the image. After pre-training on large datasets, it achieved performance comparable to or surpassing the state-of-the-art convolutional networks on multiple image classification benchmarks at the time.
[0066] Swin Transformer is a general-purpose visual Transformer backbone network that introduces a hierarchical design and shifted windows, aiming to efficiently fuse local and global information. Unlike ViT, Swin Transformer constructs hierarchical feature maps similar to convolutional pyramids (from small size, high resolution to larger size, low resolution). Its core innovation lies in "shifted window self-attention": in deep layers, it divides the feature map into non-overlapping local windows, calculating self-attention only within each window to significantly reduce computational complexity; simultaneously, by "shifting" the window boundaries layer by layer, information between different windows can interact. This design allows it to process high-resolution images as efficiently as CNNs (linear computational complexity) while also modeling long-range dependencies.
[0067] Twins is a visual Transformer architecture designed to balance high computational efficiency with a large receptive field. Its core design principle is spatially separable self-attention. It typically employs two parallel attention paths: one path computes self-attention within a fine-grained local window to efficiently capture detailed features; the other path downsamples the feature map and then performs self-attention computation at a global or coarse-grained level to quickly establish long-range dependencies. The outputs of the two paths are fused, achieving a highly efficient unification of fine-grained local modeling and global contextual understanding. Through this alternating or parallel local-global attention design, the Twins model family achieves near-linear computational complexity while maintaining near-global attention performance, demonstrating excellent performance in tasks such as image classification and object detection, exhibiting a good balance between speed and accuracy.
[0068] As one possible implementation, the image recognition device connects a quantum self-attention module and a feedforward neural network to obtain an initial quantum Transformer module; multiple initial quantum Transformer modules are stacked and connected to obtain a quantum Transformer module.
[0069] Subsequently, the image recognition device invokes the original self-attention image recognition model.
[0070] S402, The image recognition device replaces the classical Transformer module in the original self-attention image recognition model with a quantum Transformer module to obtain a preset self-attention image recognition model.
[0071] As one possible implementation, the image recognition device locates the classical Transformer encoder module responsible for global context modeling in the original model. Subsequently, the image recognition device replaces the classical Transformer encoder module with a quantum Transformer module.
[0072] It should be noted that the specific scheme for replacing the classical Transformer encoder module with the quantum Transformer module in this possible implementation can refer to existing schemes, and will not be described in detail here.
[0073] Based on S401-S402, the Quantum Transformer module can utilize quantum superposition and quantum entanglement to process high-dimensional feature spaces with exponential parallelism, more efficiently modeling complex, non-local relationships between pixels or image blocks, and capturing subtle patterns that traditional models may ignore, thereby improving the accuracy of the preset self-attention image recognition model for image recognition.
[0074] In a design, such as Figure 5 As shown, S401 provided in the specific embodiments of this application may include the following steps: S501, The image recognition device connects the quantum self-attention module and the feedforward neural network to obtain the initial quantum Transformer module.
[0075] It should be noted that both the quantum self-attention module and the feedforward neural network contain residual connections. The quantum self-attention module is the core part of the preset self-attention image recognition model. It processes the self-attention mechanism through amplitude encoding and parameterized quantum circuit (PQC) and produces a set of weighted values as output.
[0076] As one possible implementation, the image recognition device calls a quantum self-attention module and a feedforward neural network, connecting the output of the quantum self-attention module with the input of the feedforward neural network to obtain an initial quantum Transformer module.
[0077] S502, The image recognition device stacks and connects multiple initial quantum Transformer modules to obtain a quantum Transformer module.
[0078] As one possible implementation, the image recognition device connects the output of the previous initial quantum Transformer module to the input of the next initial quantum Transformer module, and so on, to obtain quantum Transformer modules.
[0079] The last initial quantum transform module in the quantum transform module outputs multiple vectors, then averages these vectors to obtain a final vector. This final vector is then used as the input to the fully connected layer for subsequent image recognition processing steps.
[0080] Based on S501-S502, this deep quantum Transformer module is constructed by stacking modules. Its core technological advantage lies in leveraging the depth and hierarchical structure of quantum computing to simulate and enhance the feature abstraction capabilities of classical deep learning. By connecting quantum self-attention and quantum feedforward networks to form a basic module, nonlinear transformations and feature renormalization on quantum states are achieved. Stacking multiple such modules constructs a deep quantum processing pipeline, enabling the input quantum state to undergo progressively more abstract and complex quantum feature extraction and fusion. This deep stacking allows the model to capture deeper and more complex patterns of association in data (such as multi-level, multi-scale visual semantic relationships in images) and may achieve representational capabilities surpassing classical deep models through the cumulative effects of quantum interference and entanglement.
[0081] The foregoing mainly describes the solutions provided by the embodiments of this application from the perspective of an image recognition device executing an image recognition method. To achieve the above functions, the image recognition device includes corresponding hardware structures and / or software modules for executing each function. Those skilled in the art should readily recognize that, in conjunction with the units and algorithm steps of the various examples described in the embodiments disclosed herein, the embodiments of this application can be implemented in hardware or a combination of hardware and computer software. Whether a function is executed in hardware or by computer software driving hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.
[0082] This application embodiment can divide the image recognition device into functional modules according to the above method example. For example, each function can be divided into a separate functional module, or two or more functions can be integrated into one processing module. The integrated module can be implemented in hardware or as a software functional module. Optionally, the module division in this application embodiment is illustrative and only represents one logical functional division; other division methods may be used in actual implementation. Furthermore, "module" here can refer to an application-specific integrated circuit (ASIC), a circuit, a processor and memory executing one or more software or firmware programs, integrated logic circuits, and / or other devices that can provide the above functions.
[0083] When using functional module division Figure 6 A schematic diagram of an image recognition device is shown. Figure 6 As shown, the image recognition device 60 includes an acquisition module 601 and a processing module 602.
[0084] In some embodiments, the image recognition device 60 may further include a storage module ( Figure 6 (not shown in the image) is used to store program instructions and data.
[0085] The acquisition module 601 is used to acquire the image to be recognized; the processing module 602 is used to vectorize the image to be recognized to obtain the image vector; the processing module 602 is used to input the image vector to be recognized into a preset self-attention image recognition model to obtain the image recognition result; the preset self-attention image recognition model includes a quantum Transformer module and a fully connected layer, and the quantum Transformer module includes a quantum self-attention module and a feedforward neural network.
[0086] Optionally, the processing module 602 is used to perform vectorization processing on the image to be recognized to obtain the target image to be recognized, including: segmenting the image to be recognized into multiple sub-images; performing vectorization processing on each of the multiple sub-images to obtain multiple sub-image vectors; and concatenating the multiple sub-image vectors to obtain the image vector to be recognized.
[0087] Optionally, before acquiring the image to be recognized, the processing module 602 is further configured to: acquire a quantum Transformer module and an original self-attention image recognition model; the original self-attention image recognition model includes a classical Transformer module and a fully connected layer; and replace the classical Transformer module in the original self-attention image recognition model with a quantum Transformer module to obtain a preset self-attention image recognition model.
[0088] Optionally, the processing module 602 is used for the quantum Transformer module, including: connecting the quantum self-attention module and the feedforward neural network to obtain an initial quantum Transformer module; and stacking and connecting multiple initial quantum Transformer modules to obtain a quantum Transformer module.
[0089] All relevant content of each step involved in the above method embodiments can be referenced from the functional description of the corresponding functional module, and will not be repeated here.
[0090] When the functions of the above modules are implemented in hardware... Figure 7 A schematic diagram of yet another image recognition device is shown. For example... Figure 7 As shown, the image recognition device 70 includes a processor 701, a memory 702, and a bus 703. The processor 701 and the memory 702 can be connected via the bus 703.
[0091] The processor 701 is the control center of the image recognition device 70. It can be a single processor or a collective term for multiple processing elements. For example, the processor 701 can be a general-purpose central processing unit (CPU) or other general-purpose processors. Among them, the general-purpose processor can be a microprocessor or any conventional processor.
[0092] As one embodiment, processor 701 may include one or more CPUs, for example Figure 7 CPU 0 and CPU 1 are shown in the diagram.
[0093] The memory 702 may be a read-only memory (ROM) or other type of static storage device capable of storing static information and instructions, random access memory (RAM) or other type of dynamic storage device capable of storing information and instructions, or electrically erasable programmable read-only memory (EEPROM), disk storage media or other magnetic storage devices, or any other medium capable of carrying or storing desired program code in the form of instructions or data structures and accessible by a computer, but is not limited thereto.
[0094] As one possible implementation, the memory 702 can exist independently of the processor 701. The memory 702 can be connected to the processor 701 via a bus 703 and is used to store instructions or program code. When the processor 701 calls and executes the instructions or program code stored in the memory 702, it can implement the image recognition method provided in the embodiments of this application.
[0095] In another possible implementation, the memory 702 can also be integrated with the processor 701.
[0096] Bus 703 can be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus. This bus can be divided into address bus, data bus, control bus, etc. For ease of representation, Figure 7 The bus is represented by a single thick line, but this does not mean that there is only one bus or one type of bus.
[0097] It should be pointed out that, Figure 7 The structure shown does not constitute a limitation on the image recognition device 70. Except... Figure 7 In addition to the components shown, the image recognition device 70 may include more or fewer components than shown, or combine certain components, or have different component arrangements.
[0098] As an example, combined Figure 6 The functions implemented by the acquisition module 601 and the processing module 602 in the image recognition device 60 are the same as those of the image recognition device 60. Figure 7 The processor 701 in it has the same function.
[0099] Optional, such as Figure 7 As shown, the image recognition device 70 provided in this application embodiment may further include a communication interface 704.
[0100] Communication interface 704 is used to connect to other devices via a communication network. This communication network can be Ethernet, a wireless access network, a wireless local area network (WLAN), etc. Communication interface 704 may include a receiving unit for receiving data and a transmitting unit for transmitting data.
[0101] In one possible implementation, the communication interface 704 in the image recognition device 70 provided in this application embodiment can also be integrated into the processor 701, and this application embodiment does not specifically limit this.
[0102] As a possible product form, the image recognition device of this application embodiment can also be implemented using one or more field programmable gate arrays (FPGAs), programmable logic devices (PLDs), controllers, state machines, gate logic, discrete hardware components, any other suitable circuits, or any combination of circuits capable of performing the various functions described throughout this application.
[0103] Through the above description of the embodiments, those skilled in the art will clearly understand that, for the sake of convenience and brevity, only the division of the above functional units is used as an example. In practical applications, the above functions can be assigned to different functional units as needed, that is, the internal structure of the device can be divided into different functional units to complete all or part of the functions described above. The specific working process of the system, device, and unit described above can be referred to the corresponding process in the foregoing method embodiments, and will not be repeated here.
[0104] This application also provides a computer-readable storage medium storing a computer program or instructions thereon, which, when executed, causes a computer to perform the various steps in the method flow shown in the above method embodiments.
[0105] Embodiments of this application provide a computer program product containing instructions that, when executed on a computer, cause the computer to perform the various steps in the method flow shown in the above-described method embodiments.
[0106] This application provides a chip system, including: a processor and an interface circuit; the interface circuit is used to receive computer programs or instructions and transmit them to the processor; the processor is used to execute the computer programs or instructions so that the chip system performs each step in the method flow shown in the above method embodiments.
[0107] The computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media (a non-exhaustive list) include: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), registers, hard disks, optical fibers, compact disc read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing, or any other form of computer-readable storage medium in the art. An exemplary storage medium is coupled to a processor, enabling the processor to read information from and write information to the storage medium. Of course, the storage medium may also be a component of the processor. The processor and the storage medium may reside in a purpose-specific ASIC. In the embodiments of this application, the computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
[0108] Since the image recognition device, computer-readable storage medium, and computer program product provided in this embodiment can be applied to the image recognition method provided in this embodiment, the technical effects they can achieve can also be referred to the above method embodiments. The embodiments of this application will not be repeated here.
[0109] Although this application has been described herein in conjunction with various embodiments, those skilled in the art will understand and implement other variations of the disclosed embodiments by reviewing the accompanying drawings and the disclosure in carrying out the claimed application.
[0110] Although this application has been described in conjunction with specific features and embodiments, it is obvious that various modifications and combinations can be made thereto without departing from the spirit and scope of this application. Accordingly, this specification and drawings are merely illustrative examples of this application and are considered to cover any and all modifications, variations, combinations, or equivalents within the scope of this application. Clearly, those skilled in the art can make various alterations and modifications to this application without departing from the spirit and scope of this application. Thus, if such modifications and modifications of this application fall within the scope of equivalent technology of this application, this application also intends to include such modifications and modifications.
Claims
1. An image recognition method, characterized in that, The method includes: Acquire the image to be recognized; The image to be identified is vectorized to obtain the image vector. The image vector to be recognized is input into a preset self-attention image recognition model to obtain the image recognition result; the preset self-attention image recognition model includes a quantum Transformer module and a fully connected layer, and the quantum Transformer module includes a quantum self-attention module and a feedforward neural network.
2. The method of claim 1, wherein, The image to be identified is vectorized to obtain the target image to be identified, including: The image to be identified is segmented into multiple sub-images; For each of the multiple sub-images, the sub-image is vectorized to obtain multiple sub-image vectors; The multiple sub-image vectors are concatenated to obtain the image vector to be identified.
3. The method according to claim 1 or 2, characterized in that, Before acquiring the image to be identified, the method further includes: Obtain the quantum Transformer module and the original self-attention image recognition model; the original self-attention image recognition model includes a classical Transformer module and a fully connected layer; The classical Transformer module in the original self-attention image recognition model is replaced with the quantum Transformer module to obtain the preset self-attention image recognition model.
4. The method of claim 3, wherein, Obtaining the quantum Transformer module includes: Connect the quantum self-attention module and the feedforward neural network to obtain the initial quantum Transformer module; Multiple initial quantum Transformer modules are stacked and connected to obtain the quantum Transformer module.
5. An image recognition apparatus characterized by comprising: The device includes: an acquisition module and a processing module; The acquisition module is used to acquire the image to be recognized; The processing module is used to perform vectorization processing on the image to be identified to obtain the image vector; The processing module is used to input the image vector to be recognized into a preset self-attention image recognition model to obtain the image recognition result; the preset self-attention image recognition model includes a quantum Transformer module and a fully connected layer, and the quantum Transformer module includes a quantum self-attention module and a feedforward neural network.
6. The apparatus of claim 5, wherein, The processing module is used to perform vectorization processing on the image to be identified to obtain the target image to be identified, including: The image to be identified is segmented into multiple sub-images; For each of the multiple sub-images, the sub-image is vectorized to obtain multiple sub-image vectors; The multiple sub-image vectors are concatenated to obtain the image vector to be identified.
7. The apparatus according to claim 5 or 6, characterized in that, Before acquiring the image to be recognized, the processing module is further configured to: Obtain the quantum Transformer module and the original self-attention image recognition model; the original self-attention image recognition model includes a classical Transformer module and a fully connected layer; The classical Transformer module in the original self-attention image recognition model is replaced with the quantum Transformer module to obtain the preset self-attention image recognition model.
8. The apparatus according to claim 7, characterized in that, The processing module, used in the quantum Transformer module, includes: Connect the quantum self-attention module and the feedforward neural network to obtain the initial quantum Transformer module; Multiple initial quantum Transformer modules are stacked and connected to obtain the quantum Transformer module.
9. An image recognition apparatus characterized by comprising: The image recognition device includes: a processor coupled to a memory for storing programs or instructions, which, when executed by the processor, cause the device to perform the method as described in any one of claims 1 to 4.
10. A computer-readable storage medium having a computer program or instructions stored thereon, characterized in that, When the computer program or instructions are executed, they cause the computer to perform the method as described in any one of claims 1 to 4.