Image recognition method, device, equipment and medium based on bayesian linear
By introducing Bayesian linearity into the channel attention module of the convolutional neural network model, the problems of numerous parameters and difficulty in providing a good prior distribution in the Bayesian convolutional neural network model are solved, enabling effective prediction of image uncertainty and efficient parameter learning.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- GUANGZHOU SHIYUAN ELECTRONICS CO LTD
- Filing Date
- 2021-10-19
- Publication Date
- 2026-06-19
Smart Images

Figure CN116012598B_ABST
Abstract
Description
Technical Field
[0001] The embodiments of the present invention relate to the field of image processing technology, and in particular to image recognition methods, apparatuses, electronic devices and storage media based on Bayesian linearity. Background Technology
[0002] Convolutional Neural Networks (CNNs) train models using backpropagation based on observed data (training samples) to obtain optimal estimates of model parameters, supporting deterministic outputs. While CNNs trained based on these optimal parameter estimates can fit observed data well, they cannot predict unobserved data (test samples) effectively, resulting in overfitting to the existing training data. Although existing regularization methods can alleviate overfitting to some extent, such as early stopping, weight decay, L1-L2 regularization, and dropout, the model itself cannot measure uncertainty. In classification tasks, the softmax function maximizes the output probability score of a given class by compressing the output probability scores of other classes; this probability is not the model's confidence in outputting the given class.
[0003] To improve the generalization ability of CNN models and support model uncertainty measurement, existing research has introduced Bayesian methods, resulting in BCNN (Bayesian Convolutional Neural Network), which transforms the estimation of the optimal point of model parameters into the estimation of the distribution of model parameters. BCNN first provides a prior distribution for the parameters, then performs gradient approximation estimation using variational inference, learning the posterior distribution of the fitted parameters based on observed data (training samples). This learned posterior distribution is used to infer the results from unobserved data (test samples).
[0004] When the inventors used BCNN to build a network model for image recognition, they found that although the BCNN model supports uncertainty estimation, it still has problems such as many parameters, difficulty in giving a good prior distribution, and approximate gradient estimation. Summary of the Invention
[0005] This invention provides a Bayesian linear image recognition method, apparatus, electronic device, and storage medium to solve the technical problems of existing Bayesian convolutional neural network models having many parameters, difficulty in giving a good prior distribution, and approximate gradient estimation.
[0006] In a first aspect, embodiments of the present invention provide an image recognition method based on Bayesian linearity, comprising:
[0007] Acquire the image to be recognized;
[0008] The image to be recognized is input into a pre-trained convolutional neural network model for target recognition. The convolutional neural network model includes a channel attention module based on Bayesian linearity. The channel attention module is used to perform average pooling on the initial multidimensional features of the input to obtain pooled features, perform Bayesian linearity on the pooled features to obtain linear features, expand the linear features to the same dimension as the initial multidimensional features, and multiply them with the initial multidimensional features to obtain the output multidimensional features of the channel attention module. The convolutional neural network module performs image recognition on the image to be recognized based on the output multidimensional features.
[0009] Output the recognition result of the convolutional neural network model on the image to be recognized.
[0010] Secondly, embodiments of the present invention also provide an image recognition device based on Bayesian linearity, comprising:
[0011] An image acquisition unit is used to acquire the image to be recognized.
[0012] An image recognition unit is used to input the image to be recognized into a pre-trained convolutional neural network model for target recognition. The convolutional neural network model includes a channel attention module based on Bayesian linearity. The channel attention module is used to perform average pooling on the initial multidimensional features of the input to obtain pooled features, perform Bayesian linearity on the pooled features to obtain linear features, expand the linear features to the same dimension as the initial multidimensional features, and multiply them with the initial multidimensional features to obtain the output multidimensional features of the channel attention module. The convolutional neural network module performs image recognition on the image to be recognized based on the output multidimensional features.
[0013] The result output unit is used to output the recognition result of the convolutional neural network model on the image to be recognized.
[0014] Thirdly, embodiments of the present invention also provide an electronic device, comprising:
[0015] One or more processors;
[0016] Memory, used to store one or more programs;
[0017] When the one or more programs are executed by the one or more processors, the electronic device enables the Bayesian linear-based image recognition method as described in the first aspect.
[0018] Fourthly, embodiments of the present invention also provide a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the Bayesian linear-based image recognition method as described in the first aspect.
[0019] The aforementioned image recognition method, apparatus, electronic device, and storage medium based on Bayesian linearity, in which the method involves acquiring an image to be recognized; inputting the image to be recognized into a pre-trained convolutional neural network model for target recognition, wherein the convolutional neural network model includes a channel attention module based on Bayesian linearity, the channel attention module being used to perform average pooling on the initial multidimensional features of the input to obtain pooled features, performing Bayesian linearity on the pooled features to obtain linear features, expanding the linear features to the same dimension as the initial multidimensional features, and multiplying them by the initial multidimensional features to obtain the output multidimensional features of the channel attention module, the convolutional neural network module performing image recognition on the image to be recognized based on the output multidimensional features, and outputting the recognition result of the convolutional neural network model on the image to be recognized. By adding Bayesian linearity to the channel attention module of the model, a channel attention mechanism based on Bayesian linearity is constructed in the model, enabling the capture of uncertainty of locally important information, thereby solving the problems of existing Bayesian convolutional neural network models having many parameters, difficulty in giving a good prior distribution, and approximate gradient estimation, and can effectively predict the uncertainty of the entire image. Attached Figure Description
[0020] Figure 1 A flowchart of an image recognition method based on Bayesian linearity provided in this embodiment of the invention;
[0021] Figure 2 A schematic diagram illustrating the principle of the channel attention module provided in an embodiment of the present invention;
[0022] Figure 3 This is a schematic diagram of the structure of an image recognition device based on Bayesian linearity provided in an embodiment of the present invention;
[0023] Figure 4 This is a schematic diagram of the structure of an electronic device provided in an embodiment of the present invention. Detailed Implementation
[0024] The present invention will now be described in further detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are for illustrative purposes only and not for limiting the invention. Furthermore, it should be noted that, for ease of description, the accompanying drawings show only the parts relevant to the present invention and not the entire structure.
[0025] In the field of image recognition processing, uncertainty is an indicator of how certain an image recognition model is in its predictions. In Bayesian models, there are two main types of uncertainty: random uncertainty and cognitive uncertainty. Random uncertainty stems from inherent noise in the observed data. This uncertainty exists in the data acquisition methods, such as sensor noise or motion noise uniformly distributed along the dataset, and cannot be eliminated by increasing the dataset size. Cognitive uncertainty, on the other hand, arises because the model has not learned from enough samples. This uncertainty is ubiquitous because the observed data (training samples) cannot fully cover the features of the unobserved data (test samples). Increasing the dataset size can reduce this uncertainty.
[0026] For example, when taking an image roughly in one direction of the road, the captured image may contain elements such as vehicles, intersections, sidewalks, traffic lights, pedestrians, trees, and buildings. When segmenting these elements using an image recognition model, segmentation biases may occur. For instance, due to variations in shooting distance and viewpoint, vehicle labels may contain noise; this random uncertainty can lead to poor vehicle segmentation. Similarly, if sidewalks are rarely seen in the training set, or if the image recognition model used to fit the training set is not well-suited for the segmentation task, this cognitive uncertainty may result in poor sidewalk segmentation.
[0027] For image recognition models, random uncertainty is inherent and objective; while cognitive uncertainty is subjective and can be eliminated. Generally, eliminating uncertainty in image recognition models involves two aspects: firstly, increasing the dataset, such as adding various types of pedestrian crossings to images captured in the road scene mentioned earlier; and secondly, designing a model that can better fit the training set, i.e., changing the model. In reality, no matter how much the number of samples is increased, the observable data is always finite and cannot be exhaustive, and data collection is costly. As for changing the model, it can only be a model that can better fit the observed data (training samples), and cannot support model uncertainty estimation.
[0028] From the perspective of data distribution, the model uncertainty inherent in cognitive uncertainty is essentially the result of training to fit a distribution that reflects the features of the training set, while the features of the test samples to be inferred may not conform to this distribution. In this case, the learned model parameters carry significant uncertainty for prediction. The introduction of Bayesian methods in existing technologies transforms the optimal estimation of parameters during image recognition model training into distribution estimation, leading to uncertain estimation during inference. Estimating all parameters of the entire image recognition model using distributions results in problems such as numerous parameters and approximate gradient estimation. Therefore, this proposal suggests another image recognition method based on Bayesian linearity, which learns the distribution of parameters only for the attention module in the image recognition model, thereby measuring the uncertainty of locally important information and addressing the problems of numerous parameters, difficulty in providing good prior distributions, and approximate gradient estimation in existing solutions.
[0029] It should be noted that, due to space limitations, this application specification does not exhaustively list all possible implementation methods. Those skilled in the art should be able to conceive after reading this application specification that, as long as the technical features do not contradict each other, any combination of technical features can constitute an optional implementation method.
[0030] The embodiments are described in detail below.
[0031] Figure 1 A flowchart of a Bayesian linear image recognition method provided in this embodiment of the invention is shown. This Bayesian linear image recognition method is used in electronic devices. As shown in the figure, the Bayesian linear image recognition method includes:
[0032] Step S110: Obtain the image to be recognized.
[0033] This solution is used for image recognition, specifically real-time image recognition or recognition based on user needs. For real-time recognition, the image acquired by the image acquisition device is used as the image to be recognized. The subsequent recognition process begins immediately after acquisition, although the recognition result may lag slightly behind the image acquisition time; it is still considered real-time recognition. For example, in the security field, real-time recognition of images captured by surveillance cameras is performed. For recognition based on user needs, the user-input image to be recognized is acquired. This image is usually a pre-prepared image by the user. After acquiring the image, the subsequent recognition process is performed, and the corresponding recognition result is output. For example, if the user needs to search for an image based on intent, or search for information based on an image, this is possible.
[0034] Step S120: Input the image to be recognized into a pre-trained convolutional neural network model for target recognition. The convolutional neural network model includes a channel attention module based on Bayesian linearity. The channel attention module is used to perform average pooling on the initial multidimensional features of the input to obtain pooled features, perform Bayesian linearity on the pooled features to obtain linear features, expand the linear features to the same dimension as the initial multidimensional features, and multiply them with the initial multidimensional features to obtain the output multidimensional features of the channel attention module. The convolutional neural network module performs image recognition on the image to be recognized based on the output multidimensional features.
[0035] The channel attention module in the pre-trained convolutional neural network model incorporates Bayesian linearity. Overall, through this Bayesian linearity-based channel attention mechanism, it effectively supports the capture of uncertainties regarding locally important information. Even when the training dataset does not exhaustively contain all "non-target" samples, it can still accurately identify the target from the image to be recognized. Figure 2 As shown, the Bayesian linear channel attention module takes C×H×W dimensional features as input, performs average pooling to obtain C×1 dimensional features, applies Bayesian linearity, and then expands them to the same size as the input features after passing through a Sigmoid function. These features are then multiplied and output. This Bayesian linear channel attention module can be embedded into any existing CNN backbone to learn the probability distribution of locally important information. The channel attention module in this scheme can be configured according to the selected network structure. The configuration of channel attention modules in networks is relatively common in existing technologies. This scheme does not specifically explain other layer modules or the cooperation between the channel attention module and other layer modules.
[0036] Cognitive uncertainty (model uncertainty) essentially stems from the fact that the observed data (training samples) is always finite, and the features learned by the image recognition model are insufficient, leading to uncertainty in the model's predictions of unobserved data (test samples). For example, a current image recognition model, trained on an existing network, is used to identify "hot dogs" in images. However, this model, trained using existing methods, has not been trained on "non-hot dog" images. Compared to images of real hot dogs with ketchup, the existing model might predict a leg or banana with ketchup as a hot dog. In fact, because it's impossible to exhaustively represent all "non-hot dog" images, model uncertainty must be addressed through the model itself, not simply by increasing the dataset. The quantity of training samples determines the model's generalization ability. Bayesian models have an advantage with small datasets because they add a prior distribution to each weight and bias parameter of the model and approximate the posterior distribution during training with a limited number of samples. The posterior distribution more accurately reflects the characteristics of the overall sample. BCNN models learn the distribution of parameters on a limited set of training samples, making the model's parameter distribution approximate the distribution of the overall sample, thus enabling prediction of uncertainties in unobserved data. However, a good prior distribution definition is highly dependent on domain knowledge, and learning the parameter distribution of the entire model is relatively expensive in terms of efficiency. To address this high dependence on domain knowledge and efficiency, this approach only learns the attention module in the model, making the distribution of locally important information in the limited training samples approximate the distribution of the overall sample.
[0037] Attention mechanisms guide computational resources towards the most informative parts of the input signal. In CNNs, channel attention selectively enhances informative features and suppresses useless features by capturing dependencies between channels. Existing channel attention modules either use MLPs (multilayer perceptrons) to capture global dependencies between channels, but this reduces dimensionality; or they use convolutions to capture local dependencies between channels, resulting in unreduced dimensionality and fewer parameters. This approach uses Bayesian linearity to capture local channel relationships. The Bayesian linearity-based channel attention module can be embedded into any existing CNN backbone to learn the probability distribution of locally important information.
[0038] Compared to traditional convolutions that optimize parameter values based on training samples, this approach uses a Bayesian linear channel attention module that learns the parameter distribution based on training samples to approximate the distribution of the overall samples. This represents a shift from point-based parameter learning in traditional convolutions to learning the parameter distribution. A Bayesian linear can be equivalent to an infinite number of identically distributed traditional linears, with this distribution determined by the training samples, serving as an approximation of the overall sample distribution.
[0039] In existing Bayesian-based models, neural network optimization strategies are divided into MLE (Maximize Likelihood Estimation) and MAP (Maximize Aposterior Estimation), with MAP adding the prior distribution of parameters. Viewing Bayesian linearity as a probabilistic model, i.e., p(y|x,ω), where y is the output given input x and parameter ω, model training learns the parameter ω based on observed data (training samples) (x,y), enabling ω to predict unobserved data. Learning the parameter ω based on observed data D can be expressed using Bayes' theorem as the relationship between the posterior, the likelihood function, and the prior, i.e. Where p(y|x,ω) is the likelihood function, p(ω) is the prior distribution of the model parameter ω, and the marginal probability in the denominator can be regarded as a normalization constant and can be removed.
[0040] Based on Bayes' theorem, the goal of model training is to obtain the posterior distribution. The objective of the MLE method is to maximize the likelihood function, i.e. The goal of the MAP method is to maximize the posterior distribution, i.e.:
[0041]
[0042] The first term corresponds to maximizing the likelihood function, and the second term is the regularization term for the parameters. If it's a Gaussian prior, the second term is equivalent to L2 regularization; if it's a Laplace prior, the second term is equivalent to L1 regularization. When log p(y|x,ω) is differentiable with respect to the parameter ω, gradient descent (backpropagation) can be used to update the parameters. The mean squared error loss is used as an example to illustrate the MAP method.
[0043] Suppose the likelihood function p(y|x,ω)=N(y|f(x,w),β -1 ), which follows a mean of f(x,w) and a variance of β. -1 The likelihood function can be understood as a linear regression with equal variance, i.e., y = f(x, w) + ε, ε ~ N(0, β). -1 ), where ε is the variance between the model's predicted results and the actual results, following a formula with a mean of 0 and a variance of β. -1 The distribution is Gaussian. Generally, in the convolution operation, f(x,w) = x·w, that is, the matrix multiplication of the input and the parameter weights. Assume the parameter ω is Gaussian prior, i.e., p(ω) = N(0,α). -1 ),but:
[0044]
[0045] In the loss function, the first term is classic linear regression, and the second term is L2 regularization. This refers to the regularization coefficient. MLE and MAP parameter estimation methods rely on optimal point estimation, while Bayesian inference methods calculate the Bayesian posterior distribution p(ω|x,y) of the parameters on the observed data, using the expectation to predict unobserved data. tags Right now Using the expectation of the likelihood function to predict unobserved data is equivalent to an ensemble of infinitely many identically distributed maximum likelihood functions. The integral nature of the expected prediction makes the model difficult to handle.
[0046] In this scheme, the objective function of the channel attention module is to minimize the distribution of model parameters and the KL divergence of the true Bayesian posterior distribution.
[0047] The training objective of a convolutional neural network model is to learn a distribution of model parameters ω, with parameters θ, i.e., p(ω|θ). KL divergence is used to measure the difference between two distributions; therefore, the objective function is to minimize the KL divergence between the distribution q(ω|θ) of the model parameters ω and the true Bayesian posterior distribution p(ω|x,y). Specifically, the objective function is expressed as:
[0048]
[0049] Where q(ω|θ) represents the distribution of model parameters ω with respect to parameter θ, p(ω) represents the prior distribution of model parameters ω, p(y|x,ω) represents the probability model corresponding to the Bayesian linear relationship, and y represents the output under the condition of input x and minimizing model parameters ω. The calculation process of the objective function is as follows:
[0050]
[0051] The purpose of this cost function is to learn the distribution parameters θ such that p(ω|θ) approximates the true Bayesian posterior distribution p(ω|x,y). The derivation transforms this into the following: the first term is the KL divergence between the distribution p(ω|θ) to be learned and the prior p(ω) of the model parameters ω, and its cost is related to the prior; the second term is the expectation of the likelihood function, and its cost is related to the data.
[0052] To further reduce the computational cost of minimizing this cost function, a variational approximation is employed. Under specific conditions, the expected derivative can be expressed as the expectation of the derivative. Based on the unbiased Monte Carlo gradient, the objective function can be expressed as:
[0053] l(θ)≈log p(ω|θ)-log p(ω)-log p(y|x,ω)
[0054] Where p(ω|θ) represents the distribution of the model parameter ω with respect to the parameter θ, p(ω) represents the prior distribution of the model parameter ω, p(y|x,ω) represents the probability model of the Bayesian linear correspondence, and y represents the output under the conditions of input x and minimizing the model parameter ω.
[0055] The minimized model parameter ω is distributed with respect to parameter θ as a diagonal Gaussian distribution, the sampling of the model parameter ω is standard Gaussian, and the parameter θ = (μ, ρ), where:
[0056]
[0057]
[0058] Where λ is the learning rate, and the parameter θ includes the mean μ and the standard deviation σ = log(1 + exp(ρ)).
[0059] This representation of the objective function based on the unbiased Monte Carlo gradient is an approximate representation of the cost function. Assuming the target p(ω|θ) to be learned follows a diagonal Gaussian distribution, the model parameter ω can be sampled using a standard Gaussian distribution. The diagonal Gaussian distribution parameters θ include the mean μ and standard deviation σ = log(1 + exp(ρ)), so the variational posterior parameters to be learned are θ = (μ, ρ). Therefore, the sampling of the model parameter ω is transformed into:
[0060]
[0061] In this way, each backpropagation updates μ and ρ, causing the distribution p(ω|θ) to continuously approximate the true Bayesian posterior distribution under training with the observed data. Thus, the parameters μ and ρ of the distribution p(ω|θ) to be learned can be calculated:
[0062]
[0063]
[0064] Here, λ is the learning rate, the cost function l(θ) is a function of the model parameters ω, and ω is a function of the parameters μ and ρ, which are distributed by the model. Using variational approximation, these parameters can be updated in a general network backpropagation. The difference is that traditional convolutional layers update the parameters ω, while Bayesian linear layers update the parameters based on the distribution θ = (μ, ρ).
[0065] The gradient of the cost function l(θ) with respect to θ = (μ, ρ) is given by the prior, where the log p(ω) term is independent of the distribution parameters and has a gradient of 0. This means that the given prior distribution does not constrain the model training; that is, the distribution p(ω|θ) to be learned is data-driven and independent of the prior. Overall, in this scheme, with limited observation data, the trained model drives the parameter distribution to approximate the true population sample distribution.
[0066] Step S130: Output the recognition result of the convolutional neural network model for the image to be recognized.
[0067] In this scheme, a pre-trained convolutional neural network model is used to perform target recognition on the image to be recognized, and then the recognition result is output. Specifically, the region where the recognized target is located can be marked in the image to be recognized in different ways, or the image to be recognized can be segmented based on the recognized target. The specific presentation methods have been implemented in many existing technologies, and will not be repeated here.
[0068] Overall, cognitive uncertainty stems from the fact that observed data is always limited. The limited samples learned by the model cannot fully cover the features of the overall sample, thus making predictions about unobserved data inherently uncertain. One approach to addressing this uncertainty is to introduce Bayesian parameter estimation methods. However, due to parameter inflation, existing Bayesian-based models are costly and inefficient in learning parameter distributions. This solution applies Bayesian linearity to the channel attention module of the model, guiding it to focus on locally discriminative information. This allows for the capture of uncertainty in locally important features, effectively predicting the uncertainty of the entire image. Furthermore, it can capture global channel dependencies while simultaneously performing uncertainty prediction, even under dimensionality unreduction.
[0069] The above method involves acquiring an image to be identified; inputting the image to be identified into a pre-trained convolutional neural network (CNN) model for target recognition. The CNN model includes a channel attention module based on Bayesian linearity. This module performs average pooling on the initial multidimensional features to obtain pooled features, applies Bayesian linearity to the pooled features to obtain linear features, expands the linear features to the same dimension as the initial multidimensional features, and multiplies them to obtain the output multidimensional features of the channel attention module. The CNN module then performs image recognition on the image to be identified based on these output multidimensional features, outputting the recognition result of the CNN model. By incorporating Bayesian linearity into the model's channel attention module, a Bayesian linearity-based channel attention mechanism is constructed within the model, enabling the capture of uncertainties in locally important information. This addresses the problems of existing Bayesian CNN models having many parameters, difficulty in providing a good prior distribution, and approximate gradient estimation, effectively predicting the uncertainty of the entire image.
[0070] Figure 3 This is a schematic diagram of a Bayesian linear image recognition device provided in an embodiment of the present invention. (Reference) Figure 3 The Bayesian linear image recognition device includes an image acquisition unit 210, an image recognition unit 220, and a result output unit 230.
[0071] The image acquisition unit 210 is used to acquire an image to be recognized; the image recognition unit 220 is used to input the image to be recognized into a pre-trained convolutional neural network model for target recognition. The convolutional neural network model includes a channel attention module based on Bayesian linearity. The channel attention module is used to perform average pooling on the initial multidimensional features of the input to obtain pooled features, perform Bayesian linearity on the pooled features to obtain linear features, expand the linear features to the same dimension as the initial multidimensional features, and multiply them with the initial multidimensional features to obtain the output multidimensional features of the channel attention module. The convolutional neural network module performs image recognition on the image to be recognized based on the output multidimensional features; the result output unit 230 is used to output the recognition result of the convolutional neural network model on the image to be recognized.
[0072] Based on the above embodiments, the objective function of the channel attention module is to minimize the distribution of model parameters and the KL divergence of the true Bayesian posterior distribution.
[0073] Based on the above embodiments, the objective function is expressed as:
[0074]
[0075] Where q(ω|θ) represents the distribution of model parameter ω with respect to parameter θ, p(ω) represents the prior distribution of model parameter ω, p(y|x,ω) represents the probability model of the Bayesian linear correspondence, and y represents the output under the condition of input x and minimizing model parameter ω.
[0076] Based on the above embodiments, the objective function is expressed as:
[0077] l(θ)≈log p(ω|θ)-log p(ω)-log p(y|x,ω)
[0078] Where p(ω|θ) represents the distribution of the model parameter ω with respect to the parameter θ, p(ω) represents the prior distribution of the model parameter ω, p(y|x,ω) represents the probability model of the Bayesian linear correspondence, and y represents the output under the conditions of input x and minimizing the model parameter ω.
[0079] Based on the above embodiments, the distribution of the minimized model parameter ω with respect to the parameter θ is a diagonal Gaussian distribution, the sampling of the model parameter ω is a standard Gaussian, and the parameter θ = (μ, ρ), where:
[0080]
[0081]
[0082] Where λ is the learning rate, and the parameter θ includes the mean μ and the standard deviation σ = log(1 + exp(ρ)).
[0083] The Bayesian linear image recognition device provided in this embodiment of the invention is included in the electronic device of the device and can be used to execute any of the Bayesian linear image recognition methods provided in the above embodiments, and has corresponding functions and beneficial effects.
[0084] It is worth noting that in the above embodiments of the Bayesian linear image recognition device, the various units and modules included are only divided according to functional logic, but are not limited to the above division, as long as the corresponding functions can be achieved; in addition, the specific names of each functional unit are only for easy differentiation and are not used to limit the scope of protection of the present invention.
[0085] Figure 4 This is a schematic diagram of the structure of an electronic device provided in an embodiment of the present invention. Figure 4 As shown, the electronic device includes a processor 310, a memory 320, an input device 330, an output device 340, and a communication device 350; the number of processors 310 in the electronic device can be one or more. Figure 4Taking a processor 310 as an example; the processor 310, memory 320, input device 330, output device 340, and communication device 350 in the electronic device can be connected via a bus or other means. Figure 4 Taking the example of a connection between China and Israel via a bus.
[0086] The memory 320, as a computer-readable storage medium, can be used to store software programs, computer-executable programs, and modules, such as the program instructions / modules corresponding to the Bayesian linear image recognition method in this embodiment of the invention (e.g., the image acquisition unit 210, image recognition unit 220, and result output unit 230 in the Bayesian linear image recognition device). The processor 310 executes various functional applications and data processing of the electronic device by running the software programs, instructions, and modules stored in the memory 320, thereby realizing the aforementioned Bayesian linear image recognition method.
[0087] The memory 320 may primarily include a program storage area and a data storage area. The program storage area may store the operating system and at least one application program required for a given function; the data storage area may store data created based on the use of the electronic device. Furthermore, the memory 320 may include high-speed random access memory and non-volatile memory, such as at least one disk storage device, flash memory device, or other non-volatile solid-state storage device. In some instances, the memory 320 may further include memory remotely located relative to the processor 310, which can be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
[0088] Input device 330 can be used to receive input digital or character information, and to generate key signal inputs related to user settings and function control of the electronic device. Output device 340 may include display devices such as a display screen.
[0089] The aforementioned electronic device includes a Bayesian linear image recognition device, which can be used to execute any Bayesian linear image recognition method and has corresponding functions and beneficial effects.
[0090] This invention also provides a storage medium containing computer-executable instructions. When executed by a computer processor, the computer-executable instructions are used to perform relevant operations in the Bayesian linear image recognition method provided in any embodiment of this application, and have corresponding functions and beneficial effects.
[0091] Those skilled in the art will understand that embodiments of this application may be provided as methods, systems, or computer program products.
[0092] Therefore, this application may take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application may take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code. This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It should be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, produce implementations of the flowchart... Figure 1 One or more processes and / or boxes Figure 1 The computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The functions specified in one or more boxes. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable apparatus for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.
[0093] In a typical configuration, a computing device includes one or more processors (CPUs), input / output interfaces, network interfaces, and memory. Memory may include non-persistent memory in computer-readable media, such as random access memory (RAM) and / or non-volatile memory, such as read-only memory (ROM) or flash RAM. Memory is an example of computer-readable media.
[0094] Computer-readable media includes both permanent and non-permanent, removable and non-removable media that can store information using any method or technology. Information can be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic magnetic disk storage or other magnetic storage devices, or any other non-transferable medium that can be used to store information accessible by a computing device. As defined herein, computer-readable media does not include transient computer-readable media, such as modulated data signals and carrier waves.
[0095] It should also be noted that the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such process, method, article, or apparatus. Unless otherwise specified, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes that element.
[0096] Note that the above description is merely a preferred embodiment of the present invention and the technical principles employed. Those skilled in the art will understand that the present invention is not limited to the specific embodiments described herein, and various obvious changes, readjustments, and substitutions can be made without departing from the scope of protection of the present invention. Therefore, although the present invention has been described in detail through the above embodiments, the present invention is not limited to the above embodiments, and may include many other equivalent embodiments without departing from the concept of the present invention, the scope of which is determined by the scope of the appended claims.
Claims
1. A method of image recognition based on Bayesian linearism, characterized in that, include: Acquire the image to be recognized; The image to be recognized is input into a pre-trained convolutional neural network model for target recognition. The convolutional neural network model includes a channel attention module based on Bayesian linearity. The channel attention module performs average pooling on the initial multidimensional features of the input to obtain pooled features, performs Bayesian linearity on the pooled features to obtain linear features, expands the linear features to the same dimension as the initial multidimensional features, and multiplies them with the initial multidimensional features to obtain the output multidimensional features of the channel attention module. The convolutional neural network model performs image recognition on the image to be recognized based on the output multidimensional features. The objective function of the channel attention module is to minimize the distribution of model parameters and the KL divergence of the true Bayesian posterior distribution. Output the recognition result of the convolutional neural network model on the image to be recognized.
2. The method of claim 1, wherein, The objective function is expressed as: in, Represents model parameters Regarding parameters The distribution, Represents model parameters The prior distribution, express and of Divergence; This represents the probability model corresponding to the Bayesian linear layer. Indicates input and minimizing model parameters Output under the condition, express The expectation of the likelihood function.
3. The method of claim 1, wherein, The objective function is expressed as: in, Minimize model parameters Regarding parameters The distribution, Represents model parameters The prior distribution, This represents the probability model corresponding to the Bayesian linear layer. Indicates input and minimizing model parameters The output under the given conditions.
4. The method according to claim 3, characterized in that, the minimization model parameters about the parameters the distribution of the model parameters the sampling of the parameters wherein: wherein, is the learning rate, parameters include the mean and standard deviation , is the variance of the model's predicted results and the true results.
5. An image recognition apparatus based on Bayesian linearization, characterized by, include: An image acquisition unit is used to acquire the image to be recognized. An image recognition unit is used to input the image to be recognized into a pre-trained convolutional neural network model for target recognition. The convolutional neural network model includes a channel attention module based on Bayesian linearity. The channel attention module performs average pooling on the initial multidimensional features of the input to obtain pooled features, performs Bayesian linearity on the pooled features to obtain linear features, expands the linear features to the same dimension as the initial multidimensional features, and multiplies them with the initial multidimensional features to obtain the output multidimensional features of the channel attention module. The convolutional neural network model performs image recognition on the image to be recognized based on the output multidimensional features. The objective function of the channel attention module is to minimize the distribution of model parameters and the KL divergence of the true Bayesian posterior distribution. The result output unit is used to output the recognition result of the convolutional neural network model on the image to be recognized.
6. The apparatus of claim 5, wherein, The objective function is expressed as: in, Represents model parameters Regarding parameters The distribution Represents model parameters The prior distribution, express and of Divergence; This represents the probability model corresponding to the Bayesian linear layer. Indicates input and minimizing model parameters Output under the given conditions express The expectation of the likelihood function.
7. An electronic device, characterized in that, include: One or more processors; Memory, used to store one or more programs; When the one or more programs are executed by the one or more processors, the electronic device implements the Bayesian linear-based image recognition method as described in any one of claims 1-4.
8. A computer-readable storage medium having stored thereon a computer program, characterized in that, When executed by a processor, the program implements the Bayesian linear-based image recognition method as described in any one of claims 1-4.