Image enhancement method, apparatus, device, and medium

This image enhancement method, which utilizes a multi-scale feature fusion network and a self-attention mechanism, addresses the issues of detail loss and noise amplification in existing image enhancement algorithms, achieving efficient image quality improvement.

CN116797890BActive Publication Date: 2026-06-16BEIJING ZITIAO NETWORK TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING ZITIAO NETWORK TECH CO LTD
Filing Date
2022-03-11
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

In existing image enhancement techniques, convolutional neural network algorithms with encoder-decoder structures are prone to losing image details, while transform-type algorithms are fast but have significant limitations and are prone to amplifying noise, resulting in poor image enhancement effects.

Method used

A multi-scale feature fusion network is used to extract and fuse features at multiple scales in the image. By dynamically combining feature maps of different scales through a self-attention mechanism, and combining a selective feature fusion module and an attention module, the image quality is gradually improved.

Benefits of technology

It effectively improves image quality, preserves detail information, reduces noise amplification, and increases image processing speed and efficiency.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116797890B_ABST
    Figure CN116797890B_ABST
Patent Text Reader

Abstract

Embodiments of the present disclosure relate to an image enhancement method, device, equipment and medium. The method comprises: obtaining an original image to be processed; inputting the original image into a pre-trained image enhancement model; wherein the image enhancement model comprises a multi-scale feature fusion network; performing multi-scale feature extraction on the input image through the multi-scale feature fusion network to obtain initial feature maps of multiple scales; fusing the initial feature maps of multiple scales to obtain intermediate state feature maps; fusing the intermediate state feature maps to obtain an output feature map of the multi-scale feature fusion network; and obtaining a quality-enhanced image based on the output feature map of the multi-scale feature fusion network and the original image. The above method can effectively improve the image picture quality.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of image processing technology, and in particular to an image enhancement method, apparatus, device and medium. Background Technology

[0002] Image enhancement technology can improve image quality and enhance the visual experience of images, and is widely applicable to various image processing scenarios that require improved image quality.

[0003] There are two main approaches to existing image enhancement techniques. One approach uses convolutional neural network algorithms with an encoder-decoder structure, which requires frequent upsampling and downsampling and is prone to losing image details. The other approach uses transform-based algorithms, but transform algorithms have significant limitations and are prone to amplifying noise. Therefore, existing image enhancement techniques still need improvement. Summary of the Invention

[0004] In order to solve the above-mentioned technical problems, or at least partially solve the above-mentioned technical problems, this disclosure provides an image enhancement method, apparatus, device and medium.

[0005] In a first aspect, embodiments of this disclosure provide an image enhancement method, the method comprising: acquiring an original image to be processed; inputting the original image into a pre-trained image enhancement model; wherein the image enhancement model includes a multi-scale feature fusion network; performing multi-scale feature extraction on the input image through the multi-scale feature fusion network to obtain initial feature maps of multiple scales, fusing the initial feature maps of multiple scales to obtain multiple intermediate feature maps; fusing the multiple intermediate feature maps to obtain an output feature map of the multi-scale feature fusion network; wherein the input image is obtained based on the original image; and obtaining an image quality enhancement image based on the output feature map of the multi-scale feature fusion network and the original image.

[0006] Secondly, embodiments of this disclosure also provide an image enhancement apparatus, comprising: an image acquisition module for acquiring an original image to be processed; a model input module for inputting the original image into a pre-trained image enhancement model; wherein the image enhancement model includes a multi-scale feature fusion network; a multi-scale fusion module for performing multi-scale feature extraction on the input image through the multi-scale feature fusion network to obtain initial feature maps of multiple scales, fusing the initial feature maps of multiple scales to obtain multiple intermediate feature maps; fusing the multiple intermediate feature maps to obtain an output feature map of the multi-scale feature fusion network; wherein the input image is obtained based on the original image; and an enhanced image acquisition module for obtaining an image quality enhancement image based on the output feature map of the multi-scale feature fusion network and the original image.

[0007] Thirdly, embodiments of this disclosure also provide an electronic device, the electronic device comprising: a processor; a memory for storing executable instructions of the processor; the processor being configured to read the executable instructions from the memory and execute the instructions to implement the image enhancement method as provided in embodiments of this disclosure.

[0008] Fourthly, embodiments of this disclosure also provide a computer-readable storage medium storing a computer program for performing the image enhancement method provided in embodiments of this disclosure.

[0009] The technical solution provided in this disclosure allows the original image to be input into a pre-trained image enhancement model, which includes a multi-scale feature fusion network. The multi-scale feature fusion network then extracts multi-scale features from the input image (obtained from the original image) to obtain initial feature maps at multiple scales. These initial feature maps are then fused to obtain multiple intermediate feature maps. The multiple intermediate feature maps are then fused to obtain the output feature map of the multi-scale feature fusion network. Finally, the image with enhanced quality is obtained based on the output feature map of the multi-scale feature fusion network and the original image. By using the multi-scale feature-based stepwise fusion method provided in this disclosure, image features can be fully extracted and utilized, effectively improving image quality.

[0010] It should be understood that the description in this section is not intended to identify key or essential features of the embodiments of this disclosure, nor is it intended to limit the scope of this disclosure. Other features of this disclosure will become readily apparent from the following description. Attached Figure Description

[0011] The accompanying drawings, which are incorporated in and form a part of this specification, illustrate embodiments consistent with this disclosure and, together with the description, serve to explain the principles of this disclosure.

[0012] To more clearly illustrate the technical solutions in the embodiments of this disclosure or the prior art, the accompanying drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0013] Figure 1 A schematic flowchart of an image enhancement method provided in an embodiment of this disclosure;

[0014] Figure 2 A schematic diagram of a selective feature fusion module provided in an embodiment of this disclosure;

[0015] Figure 3 A schematic diagram of an attention module provided in an embodiment of this disclosure;

[0016] Figure 4 A structural diagram of a multi-scale feature fusion network provided in an embodiment of this disclosure;

[0017] Figure 5 This is a schematic diagram of the structure of an image enhancement model provided in an embodiment of the present disclosure;

[0018] Figure 6 This is a schematic diagram of the structure of an image enhancement device provided in an embodiment of the present disclosure;

[0019] Figure 7 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this disclosure. Detailed Implementation

[0020] To better understand the above-mentioned objectives, features, and advantages of this disclosure, the solutions disclosed herein will be further described below. It should be noted that, unless otherwise specified, the embodiments and features described herein can be combined with each other.

[0021] Numerous specific details are set forth in the following description in order to provide a full understanding of this disclosure, but this disclosure may also be implemented in other ways different from those described herein; obviously, the embodiments in the specification are only some, and not all, of the embodiments of this disclosure.

[0022] Existing image enhancement algorithms are mainly divided into two categories. The first category is convolutional neural network algorithms with an encoder-decoder structure. These algorithms mainly use an encoder to convolve and downsample the original image to extract low-order and high-order features, and then use a decoder to upsample and restore spatial resolution, generating an enhanced image pixel by pixel. Although these algorithms can be used end-to-end for various tasks, they are computationally intensive, time-consuming, and difficult to implement in real time. Furthermore, the frequent upsampling and downsampling can lead to a loss of detail and reduced sharpness in the enhanced image, resulting in unsatisfactory image quality. The second category is transform-based algorithms. These typically first downsample the original image, then use a lightweight convolutional neural network structure to extract features from the low-resolution image, predicting transform coefficients (such as affine transform coefficients). These coefficients are then upsampled using methods such as bilateral grids to recover the transform coefficients of the entire image, which are finally applied to the original image to generate the final enhanced image. Although it is fast, the transformation algorithm has significant limitations, with poor learning ability and robustness, and is prone to amplifying noise.

[0023] To improve at least one of the above problems, this disclosure provides an image enhancement method, apparatus, device, and medium, which are described below.

[0024] First, this disclosure provides an image enhancement method, which can be executed by an image enhancement device, which can be implemented in software and / or hardware, and is generally integrated into an electronic device. Figure 1 This is a flowchart illustrating an image enhancement method provided in an embodiment of the present disclosure. The method mainly includes the following steps S102 to S108:

[0025] Step S102: Obtain the original image to be processed. This original image is the image whose image quality needs to be improved. This embodiment of the disclosure does not limit the method of obtaining the original image. For example, the image captured by the camera can be directly used as the original image to be processed, or the image uploaded by the user (or the image selected from the image library) can be used as the original image to be processed.

[0026] Step S104: Input the original image into a pre-trained image enhancement model; wherein the image enhancement model includes a multi-scale feature fusion network. In some embodiments, the number of multi-scale feature fusion networks is one or more, and when the number of multi-scale feature fusion networks is multiple, the multiple multi-scale feature fusion networks are connected in series.

[0027] That is, the image enhancement model provided in this embodiment may include N cascaded multi-scale feature fusion networks, where N is a positive integer, and can be 1, 2, 4, 16, or other values, for example. It is understood that the smaller the value of N, the shorter the image processing time of the image enhancement model, and the larger the value of N, the better the image enhancement effect of the image enhancement model. In practical applications, the number of N can be set according to the needs, and is not limited here.

[0028] Step S106: Extract multi-scale features from the input image using a multi-scale feature fusion network to obtain initial feature maps of multiple scales; fuse these initial feature maps to obtain multiple intermediate feature maps; and fuse these intermediate feature maps to obtain the output feature map of the multi-scale feature fusion network.

[0029] In this embodiment, the input image of the multi-scale feature fusion network is obtained based on the original image. In some implementations, the input image of the multi-scale feature fusion network is the original image, that is, the original image is directly used as the input image; or in other implementations, the input image of the multi-scale feature fusion network is obtained by processing the original image through a network module located before the multi-scale feature fusion network, that is, the processed image of the original image is used as the input image; the embodiments of this disclosure do not limit the network module before the multi-scale feature fusion network. For example, the network module can be a preprocessing module composed of convolutional layers, which can perform preliminary feature extraction on the original image in advance; or the network module can be an image adjustment module, which can crop the original image according to a preset size or adjust it according to a preset resolution; or the network module is a multi-scale feature fusion network before the current multi-scale feature fusion network, which can perform multi-stage multi-scale feature fusion on the original image.

[0030] In some implementations, there are multiple multi-scale feature fusion networks. The input image of the first multi-scale feature fusion network is obtained based on the original image. For example, the input image of the first multi-scale feature fusion network is a feature map of the original image after convolution. The input images of subsequent multi-scale feature fusion networks are obtained based on the output feature map of the previous multi-scale feature fusion network. For example, the input image of subsequent multi-scale feature fusion networks can be directly the output feature map of the previous multi-scale feature fusion network, or it can be obtained by performing additional processing such as convolution on the output feature map of the previous multi-scale feature fusion network.

[0031] The scale proposed in this embodiment can be used to characterize the spatial resolution of the feature map. When performing multi-scale feature extraction on the input image, a downsampling method can be used. By downsampling the input image by different factors, initial feature maps at various scales can be obtained. It is understood that the initial feature maps at different scales emphasize different feature information. For example, small-scale downsampling is more focused on local image features, while large-scale downsampling is more focused on global image features. Through the above-mentioned multi-scale feature extraction method, image features can be extracted more comprehensively and fully.

[0032] Furthermore, after extracting initial feature maps at multiple scales, these initial feature maps are fused to obtain multiple intermediate feature maps. For example, the initial feature maps at multiple scales can be fused using different methods to obtain multiple intermediate feature maps; alternatively, different initial feature maps can be extracted from the initial feature maps at multiple scales each time and fused to obtain multiple intermediate feature maps. Through these methods, multiple intermediate feature maps carrying different image features can be obtained, which helps to further extract richer and more comprehensive information.

[0033] In some implementations, the spatial resolutions of the multiple intermediate feature maps obtained by fusing initial feature maps at multiple scales are different. In specific implementations, initial feature maps at multiple scales can be fused separately under different scale branches to obtain intermediate feature maps corresponding to each scale branch. Each scale branch corresponds to an intermediate feature map, and the spatial resolutions of different intermediate feature maps are different. The intermediate feature map corresponding to each scale branch can also be called the branch feature map output by that scale branch. In practical applications, the scale branches in the multi-scale feature fusion network correspond to the scales of the initial feature maps. For example, if the multi-scale feature fusion network extracts initial feature maps at three scales for the input image (initial feature map with the same spatial resolution as the input image, initial feature map with a spatial resolution half that of the input image, and initial feature map with a spatial resolution one-quarter that of the input image), then there are a total of three scale branches. The inputs to the three scale branches are the same, all being the aforementioned three scales of initial feature maps, but they are processed at different scales (spatial resolutions). For example, in a scale branch where the spatial resolution is the same as the input image's spatial resolution, the initial feature maps at the other two scales can be upsampled to the same scale as the input image's spatial resolution before fusion processing. That is, in each scale branch, the spatial resolution of the initial feature maps at multiple scales can be unified to the spatial resolution corresponding to that scale branch before processing. The processing method for initial feature maps at multiple scales is the same across different scale branches. Each scale branch fuses the initial feature maps at multiple scales according to a preset method to obtain an intermediate feature map at the corresponding scale.

[0034] After obtaining multiple intermediate feature maps, this embodiment can further fuse these intermediate feature maps to obtain the output feature map of the multi-scale feature fusion network. Since different intermediate feature maps can reflect different feature information, fusing different intermediate feature maps, such as fusing intermediate feature maps (i.e., branch feature maps) corresponding to different scale branches, allows the output feature map obtained based on the fusion result to further comprehensively and fully characterize image features while retaining the original feature information at each spatial resolution.

[0035] Step S108: Based on the output feature map of the multi-scale feature fusion network and the original image, an image quality enhancement image is obtained. For example, the output feature map of the multi-scale feature fusion network can be fused with the original image to obtain an image quality enhancement image.

[0036] In some implementations, multiple multi-scale feature fusion networks are used. The output feature map of the last multi-scale feature fusion network can be fused with the original image to obtain an enhanced image. For example, the output feature map of the last multi-scale feature fusion network can be convolved to make its dimension consistent with the dimension of the original image, and then fused with the original image point by point (Add processing) to obtain the enhanced image.

[0037] The multi-scale feature-based stepwise fusion method provided in this embodiment can fully extract and utilize image features, effectively improving the image quality of the original image.

[0038] In some implementations, while this disclosure extracts multi-scale features, it controls the scale, extracting only feature maps of appropriate scales. Specifically, when extracting multi-scale features from an input image to obtain initial feature maps of multiple scales, the input image is downsampled at various preset multiples to obtain initial feature maps of multiple scales; wherein the preset multiples are lower than a preset threshold. For example, the preset multiples include one, two, and four times, resulting in three initial feature maps of different scales. Correspondingly, the initial feature maps of multiple scales include: an initial feature map with the same spatial resolution as the input image, an initial feature map with a spatial resolution half that of the input image, and an initial feature map with a spatial resolution one-quarter that of the input image. By controlling downsampling in the above manner, compared to methods such as 16x downsampling in related technologies, this disclosure obtains multi-scale features through appropriate downsampling, while preserving the original high-order features and accurate spatial resolution, avoiding the loss of image details through multiple downsampling steps.

[0039] When fusing initial feature maps at multiple scales under different scale branches to obtain intermediate feature maps corresponding to each scale branch, steps one through two can be followed:

[0040] Step 1: Take each scale branch in the different scale branches as the target scale branch, and perform fusion processing on the initial feature maps of multiple scales based on the self-attention mechanism to obtain a multi-scale fused map.

[0041] Step 2: Obtain the intermediate state feature map corresponding to the target scale branch based on the multi-scale fusion map.

[0042] By using the above method, each scale branch is taken as the target scale branch, and self-attention is used to fuse the initial feature maps of multiple scales. Finally, the intermediate feature map corresponding to each scale branch can be obtained. In practical applications, different scale branches can process initial feature maps of multiple scales simultaneously, and the processing method is the same. That is, the network structure contained in different scale branches is the same. The difference between different scale branches is mainly reflected in the scale (spatial resolution). Therefore, the scale of the intermediate feature maps corresponding to different scale branches is different. Considering that traditional feature fusion methods such as concatenation or addition provide limited expressive power to the network, this embodiment of the disclosure uses a self-attention mechanism to fuse initial feature maps of multiple scales. It can dynamically select features of different scales (features of multiple resolutions) for fusion based on the information of the feature maps. Specifically, based on the self-attention mechanism, fusing initial feature maps of multiple scales can provide different weight values ​​to initial feature maps of different scales. These weight values ​​are related to the content of the input image, and different images correspond to different weight values. Therefore, the above method can process the input image in a targeted manner, dynamically combine feature maps of different scales based on the image content for fusion, so that the final multi-scale fused map more reliably reflects useful image features, and achieves the effect of dynamically combining variable receptive fields while retaining the original feature information at each spatial resolution.

[0043] In some specific embodiments, this disclosure provides implementation examples of fusing initial feature maps of multiple scales based on a self-attention mechanism. That is, step one above can be implemented with reference to steps A to D below:

[0044] Step A: Unify the scale of the initial feature maps of various scales to the scale corresponding to the target scale branch, and then add and fuse the initial feature maps point by point to obtain the initial fused map.

[0045] In some implementations, bilinear interpolation can be used to unify the scale of initial feature maps at multiple scales to the scale corresponding to the target scale branch. Taking the scale corresponding to the target scale branch as an example where the spatial resolution of the feature map is half that of the input image (i.e., the feature map scale corresponding to doubling the input image downsampling), assuming the initial feature maps at multiple scales are: an initial feature map with the same spatial resolution as the input image, an initial feature map with a spatial resolution half that of the input image, and an initial feature map with a spatial resolution one-quarter that of the input image, then the initial feature map with one spatial resolution is downsampled by two times, the initial feature map with one-half spatial resolution remains unchanged, and the initial feature map with one-quarter spatial resolution is upsampled by two times. This method unifies the scale of the initial feature maps at all three scales to the scale corresponding to the target scale branch. Both upsampling and downsampling can be implemented using bilinear interpolation to reduce computational load and improve image processing speed.

[0046] Step B involves compressing information based on the initial fusion graph to obtain a compressed information vector.

[0047] In some implementations, the initial fused graph is subjected to global average pooling (GAP), convolution, and ReLU activation sequentially to obtain an information compression vector.

[0048] Specifically, firstly, a statistical vector s of the channel dimension can be obtained through global average pooling. Then, a convolution and activation process is performed on the statistical vector to obtain an information compression vector z. The length of the information compression vector z is less than the length of the statistical vector s.

[0049] Step C involves obtaining multiple feature vectors carrying attention information based on the information compression vector; wherein the number of feature vectors carrying attention information is the same as the number of scales.

[0050] For example, given the three scales mentioned above, three feature vectors carrying attention information are obtained here. In some implementations, the information compression vector can be subjected to multiple convolutional processes to expand the channels, resulting in multiple expanded feature vectors; then, softmax activation is performed on each of the multiple expanded feature vectors to obtain multiple feature vectors carrying attention information. For example, the information compression vector z can be passed through three convolutional layers to expand the channels, resulting in three vectors with the same length as the statistical vector s mentioned above, namely v1, v2, and v3, which are then activated to obtain three new vectors carrying attention information.

[0051] Step D involves fusing multiple feature vectors carrying attention information to obtain a multi-scale fused image.

[0052] In some implementations, each feature vector carrying attention information can be multiplied by its corresponding initial feature map at each scale to obtain the multiplication result for each scale. The multiplication results for each scale are then added together to obtain a multi-scale fused image. Through this stepwise fusion method, the final multi-scale fused image can fully and effectively reflect the image features, facilitating better image enhancement results in subsequent steps.

[0053] In practical applications, steps A to D above can be performed using a selective feature fusion module. This disclosure provides a method such as... Figure 2 The schematic diagram of the selective feature fusion module shown can be used to set up a selective feature fusion module at each branch scale. Taking three scales as an example, the selective feature fusion steps can be implemented as follows: 1) to 6)

[0054] 1) The input to the selective feature fusion module comes from three feature maps at different scales (spatial resolutions). The feature maps whose scales are unified with the scale branch where the selective feature fusion module is located are L1, L2, and L3, respectively. They are first fused by element-wise summation to obtain L = L1 + L2 + L3. Here, L is the aforementioned initial fused map.

[0055] 2) By performing global average pooling (GAP) on L, we can obtain the statistical vector s of the channel dimension. Where, s = GAP(L).

[0056] 3) Perform convolution and activation on the statistical vector s to compress the information, resulting in vector z. Here, z = ReLU(Conv(s)) is the compressed information vector, and the length of z is less than the length of s.

[0057] 4) Expand the channels of vector z by passing it through three convolutional layers, resulting in three vectors v1, v2, and v3 with the same length as vector s. Where vi = conv i (z), i = 1, 2, 3. vi is the extended feature vector mentioned above.

[0058] 5) Apply Softmax activation to v1, v2, and v3 respectively to obtain three new vectors s1, s2, and s3 carrying attention information. Where si = Softmax(vi), i = 1, 2, 3.

[0059] 6) Multiply s1, s2, and s3, which carry attention information, with the three feature maps L1, L2, and L3 respectively, and then add them together to obtain the output feature map U of the selective feature fusion module. U refers to the aforementioned multi-scale fusion diagram.

[0060] Traditional attention mechanisms process features at only a single scale. However, the selective feature fusion module provided in this embodiment employs a self-attention mechanism to process feature maps at different scales, fusing them based on the attention mechanism to achieve targeted dynamic combination of multi-scale features according to image content. The above is merely illustrative and should not be considered limiting. In practical applications, the number of scales used may not be limited to three, and steps 1) to 6) above can be adaptively adjusted.

[0061] To extract more useful feature information and further improve image quality, in a specific implementation of step two (i.e., obtaining the intermediate feature map corresponding to the target scale branch based on the multi-scale fusion map), an attention mechanism can be used to process the multi-scale fusion map corresponding to the target scale branch to obtain the intermediate feature map corresponding to the target scale branch. That is, based on obtaining a multi-scale fusion map that fuses features of different resolutions, an attention mechanism is further used to extract feature information within the multi-scale fusion map. The attention mechanism can suppress features that are not particularly important (useful) to the task and assign them smaller weights, while simultaneously enhancing features that are useful to the task and assigning them larger weights. In this way, effective features in the image can be further extracted, which helps to further improve image quality.

[0062] For example, the method of processing the multi-scale fusion graph corresponding to the target scale branch based on the attention mechanism can be implemented by referring to the following steps a to d:

[0063] Step a: Deep feature extraction is performed on the multi-scale fusion map corresponding to the target scale branch to obtain a deep feature map.

[0064] In some implementations, the multi-scale fusion map corresponding to the target scale branch can be subjected to a first convolution, a ReLU activation, and a second convolution sequentially to obtain a deep feature map. Step a allows for the initial extraction of deep features from the multi-scale fusion map.

[0065] Step b involves processing the deep feature map using a spatial attention mechanism to obtain a spatial attention feature map. In some implementations, this can be achieved by referring to steps b1 to b3 below:

[0066] Step b1: Perform global average pooling (GAP) on the deep feature map along the channel dimension to obtain the first feature map; and perform global max pooling (GMP) on the deep feature map along the channel dimension to obtain the second feature map.

[0067] Step b2: Concatenate the first feature map and the second feature map to obtain a concatenated feature map, which has two channels.

[0068] Step b3 involves performing dimensionality compression and activation processing on the cascaded feature map to obtain a spatial attention feature map.

[0069] Step c involves processing the deep feature map using a channel attention mechanism to obtain the channel attention vector. In some implementations, this can be achieved by referring to steps c1 to c3 below:

[0070] Step c1: Perform global average pooling (GAP) on the deep feature map in the spatial dimension to obtain the first vector;

[0071] Step c2: Perform convolution and ReLU activation on the first vector to obtain the second vector; wherein the dimension of the second vector is smaller than the dimension of the first vector.

[0072] Step c3: Perform convolution and sigmoid activation on the second vector to obtain the channel attention vector; wherein the dimension of the channel attention vector is equal to the dimension of the first vector.

[0073] Step d involves fusing the deep feature map, spatial attention feature map, and channel attention vector to obtain the intermediate feature map corresponding to the target scale branch.

[0074] After obtaining the spatial attention feature map based on the spatial attention mechanism and the channel attention vector based on the channel attention mechanism, the intermediate state feature map corresponding to the target scale branch can be further obtained by combining the deep feature map. In some implementations, this can be achieved by referring to the following steps d1 to d3:

[0075] Step d1: Perform a dot product between the deep feature map and the spatial attention feature map to obtain the first dot product result;

[0076] Step d2: Perform a dot product between the deep feature map and the channel attention vector to obtain the second dot product result;

[0077] Step d3 involves fusing the first and second dot product results to obtain the intermediate feature map corresponding to the target scale branch. For example, the first and second dot product results can be concatenated to obtain a two-channel feature map; then, the two-channel feature map can be convolved to obtain a one-channel feature map; finally, the one-channel feature map can be added to the multi-scale fusion map corresponding to the target scale branch to obtain the intermediate feature map corresponding to the target scale branch.

[0078] In practical applications, attention modules can be used to execute steps a to d above. Each scale branch can be configured with an attention module, which is then connected in series after the selective feature fusion module. This disclosure provides a method such as... Figure 3 The diagram shown illustrates the principle of the attention module. The attention module can be implemented by processing the feature map M (i.e., the aforementioned multi-scale fusion map U) output by the selective feature fusion module, as shown in steps 1) to 6) below:

[0079] 1) Feature map M is convolved (in...) Figure 3 The process involves conv (represented by Conv), ReLU activation, and another convolutional process to obtain the feature map M', where M' = Conv(ReLU(Conv(M))). M' is the aforementioned deep feature map.

[0080] Afterwards, M' enters two branches (channel attention branch and spatial attention branch).

[0081] 2) In the spatial attention branch, GAP and GMP processing are performed on M' in the channel dimension, and the two resulting feature maps are concatenated (in... Figure 3 (Illustrated by C), resulting in a 2-channel feature map f. Here, f = Concat(GAP(M'), GMP(M')), f is the concatenated feature map described above. Then, feature map f is convolved once to compress its dimensionality, resulting in a 1-channel feature map, which is then activated by the Sigmoid activation function (in...). Figure 3 (Illustrated by S in the diagram), we can obtain the feature map f', where f' = Sigmoid(Conv(f)). The feature map f' is the aforementioned spatial attention feature map.

[0082] 3) In the channel attention branch, M' is subjected to a gap in spatial dimension to obtain vector d, where d = GAP(M'), and d is the first vector mentioned above. Then, vector d is processed by convolution and ReLU activation function to compress its dimension, resulting in vector z; that is, the dimension of vector z is smaller than the dimension of vector d, and z = ReLU(Conv(d)), where z is the second vector mentioned above. Then, vector z is further expanded in dimension by convolution and Sigmoid function to obtain vector d' with the same length as vector d, where d' = Sigmoid(Conv(z)), and d' is the aforementioned channel attention vector.

[0083] 4) Perform dot product and concatenate the spatial attention feature map f' and the channel attention vector d' with the feature map M' in 1) to obtain the two-channel feature map L = Concat(M'·f', M'·d').

[0084] 5) After L undergoes a convolutional layer, it is converted into a single-channel feature map. This feature map is then added to the aforementioned feature map M to obtain the output feature map O = M + Conv(L) of the attention module. O is the intermediate feature map mentioned above.

[0085] The above are merely illustrative examples and should not be considered as limitations.

[0086] After obtaining multiple intermediate feature maps using the above method, these intermediate feature maps can be fused to obtain the output feature map of the multi-scale feature fusion network. Specifically, steps 1 and 2 below can be used for implementation:

[0087] Step 1: Fuse multiple intermediate feature maps to obtain a fused feature map; for example, fuse intermediate feature maps corresponding to branches at different scales to obtain a fused feature map; the scale of the fused feature map is the same as the scale of the input image of the multi-scale feature fusion network; in some embodiments, the fusion method of fusing multiple intermediate feature maps is the same as the fusion method of fusing initial feature maps at multiple scales, such as the above-described method. Figure 2 The provided selective feature fusion module is used for implementation.

[0088] Step 2 involves point-by-point addition and fusion of the input image based on the fused feature map and the multi-scale feature fusion network to obtain the output feature map of the multi-scale feature fusion network. In practice, the fused feature map can first be convolved, and then the resulting feature map can be added point-by-point to the input image to obtain the output feature map of the multi-scale feature fusion network.

[0089] For ease of understanding, based on the foregoing, this disclosure provides an embodiment as follows: Figure 4 The diagram shown illustrates the structure of a multi-scale feature fusion network. Figure 4 The diagram simply illustrates three scale branches, corresponding to the initial feature maps at three scales obtained after downsampling the input image by one, two, and four times. These scales can be used to characterize the spatial resolution of the feature maps. The three scales of feature maps are: an initial feature map with the same spatial resolution as the input image; an initial feature map with a spatial resolution half that of the input image; and an initial feature map with a spatial resolution one-quarter that of the input image. Figure 4 As illustrated, each scale branch contains a selective feature fusion module and an attention module. Finally, the output feature maps of the attention module are fused again on the first scale branch using the selective feature fusion module to obtain a feature map with the same scale as the input image. Furthermore, Figure 4The diagram also illustrates how, after fusing the intermediate feature maps corresponding to branches at different scales to obtain a fused feature map, this fused feature map is convolutionally processed and then added point-by-point to the input image of the multi-scale feature fusion network to obtain the output feature map of the multi-scale feature fusion network. Furthermore, Figure 4 This is for illustrative purposes only and should not be considered a limitation.

[0090] For each multi-scale feature fusion network in the image enhancement model, the corresponding output feature map can be obtained through the above method. When there are multiple multi-scale feature fusion networks connected in series, multi-stage multi-scale feature fusion can be performed from beginning to end to gradually obtain the output feature map of the last multi-scale feature fusion network. Based on this, the step of obtaining the image enhancement image based on the output feature map of the multi-scale feature fusion network and the original image includes: fusing the output feature map of the last multi-scale feature fusion network with the original image to obtain the image enhancement image. In specific implementation, the output feature map of the last multi-scale feature fusion network can first be convolved to make its dimension the same as the dimension of the original image, and then fused with the original image point by point to obtain the image enhancement image.

[0091] Based on the foregoing, please refer to an embodiment provided in this disclosure, such as... Figure 5 The schematic diagram of the image enhancement model shown illustrates that the original image is processed through N multi-scale feature fusion networks. The feature fusion is performed step by step within each multi-scale feature fusion network, and multiple multi-scale feature fusion networks are fused sequentially, which can gradually improve the image quality and finally obtain a better image quality enhancement image.

[0092] To accelerate network operation and reduce the number of network parameters, the convolutions in the image enhancement model provided in this embodiment are 3x3 depthwise separable convolutions and / or 1x1 convolutions. For example, all convolutions may use 3x3 depthwise separable convolutions, or all convolutions may use 1x1 convolutions, or some convolutions may use 3x3 depthwise separable convolutions and some may use 1x1 convolutions. Furthermore, both downsampling and upsampling involved in the image enhancement model employ bilinear interpolation. Through these methods, the image enhancement model can be made lightweight, significantly reducing the number of network parameters, effectively reducing computational load, and significantly improving network operation speed.

[0093] Furthermore, this disclosure provides a method for training an image enhancement model. Specifically, the image enhancement model is trained according to the following steps (1) to (2):

[0094] Step (1) Obtain training sample pairs; wherein each training sample pair includes image quality enhancement samples and image quality degradation samples with consistent image content; and the number of training sample pairs is multiple.

[0095] In some implementations, image samples may be acquired first; then the image samples may be degraded according to specified dimensions to obtain image quality degraded samples; the specified dimensions may include multiple of sharpness, color, contrast, and noise; and the image samples may be used as image quality enhancement samples; or the image samples may be enhanced according to specified dimensions to obtain image quality enhancement samples.

[0096] This disclosure does not limit the method of acquiring image samples. For example, images can be directly captured via a camera, acquired via a network, or obtained from an existing image library or sample library. Subsequently, the image samples can be degraded in various dimensions, such as reducing image sharpness, color, or contrast, or adding noise to obtain image quality degraded samples. In practical applications, when the image sample quality is good, it can be directly used as an image quality enhancement sample; when the image sample quality is average, existing image optimization algorithms or image processing tools such as Photoshop can be used to enhance the image sample to obtain an image quality enhanced sample.

[0097] Step (2): Train a pre-built neural network model based on training sample pairs and a preset loss function, and use the trained neural network model as an image enhancement model.

[0098] For example, the loss function can be an L1 loss function. The neural network model training is considered complete when the loss function value converges to a preset threshold. The trained neural network model, by processing degraded image samples, can obtain image enhancement images that meet expectations (with minimal difference from the image enhancement samples). The image enhancement images obtained in the above manner can effectively enhance the image quality of the image to be processed in multiple dimensions, resulting in better image enhancement effects.

[0099] In summary, the image enhancement method provided in this disclosure can utilize an end-to-end image enhancement model to downsample the original image to an appropriate degree to extract multi-scale features. Through progressive fusion processing within and between multiple multi-scale feature fusion networks, a good image enhancement effect is achieved. Furthermore, by optimizing the network structure and parameters, the network can be made lightweight, effectively reducing computational load and improving image processing speed, achieving high real-time performance (30 FPS). In addition, the multi-dimensional simultaneous training method allows the model to enhance multiple image quality dimensions simultaneously, making it more convenient and faster.

[0100] Corresponding to the aforementioned image enhancement methods, this disclosure provides an image enhancement apparatus. Figure 6This is a schematic diagram of the structure of an image enhancement device provided in an embodiment of the present disclosure. The device can be implemented by software and / or hardware, and is generally integrated into an electronic device, such as... Figure 6 As shown, it includes:

[0101] Image acquisition module 602 is used to acquire the original image to be processed;

[0102] The model input module 604 is used to input the original image into a pre-trained image enhancement model; wherein, the image enhancement model includes a multi-scale feature fusion network;

[0103] The multi-scale fusion module 606 is used to extract multi-scale features from the input image through a multi-scale feature fusion network to obtain initial feature maps of multiple scales, fuse the initial feature maps of multiple scales to obtain multiple intermediate feature maps, and fuse the multiple intermediate feature maps to obtain the output feature map of the multi-scale feature fusion network; wherein, the input image is obtained based on the original image;

[0104] The enhanced image acquisition module 608 is used to obtain an enhanced image by combining the output feature map and the original image based on the multi-scale feature fusion network.

[0105] The multi-scale feature-based stepwise fusion method provided in this embodiment can fully extract and utilize image features, effectively improving image quality.

[0106] In some implementations, the multi-scale fusion module 606 is specifically used to: downsample the input image according to multiple preset multiples to obtain initial feature maps of multiple scales; wherein the multiples are lower than a preset threshold.

[0107] In some implementations, the multi-scale fusion module 606 is specifically used to: fuse the initial feature maps of the multiple scales under different scale branches to obtain the intermediate feature map corresponding to each scale branch; the spatial resolution of the different intermediate feature maps is different.

[0108] In some implementations, the multi-scale fusion module 606 is specifically used to: take each scale branch in different scale branches as a target scale branch, perform fusion processing on the initial feature maps of the multiple scales based on a self-attention mechanism to obtain a multi-scale fusion map; and obtain an intermediate state feature map corresponding to the target scale branch based on the multi-scale fusion map.

[0109] In some implementations, the multi-scale fusion module 606 is specifically used to: unify the scales of the initial feature maps of the multiple scales to the scale corresponding to the target scale branch, and perform point-by-point addition and fusion of the scale-unified initial feature maps to obtain an initial fusion map; perform information compression based on the initial fusion map to obtain an information compression vector; obtain multiple feature vectors carrying attention information based on the information compression vector; wherein the number of feature vectors carrying attention information is the same as the number of scales of the multiple scales; and perform fusion processing based on the multiple feature vectors carrying attention information to obtain a multi-scale fusion map.

[0110] In some implementations, the multi-scale fusion module 606 is specifically used to: use bilinear interpolation to unify the scales of the initial feature maps of the multiple scales to the scale corresponding to the target scale branch.

[0111] In some implementations, the multi-scale fusion module 606 is specifically used to: perform global average pooling, convolution, and ReLU activation processing on the initial fusion graph sequentially to obtain an information compression vector.

[0112] In some implementations, the multi-scale fusion module 606 is specifically used to: perform multiple convolution processes on the information compression vector to expand the channels and obtain multiple expanded feature vectors; and perform Softmax activation processing on the multiple expanded feature vectors to obtain multiple feature vectors carrying attention information.

[0113] In some implementations, the multi-scale fusion module 606 is specifically used to: perform dot product processing on each feature vector carrying attention information and the initial feature map of its corresponding scale to obtain the dot product result corresponding to each scale; and add the dot product results corresponding to the multiple scales to obtain a multi-scale fusion map.

[0114] In some implementations, the multi-scale fusion module 606 is specifically used to: process the multi-scale fusion map corresponding to the target scale branch based on an attention mechanism to obtain the intermediate state feature map corresponding to the target scale branch.

[0115] In some implementations, the multi-scale fusion module 606 is specifically used to: extract deep features from the multi-scale fusion map corresponding to the target scale branch to obtain a deep feature map; process the deep feature map based on a spatial attention mechanism to obtain a spatial attention feature map; process the deep feature map based on a channel attention mechanism to obtain a channel attention vector; and perform fusion processing based on the deep feature map, the spatial attention feature map, and the channel attention vector to obtain an intermediate state feature map corresponding to the target scale branch.

[0116] In some implementations, the multi-scale fusion module 606 is specifically used to: perform a first convolution process, a ReLU activation process, and a second convolution process on the multi-scale fusion map corresponding to the target scale branch in sequence to obtain a deep feature map.

[0117] In some implementations, the multi-scale fusion module 606 is specifically used to: perform global average pooling on the deep feature map in the channel dimension to obtain a first feature map; and perform global max pooling on the deep feature map in the channel dimension to obtain a second feature map; concatenate the first feature map and the second feature map to obtain a concatenated feature map; and perform dimensionality compression and activation processing on the concatenated feature map to obtain a spatial attention feature map.

[0118] In some implementations, the multi-scale fusion module 606 is specifically used to: perform global average pooling on the deep feature map in the spatial dimension to obtain a first vector; perform convolution and ReLU activation on the first vector to obtain a second vector; wherein the dimension of the second vector is smaller than the dimension of the first vector; perform convolution and Sigmoid activation on the second vector to obtain a channel attention vector; wherein the dimension of the channel attention vector is equal to the dimension of the first vector.

[0119] In some implementations, the multi-scale fusion module 606 is specifically used to: perform a dot product between the deep feature map and the spatial attention feature map to obtain a first dot product result; perform a dot product between the deep feature map and the channel attention vector to obtain a second dot product result; and perform fusion processing based on the first dot product result and the second dot product result to obtain the intermediate state feature map corresponding to the target scale branch.

[0120] In some implementations, the multi-scale fusion module 606 is specifically used to: concatenate the first dot product result with the second dot product result to obtain a two-channel feature map; perform convolution processing on the two-channel feature map to obtain a one-channel feature map; and add the one-channel feature map to the multi-scale fusion map corresponding to the target scale branch to obtain the intermediate state feature map corresponding to the target scale branch.

[0121] In some implementations, the multi-scale fusion module 606 is specifically used to: fuse the intermediate feature maps corresponding to the different scale branches to obtain a fused feature map; the scale of the fused feature map is the same as the scale of the input image of the multi-scale feature fusion network; and perform point-by-point addition and fusion based on the fused feature map and the input image of the multi-scale feature fusion network to obtain the output feature map of the multi-scale feature fusion network.

[0122] In some implementations, the multi-scale fusion module 606 is specifically used to fuse the multiple intermediate feature maps in the same way as the initial feature maps of the multiple scales are fused.

[0123] In some implementations, the multi-scale fusion module 606 is specifically used to: include the initial feature maps of the multiple scales as follows: an initial feature map with the same spatial resolution as the input image, an initial feature map with a spatial resolution that is half the spatial resolution of the input image, and an initial feature map with a spatial resolution that is one-quarter the spatial resolution of the input image.

[0124] In some implementations, the multi-scale fusion module 606 is specifically used to: convert the convolutions in the image enhancement model into 3*3 depth-separable convolutions and / or 1*1 convolutions.

[0125] In some embodiments, there are multiple multi-scale feature fusion networks, and the multiple multi-scale feature fusion networks are connected in series; wherein, the input image of the first multi-scale feature fusion network is obtained based on the original image, and the input image of the subsequent multi-scale feature fusion networks is obtained based on the output feature map of the previous multi-scale feature fusion network.

[0126] In some implementations, the enhanced image acquisition module 608 is specifically used to: fuse the output feature map of the last multi-scale feature fusion network with the original image to obtain an enhanced image.

[0127] In some embodiments, the apparatus further includes a training module, specifically used for: training the image enhancement model in the following manner: obtaining training sample pairs; wherein each training sample pair includes image quality enhancement samples and image quality degradation samples with consistent image content; and the number of training sample pairs is multiple; training a pre-constructed neural network model based on the training sample pairs and a preset loss function, and using the trained neural network model as the image enhancement model.

[0128] In some implementations, the training module is specifically used to: acquire image samples; degrade the image samples according to a specified dimension to obtain image quality degraded samples; the specified dimension includes multiple aspects such as sharpness, color, contrast, and noise; use the image samples as image quality enhancement samples; or, enhance the image samples according to the specified dimension to obtain image quality enhancement samples.

[0129] The image enhancement apparatus provided in this disclosure can execute the image enhancement method provided in any embodiment of this disclosure, and has the corresponding functional modules and beneficial effects for executing the method.

[0130] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working process of the above-described device embodiments can be referred to the corresponding process in the method embodiments, and will not be repeated here.

[0131] This disclosure provides an electronic device, which includes: a processor; a memory for storing processor-executable instructions; and a processor for reading executable instructions from the memory and executing the instructions to implement any of the above-described image enhancement methods.

[0132] Figure 7 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this disclosure. Figure 7 As shown, the electronic device 700 includes one or more processors 701 and memory 702.

[0133] The processor 701 may be a central processing unit (CPU) or other form of processing unit with data processing capabilities and / or instruction execution capabilities, and may control other components in the electronic device 700 to perform desired functions.

[0134] The memory 702 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and / or non-volatile memory. The volatile memory may include, for example, random access memory (RAM) and / or cache memory. The non-volatile memory may include, for example, read-only memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium, and the processor 701 may execute the program instructions to implement the image enhancement method of the embodiments of this disclosure described above and / or other desired functions. Various contents such as input signals, signal components, and noise components may also be stored in the computer-readable storage medium.

[0135] In one example, the electronic device 700 may also include an input device 703 and an output device 704, which are interconnected via a bus system and / or other forms of connection mechanism (not shown).

[0136] In addition, the input device 703 may also include, for example, a keyboard, a mouse, etc.

[0137] The output device 704 can output various information to the outside, including determined distance information, direction information, etc. The output device 704 may include, for example, a display, a speaker, a printer, and a communication network and its connected remote output devices, etc.

[0138] Of course, for the sake of simplicity, Figure 7Only some of the components of the electronic device 700 relevant to this disclosure are shown, omitting components such as buses, input / output interfaces, etc. In addition, the electronic device 700 may include any other suitable components depending on the specific application.

[0139] In addition to the methods and devices described above, embodiments of this disclosure may also be computer program products, including computer program instructions that, when executed by a processor, cause the processor to perform the image enhancement method provided in the embodiments of this disclosure.

[0140] The computer program product can be written in any combination of one or more programming languages ​​to perform the operations of the embodiments of this disclosure. The programming languages ​​include object-oriented programming languages ​​such as Java and C++, as well as conventional procedural programming languages ​​such as C or similar languages. The program code can be executed entirely on a user's computing device, partially on a user's computing device, as a standalone software package, partially on a user's computing device and partially on a remote computing device, or entirely on a remote computing device or server.

[0141] Furthermore, embodiments of this disclosure may also be computer-readable storage media storing computer program instructions that, when executed by a processor, cause the processor to perform the image enhancement method provided in embodiments of this disclosure.

[0142] The computer-readable storage medium may be any combination of one or more readable media. A readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may, for example, include, but is not limited to, electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatuses, or devices, or any combination thereof. More specific examples of readable storage media (a non-exhaustive list) include: electrical connections having one or more wires, portable disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fibers, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof.

[0143] This disclosure also provides a computer program product, including a computer program / instructions that, when executed by a processor, implement the image enhancement method of this disclosure.

[0144] It should be noted that, in this document, relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0145] The above description is merely a specific embodiment of this disclosure, enabling those skilled in the art to understand or implement it. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of this disclosure. Therefore, this disclosure is not to be limited to the embodiments described herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An image enhancement method, characterized in that, include: Obtain the original image to be processed; The original image is input into a pre-trained image enhancement model; wherein the image enhancement model includes a multi-scale feature fusion network; The multi-scale feature fusion network extracts features from the input image at multiple scales to obtain initial feature maps at multiple scales. These initial feature maps are then fused to obtain multiple intermediate feature maps. Finally, these intermediate feature maps are fused to obtain the output feature map of the multi-scale feature fusion network. The input image is obtained based on the original image. Based on the output feature map of the multi-scale feature fusion network and the original image, an image with enhanced quality is obtained; The step of fusing the initial feature maps based on the multiple scales to obtain multiple intermediate feature maps includes: Under each scale branch, the spatial resolution of the initial feature maps at the various scales is unified to the spatial resolution of the corresponding scale branch; The initial feature maps of the various scales are fused under different scale branches to obtain intermediate feature maps corresponding to each scale branch; the spatial resolution of the different intermediate feature maps is different.

2. The method according to claim 1, characterized in that, The steps of performing multi-scale feature extraction on the input image to obtain initial feature maps at multiple scales include: The input image is downsampled by various preset multiples to obtain initial feature maps of various scales; wherein the multiples are lower than a preset threshold.

3. The method according to claim 1, characterized in that, The step of fusing the initial feature maps at different scale branches to obtain the intermediate feature map corresponding to each scale branch includes: Each scale branch in the different scale branches is taken as the target scale branch, and the initial feature maps of the multiple scales are fused based on the self-attention mechanism to obtain a multi-scale fused map. The intermediate state feature map corresponding to the target scale branch is obtained based on the multi-scale fusion map.

4. The method according to claim 3, characterized in that, The steps for fusing the initial feature maps at multiple scales based on a self-attention mechanism to obtain a multi-scale fused map include: The scales of the initial feature maps at various scales are all unified to the scale corresponding to the target scale branch, and the initial feature maps after scale unification are added and fused point by point to obtain the initial fused map; Information compression is performed based on the initial fusion graph to obtain an information compression vector; Based on the information compression vector, multiple feature vectors carrying attention information are obtained; wherein, the number of feature vectors carrying attention information is the same as the number of scales of the multiple scales; The multi-scale fusion map is obtained by fusing the multiple feature vectors carrying attention information.

5. The method according to claim 4, characterized in that, The step of unifying the scales of the initial feature maps at various scales to the scale corresponding to the target scale branch includes: Bilinear interpolation is used to unify the scale of the initial feature maps at various scales to the scale corresponding to the target scale branch.

6. The method according to claim 4, characterized in that, The step of compressing information based on the initial fusion graph to obtain an information compression vector includes: The initial fused graph is subjected to global average pooling, convolution, and ReLU activation processes in sequence to obtain an information compression vector.

7. The method according to claim 4, characterized in that, The steps of obtaining multiple feature vectors carrying attention information based on the information compression vector include: The compressed information vector is subjected to multiple convolution processes to expand the channels, resulting in multiple expanded feature vectors; Softmax activation is applied to each of the extended feature vectors to obtain multiple feature vectors carrying attention information.

8. The method according to claim 4, characterized in that, The step of fusing the multiple feature vectors carrying attention information to obtain a multi-scale fused map includes: Each feature vector carrying attention information is multiplied by its corresponding initial feature map at its scale to obtain the multiplication result for each scale. The dot product results corresponding to each of the multiple scales are added together to obtain a multi-scale fused graph.

9. The method according to claim 3, characterized in that, The step of obtaining the intermediate feature map corresponding to the target scale branch based on the multi-scale fusion map includes: The multi-scale fusion map corresponding to the target scale branch is processed based on the attention mechanism to obtain the intermediate state feature map corresponding to the target scale branch.

10. The method according to claim 9, characterized in that, The step of processing the multi-scale fusion map corresponding to the target scale branch based on the attention mechanism to obtain the intermediate state feature map corresponding to the target scale branch includes: Deep feature extraction is performed on the multi-scale fusion map corresponding to the target scale branch to obtain a deep feature map; The deep feature map is processed based on the spatial attention mechanism to obtain a spatial attention feature map; The deep feature map is processed based on the channel attention mechanism to obtain the channel attention vector; The intermediate state feature map corresponding to the target scale branch is obtained by fusing the deep feature map, the spatial attention feature map, and the channel attention vector.

11. The method according to claim 10, characterized in that, The step of performing deep feature extraction on the multi-scale fusion map corresponding to the target scale branch to obtain a deep feature map includes: The multi-scale fusion map corresponding to the target scale branch is subjected to first convolution processing, ReLU activation processing and second convolution processing in sequence to obtain a deep feature map.

12. The method according to claim 10, characterized in that, The steps of processing the deep feature map based on the spatial attention mechanism to obtain the spatial attention feature map include: The deep feature map is subjected to global average pooling in the channel dimension to obtain a first feature map; and the deep feature map is subjected to global max pooling in the channel dimension to obtain a second feature map. The first feature map and the second feature map are concatenated to obtain a concatenated feature map. The cascaded feature map is subjected to dimensionality compression and activation processing to obtain a spatial attention feature map.

13. The method according to claim 10, characterized in that, The steps of processing the deep feature map based on the channel attention mechanism to obtain the channel attention vector include: Perform global average pooling on the deep feature map in the spatial dimension to obtain the first vector; The first vector is subjected to convolution and ReLU activation to obtain a second vector; wherein the dimension of the second vector is smaller than the dimension of the first vector. The second vector is subjected to convolution and sigmoid activation to obtain a channel attention vector; wherein the dimension of the channel attention vector is equal to the dimension of the first vector.

14. The method according to claim 10, characterized in that, The step of fusing the deep feature map, the spatial attention feature map, and the channel attention vector to obtain the intermediate state feature map corresponding to the target scale branch includes: The deep feature map is multiplied by the spatial attention feature map to obtain the first dot product result. The deep feature map is multiplied by the channel attention vector to obtain the second multiplication result; The intermediate state feature map corresponding to the target scale branch is obtained by fusing the first dot product result and the second dot product result.

15. The method according to claim 14, characterized in that, The step of fusing the first dot product result and the second dot product result to obtain the intermediate state feature map corresponding to the target scale branch includes: Concatenate the first dot product result with the second dot product result to obtain a two-channel feature map; The two-channel feature map is convolved to obtain a one-channel feature map; The one-channel feature map is added to the multi-scale fusion map corresponding to the target scale branch to obtain the intermediate state feature map corresponding to the target scale branch.

16. The method according to claim 1, characterized in that, The step of fusing the multiple intermediate feature maps to obtain the output feature map of the multi-scale feature fusion network includes: The multiple intermediate feature maps are fused to obtain a fused feature map; the scale of the fused feature map is the same as the scale of the input image of the multi-scale feature fusion network. The fused feature map and the input image of the multi-scale feature fusion network are fused point by point to obtain the output feature map of the multi-scale feature fusion network.

17. The method according to claim 16, characterized in that, The fusion method for fusing the multiple intermediate feature maps is the same as the fusion method for fusing the initial feature maps of the multiple scales.

18. The method according to claim 1, characterized in that, The initial feature maps at various scales include: an initial feature map with the same spatial resolution as the input image, an initial feature map with a spatial resolution that is half that of the input image, and an initial feature map with a spatial resolution that is one-quarter that of the input image.

19. The method according to claim 1, characterized in that, The convolutions in the image enhancement model are 3*3 depthwise separable convolutions and / or 1*1 convolutions.

20. The method according to any one of claims 1 to 19, characterized in that, The number of multi-scale feature fusion networks is multiple, and the multiple multi-scale feature fusion networks are connected in series; wherein, the input image of the first multi-scale feature fusion network is obtained based on the original image, and the input image of the subsequent multi-scale feature fusion networks is obtained based on the output feature map of the previous multi-scale feature fusion network.

21. The method according to claim 20, characterized in that, The step of obtaining an image enhancement image based on the output feature map of the multi-scale feature fusion network and the original image includes: The image is then fused with the output feature map of the last multi-scale feature fusion network to obtain an enhanced image.

22. The method according to any one of claims 1 to 19, characterized in that, The image enhancement model was trained in the following manner: Obtain training sample pairs; wherein each training sample pair includes image quality enhancement samples and image quality degradation samples with consistent image content; and the number of training sample pairs is multiple; The pre-built neural network model is trained based on the training sample pairs and the preset loss function, and the trained neural network model is used as an image enhancement model.

23. The method according to claim 22, characterized in that, The steps to obtain training sample pairs include: Obtain image samples; The image samples are degraded according to specified dimensions to obtain image quality degraded samples; the specified dimensions include multiple aspects such as sharpness, color, contrast, and noise. The image sample is used as an image quality enhancement sample; or, the image sample is enhanced according to the specified dimension to obtain an image quality enhancement sample.

24. An image enhancement device, characterized in that, include: The image acquisition module is used to acquire the original image to be processed; The model input module is used to input the original image into a pre-trained image enhancement model; wherein the image enhancement model includes a multi-scale feature fusion network; A multi-scale fusion module is used to extract multi-scale features from the input image through the multi-scale feature fusion network to obtain initial feature maps of multiple scales, fuse the initial feature maps of multiple scales to obtain multiple intermediate feature maps, and fuse the multiple intermediate feature maps to obtain the output feature map of the multi-scale feature fusion network; wherein, the input image is obtained based on the original image; An enhanced image acquisition module is used to obtain an enhanced image based on the output feature map of the multi-scale feature fusion network and the original image; The step of fusing the initial feature maps based on the multiple scales to obtain multiple intermediate feature maps includes: Under each scale branch, the spatial resolution of the initial feature maps at the various scales is unified to the spatial resolution of the corresponding scale branch; The initial feature maps of the various scales are fused under different scale branches to obtain intermediate feature maps corresponding to each scale branch; the spatial resolution of the different intermediate feature maps is different.

25. An electronic device, characterized in that, The electronic device includes: processor; Memory used to store the processor's executable instructions; The processor is configured to read the executable instructions from the memory and execute the instructions to implement the image enhancement method according to any one of claims 1-23.

26. A computer-readable storage medium, characterized in that, The storage medium stores a computer program for performing the image enhancement method according to any one of claims 1-23.