Image compression method and image decompression method adaptive to target distance

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By using a target distance-adaptive image compression method, which utilizes depth estimation and an image compression network to adaptively allocate the bitstream, the problem of low image compression rate in UAV inspection scenarios is solved, achieving more efficient image compression and decompression effects.

CN117880524BActive Publication Date: 2026-06-12HANGZHOU ARCVIDEO TECHNOLOGY CO LTD

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: HANGZHOU ARCVIDEO TECHNOLOGY CO LTD
Filing Date: 2023-12-29
Publication Date: 2026-06-12

Smart Images

Figure CN117880524B_ABST

Patent Text Reader

Abstract

The application discloses a target distance adaptive image compression method and an image decompression method, wherein the image compression method comprises a training process and an inference process; the training process further comprises depth information extraction, data preprocessing and model training; the depth information extraction takes an image containing a close-range target as input, uses a depth estimation network to extract a depth value corresponding to each pixel from the image, and outputs a single-channel depth map containing the depth value, the resolution of the depth map being the same as that of the input image; the data preprocessing randomly divides images used for training into smaller image blocks, and divides a depth map obtained through the depth information extraction process into small blocks corresponding to the images; the model training performs training on an image compression network, takes the cut image blocks as input, processes the image blocks through a neural network, and outputs corresponding reconstructed image blocks; and the inference process performs image compression.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of image compression technology, specifically relating to an image compression method and an image decompression method that are adaptive to target distance. Background Technology

[0002] In recent years, the development of image compression methods based on deep learning has provided new solutions for the field of image compression. In some scenarios, such as drone power line inspection, it is necessary to transmit a large number of drone-captured images via wireless signals for subsequent data storage and analysis of equipment such as power towers and power lines.

[0003] Traditional compression algorithms have low compression ratios, which result in long transmission times and high transmission costs when transmitting compressed images, and require a large amount of storage space when storing compressed images.

[0004] While existing deep learning-based image compression methods offer improved compression ratios compared to traditional methods, they are designed for general image compression scenarios and therefore not optimized for many practical applications. For example, in scenarios involving drone inspections or photographing specific objects, existing deep learning image compression methods consume a significant amount of storage space on storing distant background environments that are irrelevant to the nearby target. Summary of the Invention

[0005] In view of the above-mentioned problems, the present invention provides an image compression method and an image decompression method that are adaptive to target distance.

[0006] To solve the above-mentioned technical problems, the present invention adopts the following technical solution:

[0007] This invention provides a target distance adaptive image compression method, including a training process and an inference process. The training process further includes depth information extraction, data preprocessing, and model training.

[0008] S101, the depth information extraction takes an image containing near targets as input, uses a depth estimation network to extract the depth value corresponding to each pixel from the image, and outputs a single-channel depth map with the same resolution as the input image containing depth values.

[0009] S102, the data preprocessing randomly divides the image used for training into smaller image blocks, and also divides the depth map obtained in the depth information extraction process into small blocks corresponding to the image;

[0010] S103, the model training involves training an image compression network, using pre-cut image blocks as input, which are then processed by a neural network to output corresponding reconstructed image blocks; the similarity between the input image block and the reconstructed image block is calculated, and the corresponding depth map block is used as the weight value to obtain the corresponding rate-distortion loss; the training process optimizes this loss as the target to obtain the trained network weights.

[0011] S104, the inference process performs image compression, using an image to be compressed as the input of the image encoder and the super-prior encoder-decoder, outputting the corresponding feature tensor and the probability estimation result for entropy coding, and then inputting the feature tensor and the probability estimation result into the corresponding entropy encoder for encoding, to obtain the bitstream for storage and transmission.

[0012] In one possible implementation, S101 specifically includes: for a training set image x, let the dimension be [h x ,w x [,3], corresponding to the height, width, and number of channels of x, respectively, are input into this depth estimation network, and the output is a depth tensor d corresponding to the training set images. d is a three-dimensional tensor with dimensions [h x ,w x [1], where the value in d is a floating-point number, and each value represents the depth of each pixel in the training image x estimated by the depth estimation network;

[0013] After performing the above depth information extraction on each training set image, the depth information extraction stage ends.

[0014] In one possible implementation, S102 specifically includes: cutting images x and d in the training set into several blocks of the same size, the center point of the block being specified by random sampling, ensuring that the boundary of the block is always within the range of image x during sampling, and the number N of blocks selected for each image being consistent with the preset number of training epochs max_epoch.

[0015] For a block x of image x i The coordinates of its center point on the x-axis are (a i ,b i Similarly, x also corresponds to block d on the depth map. i The coordinates of its center point on d are also (a i ,b i That is, for a training set image x, after data preprocessing, N images are obtained. <x i ,d i >Image-depth data pairs;

[0016] After performing the above preprocessing on each training set image, the data preprocessing stage is complete.

[0017] In one possible implementation, in S103, during the image compression network training phase, the image-depth data pairs obtained in the data preprocessing phase are input into an image compression network pre-trained using a common dataset; during the forward propagation phase, image patch x... i The input image encoder obtains its corresponding feature y i , then y i Input the super-prior encoder to obtain the super-prior features z i ; Super-prior feature z i First, the z-axis is input into the quantization module. i Rounding to the nearest integer Then input it into the super-prior decoder to obtain the result with y. i Two features of the same size mean i and scale i ;y i and means i The input is fed into the quantization module to obtain the quantized features.

[0018]

[0019] Where Round indicates rounding; subsequently... The image is input into the image decoder to obtain the reconstructed image patches.

[0020] Rate-distortion loss, or loss function L, includes distortion loss L. d Sum rate loss L r Distortion loss L d Calculated using the mean squared error (MSE) on pixels or the multi-scale structural consistency loss (MS-SSIM), specifically:

[0021]

[0022] or

[0023] Rate loss L r To constrain the encoding length, this loss uses scale. i and means i Come to and The bitrate is estimated, and the final loss function L is calculated as follows:

[0024] L=λL d +L r

[0025] Where λ is a parameter used to adjust the weights between the two losses, determined by the depth block d. i Decision, and with x i and d i Same shape.

[0026] In one possible implementation, λ is determined as follows: λ selects a threshold t based on the range of the depth tensor d, and sets two different λ1 and λ2, where λ1 > λ2. + At this point, the j-th value of λ is... j The decision-making method is as follows:

[0027]

[0028] That is, the value λ at the j-th position in λ. j By d i The determination is made by whether the j-th value in d is less than the threshold t. ij When λ is less than t, j The value is λ1, and the value is λ2 otherwise.

[0029] In one possible implementation, λ is determined as follows: two distinct λ3 and λ4 are set, where λ3 > λ4, and d is normalized to [0,1]. The normalized tensor is then calculated as follows:

[0030]

[0031] Therefore, λ is determined as follows:

[0032]

[0033] That is, the closer the distance, the closer the value of λ is to λ3, and vice versa. Since the larger λ is, the more the loss function L tends to be distorted, the network will tend to allocate more bitstream to the shallower parts after training, thereby realizing an image compression network that adapts to the target distance.

[0034] In one possible implementation, S104 specifically includes:

[0035] In the image compression stage, the trained image compression network takes an image X to be compressed as input. The image X is encoded into feature Y by the image encoder, and feature Y is encoded into Z by the hyper-prior encoder. Z is input to the quantization module, which outputs a feature matrix consisting of all integers. Will The input is fed into the entropy encoder to obtain the first bitstream Z for storage and transmission. s ;

[0036] Simultaneously, it is also input into the super-prior decoder, which outputs the mean and standard deviation (scale) of the same shape as Y, and calculates Y.int =Round(Y-means), and set Y int The cumulative distribution corresponding to the standard deviation is input into the entropy encoder to obtain the second segment of the bitstream Y used for storage and transmission. s .

[0037] In another aspect, the present invention provides an image decompression method for decompressing the compressed bitstream obtained as described above, including S201, image decompression, wherein in the image decompression process, a bitstream output from a series of image compression processes is used as input, and the corresponding image features are restored using an entropy decoder and a super-prior decoder, and then the image features are converted into a decompressed image using an image decoder.

[0038] In one possible implementation, it further includes S202, training of the detail generation network and S203, detail generation, wherein the training process of the detail generation network in S202 is based on the output of the image compression network. As input, the image is processed by a neural network and then output as a corresponding image patch. Calculate the image patch x input to the image compression network i Output of the detail generation network Similarity and adversarial loss, and corresponding depth block d i As weight values, the corresponding reconstruction-adversarial loss is obtained. The training process optimizes the reconstruction-adversarial loss to obtain the trained network weights.

[0039] S203, in the detail generation process, uses the decompressed image as input and utilizes the detail generation network to generate the final image.

[0040] In one possible implementation, step S201 specifically includes: first, processing the bitstream Z... s The input is fed into the entropy decoder to reconstruct the features. Then The input is fed into the hyperprior decoder, which outputs the mean and standard deviation (scale) of the same shape as Y. Using the cumulative distribution corresponding to the standard deviation, the bitstream Y is... s Decode back Y int and through Obtain features Will The image is input into an image decoder to obtain the decompressed image.

[0041] In one possible implementation, S202 specifically includes: during the training phase of the detail generation network, compressing the output of the trained image compression network. With the corresponding input x i and depth d i As input, The input is fed into the detail generation network, and the output is a generated image. Then input x i With generated image The inputs are fed into the detail discrimination network, and the outputs are two corresponding values D between 0 and 1. real and D fake ;

[0042] Adversarial loss function L of detail discriminative networks disc The calculation method is as follows:

[0043] L disc = -log(1–D fake )-log(D real )

[0044] The loss of a detail-generating network consists of two parts: the first part is the reconstruction loss L. rec Calculated using MSE With x i The similarity between them is similar to the distortion loss:

[0045]

[0046] The first part is used to ensure that the generated image maintains the same content as the real image; the second part is the adversarial loss L of the detail generation network. gen Adversarial loss L of detail discrimination network disc Correspondingly, an adversarial relationship is formed, and the calculation method is as follows:

[0047] L gen =-log(D fake )

[0048] The overall loss function L of the detail generation network (etail The calculation method is as follows:

[0049] L (etail =L rec +βL gen

[0050] The value of β is a positive integer.

[0051] The present invention has the following beneficial effects:

[0052] (1) The bitstream obtained after the image encoding stage is shorter, occupies less space, and takes less time to transmit compared with existing methods.

[0053] (2) In the decompressed image obtained after the image decoding stage, the details of the target object are well preserved, while the background details are less.

[0054] (3) In the generated image obtained by the detail generation network, the details of the target object are well preserved, the background details are rich, and the subjective quality of the human eye is high. Attached Figure Description

[0055] Figure 1 This is a flowchart illustrating the steps of the target distance adaptive image compression method according to an embodiment of the present invention.

[0056] Figure 2 This is a flowchart of the forward propagation of the image compression network during the training phase in the target distance adaptive image compression method of this invention.

[0057] Figure 3 This is a flowchart of the image compression process during the inference stage in the target distance adaptive image compression method according to an embodiment of the present invention;

[0058] Figure 4 This is a flowchart illustrating the steps of the image decompression method according to an embodiment of the present invention. Detailed Implementation

[0059] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0060] See Figure 1 The diagram shows a flowchart of a target distance adaptive image compression method according to an embodiment of the present invention, including a training process and an inference process. The training process further includes depth information extraction, data preprocessing, and model training, and further includes:

[0061] S101, Depth information extraction takes an image containing near targets as input, uses a depth estimation network to extract the depth values corresponding to each pixel from the image, and outputs a single-channel depth map with the same resolution as the input image containing depth values.

[0062] S102, data preprocessing randomly divides the image used for training into smaller image blocks, and also divides the depth map obtained in the depth information extraction process into small blocks corresponding to the image;

[0063] S103, Model Training: The image compression network is trained using pre-cut image patches as input. After processing by the neural network, the corresponding reconstructed image patches are output. The similarity between the input image patch and the reconstructed image patch is calculated, and the corresponding depth patch is used as the weight value to obtain the corresponding rate-distortion loss. The training process optimizes the network weights by targeting this loss.

[0064] S104, the inference process performs image compression, using an image to be compressed as the input of the image encoder and the super-prior encoder-decoder, outputting the corresponding feature tensor and the probability estimation result for entropy coding, and then inputting the feature tensor and the probability estimation result into the corresponding entropy encoder for encoding, to obtain the bitstream for storage and transmission.

[0065] Another embodiment of the present invention provides a target distance adaptive image compression method, S101 specifically includes: for a training set image x, let the dimension be [h x ,w x [3], corresponding to the height, width, and number of channels of x, respectively, are input into the depth estimation network, and the output is the depth tensor d corresponding to the training set image. d is a three-dimensional tensor with dimensions [h]. x ,w x [1], where the value in d is a floating-point number, and each value represents the depth of each pixel in the training image x estimated by the depth estimation network;

[0066] After performing the above depth information extraction on each training set image, the depth information extraction stage ends.

[0067] In practical applications, the depth estimation network can use existing structures, as long as it can take an image as input and output its depth map. The resulting depth map is then used in the training of the image compression network. The depth estimation network is only used during the training process.

[0068] Another embodiment of the present invention provides an image compression method with adaptive target distance. S102 specifically includes: dividing the images x and d in the training set into several blocks of the same size. The block size is generally chosen to be 256×256, and can be 512×512. The height and width of x are not less than the height and width of the block. The center point of the block is specified by random sampling. During sampling, it is ensured that the boundary of the block is always within the range of image x. The number N of blocks selected for each image is consistent with the preset number of training epochs max_epoch.

[0069] For a block x of image x i The coordinates of its center point on the x-axis are (a i ,b i Similarly, x also corresponds to block d on the depth map. i The coordinates of its center point on d are also (a i ,b i That is, for a training set image x, after data preprocessing, N images are obtained. <x i ,d i >Image-depth data pairs;

[0070] After performing the above preprocessing on each training set image, the data preprocessing stage is complete.

[0071] In another embodiment of the present invention, a target distance adaptive image compression method, in step S103, during the image compression network training phase, the image-depth data obtained in the data preprocessing phase is input into an image compression network pre-trained using a general dataset. The image compression network includes an image encoder, a quantization module, a super-prior encoder, a super-prior decoder, an image decoder, and an entropy encoder / decoder. The image encoder includes multiple serially connected convolutional blocks and one convolutional layer, wherein each convolutional block sequentially includes a convolutional layer and an activation function. The super-prior encoder includes multiple serially connected convolutional blocks and one convolutional layer. The super-prior decoder includes multiple serially connected deconvolutional blocks and one deconvolutional layer, wherein each convolutional block sequentially includes a convolutional layer and an activation function, and each deconvolutional block sequentially includes a deconvolutional layer and an activation function. The image decoder includes multiple serially connected deconvolutional blocks and one deconvolutional layer. The quantization module includes a rounding operation. The entropy encoder and entropy decoder are modules that do not participate in training; they are only used during the inference phase to encode the features of the input image into the corresponding bitstream and to decode the corresponding features from the bitstream.

[0072] Based on the above image compression network, such as Figure 2 As shown, during the forward propagation phase, image patch x is... i The input image encoder obtains its corresponding feature y i , then y i Input the super-prior encoder to obtain the super-prior features z i ; Super-prior feature z i First, the z-axis is input into the quantization module. i Rounding to the nearest integer Then input it into the super-prior decoder to obtain the result with y. i Two features of the same size mean i and scale i ;y i and means i The input is fed into the quantization module to obtain the quantized features.

[0073]

[0074] Where Round indicates rounding; subsequently... The image is input into the image decoder to obtain the reconstructed image patches.

[0075] Rate-distortion loss (i.e., loss function L) includes distortion loss L_d and rate loss L_d. r The distortion loss L_d is calculated using the mean square error (MSE) on pixels or the multi-scale structural consistency loss (MS-SSIM). Specifically:

[0076]

[0077] or

[0078] Rate loss L r To constrain the encoding length, this loss uses scale. i and means i Come to and The bitrate is estimated, and the final loss function L is calculated as follows:

[0079] L=λL d +L r

[0080] Where λ is a parameter used to adjust the weights between the two losses, determined by the depth block d. i Decision, and with x i and d i Same shape.

[0081] In a specific application example, λ is determined as follows: λ selects a threshold t based on the range of the depth tensor d, and sets two different λ1 and λ2, where λ1 > λ2. In this case, the j-th value of λ is... j The decision-making method is as follows:

[0082]

[0083] That is, the value λ at the j-th position in λ. j By d i The determination is made by whether the j-th value in d is less than the threshold t. ij When λ is less than t, j The value is λ1, and the value is λ2 otherwise.

[0084] In another specific application example, λ is determined as follows: two distinct λ3 and λ4 are set, where λ3 > λ4, and d is normalized to [0,1]. The normalized tensor is then calculated as follows:

[0085]

[0086] Therefore, λ is determined as follows:

[0087]

[0088] That is, the closer the distance, the closer the value of λ is to λ3, and vice versa.

[0089] Since the larger λ is, the more the loss function L tends to be distorted, the network will tend to allocate more bitstream to the shallower parts after training, thus achieving an image compression network that adapts to the target distance.

[0090] Another embodiment of the present invention provides a target distance adaptive image compression method, S104 of which specifically includes:

[0091] In the image compression stage, the trained image compression network takes an image X to be compressed as input. The image X is encoded into feature Y by the image encoder, and feature Y is encoded into Z by the hyper-prior encoder. Z is input to the quantization module, which outputs a feature matrix consisting of all integers. Will The input is fed into the entropy encoder to obtain the first bitstream Z for storage and transmission. s ;

[0092] Simultaneously, it is also input into the super-prior decoder, which outputs the mean and standard deviation (scale) of the same shape as Y, and calculates Y. int =Round(Y-means), and set Y int The cumulative distribution corresponding to the standard deviation is input into the entropy encoder to obtain the second segment of the bitstream Y used for storage and transmission. s .

[0093] By employing the target distance-adaptive image compression method described above, the compression rate of the background environment is improved while maintaining the image quality of the target at close range. This results in a higher compression rate across the entire image. The bitstream obtained after the image encoding stage is shorter, occupies less space, and has a shorter transmission time compared to existing methods. It can compress the original image at a higher compression rate while preserving the subjective visual quality of the image, the detailed information of the main target, and the subsequent image applications (such as target detection, anomaly detection, etc.).

[0094] Corresponding to the target distance adaptive image compression method set above, another embodiment of the present invention provides an image decompression method for decompressing the compressed bitstream obtained by any of the target distance adaptive image compression methods above, including: S201, image decompression, in the image decompression process, taking a series of bitstreams output by the image compression process as input, using an entropy decoder and a super-prior decoder to restore the corresponding image features, and then using an image decoder to convert the image features into a decompressed image.

[0095] Another embodiment of the present invention provides an image decompression method, such as... Figure 4As shown, it further includes S202, training of the detail generation network, and S203, detail generation, wherein:

[0096] In S202, the training process of the detail generation network is based on the output of the image compression network. As input, the image is processed by a neural network and then output as a corresponding image patch. Calculate the image patch x input to the image compression network i Output of the detail generation network Similarity and adversarial loss, and corresponding depth block d i As weight values, the corresponding reconstruction-adversarial loss is obtained. The training process optimizes the reconstruction-adversarial loss to obtain the trained network weights. The detail generation network is a network that takes a decompressed image as input and outputs an image that contains information similar to the original image but with richer background details. It includes multiple convolutional blocks and a convolutional layer.

[0097] S203, in the detail generation process, uses the decompressed image as input and utilizes the detail generation network to generate the final image.

[0098] Another embodiment of the present invention provides an image decompression method, S201 of which specifically includes: firstly, decompressing the bitstream Z... s The input is fed into the entropy decoder to reconstruct the features. Then The input is fed into the hyperprior decoder, which outputs the mean and standard deviation (scale) of the same shape as Y. Using the cumulative distribution corresponding to the standard deviation, the bitstream Y is... s Decode back Y int and through Obtain features Will The image is input into an image decoder to obtain the decompressed image.

[0099] In another embodiment of the present invention, an image decompression method, S202 specifically includes: during the training phase of the detail generation network, the output of the trained image compression network is processed. With the corresponding input x i and depth d i As input, The input is fed into the detail generation network, and the output is a generated image. Then input x i With generated image The inputs are fed into the detail discrimination network, and the outputs are two corresponding values D between 0 and 1. real and D fakeIn other words, the detail discrimination network is a network that takes a real image or the output of the detail generation network as input. The output is a value between 0 and 1, which is used to indicate whether the discrimination network considers the input to be a real image (leaning to 1) or a generated image (leaning to 0).

[0100] Adversarial loss function L of detail discriminative networks disc The calculation method is as follows:

[0101] L disc = -log(1–D fake )-log(D real )

[0102] The loss of a detail-generating network consists of two parts: the first part is the reconstruction loss L. rec Calculated using MSE With x i The similarity between them is similar to the distortion loss:

[0103]

[0104] The first part is used to ensure that the generated image maintains the same content as the real image; the second part is the adversarial loss L of the detail generation network. gen Adversarial loss L of detail discrimination network (isc Correspondingly, an adversarial relationship is formed, and the calculation method is as follows:

[0105] L gen =-log(D fake )

[0106] The overall loss function L of the detail generation network (etail The calculation method is as follows:

[0107] L (etail =L rec +βL gen

[0108] The value of β is a positive integer. The larger the value of β, the more detail the generated image has. The smaller the value of β, the less detail the generated image has, and the more similar the generated image is to the input image.

[0109] Using the image decompression method described above, the decompressed image obtained after the image decoding stage retains good details of the target object but has less background detail. Furthermore, the generated image obtained after passing through the detail generation network retains good details of the target object and has rich background detail, resulting in a high subjective quality for the human eye.

[0110] It should be understood that the exemplary embodiments described herein are illustrative and not restrictive. Although one or more embodiments of the invention have been described in conjunction with the accompanying drawings, those skilled in the art will understand that various changes in form and detail may be made without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A target distance adaptive image compression method, characterized in that, It includes a training process and an inference process, wherein the training process further includes deep information extraction, data preprocessing, and model training. S101, the depth information extraction takes an image containing near targets as input, uses a depth estimation network to extract the depth value corresponding to each pixel from the image, and outputs a single-channel depth map with the same resolution as the input image containing depth values. S102, the data preprocessing randomly divides the image used for training into smaller image blocks, and also divides the depth map obtained in the depth information extraction process into small blocks corresponding to the image; S103, the model training involves training an image compression network, using pre-cut image blocks as input, and after processing by a neural network, outputting corresponding reconstructed image blocks; calculating the similarity between the input image block and the reconstructed image block, and using the corresponding depth map block as the weight value to obtain the corresponding rate-distortion loss; The training process optimizes this loss to obtain the trained network weights; S104, the inference process performs image compression, using an image to be compressed as the input of the image encoder and the super-prior encoder-decoder, outputting the corresponding feature tensor and the probability estimation result for entropy coding, and then inputting the feature tensor and the probability estimation result into the corresponding entropy encoder for encoding, to obtain the bitstream for storage and transmission.

2. The target distance adaptive image compression method as described in claim 1, characterized in that, S101 specifically includes: for a training set image x, let the dimension be [h] x ,w x [,3], corresponding to the height, width, and number of channels of x, respectively, are input into this depth estimation network, and the output is a depth tensor d corresponding to the training set images. d is a three-dimensional tensor with dimensions [h x ,w x [1], where the value in d is a floating-point number, and each value represents the depth of each pixel in the training image x estimated by the depth estimation network; After performing the above depth information extraction on each training set image, the depth information extraction stage ends.

3. The target distance adaptive image compression method as described in claim 2, characterized in that, S102 specifically includes: dividing the images x and d in the training set into several blocks of the same size, with the center point of the block specified by random sampling, ensuring that the boundary of the block is always within the range of image x during sampling, and the number N of blocks selected for each image being consistent with the preset number of training epochs max_epoch. For a block x of image x i The coordinates of its center point on the x-axis are (a i ,b i Similarly, x also corresponds to block d on the depth map. i The coordinates of its center point on d are also (a i ,b i That is, for a training set image x, after data preprocessing, N images are obtained. <x i ,d i >Image-depth data pairs; After performing the above preprocessing on each training set image, the data preprocessing stage is complete.

4. The target distance adaptive image compression method as described in claim 3, characterized in that, In step S103, during the image compression network training phase, the image-depth data pairs obtained in the data preprocessing phase are input into an image compression network pre-trained using a common dataset. During the forward propagation phase, image patch x i The input image encoder obtains its corresponding feature y i , then y i Input the super-prior encoder to obtain the super-prior features z i ; Super-prior feature z i First, the z-axis is input into the quantization module. i Rounding to the nearest integer Then input it into the super-prior decoder to obtain the result with y. i Two features of the same size mean i and scale i ;y i and means i The input is fed into the quantization module to obtain the quantized features. Where Round indicates rounding; subsequently... The image is input into the image decoder to obtain the reconstructed image patches. Rate-distortion loss, or loss function L, includes distortion loss L. d Sum rate loss L r Distortion loss L d Calculated using the mean squared error (MSE) on pixels or the multi-scale structural consistency loss (MS-SSIM), specifically: or Rate loss L r To constrain the encoding length, this loss uses scale. i and means i Come to and The bitrate is estimated, and the final loss function L is calculated as follows: L=λL d +L r Where λ is a parameter used to adjust the weights between the two losses, determined by the depth block d. i Decision, and with x i and d i Same shape.

5. The target distance adaptive image compression method as described in claim 4, characterized in that, The method for determining λ is as follows: λ selects a threshold t based on the range of the depth tensor d, and sets two different λ1 and λ2, where λ1 > λ2. At this time, the j-th value of λ is determined. j The decision-making method is as follows: That is, the value λ at the j-th position in λ. j By d i The determination is made by whether the j-th value in d is less than the threshold t. ij When λ is less than t, j The value is λ1, and the value is λ2 otherwise.

6. The target distance adaptive image compression method as described in claim 4, characterized in that, The method for determining λ is as follows: Define two distinct λ3 and λ4, where λ3 > λ4, and normalize d to [0,1]. The normalized tensor is then calculated as follows: Therefore, λ is determined as follows: That is, the closer the distance, the closer the value of λ is to λ3, and vice versa. Since the larger λ is, the more the loss function L tends to be distorted, the network will tend to allocate more bitstream to the shallower parts after training, thereby realizing an image compression network that adapts to the target distance.

7. The target distance adaptive image compression method as described in claim 1, characterized in that, S104 specifically includes: In the image compression stage, the trained image compression network takes an image X to be compressed as input. The image X is encoded into feature Y by the image encoder, and feature Y is encoded into Z by the hyper-prior encoder. Z is input to the quantization module, which outputs a feature matrix consisting of all integers. Will The input is fed into the entropy encoder to obtain the first bitstream Z for storage and transmission. s ; Simultaneously, it is also input into the super-prior decoder, which outputs the mean and standard deviation (scale) of the same shape as Y, and calculates Y. int =Round(Y-means), and set Y int The cumulative distribution corresponding to the standard deviation is input into the entropy encoder to obtain the second segment of the bitstream Y used for storage and transmission. s .

8. An image decompression method, characterized in that, The method for decompressing the compressed bitstream obtained as described in any one of claims 1 to 7 includes S201, image decompression, wherein during the image decompression process, a bitstream output from a series of image compression processes is used as input, and the corresponding image features are restored using an entropy decoder and a super-prior decoder, and then the image features are converted into a decompressed image using an image decoder.

9. The image decompression method as described in claim 8, characterized in that, Further including S202, training of the detail generation network and S203, detail generation, Furthermore, the training process of the detail generation network in S202 uses the output of the image compression network. As input, the image is processed by a neural network and then output as a corresponding image patch. Calculate the image patch x input to the image compression network i Output of the detail generation network Similarity and adversarial loss, and corresponding depth block d i As weight values, the corresponding reconstruction-adversarial loss is obtained. The training process optimizes the reconstruction-adversarial loss to obtain the trained network weights. S203, in the detail generation process, uses the decompressed image as input and utilizes the detail generation network to generate the final image.

10. The image decompression method as described in any one of claims 8 or 9, characterized in that, S201 specifically includes: first, processing the bitstream Z... s The input is fed into the entropy decoder to reconstruct the features. Then The input is fed into the hyperprior decoder, which outputs the mean and standard deviation (scale) of the same shape as Y. Using the cumulative distribution corresponding to the standard deviation, the bitstream Y is... s Decode back Y int and through Obtain features Will The image is input into an image decoder to obtain the decompressed image.

11. The image decompression method as described in claim 10, characterized in that, S202 specifically includes: during the training phase of the detail generation network, compressing the output of the trained image network. With the corresponding input x i and depth d i As input, The input is fed into the detail generation network, and the output is a generated image. Then input x i With generated image The inputs are fed into the detail discrimination network, and the outputs are two corresponding values D between 0 and 1. real and D fake ; Adversarial loss function L of detail discriminative networks disc The calculation method is as follows: L disc =-log(1–D fake )-log(D real ) The loss of a detail-generating network consists of two parts: the first part is the reconstruction loss L. rec Calculated using MSE With x i The similarity between them is similar to the distortion loss: The first part is used to ensure that the generated image maintains the same content as the real image; the second part is the adversarial loss L of the detail generation network. gen Adversarial loss L of detail discrimination network disc Correspondingly, an adversarial relationship is formed, and the calculation method is as follows: L gen -log(D fake ) The overall loss function L of the detail generation network detail The calculation method is as follows: L detail =L rec +βL gen The value of β is a positive integer.

Citation Information

Patent Citations

CN113313777A
CN115439565A

Patent Information

Abstract

Description

Patent Citations

CN113313777A

CN115439565A