Image compression method based on discrete Gaussian mixture hyper-prior and Mask and medium

A technology of Gaussian mixing and image compression, which is applied in image communication, digital video signal modification, electrical components, etc., can solve problems such as limited capability and limited compression model performance, and achieve the goal of reducing space size, improving compression efficiency, and improving feature extraction effect of ability

Pending Publication Date: 2022-05-13
TONGJI UNIV
6 Cites 0 Cited by

AI-Extracted Technical Summary

Problems solved by technology

However, the entropy model of the current deep learning-based image compression algorithm has limited ability to accurately fit the compressed representation, which limit...
View more

Abstract

The invention relates to an image compression method based on discrete Gaussian mixture hyper-prior and Mask and a medium, and the method comprises the following steps: carrying out the preprocessing of a to-be-compressed image, and obtaining a preprocessed image; extracting a feature map of the preprocessed image, generating a Mask value based on spatial feature information of the preprocessed image, and performing point product processing on the feature map and the Mask value to obtain hidden variable representation; adopting a plurality of Gaussian distributions to extract distribution conditions represented by hidden variables, and generating discrete Gaussian mixture hyper-priori values; quantizing the hidden variable representation, and performing entropy coding compression on the quantized hidden variable representation based on the hyper-priori value to obtain coding information of the compressed image; and decoding based on the coding information of the compressed image to obtain a reconstructed image. Compared with the prior art, the method has the advantages of good compression quality, high image compression efficiency and the like.

Application Domain

Digital video signal modification

Technology Topic

Computer visionImage compression +2

Image

  • Image compression method based on discrete Gaussian mixture hyper-prior and Mask and medium
  • Image compression method based on discrete Gaussian mixture hyper-prior and Mask and medium
  • Image compression method based on discrete Gaussian mixture hyper-prior and Mask and medium

Examples

  • Experimental program(1)

Example Embodiment

[0040] The present invention is described in detail below in conjunction with the accompanying drawings and specific embodiments. The present embodiment is implemented on the premise of the technical solution of the present invention, giving a detailed embodiment and a specific operating process, but the scope of protection of the present invention is not limited to the following embodiments.
[0041] The present embodiment provides an image compression method based on discrete Gaussian hybrid superatural and Mask, comprising the following steps: preprocessing the compressed image to obtain a preprocessed image; extracting the feature map of the preprocessed image, while based on the spatial feature information of the preprocessed image, generating a Mask value, the feature map and the Mask value are treated by point multiplication, to obtain a hidden variable representation; using a plurality of Gaussian distributions to extract the distribution of the hidden variable representation, generating a discrete Gaussian mixed super prior value Quantification of the hidden variable representation, entropy encoding compression of the quantified hidden variable representation based on the super-a priori value, to obtain the encoding information of the compressed image; decoding the encoding information of the compressed image is obtained to obtain a reconstructed image. The generation of discrete Gaussian mixed hyperaquial values specifically comprises the following steps: capture redundant information on the spatial domain and channel domain represented by a plurality of super prior branches and quantify the information; fuse the quantified information with context-assisted information; calculate the Gaussian distribution parameters of the fused information; assign corresponding weights to each single Gaussian distribution, and add the weights to obtain the discrete Gaussian mixed super prior values.
[0042] as Figure 1 As shown in the present embodiment, the specific implementation steps of the above method include:
[0043] 1. Pre-process the compressed image and obtain the pre-processed image.
[0044] In order to be able to calculate the image smoothly, the width W and height H of the image must be an integer multiple of 64, and if the input image does not meet the above conditions, it needs to be cropped. In the present embodiment, the preprocessing is specifically to crop the image to 256×256 size images.
[0045] 2. Compress the preprocessed image with the image compression model constructed.
[0046] In the present embodiment, such as Figure 1 As shown, the image compression model includes an encoder (Main Encoder), a Mask Module, a Context Module, a super-a priori module, an entropy parameter calculation module (Entropy Parameter) and a decoder (Main Decoder), wherein the encoder is used to extract the feature map of the preprocessed image, and the Mask module is used to calculate the Mask value of the preprocessed image, Feature graph and Mask values are used to obtain hidden variable representation after point multiplication processing, Context module is used to generate context-assisted information, super prior module includes hyper prior encoding (HyperPriorEncoder) and hyper prior decoder (HyperPrior Decoder) for capturing redundant information on the spatial domain and channel domain, entropy parameter calculation module is used for parameter calculation of Gaussian distribution, Finally, the distribution of hidden variable representation is expressed by generating multiple Gaussian distributions, and the decoder and encoder have a network structure similar to mirror image, and the decoder can effectively restore the compressed feature values step by step for quantifying the hidden variable representation and its distribution after quantification compression, and realize the reconstruction of the image.
[0047] The features extracted by the encoder of the above encoder and the hyper-a priori module need to be quantized and compressed into bytecode, and transmitted to the decoder end for decompression, which is implemented through module Q, AE and AD, where Q represents the quantizer, the role is to quantify the encoder extraction features, AE represents arithmetic encoding, the role is to compress the features after quantization, and generate bytecode; AD represents arithmetic decoding, and the role is to restore the bytecode to features.
[0048] The specific structure of each part of the image compression model is shown in Table 1, where "Conv" and "SubpelConv" represent convolutional layers and subpixel convolutional layers with output channel number and kernel size; "ResBlock" and "Attention Module" represent residual blocks and attention modules, and the output channel number is 2↑/↓ indicates the up/down sampling of step size 2, the default stride length is 1; "LeakyReLU" and "Sigmoid" indicate the activation function
[0049] Table 1
[0050]
[0051] The training process for an image compression model is described as follows:
[0052] In order for the training step to proceed smoothly, the original training set image is cropped to an image of 256×256 sizes before the training step is carried out and entered into the model as training set data. The present embodiment selects part of the image in the training set of MS COCO2014 as the training dataset of the model; the validation set of MS COCO 2014 is selected as the validation dataset of the model; the Kodak24 andLIC Mobile validation sets are selected as the test data sets of the model.
[0053] In order to control the bitrate of the model compressed image, the hyperparameters in the loss function used in the training process are adjusted accordingly, and the loss function is as follows:
[0054]
[0055] thereinto Represents the bitrate after quantization compression of features extracted by the encoder; Represents the bitrate after quantification and compression of features extracted by the encoder of the hyper-a priori module; λ represents the hyperparameter, controlling the trade-off between bitrate and distortion; Represents the distortion value between the reconstructed image and the distorted image. The correspondence λ to the number of channels in the network structure is shown in Table 2, where the line corresponding to MSE represents the value corresponding to λ when using MSE as the distortion value to calculate the loss function; the row corresponding to MS-SSIM represents the value λ corresponding when using MS-SSIM as the distortion value to calculate the loss function. That is, the MSE is λ MSE , MS-SSIM is λ MS-SSIM 。
[0056] Table 2
[0057]
[0058] The larger the λ, the higher the bitrate value of the compressed image and the lower the distortion value of the image. The model uses the Adam optimizer as the trained optimizer, and the number of images per batch of input models is 16. During the first 60 steps of training, the learning rate is set to 10 -4 , and when the loss value is stable, the learning rate decays to 10 -5 。
[0059] Specific training procedures include:
[0060] 1) Enter the image into several residual blocks and attention modules for downsampling and extraction of depth features, and quantify the resulting feature map, which satisfies the following formula:
[0061]
[0062]
[0063] where x is the original input image; g a Represents the computational conversion process of the encoder; Indicates the parameters contained in the encoder; Q represents the quantization operation of the quantizer; y represents the characteristics of the encoder extracting output; Represents y after quantification.
[0064] 2) At the same time, the image is input to the Mask module, and the feature weights based on the spatial domain content are adaptively assigned to generate the Mask value. The Mask value is then dotted with the encoder-generated feature map to generate a hidden variable representation. This action can be represented by the following formula:
[0065] y=y· Sigmoid(M(x))
[0066] where y is the feature extracted by the encoder, M is the Mass module, and Sigmoid is the activation function.
[0067] 3) Implicit variables are represented and fed into multiple hyper-a priori models to generate auxiliary information for capturing redundant information on the spatial domain and channel domain, and quantify the auxiliary information, which can be expressed as:
[0068]
[0069]
[0070] where y is the feature extracted by the encoder; h a Represents the computational conversion process of the encoder of a hyper-prior module; Indicates the parameters contained in the encoder of the hyper-prior module; Q represents the quantization operation of the quantizer; z represents the characteristics of the output extracted by the encoder of the hyper-prior module; Represents z after quantization.
[0071] 4) Fuse the quantified auxiliary information with the contextual auxiliary information generated by the Content module:
[0072]
[0073] where g cm Represents the computational conversion process of a Content module; θ cm Represents the parameters contained in the Context module; Represents an image element that has been decoded; φ i Represents the output of a Context module.
[0074] 5) Enter the fused information into the entropy parameter calculation module for parameter calculation of Gaussian distribution:
[0075]
[0076] thereinto Represents the mean of the ith element of the kth hyper-a priori branch; Represents the variance of the ith element of the kth hyperalyptic branch; Represents the conversion process for the entropy parameter of the kth hyper-prior branch; φ i Represents the fused information; ξ (k) Represents the output of the supercodec of the kth hyper-a priori branch; Represents the entropy parameter calculation parameter of the kth hyper-a priori branch.
[0077] 6) Assign corresponding weights to each single Gaussian distribution, and add the weights to calculate the hidden variable representation distribution of the Gaussian mixture:
[0078]
[0079] thereinto indicates the distribution, Represents the weight of the Gaussian distribution of the ith element of the kth hyper-a priori branch; Represents the mean of the ith element of the kth hyper-a priori branch; Represents the variance of the ith element of the kth hyperalyptic branch; Represents in distribution of means between, Represents quantified information.
[0080] 7) Implicit variables are represented by input to the quantizer for quantization. Model backpropagation is optimized by gradient descent, requiring that the computational process be derivable, but quantification is a non-derivative process. Adding uniform noise to the centrification process can realize the backpropagation calculation process of quantization.
[0081] 8) Fusion of eigenvalues after quantification
[0082] Entropy encoding is performed to further compress, and the commonly used entropy encoding method is arithmetic encoding, which is a lossless compression encoding.
[0083] 9) The decoder and the encoder have a network structure similar to the mirror image, for; quantify the compressed features, the decoder can effectively restore the compressed feature values step by step, to achieve image reconstruction, the formula is as follows:
[0084]
[0085] thereinto to reconstruct the image; g s Represents the computational conversion process of a super-a priori module decoder; Represents the parameters contained in the decoder of a hyper-prior module; Represents the feature extracted by the quantized encoder.
[0086] 10) Reconstruct the image and the original image for perceptual quality calculation, you can get the efficiency of the model to compress the image and the ability to reconstruct the image.
[0087] To verify the performance of the above method, the following experiments are designed.
[0088] 1) Test on the Kodak24 dataset. According to the method of the present invention, the original image is input into the model, the decoder output of the reconstructed image with the original image for the calculation of the image quality matrix to obtain experimental results. Experimental results of bpp vs. PSNR images such as Figure 2 as shown.
[0089] 2) Test on the CLICS Mobile validation dataset. According to the method of the present invention, the original image is input into the model, the decoder output of the reconstructed image with the original image for the calculation of the image quality matrix to obtain experimental results. Experimental results of bpp vs. PSNR images such as Figure 3 as shown.
[0090] If the above method is implemented in the form of a software functional unit and sold or used as a stand-alone product, it may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present invention is essentially or a portion of the prior art contribution or a portion of the technical solution may be embodied in the form of a software product, the computer software product is stored in a storage medium, comprising a number of instructions for such that a computer device (may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the method described in each embodiment of the present invention. The above-mentioned storage media include: U disk, removable hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), disk or optical disk and other media that can store program code.
[0091] The above describes in detail a preferred specific embodiment of the present invention. It should be understood that those of ordinary skill in the art may make numerous modifications and changes according to the concept of the present invention without creative labor. Accordingly, where those skilled in the art of the present invention conceived on the basis of the prior art through logical analysis, reasoning or limited experiments can obtain a technical solution, should be within the scope of protection determined by the claims.

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.

Similar technology patents

Gas compressor

InactiveCN102996404Aimprove compression efficiency
Owner:SHENZHEN LK SCI&TECH

Metal compression device

PendingCN108381977Aimprove compression efficiencyMechanism is easy to operate
Owner:ZHEJIANG XIAYUAN INFORMATION TECH CO LTD

Vortex fluid channel gas compressor

Owner:ENTROPY ZERO TECH LOGIC ENG GRP CO LTD

Compression method based on HEVC

Owner:CHONGQING INST OF GREEN & INTELLIGENT TECH CHINESE ACADEMY OF SCI

Classification and recommendation of technical efficacy words

  • improve compression efficiency
  • Improve perceived quality

Cyclone compressor possessing lubricating system

ActiveCN101338754Agood lubricationimprove compression efficiency
Owner:DALIAN SANYO COMPRESSOR

Image processing method and device

ActiveCN107087201Aimprove compression efficiencyHigh Image Restoration Quality
Owner:XIAN WANXIANG ELECTRONICS TECH CO LTD

Image quality evaluation method for image quality dataset adopting high dynamic range

InactiveCN104346809AImprove perceived qualityreduce complexity
Owner:SHANGHAI JIAO TONG UNIV

Door lock assembly and vehicle

PendingCN114658298AImprove the sound quality of the door closing soundImprove perceived quality
Owner:GREAT WALL MOTOR CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products