An arbitrary resolution depth image hiding method

By decoupling the secret image into a global visual basis and detail latent variables, and combining implicit resolution coding and noise-resistant decoders, the problem of secret image hiding and recovery under resolution mismatch is solved, achieving high-quality image reconstruction and resolution recovery.

CN122199337APending Publication Date: 2026-06-12NANJING UNIV OF INFORMATION SCI & TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
NANJING UNIV OF INFORMATION SCI & TECH
Filing Date
2026-05-13
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing depth image hiding methods suffer from irreversible loss of high-frequency detail information in the secret image when the resolution of the secret image and the carrier image do not match. This results in blurred and distorted recovery results, and the receiver cannot correctly obtain the original resolution of the secret image.

Method used

The secret image is decoupled into a global visual basis and detail latent variables. The resolution information is encoded into a resolution feature map through an implicit resolution encoding strategy. A reversible hidden network is used to generate a secret image. The secret image is then reversed by combining a voting-based noise-resistant decoder and a detail-guided implicit reconstructor to achieve blind recovery of the secret image.

🎯Benefits of technology

It improves the hiding adaptability and recovery quality of secret images at arbitrary resolutions, reduces the loss of detail caused by resampling, and achieves high-fidelity reconstruction of secret images and reliable recovery of resolution information.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122199337A_ABST
    Figure CN122199337A_ABST
Patent Text Reader

Abstract

The application discloses a method for hiding an image with an arbitrary resolution, which comprises the following steps: decoupling a secret image into a global visual basis and a detail latent variable which are aligned with the resolution of a carrier image; encoding the resolution information of the secret image into a resolution feature map through an implicit resolution coding strategy; inputting the carrier image, the global visual basis, the detail latent variable and the resolution feature map into a reversible hiding network to generate a stego image; inputting the stego image into the reversible hiding network for reverse recovery to obtain a recovered global visual basis, a recovered detail latent variable and a recovered resolution feature map; decoding the recovered resolution feature map through a voting-based noise-resistant decoder to obtain the resolution information of the secret image; and inputting the recovered global visual basis and the recovered detail latent variable into a detail-guided implicit reconstructor to reconstruct the secret image at a target resolution corresponding to the resolution information. The application improves the reconstruction quality and recovery fidelity of the image.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of image processing and information hiding technology, specifically to a method for hiding depth images at arbitrary resolution. Background Technology

[0002] Deep image hiding is a technique that embeds a secret image into a carrier image using a deep neural network, and the receiving end recovers the secret image from the encrypted image. Compared with traditional cryptographic techniques, deep image hiding not only protects the security of the secret information itself, but also improves the concealment of the secret information during transmission. Therefore, it has important application value in fields such as privacy protection, secure communication, and digital rights protection.

[0003] Existing depth image hiding methods typically require the secret image and the carrier image to have the same resolution, and model the hiding and restoration process as a discrete pixel mapping problem at a fixed resolution. In practical applications, when the secret image and the carrier image have different resolutions, the sending end usually needs to resample the secret image before embedding it into the carrier image. This processing method causes irreversible loss of high-frequency detail information in the secret image, resulting in problems such as blurriness, distortion, and missing details in the restoration result.

[0004] Meanwhile, the receiving end is often unable to correctly obtain the original resolution of the secret image without relying on additional metadata transmission, which easily leads to problems such as incorrect restoration size, geometric scale distortion, and the inability to achieve blind restoration. In addition, existing methods mostly rely on fixed-resolution decoding networks or simple interpolation operations in the restoration stage, making it difficult to achieve high-fidelity secret image reconstruction at arbitrary target resolutions. Summary of the Invention

[0005] Purpose of the invention: The purpose of this invention is to provide a method for hiding depth images at arbitrary resolutions, solving the problems of how to achieve high-quality hiding of secret images, implicit transmission of resolution information, and blind recovery of secret images under resolution mismatch conditions.

[0006] Technical solution: The present invention provides a method for hiding depth images at arbitrary resolution, comprising:

[0007] Step 1: Input the secret image into the frequency decoupling module to decouple the secret image into a global visual basis aligned with the resolution of the carrier image, and a detail latent variable independent of resolution;

[0008] Step 2: Encode the resolution information of the secret image into a resolution feature map using an implicit resolution coding strategy;

[0009] Step 3: Input the carrier image, global visual basis, detail latent variables, and resolution feature map into the reversible hidden network to generate a dense image;

[0010] Step 4: Input the dense image into the reversible hidden network for inverse recovery to obtain the recovered global visual basis, the recovered detail latent variables, and the recovered resolution feature map;

[0011] Step 5: Decode the recovered resolution feature map using a voting-based noise-resistant decoder to obtain the resolution information of the secret image;

[0012] Step 6: Input the recovered global visual basis and the recovered detail latent variables into the detail-guided implicit reconstructor to reconstruct the secret image at the target resolution corresponding to the resolution information.

[0013] Furthermore, in step 1, frequency decoupling specifically involves: resampling the secret image to the same resolution as the carrier image to obtain a global visual basis; resampling the global visual basis back to the original size of the secret image and subtracting it pixel by pixel from the original secret image to obtain a detail residual; and encoding the detail residual using a residual encoder to obtain the detail latent variable.

[0014] Furthermore, when the size of the secret image is smaller than the preset reference resolution of the carrier image, the detail latent variables are represented by all-zero feature maps, adaptively matching secret images of different resolutions.

[0015] Furthermore, the residual encoder employs a hyperbolic tangent activation function at its output to limit the values ​​of detail latent variables within a preset range, thereby improving the numerical stability of subsequent network processing.

[0016] Furthermore, in step 2, the implicit resolution coding strategy is as follows: the original height and original width of the secret image are independently encoded into binary bit sequences of fixed length; bit value 0 is mapped to a negative value and bit value 1 is mapped to a positive value; the mapped height bit sequence and width bit sequence are written into mutually separated strip regions in the resolution feature map to form a strip-type resolution feature map carrying the original size information.

[0017] Furthermore, in the strip resolution feature map, each binary bit is represented by a strip region consisting of multiple consecutive rows of pixels.

[0018] Furthermore, in step 4, the reversible hidden network includes a discrete wavelet transform module, an inverse discrete wavelet transform module, and multiple cascaded reversible affine coupling blocks; the discrete wavelet transform module is used to transform the carrier image, global visual basis, and detail latent variables to the frequency domain to form the secret payload frequency domain features, which are input into the subsequent network together with the carrier image frequency domain features.

[0019] Furthermore, in step 5, the voting-based noise-resistant decoder includes: an amplitude truncation unit, a strip statistics unit, and a symbol decision unit; wherein, the amplitude truncation unit is used to limit the amplitude of the recovered resolution feature map; the strip statistics unit is used to perform overall mean statistics on the strip regions representing the same bit; and the symbol decision unit is used to determine the corresponding bit value based on the symbol of the statistical result, and thereby recover the height and width values ​​of the secret image.

[0020] Furthermore, in step 6, the detail-guided implicit reconstructor includes: a joint feature extraction unit, a continuous coordinate construction unit, and an implicit prediction unit; wherein the joint feature extraction unit is used to perform feature fusion extraction on the recovered global visual basis and the recovered detail latent variables to form joint features; the continuous coordinate construction unit is used to generate normalized continuous query coordinates according to the target resolution; the implicit prediction unit is used to predict the high-frequency residuals at the corresponding positions based on the local features sampled from the joint features and the continuous query coordinates, and add the high-frequency residuals to the interpolation results of the recovered global visual basis at the target resolution pixel by pixel to obtain the reconstructed secret image.

[0021] Beneficial Effects: Compared with existing technologies, this invention has the following significant advantages: This invention decomposes the secret image into a global visual basis and detail latent variables through a frequency decoupling module, enabling the separate processing of global structural information and high-frequency detail information in the secret image. This effectively reduces detail loss caused by resampling when the secret image and carrier image resolutions do not match, improving the hiding adaptability across resolution scenarios. This invention encodes the original size information of the secret image into a resolution feature map through an implicit resolution coding strategy, and combines it with a voting-based noise-resistant decoder to reliably recover the resolution information. Without additional transmission of resolution metadata, the original resolution of the secret image can be obtained at the receiving end, thereby improving the system's blind recovery capability and ease of use. This invention uses a detail-guided implicit reconstructor to perform continuous coordinate modeling and point-by-point recovery of high-frequency residuals at any target resolution. This allows the recovery result to maintain the stability of the global visual basis while further compensating for high-frequency detail information, thereby improving the reconstruction quality and recovery fidelity of the secret image. Attached Figure Description

[0022] Figure 1 This is a flowchart of the present invention. Detailed Implementation

[0023] The technical solution of the present invention will be further described below with reference to the accompanying drawings.

[0024] like Figure 1 As shown, this embodiment of the invention provides a method for hiding depth images at arbitrary resolution, including:

[0025] S10, input the secret image into the frequency decoupling module, decouple the secret image, and obtain a global visual basis aligned with the resolution of the carrier image and a detail latent variable independent of the resolution;

[0026] In the specific implementation process, a secret image and a carrier image are first acquired, wherein the carrier image has a preset reference resolution. In one specific embodiment of the present invention, the preset reference resolution is 256×256. The secret image is resampled to the same resolution as the carrier image to obtain a global visual basis, which is represented as follows:

[0027]

[0028] in, Represents the global visual basis. Represents a secret image. Indicates the resampling operator, and These represent the height and width of the carrier image, respectively.

[0029] Subsequently, the global visual basis is resampled back to the original size of the secret image, and the difference between the global visual basis and the secret image is calculated pixel by pixel to obtain the detail residual, which is represented as:

[0030]

[0031] in, Indicates detailed residuals, and Let these represent the height and width of the secret image, respectively. Then, the detail residuals are input into a residual encoder for feature encoding to obtain the detail latent variables, which are represented as follows:

[0032]

[0033] in, Representing latent variables in detail, This indicates a residual encoder.

[0034] In one specific embodiment of the present invention, when the secret image size is larger than the preset reference resolution, a residual encoder is used to encode the detail residuals; when the secret image size is not larger than the preset reference resolution, the detail latent variables are represented using zero feature maps. The residual encoder includes multiple convolutional units connected in sequence, used to encode the input detail residuals into detail latent variables. The output of the residual encoder uses a hyperbolic tangent activation function to constrain the output range, thereby improving the numerical stability in the subsequent reversible hidden network processing.

[0035] The residual encoder includes a first convolutional unit, a second convolutional unit, a third convolutional unit, and a fourth convolutional unit. The first convolutional unit performs a first downsampling and shallow feature extraction on the detail residuals; the second convolutional unit performs a second downsampling and mid-level feature extraction on the features; the third convolutional unit performs compression mapping on the mid-level features; and the fourth convolutional unit further maps the features into single-channel detail latent variables. This is represented as follows:

[0036]

[0037]

[0038]

[0039]

[0040] in, , , , , represent the output features of the first convolutional unit, the second convolutional unit, and the third convolutional unit, respectively. This represents the LeakyReLU activation function. , , , This represents the first, second, third, and fourth convolutional units, with GN representing the grouping normalization operation. This represents the hyperbolic tangent activation function.

[0041] S20, the resolution information of the secret image is encoded into a resolution feature map through an implicit resolution encoding strategy;

[0042] In the specific implementation process, the original height and original width values ​​of the secret image are obtained separately and encoded into fixed-length binary bit sequences. In one specific embodiment of the present invention, both the original height and original width values ​​are encoded using 12-bit binary encoding. Then, bit value zero is mapped to a negative value, bit value one is mapped to a positive value, and the binary sequences representing the height and width values ​​are written into mutually separated strip regions in the resolution feature map to form a strip-type resolution feature map.

[0043] In one specific embodiment of the present invention, let the original height value and the original width value of the secret image be respectively... and Each of these is encoded into a 12-bit binary sequence:

[0044]

[0045]

[0046] in, This represents the binary bit sequence corresponding to the height value. This represents the binary bit sequence corresponding to the width value. Indicates the first Bit height encoding bits Indicates the first Bit width encoding bits .

[0047] Mapping the binary bit values ​​to the symbol field yields:

[0048]

[0049]

[0050] in, Indicates the first The bit height encodes the symbol value after the bit mapping. Indicates the first The sign value after bit width encoding bit mapping .

[0051] Let the resolution feature map be The construction method of the resolution feature map is expressed as follows:

[0052]

[0053] in, Indicates the resolution feature map at location Pixel value at that location, Represents row coordinates, Represents column coordinates, Indicates the height encoding corresponding to the first Each strip region, Indicates the width encoding corresponding to the first Each strip region.

[0054] In one specific embodiment, the first 60 lines are used to write the height value encoding, and the last 60 lines are used to write the width value encoding.

[0055] S30, the carrier image, the global visual basis, the detail latent variables, and the resolution feature map are input into the reversible hidden network to generate a dense image.

[0056] In one specific embodiment of the present invention, the reversible hiding network includes a discrete wavelet transform module, an inverse discrete wavelet transform module, and multiple reversible affine coupling blocks. First, the discrete wavelet transform module is used to transform the carrier image, global visual basis, and detail latent variables to the frequency domain, obtaining corresponding frequency domain features. Then, the frequency domain features of the global visual basis, the frequency domain features of the detail latent variables, and the resolution feature map are concatenated to form the secret payload frequency domain features, which are then input together with the carrier image frequency domain features into the reversible hiding network.

[0057] In one specific embodiment of the present invention, the reversible hiding network consists of 16 cascaded affine coupling blocks, and the sub-network in each affine coupling block adopts a DenseNet structure. After processing by the reversible hiding network, the frequency domain features and auxiliary variables corresponding to the dense image are obtained. Then, the frequency domain features corresponding to the dense image are transformed back to the spatial domain using an inverse discrete wavelet transform module to obtain the dense image.

[0058] S40, the dense image is input into the reversible hidden network for inverse recovery to obtain the recovered global visual basis, the recovered detail latent variables, and the recovered resolution feature map.

[0059] In the specific implementation process, firstly, a discrete wavelet transform is performed on the dense image to obtain its frequency domain features. Then, noise features with dimensions consistent with the auxiliary variables in the hiding stage are constructed, and these noise features, along with the frequency domain features of the dense image, are input into the inverse process of the reversible hiding network. After inverse recovery, the recovered global visual basis frequency domain features, the recovered detail latent variable frequency domain features, and the recovered resolution feature map are obtained. Finally, inverse discrete wavelet transforms are performed on the recovered global visual basis frequency domain features and the recovered detail latent variable frequency domain features, respectively, to obtain the recovered global visual basis and the recovered detail latent variables.

[0060] S50, the recovered resolution feature map is input to a voting-based noise-resistant decoder for decoding to obtain the resolution information of the secret image.

[0061] In one specific embodiment of the present invention, the voting-based noise-resistant decoder includes an amplitude truncation unit, a strip statistics unit, and a symbol decision unit; the amplitude truncation unit is used to limit the amplitude of the recovered resolution feature map to reduce the impact of outliers on subsequent decoding results; the strip statistics unit is used to perform overall statistics on strip regions representing the same binary bit; the symbol decision unit is used to determine the corresponding bit value based on the statistical results and recover the height and width values ​​of the secret image.

[0062] Specifically, let the resolution feature map obtained by the inverse recovery of the reversible hidden network be... First, the amplitude of the recovered resolution feature map is limited using an amplitude truncation unit to obtain a truncated resolution feature map. It is represented as:

[0063]

[0064] in This indicates an amplitude truncation operation. Indicates position The resolution feature map values ​​are obtained after truncation. Then, the overall mean is calculated for the strip regions corresponding to height encoding and width encoding using the strip statistics unit. For the ... The bit height encoding, and the statistical values ​​of the corresponding stripe regions are represented as follows:

[0065]

[0066] Corresponding to the Bit-width encoding, the statistical values ​​of the corresponding stripe region are represented as follows:

[0067]

[0068] in Indicates the height encoding corresponding to the first Partial strip region, Indicates the width encoding corresponding to the first Partial strip region, and These represent the total number of pixels within the corresponding stripe region. and They represent the first Statistical results corresponding to bit height encoding and width encoding.

[0069] In this embodiment of the invention, since the same binary bit is represented by multiple consecutive rows of pixels in the resolution feature map, local noise disturbances can be suppressed by performing overall statistics on the pixel values ​​within the entire stripe region. That is, when most pixels within the stripe region maintain the same sign as the original bit, the statistical result can still reflect the true value of that bit, thereby achieving voting-based noise-resistant decoding.

[0070] Next, the symbol decision unit recovers the corresponding binary bit value based on the symbol of the statistical result. For the first... Bit height encoding and the first Bit-width encoding, the decision results are expressed as follows:

[0071]

[0072]

[0073] in, Indicates the recovered first Bit height encoding bits Indicates the recovered first Bit width encoded bits. After obtaining all binary bits, the recovered height encoded bit sequence and width encoded bit sequence are restored to the height and width values ​​of the secret image, respectively, as follows:

[0074]

[0075]

[0076] in, This represents the height value of the recovered secret image. This represents the width value of the recovered secret image. Through the above decoding process, the original resolution information of the secret image can be stably recovered from the recovered resolution feature map. Since each binary bit is written repeatedly in a striped manner in the resolution feature map, the voting-based noise-resistant decoder can effectively reduce the impact of secret image transmission disturbances, reverse recovery errors, and local noise on the size decoding results, thereby improving the accuracy and robustness of secret image resolution recovery.

[0077] S60, the recovered global visual basis and the recovered detail latent variables are input into the detail-guided implicit reconstructor, and the secret image is reconstructed at the target resolution corresponding to the resolution information obtained in step S50.

[0078] In one specific embodiment of the present invention, the detail-guided implicit reconstructor includes a joint feature extraction unit, a continuous coordinate construction unit, and an implicit prediction unit. First, the recovered global visual basis is... Details of the recovery of latent variables By splicing the channels, we get:

[0079]

[0080] in, This indicates a channel splicing operation. This represents the fused features. Then, the joint feature extraction unit extracts features from the four-channel input features to obtain the joint features. :

[0081]

[0082]

[0083]

[0084]

[0085]

[0086] in, , , , These represent the features of the intermediate layer, , , This represents three residual blocks. , This represents the first convolutional operation unit and the second convolutional operation unit. Indicates joint features.

[0087] Then, the secret image resolution is recovered according to step S50. The continuous query coordinates corresponding to each pixel position at the reconstructed resolution are constructed using continuous coordinate construction units. For the first pixel in the reconstructed image... line, number The column position, and its continuous query coordinates are represented as follows:

[0088]

[0089] in, Indicates position The corresponding normalized continuous query coordinates, Represents row index, Represents a column index.

[0090] Furthermore, sampling operations are used to extract joint features Read the continuous query coordinates from the middle Corresponding local representation:

[0091]

[0092] in, Indicates position Corresponding local joint features This indicates a sampling operation.

[0093] Next, the local joint features With continuous query coordinates and continuous query coordinates By inputting the implicit prediction unit together, the high-frequency residual prediction value at the corresponding position is obtained, expressed as:

[0094]

[0095] in, Indicates position High-frequency residual prediction values ​​at the location, This represents the multilayer perceptron corresponding to the implicit prediction unit. Finally, the recovered global visual basis is... Resampling to reconstructed resolution Obtain the base image at the corresponding resolution. and reconstruct the secret image :

[0096]

[0097] In the above manner, the detail-guided implicit reconstructor uses the restored global visual basis to provide global structural information, uses the restored detail latent variables to provide high-frequency detail constraints, and combines the secret image resolution information restored in step S50 to achieve high-fidelity reconstruction of the secret image at the corresponding resolution.

[0098] In practical applications, the secret image is input into a frequency decoupling module to obtain the global visual basis and detail latent variables. The resolution information of the secret image is encoded into a resolution feature map. The carrier image, global visual basis, detail latent variables, and resolution feature map are then input into a trained reversible hidden network to obtain a secret image. This secret image is then input into the trained reversible hidden network for inverse reconstruction. Combined with a voting-based noise-resistant decoder and a detail-guided implicit reconstructor, the secret image at any resolution can be recovered.

[0099] In summary, the arbitrary resolution depth image hiding method in this embodiment of the invention obtains the global visual basis and detail latent variables by frequency decoupling of the secret image, and achieves blind recovery of the resolution information of the secret image by combining implicit encoding and noise-resistant decoding of the resolution feature map; at the same time, it improves the recovery quality of the secret image under resolution mismatch conditions by continuously reconstructing the high-frequency residuals through a detail-guided implicit reconstructor. It has high concealment, recovery fidelity and cross-resolution adaptability.

Claims

1. A method for hiding depth images at arbitrary resolution, characterized in that, include: Step 1: Input the secret image into the frequency decoupling module to decouple the secret image into a global visual basis aligned with the resolution of the carrier image, and a detail latent variable independent of resolution; Step 2: Encode the resolution information of the secret image into a resolution feature map using an implicit resolution coding strategy; Step 3: Input the carrier image, global visual basis, detail latent variables, and resolution feature map into the reversible hidden network to generate a dense image; Step 4: Input the dense image into the reversible hidden network for inverse recovery to obtain the recovered global visual basis, the recovered detail latent variables, and the recovered resolution feature map; Step 5: Decode the recovered resolution feature map using a voting-based noise-resistant decoder to obtain the resolution information of the secret image; Step 6: Input the recovered global visual basis and the recovered detail latent variables into the detail-guided implicit reconstructor to reconstruct the secret image at the target resolution corresponding to the resolution information.

2. The arbitrary resolution depth image hiding method according to claim 1, characterized in that, In step 1, frequency decoupling specifically involves: resampling the secret image to the same resolution as the carrier image to obtain a global visual basis; resampling the global visual basis back to the original size of the secret image and subtracting it pixel by pixel from the original secret image to obtain a detail residual; and encoding the detail residual using a residual encoder to obtain the detail latent variable.

3. The arbitrary resolution depth image hiding method according to claim 2, characterized in that, When the size of the secret image is smaller than the preset reference resolution of the carrier image, the detail latent variables are represented by all-zero feature maps, adaptively matching secret images of different resolutions.

4. The arbitrary resolution depth image hiding method according to claim 2, characterized in that, The residual encoder uses a hyperbolic tangent activation function at its output to limit the values ​​of detail latent variables within a preset range.

5. The arbitrary resolution depth image hiding method according to claim 1, characterized in that, In step 2, the implicit resolution coding strategy is as follows: the original height and original width of the secret image are independently encoded into binary bit sequences of fixed length; bit value 0 is mapped to a negative value and bit value 1 is mapped to a positive value; the mapped height bit sequence and width bit sequence are written into mutually separated strip regions in the resolution feature map to form a strip-type resolution feature map carrying the original size information.

6. The arbitrary resolution depth image hiding method according to claim 5, characterized in that, In a strip-resolution feature map, each binary bit is represented by a strip region consisting of multiple consecutive rows of pixels.

7. The arbitrary resolution depth image hiding method according to claim 1, characterized in that, In step 4, the reversible hidden network includes a discrete wavelet transform module, an inverse discrete wavelet transform module, and multiple cascaded reversible affine coupling blocks. The discrete wavelet transform module is used to transform the carrier image, global visual basis, and detail latent variables to the frequency domain to form the secret payload frequency domain features, which are input into the subsequent network along with the carrier image frequency domain features.

8. The arbitrary resolution depth image hiding method according to claim 1, characterized in that, In step 5, the voting-based noise-resistant decoder includes: an amplitude truncation unit, a strip statistics unit, and a symbol decision unit; wherein, the amplitude truncation unit is used to limit the amplitude of the recovered resolution feature map; the strip statistics unit is used to perform overall mean statistics on the strip regions representing the same bit; and the symbol decision unit is used to determine the corresponding bit value based on the symbol of the statistical result, and thereby recover the height and width values ​​of the secret image.

9. The arbitrary resolution depth image hiding method according to claim 1, characterized in that, In step 6, the detail-guided implicit reconstructor includes: a joint feature extraction unit, a continuous coordinate construction unit, and an implicit prediction unit. The joint feature extraction unit is used to perform feature fusion extraction on the recovered global visual basis and the recovered detail latent variables to form joint features. The continuous coordinate construction unit is used to generate normalized continuous query coordinates based on the target resolution. The implicit prediction unit is used to predict the high-frequency residuals at the corresponding positions based on the local features sampled from the joint features and the continuous query coordinates, and to add the high-frequency residuals to the interpolation results of the recovered global visual basis at the target resolution pixel by pixel to obtain the reconstructed secret image.