A geometrically robust reversible neural network image digital watermarking method
By employing a reversible neural network method that combines geometric security mask constraints and content-adaptive assisted alignment of watermarks, the problem of synchronization loss of digital watermarks under geometric attacks is solved. This method achieves robust extraction and high-capacity embedding under complex conditions, thereby improving the robustness and usability of watermarks.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- TONGJI UNIV
- Filing Date
- 2026-05-14
- Publication Date
- 2026-06-12
Smart Images

Figure CN122199247A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of information hiding and digital watermarking technology, and specifically relates to a geometrically robust reversible neural network image digital watermarking method. Background Technology
[0002] Digital watermarking is used for multimedia copyright protection and content traceability, with the core objective of achieving a balance between imperceptibility, robustness, capacity, and security. Traditional pixel-domain or transform-domain methods (such as DCT and DWT) exhibit certain stability under range attacks such as compression, noise, and blurring, but are prone to synchronization loss under geometric attacks such as rotation, scaling, translation, and perspective, leading to extraction failure due to mismatch between the decoder and the embedding position. To alleviate the synchronization loss problem, existing research has introduced template synchronization or feature point / local descriptor alignment, but these methods lack stability in scenarios with strong compression and blurring, scarce textures, or large deformations. Furthermore, fixed templates pose visibility risks, and feature point schemes are constrained by scene content and geometric amplitude.
[0003] With the development of deep learning, learning-based watermarking encoding / decoding and reversible networks have made progress in terms of visual fidelity and capacity. However, geometric transformations can change the relationship between pixel coordinates and sampling, disrupting the encoder-decoder correspondence. Simply enhancing the decoder or restoring the pixel domain often fails to fundamentally solve the synchronization loss problem, and excessive restoration may introduce artifacts that interfere with the watermark signal. On the other hand, message mapping is often limited by resolution and layout coupling, lacking flexible support for payload length; when cross-scale robust fusion is insufficient, the stability of the watermark is also affected under complex distortion combinations.
[0004] To balance reversibility and robustness under strong geometric conditions, it is crucial to employ consistent geometrically safe regions at both the embedding and extraction ends to systematically avoid the impact of boundary interpolation and occlusion on decoding. Simultaneously, content-adaptive alignment cues should be introduced to reliably detect superimposed range perturbations such as compression, blurring, and noise, enabling the estimation of geometric parameters and synchronization. Furthermore, the geometric alignment side should focus on robust regression of geometric parameters such as homography, rather than relying on large-scale pixel reconstruction, to reduce secondary distortion interference with the payload channel. The training protocol must also cover combined geometric and range distortion scenarios, ensuring the encoding / decoding path is tolerant of real-world attacks.
[0005] The existing technology still has the following main shortcomings: (1) lack of reusable geometrically secure embedding / extraction constraints in full-angle rotation and perspective scenes; (2) insufficient coordination between synchronization / alignment mechanisms and payload channels, making it difficult to work stably under composite distortion; (3) limited capacity scalability of message mapping and cross-scale fusion; (4) the geometric alignment strategy still needs to be balanced between reliability and simplicity. Therefore, it is necessary to propose a high-capacity watermarking method that uses reversible neural networks to effectively fuse payload and host images under geometrically secure constraints and completes robust extraction through geometric alignment, so as to improve the overall reliability and usability under complex distortion conditions. Summary of the Invention
[0006] To address the problems existing in the prior art, this invention proposes a geometrically robust reversible neural network image digital watermarking method, designs a digital watermark that can still be stably extracted under complex geometric attacks and multiple pixel domain distortion conditions, and provides matching embedding and extraction algorithms.
[0007] In existing application scenarios, traditional watermarking methods are prone to synchronization loss and decreased extraction accuracy when the host image is subjected to complex perturbations such as full-angle rotation, perspective transformation, translation, scaling, compression, blurring, and noise. This invention introduces a geometrically secure mask to restrict watermark embedding and extraction to a stable central region even under worst-case deformation conditions, ensuring the reversibility of the embedding structure and the robustness of the extraction process. It utilizes a reversible neural network to achieve bidirectional mapping and fusion of the payload message and the host image within a unified framework, and a multi-scale fusion mechanism to enhance payload capacity and anti-interference capability. A content-adaptive auxiliary alignment watermark is introduced as a geometric synchronization signal. A distortion-restoration network focuses on estimating homography geometric parameters, accurately aligning the attacked image before decoding and extraction, thus significantly improving overall robustness. To achieve "high capacity," this invention expands to the target resolution through two-dimensional layout mapping of the message and cross-scale progressive upsampling, combined with multi-channel parallel coding and a multi-scale residual dense fusion structure, increasing the capacity per unit pixel. This method supports configurable payload length and embedding strength, enabling an adjustable balance between capacity, robustness, and visual quality under different task requirements.
[0008] The present invention adopts the following technical solution: A geometrically robust reversible neural network image digital watermarking method includes the following steps: Step 1: Under the gating of geometric security mask, the message to be embedded with watermark is mapped and embedded into the host image using a reversible neural network to obtain a watermarked image, and an auxiliary alignment watermark is superimposed on the watermarked image using a spatial adaptive method.
[0009] Step 2: For the attacked watermarked image, use the distortion-restoration network to estimate the geometric transformation based on the auxiliary alignment watermark and align the image. Under the mask constraint consistent with the embedding stage, execute the reverse path of the reversible neural network to recover the watermark message.
[0010] Furthermore, step 1 considers issues such as security masks, different scales of watermark messages and images, and the addition of distortion-assisted alignment watermarks. Specifically, this includes: Step 1.1: Scale the input image to 256×256 resolution, construct a central square geometric safety mask under constraints of full-angle rotation (-180° to 180°), perspective distortion, and vertical and horizontal translation of the image, and broadcast the mask to all image channels.
[0011] Step 1.2: Under the region consistency constraint defined by the geometric security mask, the scaled image and the low-resolution message are input into the reversible neural network. The message is injected into the network in a cross-scale manner and fused with the image branch. After the forward propagation of the network ends, only the image branch is retained to output the watermarked image.
[0012] Step 1.3: Based on the watermarked image, the auxiliary alignment watermark is weighted and superimposed using a content-adaptive intensity map to maintain detectability and improve alignment stability under complex distortion.
[0013] Furthermore, in step 1.2, the watermark message and the image are fused together step by step across scales and propagated forward through reversible computation. The reversible neural network comprises several reversible neural network blocks connected in series; The reversible neural network block includes: a cross-scale progressive upsampling path and a cross-scale progressive downsampling path; wherein, the cross-scale progressive upsampling path is used for fusing watermark information into the image, and the cross-scale progressive downsampling path is used for fusing the image into the watermark information. Specifically, it includes: Step 1.2.1: The watermark message is preferably square in size so that it can be mapped to the initial feature space in a square two-dimensional layout; non-square lengths can be converted to square size by padding, error correction codes or other encodings and then mapped to the same layout. It is then mapped to a representation of the same size as the image by cross-scale upsampling and fused with the image.
[0014] Step 1.2.2: Map the image to a representation of the same size as the watermark message by using a masked, multi-scale downsampling method, and then fuse it with the watermark message.
[0015] Step 1.2.3: Feed the fused image and watermark message representation into the next reversible neural network block. Repeat steps 1.2.1 to 1.2.3. After passing through multiple layers of reversible neural network blocks, scale the image output by the last block to the original size and use it as the image with embedded watermark information.
[0016] Furthermore, in step 1.3, the auxiliary alignment watermark is added to the image after embedding the watermark information in step 1.2.3 using an adaptive intensity map. The adaptive intensity map is learned by a neural network through a mean squared error loss function and can be modeled using image statistical features to facilitate embedding the watermark in a position that is not easily noticeable to the human eye.
[0017] Furthermore, the adaptive intensity map in step 1.3 uses a fixed 4×4 color block guide template. The color blocks have different color combinations at different positions to provide a stable detection response under geometric and pixel distortion conditions.
[0018] Furthermore, the image obtained in step 1.3 after embedding the auxiliary alignment watermark is randomly subjected to single or combined attacks such as rotation, scaling, affine transformation, projection, cropping, and translation, as well as data augmentation without attacks.
[0019] Furthermore, step 1.3 extracts the auxiliary alignment watermark using a learnable neural network.
[0020] Furthermore, in step 1.3, the neural network is optimized using a learnable neural network such as ConvInRelu or UNet, with the mean squared error loss function combined with the extracted auxiliary alignment watermark.
[0021] Furthermore, step 2 trains a distortion-restoration network to restore the attacked image and extracts the watermark message using a reversible neural network. Specifically, this includes: Step 2.1: Scale the input image to 256×256 resolution, use the distortion-restoration network to detect the watermark for auxiliary alignment, and output the estimated watermark for auxiliary alignment.
[0022] Step 2.2: In the geometric estimation stage, one of two methods is adopted: single regression or iterative refinement. Single regression directly obtains the final homography matrix; iterative refinement obtains the final result by predicting small homography updates in sequence and combining them. The iterative refinement can be optimized in the training stage or executed independently in the inference stage.
[0023] Step 2.3: Apply a geometric security mask consistent with the embedding stage to the geometrically aligned image, input the image within the masked area into the reverse path of the reversible neural network, and recover the original watermark message for verification or decoding.
[0024] Furthermore, step 2.2 can use a learnable distortion-restoration network to compare the auxiliary alignment watermark extracted in step 2.1 with the auxiliary alignment watermark embedded in step 1.3, obtaining the homography matrix through a single regression; alternatively, multiple predictions can be used to make small homography updates and obtain the final result. The neural network can be an early CNN-based method or a later VisionTransformer-based method.
[0025] Furthermore, the distortion-restoration network described in step 2.2 can utilize a soft-gated expert routing mechanism. Different experts specialize for different geometric deformations such as rotation, scaling, affine transformation, projection, cropping, and translation. Gating is performed based on soft selection according to the response and image statistical features related to the guiding template, specifically including: Step 2.2.1: Using the auxiliary alignment watermark extracted in Step 2.1 and the actual auxiliary alignment watermark embedded in Step 1.3 as input, construct multiple expert networks with the same network structure.
[0026] Step 2.2.2, parallel to the multiple expert networks described in Step 2.2.1, constructs a soft-gated expert routing network, which outputs gating weights for each expert.
[0027] Step 2.2.3: Weight the gating weights of each expert and the output results of each expert to obtain the final output result of the soft-gated expert network.
[0028] Furthermore, the watermark message recovery process in step 2.3 is a reverse inference path of a reversible neural network, with a structure corresponding to steps 1.2.1 to 1.2.3, but with the data flow direction reversed. Since the reversible neural network satisfies mathematical reversibility, the message expansion and image fusion processes in the embedding stage can be executed completely in reverse during the extraction stage, thereby recovering the original watermark message. Specifically, this includes: Step 2.3.1: In the geometrically aligned watermarked image, first apply a geometric security mask and use the pixel values within the mask area as valid input.
[0029] Step 2.3.2: This step is the reverse process of the reversible neural network in Step 1. The image message and the watermark message are mutually mapped in size through cross-scale stepwise sampling (mapped to the watermark message size when fusing from the image message to the watermark message, and mapped to the image message size when fusing from the watermark message to the image message), and then fused through the inverse operation of the reversible neural network in Step 1.
[0030] Step 2.3.3: Repeat step 2.3.2. After the reverse process of the multi-layer reversible neural network block, the output watermark message features are inversely mapped into a one-dimensional watermark message vector according to the two-dimensional spatial layout. The length is strictly consistent with the original payload message in the embedding stage, thereby achieving lossless extraction.
[0031] Furthermore, the network parameters of the reversible neural network in step 2.3 are the same as those of the reversible neural network in step 1.2.
[0032] The beneficial effects of this invention are: 1. This invention proposes a reversible neural network embedding method with geometric security mask constraints, which effectively improves the stability of watermarks under complex geometric attacks such as rotation, scaling, affine, projection, cropping and translation while ensuring visual quality and message capacity.
[0033] 2. This invention combines a multi-scale residual dense fusion structure to fully utilize features of different resolutions, significantly improving the watermark's resistance to pixel-domain distortions such as compression, noise, and blurring.
[0034] 3. This invention introduces a content-adaptive assisted alignment watermark that works in conjunction with a distortion-restoration network to accurately restore geometric parameters and perform alignment, enabling stable extraction of the watermark even under various attack combinations. Attached Figure Description
[0035] Figure 1 Overall architecture diagram of the invention model; Figure 2 Detailed diagram of the reversible neural network model of this invention; Figure 3 Detailed diagram of the distortion-restoration network in this invention; Figure 4 Logical architecture diagram of this invention; Figure 5 Performance comparison chart of the method of this invention and existing methods (where ① is the method of this invention, ② is the StegaStamp method, ③ is the RivaGAN method, ④ is the RoSteALS method, ⑤ is the MBRS method, ⑥ is the CIN method, ⑦ is the PIMoG method, and ⑧ is the DwtDctSvd method). Detailed Implementation
[0036] The embodiments of the present invention will be described in detail below with reference to the accompanying drawings and examples, so as to provide a more complete understanding of how the present invention uses technical means to solve technical problems and achieve technical effects, and to enable implementation accordingly.
[0037] Since training digital watermarking only requires a certain diversity of training images, and does not concern itself with the content of the images, this embodiment merges commonly used datasets such as DIV2K and MS COCO to form a new dataset with a total of approximately 100,000 images.
[0038] Given an image arbitrarily scaled to a resolution of 256×256 and a multi-bit (128-bit) watermark information to be embedded. The method of this invention aims to construct a watermark embedding module. and extraction module It can obtain images with added watermark information through the watermark embedding module. : After being scaled back to its original size and transmitted over the network, the image may be subjected to attacks such as compression or cropping, resulting in the compromised image. This compromised image is then scaled back to a resolution of 256×256. .
[0039] Attacked image The watermark extraction module can then be used to obtain the extracted watermark information. The specific implementation of the method of the present invention is as follows: A geometrically robust, high-capacity, reversible neural network image watermarking method, with the overall structure as follows: Figure 1 As shown, the specific steps include: ( Figure 4 (This is a logical architecture diagram of the steps in this invention) Step 1: Under the gating of a geometric security mask, the message to be embedded is mapped and embedded into the host image using an invertible neural network (INN) to obtain a watermarked image. An auxiliary alignment watermark is then superimposed on the watermarked image using a spatial adaptive method.
[0040] Step 1.1: Scale the input image to a resolution of 256×256 to enable the present invention to embed watermarks into images of any size. Simultaneously, to ensure the model embeds the watermark information into the central region of the image to compensate for information loss caused by geometric deformation, a central square geometric security mask is constructed under constraints of full-angle rotation (-180° to 180°), perspective distortion, and vertical and horizontal translation of the image. This mask is then broadcast to all image channels.
[0041] For pixel position Define a geometric security mask: in For the mask ratio parameter, in this embodiment The geometric security mask is located in a square area at the center of the image, such as... Figure 1 The square color block in the center of the image is for illustrative purposes only. After the actual watermark is added, the color block is indistinguishable to the naked eye.
[0042] Step 1.2: Under the region constraints defined by the geometric security mask, the scaled image and the low-resolution watermark message are input into several cascaded reversible neural network blocks. The watermark message is injected step-by-step within the network in a cross-scale manner and fused with the image branch. After the network's forward propagation ends, only the image branch is retained to output the watermarked image. The specific process is as follows: Figure 2 (See step 1.2).
[0043] The reversible neural network block includes a cross-scale progressive upsampling path and a cross-scale progressive downsampling path. The cross-scale progressive upsampling path is used for fusing watermark information into the image, and the cross-scale progressive downsampling path is used for fusing the image into watermark information.
[0044] Specifically as follows: Step 1.2.1: Fusion of watermark information into the image (upsampling); Since the watermark information is 128 bits, this embodiment will use the nearest square number, 144, as the actual embedded information, with an additional 16 bits as ECC check bits. Specifically, a segment is randomly generated. The value is a 128-bit binary string with an additional 16-bit ECC checksum, which is used as the watermark information during training. Before feeding the binary string into the reversible neural network, the binary string is mapped to a value... An array of floating-point numbers, where Representative value The bits, and The original value is represented as The number of bits (floating-point numbers for convenience in neural network calculations) is then converted into a 12×12 square matrix (144 bits in total) to form the initial message feature map for a single channel, denoted as . .
[0045] Image It also changed from the original value range in each channel to Floating-point numbers scaled to The range is scaled to ensure it is similar to the value range of the message, facilitating message transmission. After scaling, it serves as the input to the image branch of the first reversible neural network block. The superscript indicates the block number of the reversible neural network block, and the subscript indicates the number of upsampling iterations. 0 iterations indicate no sampling or representation. In this embodiment, the image height and width... .
[0046] To map messages to the same size as images, a multi-scale message representation is constructed using a cross-scale upsampling path, and then fused at the top layer. This upsampling path comprises four scale expansion levels, each using a deconvolutional network with a stride of 2, a kernel size of 4, and padding of 1. By doubling the resolution, multi-scale message features with sizes of 24×24, 48×48, 96×96, and 192×192 were obtained step by step. The superscript indicates the block number of the reversible neural network block, the subscript indicates the number of scale expansions, and the input... Original single-channel initial message feature map The details are as follows: After each upsampling stage, a 3×3 convolutional network with stride 1 and padding 1 is used. and Nonlinear activation By performing channel mapping and nonlinear transformation on the features, an intermediate representation that is more compatible with subsequent fusion is obtained. : To facilitate the aggregation of cross-scale information and take into account high-frequency / low-frequency details, a multi-scale residual dense fusion structure is adopted. The fusion process involves first aligning the features at each scale to the target size of 256×256 using bilinear interpolation, and then applying 3×3 convolutions to each feature to obtain a representation with a uniform number of channels. Subsequently, the images are concatenated along the channel dimension, and finally mapped to the same number of channels as the host image (3 channels in this embodiment) through a 3×3 convolution, resulting in a message representation of the same size as the image, denoted as . : This embodiment will be from arrive process record ,Right now Then, this representation is input along with the image branch. The summation yields the image output of the first reversible neural network block, completing the message transfer from the watermark message to the image: That is, it serves as the input to the next (second) reversible neural network block and as a representation of the image information transmitted to the watermark information in step 1.2.2.
[0047] Step 1.2.2: Image fusion with watermark information (downsampling); for Image input A multi-scale image representation is constructed using a cross-scale progressive downsampling path and then fused into the message representation. The cross-scale progressive downsampling path includes two parallel progressive downsampling paths, employing the same network structure but different network parameters. The input to both progressive downsampling paths is... The two downsampling processes are denoted as follows: and , respectively obtained and Two images of the same size as the watermark message are represented.
[0048] The progressive downsampling path comprises four levels of scaling down, each using a convolutional network with a stride of 2, a kernel size of 4, and padding of 1. By reducing the resolution by a factor of 2, multi-scale image features with sizes of 128×128, 64×64, 32×32, and 16×16 are obtained step by step. The superscript indicates the block number of the reversible neural network block, and the subscript indicates the number of downsampling operations. The downsampling process is as follows: The input for the first scale reduction: .
[0049] A 3×3 convolutional network with stride 1 and padding 1 is used. and Nonlinear activation By performing channel mapping and nonlinear transformation on the features, an intermediate representation that is more compatible with subsequent fusion is obtained. : To facilitate the aggregation of cross-scale information and take into account high-frequency / low-frequency details, a multi-scale residual dense fusion structure is adopted. The fusion process involves first aligning the features at each scale to the target message size of 12×12 using bilinear interpolation, and then applying 3×3 convolutions to each feature to obtain a representation with a uniform number of channels. Subsequently, the images are concatenated along the channel dimension, and finally mapped to the same number of channels as the host image (1 channel in this embodiment) through a 3×3 convolution, resulting in an image representation of the same size as the watermark message, denoted as . : Obtained in the same way .
[0050] Will Used as weight, Used as a bias to modify the message representation: in, This serves as the input to the watermark message for the next reversible neural network. For each reversible neural network block, it involves a process of upsampling the image representation from the watermark message. The two images represent the downsampling process of the watermark message. and .
[0051] Specifically, regarding the first The message transmission process of a reversible neural network block is represented as follows: Step 1.2.3: The output watermark message of the first reversible neural network block from steps 1.2.1 and 1.2.2. and image representation As the input to the second reversible neural network block, repeat the cross-scale progressive sampling and fusion process of steps 1.2.1 and 1.2.2 until... After processing a reversible neural network block, the watermark message is output. and image output .
[0052] This refers to the image with the watermark added to the output. : Watermark message output It is discarded as meaningless output.
[0053] In this embodiment, mean square error is used. Loss and Perceptual loss is used to maximize image similarity during the reversible neural network watermark embedding process, thereby ensuring the visual invisibility of the watermark. in As the weights of the loss function, in this embodiment .
[0054] Step 1.3: To further enhance the spatial alignment and robustness of the subsequent decoding stage, this embodiment introduces an auxiliary alignment watermark in addition to the main watermark to achieve synchronous positioning.
[0055] To ensure the auxiliary watermark remains imperceptible to the human eye and maintains its recognizability, this embodiment employs an adaptive addition method that combines a fixed guiding template with a human-perceptible difference model, such as... Figure 3 (As shown in step 1.3), the details are as follows: in, It is an image with an embedded watermark message output by a reversible neural network. Is An image with an embedded watermark is then added on top of the existing image. When actually outputting the watermarked image, it is necessary to... Do by The operation of converting intervals into images.
[0056] Watermark Alignment It is a non-learnable guide template composed of 4×4 color blocks of the same size and different colors with fixed margins (such as...). Figure 3 In the guide template This template is used for positioning and restoration in the spatial domain. It is Gaussian blurred to prevent sharp edges from affecting the watermark's invisibility.
[0057] As a learnable adaptive intensity map network, used to determine the weight distribution of the auxiliary watermark in different regions to enhance the robustness of the alignment signal in regions containing key structures, this embodiment uses the ConvInRelu network as its network structure.
[0058] Just Noticeable Difference (JND) mapping (existing technology) is used to suppress the amplitude of auxiliary watermarks.
[0059] Suppose an attacker performs a combined attack of rotation, perspective manipulation, compression, blurring, and noise on an image after an auxiliary alignment watermark has been added, resulting in a distorted image. For geometric deformations (rotation, perspective) of the image, the homography matrix used in this embodiment is: The homography matrix is calculated using the four corner points after deformation. The parameters for compression, blurring, and noise-based attacks are uniformly represented using... express: Among them, PyTorch distortion operators This allows for various image processing operations using existing Python image processing libraries such as torchvision or opencv.
[0060] Step 2: For the attacked watermarked image, use the distortion-restoration network to estimate the geometric transformation based on the auxiliary alignment watermark and align the image. Under the mask constraint consistent with the embedding stage, execute the reverse path of the reversible neural network to recover the watermark message.
[0061] The distortion-restoration network includes a watermark extractor (detection branch) capable of extracting the guide template and a geometric deformation estimation branch composed of a multi-expert network and a gating network, used to estimate the parameters of geometric deformation based on the changes in the guide template and restore the deformed image.
[0062] Step 2.1: This embodiment uses a trainable watermark extractor. from Extracting auxiliary alignment watermarks In this embodiment Using UNet as its network architecture, the process is as follows: To ensure the success of subsequent alignment steps, it is necessary to guarantee the accuracy of the extracted auxiliary alignment watermark. It should be consistent with the distorted guide template. To be as consistent as possible, therefore, use The loss function optimizes this. Simultaneously, it should be ensured that the image with the auxiliary alignment watermark is as consistent as possible with the image without it, thus making the addition of the auxiliary alignment watermark visually invisible. The specific loss function is as follows: in and These are the hyperparameters that control the weights of the loss function, as described in this embodiment. .
[0063] Distortion Operator This is a training data augmentation module and is not part of the neural network ontology.
[0064] In practical implementation, if the image has been compressed or distorted due to network transmission, this embodiment will scale the obtained image to a resolution of 256×256 and simultaneously adjust the image pixel values to... The range facilitates the watermark extractor. Able to accept correct input and output .
[0065] Step 2.2: In this embodiment, the geometric deformation estimation branch in the distortion-restoration network is used to align the extracted auxiliary watermark. and preset guide templates As input, this yields the homography estimation matrix for deformation distortion. , specifically Figure 3 As shown.
[0066] Step 2.2.1: This embodiment uses HomoNet as the receiver. and As input and output The basic network, construction A HomoNet base network It also outputs a set of homography estimation matrices: in For HomoNet network instances with identical structures but independent parameters, as restoration network experts, they focus on different types and scales of geometric deformation during training, such as affine migration, perspective distortion, or nonlinear compression, thereby forming an integrated multi-view deformation estimation capability.
[0067] For the homography matrix, this embodiment uses the corner regression method. Specifically, for the four corner points of the undistorted image... ,network Returning to the four corner points after deformation Then, by linear solution, the desired result is obtained. homography matrix .
[0068] Alternatively, direct parametric regression can be used to directly regress eight independent parameters. This forms the homography matrix: Step 2.2.2: Parallel to the above-mentioned multiple expert networks, this embodiment establishes a soft-gated expert routing network. This is used to control the gating weights of each expert.
[0069] The soft-gated expert routing network uses a convolutional network. Based on the network to assist in aligning the watermark Guide template and images As input, the input is concatenated along the channel dimension and then passed through multiple convolutional networks, followed by... Linear layer, outputting the gating weights of each expert. ,as follows: Make: in, It reflects the degree of matching between the input samples and the abilities of each expert, enabling dynamic soft selection of images with different geometric distortion types randomly sampled from the data.
[0070] To stabilize expert utilization and prevent collapse, this embodiment uses a loss function in the gated branch. The following load balancing and entropy regularization terms are introduced: Step 2.2.3, after obtaining all One expert output and corresponding weights Then, the final homography estimation matrix is obtained through weighted fusion: To achieve end-to-end trainable geometric alignment in the network, this embodiment introduces a differentiable geometric transformation module based on grid sampling in the subsequent correction stage. This module follows... Generate sampling grid And perform a transformation to obtain a geometrically aligned image. : Among them, due to The image has minimal pixel-level differences from the original, allowing for smoother extraction of the original watermark message. During inference, it can be directly obtained using non-differentiable pixel-level geometric transformations. In this embodiment, the training loss function for the geometric deformation estimation branch of the distortion-restoration network is expressed as: In this embodiment, the loss function weights .
[0071] Step 2.3: In the geometrically aligned image of Step 2.2, apply a geometric security mask consistent with the embedding stage of Step 1. Input the image within the masked area into the reverse process of the reversible neural network of Step 1 to recover the original watermark message for verification or decoding. The specific process is as follows... Figure 2 (See step 2.3).
[0072] The reverse process of the reversible neural network has completely identical network parameters to those in the forward process of the reversible neural network in step 1.2. Because the reversible neural network satisfies mathematical reversibility, the message expansion and image fusion processes in the embedding stage can be completely reversed in the extraction stage, thereby recovering the original watermark message.
[0073] In the inverse process of this reversible neural network, this embodiment uses an all-zero matrix. As the input to the watermark message branch, the message branch input for the last reversible neural network block can be denoted as... .
[0074] Step 2.3.1: Image preprocessing with a mask Similar to step 1.2.2, this embodiment restores the image after distortion. Perform geometric security masking operations The image branch input of the last reversible neural network block is obtained. .
[0075] Step 2.3.2: Image and watermark information are fused together. Perform the steps described in step 1.2.2 on the input. and Operation, here and The network parameters are exactly the same as those in step 1.2.2, ensuring that the entire process is reversible. Subsequently, the following is obtained: and : Then, the watermark message input of the previous reversible neural network block can be obtained through inverse computation. : in, and These are element subtraction and element division, respectively. As the output of the last reversible neural network block, and also the output of the previous one ( The input to the reversible neural network block. Simultaneously, according to the rules of reversible computation, It is also used to represent watermark information in image information.
[0076] Similarly, as in step 1.2.1, this embodiment uses... The operation yields a representation of the watermark message at the image message scale: Through reversible calculation, the image message output of the last reversible neural network block can be obtained. : Step 2.3.3: The output watermark message of the last reversible neural network block in step 2.3.2. and image representation As the second to last ( The input to the reversible neural network block is used to repeat the cross-scale stepwise sampling and fusion process in step 2.3.2 until the reverse process is completed. After reversible neural network blocks, the watermark message output is obtained. and image output ,in This refers to the watermark message restored after the reverse process of a reversible neural network: Image message output It is discarded as meaningless output.
[0077] Restored watermark message The watermark can be extracted from the actual watermark payload or recovered through ECC verification, and then the bit value of the watermark can be determined based on the value of a certain bit. in, This is the final message restoring the watermark. It is the watermark bit.
[0078] In the training process of this embodiment, the parameters of the reversible neural network are trained by freezing the network parameters of the distortion-restoration network and performing the forward and inverse processes of the reversible neural network. Since the inverse process only ensures that the watermark message can be accurately extracted, the loss function can be expressed as: The loss function for training the overall invertible neural network can be expressed as: in As the weights of the loss function, in this embodiment .
[0079] The overall loss function in this embodiment includes a reversible neural network part and a distortion-restoration network part, and can be expressed as follows: in As the weights of the loss function, in this embodiment This embodiment requires that the extracted watermark information be basically consistent with the original watermark information, and the bit accuracy rate... It can reach over 99%. The bit accuracy is calculated as follows: in, It's an indicator function; it returns 1 if the values are equal, and 0 if they are not equal. and These are the first watermark information. Bit values and predicted watermark information The Middle The value of a bit.
[0080] Meanwhile, this embodiment requires that the image after watermarking has good visual concealment, specifically in the following applications. and Measured by (structural similarity) indicators: 1) The formula used to evaluate image fidelity is as follows: in, It is the mean square error. This is the maximum value of the image pixels, which this embodiment can guarantee under attack conditions. .
[0081] 2) The formula used to evaluate the integrity of image structure is as follows: in, and These are the mean values of the original image and the watermarked image, respectively. and Standard deviation For covariance, and It is a constant used to maintain numerical stability.
[0082] Meanwhile, this embodiment requires that the added watermark capacity remain high, specifically in use... The metric used to measure this is (Bits Per Pixel, average number of embedded bits). This embodiment compares the method of the present invention with four representative approaches: traditional methods (DwtDctSvd), classic deep learning methods (RivaGAN, MBRS, RoSteALS), methods with deformation recovery networks (StegaStamp, PIMoG), and reversible neural network methods (CIN). The test results are as follows: Figure 5 The vertical axis represents the overall bit accuracy ( ). The horizontal axis represents structural similarity ( ), The circle size represents the bit capacity (BPP), and the label box contains the method name and BPP value. ① represents the method of this invention, ② represents the StegaStamp method, ③ represents the RivaGAN method, ④ represents the RoSteALS method, ⑤ represents the MBRS method, ⑥ represents the CIN method, ⑦ represents the PIMoG method, and ⑧ represents the DwtDctSvd method. It is evident that the method of this invention has the best performance across all metrics.
[0083] The above description is merely a description of preferred embodiments of this application and is not intended to limit the scope of this application in any way. Any changes or modifications made by those skilled in the art based on the above-disclosed technical content should be considered as equivalent and valid embodiments and fall within the scope of protection of the technical solution of this application.
[0084] Innovation This invention relates to digital watermarking technology, particularly a robust and high-capacity method for watermark embedding and extraction that remains effective even under complex geometric attacks and composite value range distortions. By constructing an overall framework of "embedding-synchronization-alignment-reversible extraction," this invention can improve watermark recovery capabilities and information carrying capacity under various attacks while ensuring visual imperceptibility.
[0085] One of the innovations: a collaborative alignment mechanism between geometric security masks and content-adaptive assisted alignment watermarks. This invention proposes a central square geometric security mask that remains stable under full-angle rotation and perspective deformation, and uses it consistently in both the embedding and extraction stages. Simultaneously, a fixed 4×4 guiding template is introduced and weighted by intensity maps predicted by a neural network. Combined with perceptual budgeting and JND suppression, this achieves a "detectable yet imperceptible" embedding effect. The detection branch of the distortion-restoration network extracts template estimation, while the geometric estimation branch regresses homography (supporting iterative refinement). Soft gating experts are employed to adapt to different geometric distortions, and differentiable mesh sampling is combined to achieve precise alignment, thereby improving the synchronization and alignment stability under complex distortion combinations from the source.
[0086] Second innovation: High-capacity cross-scale embedding and consistent reverse extraction of reversible neural networks This invention employs a reversible neural network to achieve bidirectional mapping between images and messages. It designs a coupling of "message cross-scale upsampling injection + image cross-scale downsampling backflow" and multi-scale residual dense fusion, mapping a one-dimensional payload to a squared two-dimensional layout and supporting padding / error correction extensions. The forward pass retains only the image branch output. During the extraction stage, the reverse path is executed under geometric alignment and mask gating to recover the original message, balancing high capacity with visual imperceptibility, significantly improving the extraction success rate under combined attacks such as rotation, perspective, compression, blur, and noise.
Claims
1. A geometrically robust reversible neural network image digital watermarking method, characterized in that, Includes the following steps: Step 1: Under the gating of geometric security mask, the message to be embedded with watermark is mapped and embedded into the host image using a reversible neural network to obtain a watermarked image, and an auxiliary alignment watermark is superimposed on the watermarked image using a spatial adaptive method. Step 2: For the attacked watermarked image, use the distortion-restoration network to estimate the geometric transformation based on the auxiliary alignment watermark and align the image. Under the mask constraint consistent with the embedding stage, execute the reverse path of the reversible neural network to recover the watermark message.
2. The method according to claim 1, characterized in that, Step 1 specifically involves: Step 1.1: Scale the input image to 256×256 resolution, construct a central square geometric safety mask under constraints of full-angle rotation, perspective distortion, and vertical and horizontal translation of the image, and broadcast the mask to all image channels; Step 1.2: Under the region consistency constraint defined by the geometric security mask, the scaled image and the low-resolution message are input into the reversible neural network. The message is injected into the network in a cross-scale manner and fused with the image branch. After the forward propagation of the network is completed, only the image branch is retained to output the watermarked image. Step 1.3: Based on the watermarked image, the auxiliary alignment watermark is weighted and superimposed using a content-adaptive intensity map to maintain detectability and improve alignment stability under complex distortion.
3. The method according to claim 2, characterized in that, Step 1.2 The watermark message and the image are fused together step by step across scales and then propagated forward using a reversible computation method; The reversible neural network comprises several reversible neural network blocks connected in series; The reversible neural network block includes: a cross-scale progressive upsampling path and a cross-scale progressive downsampling path; wherein, the cross-scale progressive upsampling path is used for fusing watermark information into the image, and the cross-scale progressive downsampling path is used for fusing the image into the watermark information. Specifically, it includes: Step 1.2.1: Set the watermark message to a square number size. After converting the non-square number length to a square number size through padding and error correction coding, perform the same layout mapping. Map it to a representation of the same size as the image through cross-scale upsampling and then fuse it with the image. Step 1.2.2: Map the image to a representation of the same size as the watermark message by using a masked, multi-scale downsampling method, and then fuse it with the watermark message; Step 1.2.3: Feed the fused image and watermark message representation into the next reversible neural network block. Repeat steps 1.2.1 to 1.2.
3. After passing through multiple layers of reversible neural network blocks, scale the image output by the last block to the original size and use it as the image with embedded watermark information.
4. The method according to claim 3, characterized in that, Step 1.3 Adds the auxiliary alignment watermark to the image after embedding the watermark information as described in Step 1.2.3 using an adaptive intensity map; the adaptive intensity map is learned by a neural network through a mean squared error loss function and is modeled using image statistical features.
5. The method according to claim 4, characterized in that, The adaptive intensity map described in step 1.3 uses a fixed 4×4 color block guide template; the color blocks have different color combinations at different positions to provide a stable detection response under geometric and pixel distortion conditions.
6. The method according to claim 4, characterized in that, In step 1.3, the learnable neural network is optimized using the mean squared error loss function combined with the extracted auxiliary alignment watermark.
7. The method according to claim 1, characterized in that, Step 2: Train the distortion-restoration network to restore the attacked image and extract the watermark message using a reversible neural network; Specifically, it includes: Step 2.1: Scale the input image to 256×256 resolution, use the distortion-restoration network to detect the watermark for auxiliary alignment, and output the estimated watermark for auxiliary alignment. Step 2.2: In the geometric estimation stage, one of two methods is adopted: single regression or iterative refinement. Single regression directly obtains the final homography matrix; iterative refinement obtains the final result by predicting small homography updates in sequence and combining them. The iterative refinement can be optimized in the training stage or executed independently in the inference stage. Step 2.3: Apply a geometric security mask consistent with the embedding stage to the geometrically aligned image, input the image within the masked area into the reverse path of the reversible neural network, and recover the original watermark message for verification or decoding.
8. The method according to claim 7, characterized in that, Step 2.2 describes a distortion-restoration network that uses a soft-gated expert routing mechanism. Different experts specialize for different geometric deformations, including rotation, scaling, affine transformation, projection, cropping, and translation. Gating is performed based on soft selection using responses and image statistical features related to the guiding template, specifically including: Step 2.2.1: Using the auxiliary alignment watermark extracted in Step 2.1 and the actual auxiliary alignment watermark embedded in Step 1.3 as input, construct multiple expert networks with the same network structure; Step 2.2.2: Construct a soft-gated expert routing network from the multiple expert networks in step 2.2.
1. This network outputs the gating weights for each expert. Step 2.2.3: Weight the gating weights of each expert and the output results of each expert to obtain the final output result of the soft-gated expert network.
9. The method according to claim 8, characterized in that, The watermark message recovery process in step 2.3 is the reverse reasoning path of the reversible neural network. Its structure corresponds to steps 1.2.1 to 1.2.3, but the data flow direction is reversed. Since the reversible neural network satisfies mathematical reversibility, the message expansion and image fusion process in the embedding stage can be completely reversed in the extraction stage, thereby recovering the original watermark message. Specifically, it includes: Step 2.3.1: In the geometrically aligned watermarked image, first apply a geometric security mask and use the pixel values within the mask area as valid input; Step 2.3.2: This step is the reverse process of the reversible neural network in step 1. The image message and the watermark message are mapped to each other through cross-scale stepwise sampling, and then fused by the inverse operation of the reversible neural network in step 1. Step 2.3.3: Repeat step 2.3.
2. After the reverse process of the multi-layer reversible neural network block, the output watermark message features are inversely mapped into a one-dimensional watermark message vector according to the two-dimensional spatial layout. The length is strictly consistent with the original payload message in the embedding stage, thereby achieving lossless extraction.
10. The method according to claim 9, characterized in that, The network parameters of the reversible neural network in step 2.3 are the same as those of the reversible neural network in step 1.2.