Watermark embedding method and watermark embedding apparatus
By acquiring and processing the frequency domain feature coefficient matrix of the image, performing frequency band weighting and hybrid domain fusion, the problem of watermark elements being vulnerable to attack is solved, and a highly robust and adaptive watermark embedding is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- 武汉启云方科技有限公司
- Filing Date
- 2026-04-29
- Publication Date
- 2026-06-12
AI Technical Summary
In existing technologies, watermark elements are vulnerable to attacks and have weak robustness. Furthermore, the generation and rendering logic are entirely on the front end, making it difficult to defend against forgery and man-in-the-middle attacks.
By obtaining the frequency domain image feature coefficient matrix of the original image, performing frequency band weighted processing, generating dynamic watermark information, and performing hybrid domain fusion in the spatial and frequency domains, the dynamic adaptability and anti-attack capability of watermark embedding are improved.
It enhances the robustness and applicability of watermarks, effectively resisting attacks and ensuring the stability and security of watermark information.
Smart Images

Figure CN122199245A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of image processing technology, and in particular to a watermark embedding method and a watermark embedding device. Background Technology
[0002] With the widespread adoption of internet technology, enterprises have increasingly prominent needs for data leakage prevention and copyright protection. Especially in scenarios such as internal data security and user interaction (UI) / user experience (UX) design protection, front-end watermarking technology has become a common means of protecting original content and preventing the theft, alteration, or leakage of sensitive information and digital assets. Current technologies typically use a method of concatenating time information and user information into a dynamic watermark, generating the watermark element based on scalable vector graphics (SVG), and then integrating it as a component into the Vue framework, using data binding mechanisms to achieve dynamic updates and rendering of the watermark.
[0003] However, existing technical solutions have significant limitations: on the one hand, watermark elements are vulnerable to attacks from developer tools. Attackers can hide watermarks by directly deleting or modifying Document Object Model (DOM) nodes (such as removing SVG elements) or by using cascading style sheets (CSS) styles. On the other hand, the robustness of watermarks in existing technologies is weak. Their generation mechanism relies solely on the state of front-end components and is not deeply bound to critical business data (such as user identity tokens or database records), making watermarks easy to forge or impersonate. Furthermore, since the generation and rendering logic of watermarks is entirely placed in the front-end environment, attackers can reverse engineer the watermark generation rules using debugging tools, and even simulate legitimate watermark content, leaving the technology lacking effective protection against man-in-the-middle attacks or malicious forgery. Summary of the Invention
[0004] This application provides a watermark embedding method and a watermark embedding device, which can dynamically select frequency bands for watermark embedding based on frequency domain image features. The watermark information embedded in each frequency band is dynamically generated based on the frequency band image features, which improves the dynamic adaptability and anti-attack capability of watermark embedding, and has high robustness and applicability.
[0005] In a first aspect, this application provides a watermark embedding method, comprising: obtaining a frequency domain image feature coefficient matrix of an original image; performing frequency band weighting processing on the frequency domain image feature coefficient matrix based on image feature weighting coefficients corresponding to multiple frequency bands to obtain a weighted frequency domain image feature coefficient matrix; generating watermark information corresponding to each frequency band among the multiple frequency bands based on the weighted frequency domain image feature coefficient matrix; generating a watermarked frequency domain image feature coefficient matrix based on watermark embedding indication information corresponding to the multiple frequency bands, the watermark information corresponding to each frequency band, and the weighted frequency band image feature matrix, wherein the watermark embedding indication information corresponding to the multiple frequency bands is used to indicate the embedding position and / or the embedding strength of the watermark information; performing hybrid domain fusion processing on the watermarked frequency domain image feature coefficient matrix in the spatial and frequency domains based on the original image and the watermarked frequency domain image feature coefficient matrix to generate a fused image feature matrix after watermark embedding; and generating a watermarked target image based on the fused image feature matrix after watermark embedding.
[0006] In this application, the frequency domain image feature coefficient matrix of the original image can be obtained. Then, the frequency domain image feature coefficient matrix can be weighted by frequency band based on the image feature weighting coefficients corresponding to multiple frequency bands to obtain a weighted frequency domain image feature coefficient matrix, and watermark information corresponding to each frequency band can be generated. Then, the watermark information corresponding to each frequency band can be embedded into the weighted frequency band image feature matrix according to the watermark embedding instruction information corresponding to each frequency band to generate a watermarked frequency domain image feature coefficient matrix. Then, based on the original image, the watermarked frequency domain image feature coefficient matrix is subjected to hybrid domain fusion processing in the spatial and frequency domains to generate a fused image feature matrix after watermark embedding, and thus generate a watermarked target image. The watermark information of each frequency band can be dynamically generated according to the frequency domain image features, and the frequency band can be dynamically selected for watermark embedding, which improves the dynamic adaptability and anti-attack of watermark embedding, and has high robustness and applicability.
[0007] In one possible implementation of the first aspect, the acquisition of the frequency domain image feature coefficient matrix of the original image includes: acquiring the spatial domain image block feature matrix of the original image; acquiring the basis matrix for frequency domain transformation of the image features; and performing a frequency domain transformation on the spatial domain image block feature matrix using the basis matrix to obtain the frequency domain image feature coefficient matrix.
[0008] In this application, the spatial image patch feature matrix of the original image and the basis matrix of the image feature matrix can be obtained. Based on the basis matrix, a frequency domain transformation is performed on the spatial image patch feature matrix to obtain a frequency domain image feature coefficient matrix. This achieves the transformation of features originally expressed in the spatial domain to the frequency domain. Using the spatial image patch feature matrix for frequency domain transformation instead of using the pixels of the original image for frequency domain transformation can obtain a more efficient and stable feature representation while preserving key information. Since the basis matrix has good decorrelation and energy compression properties, it can concentrate the dispersed energy in the spatial features onto a few low-frequency coefficients, thereby obtaining a highly sparse frequency domain coefficient matrix, achieving feature dimensionality reduction, and significantly reducing storage and subsequent computational overhead. The frequency domain transformation separates redundant information in the spatial domain. The low-frequency part is insensitive to noise and other interference, enhancing the robustness of the features; the high-frequency part retains details and edge information, facilitating flexible selection of coefficients according to task requirements. Simultaneously, the transformation based on the basis matrix is linear and invertible, allowing for lossless transformation of features between the frequency and spatial domains, facilitating feature analysis and reconstruction.
[0009] In one possible implementation of the first aspect, the acquisition of the spatial image block feature matrix of the original image includes: acquiring the original image, performing image normalization processing and segmenting the original image into blocks; and generating the spatial image block feature matrix of the original image according to the image data of each image block obtained after segmentation.
[0010] In this application, after acquiring the original image, image standardization and image segmentation can be performed on the original image. By performing image standardization processing such as normalization on the original image, the original images acquired by different devices can be unified into the same numerical range, eliminating the differences in brightness, contrast and color response of the original image caused by device differences. Image segmentation of the original image divides the original image into image blocks of uniform size, providing regular input units (i.e., the aforementioned image blocks) for subsequent processing. Then, the spatial domain image block feature matrix of the original image can be generated based on each segmented image block, preserving the image information of the original image and laying a good input foundation for subsequent watermark information embedding.
[0011] In one possible implementation of the first aspect, the acquisition of the basis matrix for image feature frequency domain transformation includes: constructing an initial basis matrix with the same matrix size as the spatial domain image patch feature matrix, wherein the initial values of the matrix elements of the initial basis matrix are standard discrete cosine transform basis functions; using the initial basis matrix as trainable parameters of a neural network, and acquiring multiple sets of sample data as training input data for the neural network; performing multiple backpropagation optimization training on the matrix elements of the initial basis matrix using a backpropagation algorithm to obtain a trained initial basis matrix; and using the trained initial basis matrix as the basis matrix for image feature frequency domain transformation, wherein the backpropagation algorithm uses a loss function constructed according to the orthogonal regularization term of the basis matrix to optimize and update the matrix elements of the basis matrix.
[0012] In this application, a basis matrix of the same size as the feature matrix of the spatial domain image patch can be constructed. This basis matrix can serve as a trainable parameter for the neural network, and its initial value can be a standard discrete cosine transform basis function. Multiple sets of sample data are used as the training input data for the neural network, and the matrix elements of the basis matrix are optimized through multiple backpropagation training based on the backpropagation algorithm to obtain the trained basis matrix. The loss function constructed by the regularization term improves the stability and controllability of the learning process. By replacing the traditional fixed standard discrete cosine transform basis function with a trainable basis matrix, the transform space can be learned and optimized from multiple sets of sample data. This allows the transformed frequency domain representation (i.e., the frequency domain image feature coefficient matrix) to more compactly and specifically represent the key features of the spatial domain image patch feature matrix, filtering or reducing the impact of noise on image features, thus exhibiting high applicability.
[0013] In one possible implementation of the first aspect, the plurality of frequency bands include a first frequency band, a second frequency band, and a third frequency band obtained by dividing the spatial frequency based on the matrix coordinate system corresponding to the frequency domain image feature coefficient matrix; wherein the first frequency band, the second frequency band, and the third frequency band are arranged in ascending order of frequency, the first frequency band corresponds to the main structure of the original image, the second frequency band corresponds to the edge transition region of the original image, and the third frequency band corresponds to the noise and texture details in the original image.
[0014] In this application, the image can be divided into a first frequency band, a second frequency band, and a third frequency band according to the spatial frequency corresponding to the matrix coordinate system of the frequency domain image feature coefficient matrix. This can effectively distinguish the main structure, edge transition region, and noise of the original image, and realize the layered, decoupled, and interpretable processing of the original image information. This lays the foundation for subsequent generation of watermark information by frequency band division and embedding of the corresponding watermark information, improves the interpretability, robustness, and flexibility of watermark embedding, and has strong applicability.
[0015] In one possible implementation of the first aspect, the above-mentioned frequency-domain image feature coefficient matrix is subjected to frequency-band weighting processing based on image feature weighting coefficients corresponding to multiple frequency bands to obtain a weighted frequency-domain image feature coefficient matrix, comprising: inputting the amplitude and phase of the frequency-domain image feature coefficient matrix into a weight generation network respectively, and obtaining a first image feature weighting coefficient matrix through multi-layer convolution processing in the weight generation network, wherein the first image feature weighting coefficient matrix includes multiple first image feature weighting coefficients corresponding to image feature coefficients at different spatial positions in the frequency-domain image feature coefficient matrix; and applying the first image feature weighting coefficients to the weighted frequency-domain image feature coefficient matrix. Different weight scaling coefficients are applied to the first image feature weighting coefficients corresponding to the first frequency band, the second frequency band, and the third frequency band in the number matrix to generate a second image feature weighting coefficient matrix. The weight scaling coefficients include a gain coefficient for enhancing the first image feature weighting coefficients and an attenuation coefficient for suppressing the first image feature weighting coefficients. A learnable scaling parameter is obtained, and a weighted frequency domain image feature coefficient matrix is generated according to the second image feature weighting coefficient matrix, the learnable scaling parameter, and the frequency domain image feature coefficient matrix. The learnable scaling parameter is used to scale the second image feature weighting coefficient matrix.
[0016] In this application, the amplitude and phase of the frequency domain image feature coefficient matrix can be input into the weight generation network to obtain the first image feature weighting coefficient matrix. Then, different weight scaling coefficients can be applied to the first image feature weighting coefficients corresponding to the first frequency band, the second frequency band, and the third frequency band in the first image feature weighting coefficient matrix to generate the second image feature weighting coefficient matrix. Then, a weighted frequency domain image feature coefficient matrix can be generated by combining the learnable scaling parameter and the frequency domain image feature coefficient matrix. Attenuation or enhancement coefficients can be applied to the frequency domain image feature coefficient matrix of each frequency band to obtain a weighted frequency domain image feature coefficient matrix that strengthens the image subject information and weakens noise. This lays the foundation for subsequent generation of watermark information by frequency band and embedding of corresponding watermark information, improves the robustness and flexibility of watermark embedding, and has high applicability.
[0017] In one possible implementation of the first aspect, generating watermark information corresponding to each frequency band among the plurality of frequency bands based on the weighted frequency domain image feature coefficient matrix includes: obtaining a pseudo-random sequence generated using a first key and an image hash; generating first watermark information corresponding to the first frequency band according to the pseudo-random sequence and the sign function value of the weighted frequency domain image feature coefficient matrix; generating second watermark information corresponding to the second frequency band according to the quantization step size and the weighted frequency domain image feature coefficient matrix; and generating third watermark information corresponding to the third frequency band according to the intensity coefficient, the weighted frequency domain image feature coefficient matrix, the local mean of the weighted frequency domain image feature coefficient matrix, and a random matrix controlled by the second key, thereby obtaining watermark information corresponding to each frequency band among the plurality of frequency bands.
[0018] In this application, the first watermark information corresponding to the first frequency band can be generated based on the pseudo-random sequence generated by the first key and image hash, and the sign function value of the weighted frequency domain image feature coefficient matrix. The second watermark information corresponding to the second frequency band can be generated based on the quantization step size and the weighted frequency domain image feature coefficient matrix. The third watermark information corresponding to the third frequency band can be generated based on the intensity coefficient, the weighted frequency domain image feature coefficient matrix, the local mean of the weighted frequency domain image feature coefficient matrix, and a random matrix controlled by the second key. Differentiated watermark information is generated according to the characteristics of each frequency band. The first watermark information corresponding to the first frequency band (i.e., the low frequency band) is bound to the coefficient symbol to resist geometric attacks. The second watermark information corresponding to the second frequency band (i.e., the mid frequency band) can be dynamically adjusted to balance the robustness and concealment of the watermark embedding. The third watermark information corresponding to the third frequency band (i.e., the high frequency band) can generate weak, content-adaptive perturbations based on local statistical characteristics to support anti-counterfeiting and traceability. The differentiated watermark information corresponding to each frequency band can effectively solve the failure problem of traditional watermarks caused by image compression, rotation, and cropping. The watermark information has anti-counterfeiting properties, high security, high flexibility, and high applicability.
[0019] In one possible implementation of the first aspect, the watermark embedding indication information corresponding to the plurality of frequency bands includes a frequency band mask corresponding to the plurality of frequency bands; after generating the watermark information corresponding to each of the plurality of frequency bands and before generating the watermarked frequency domain image feature coefficient matrix, the method further includes: generating a frequency band mask corresponding to the plurality of frequency bands, wherein the frequency band mask is a binary matrix and the frequency band mask has the same matrix size as the weighted frequency domain image feature coefficient matrix, and the frequency band mask is used to set multiple target matrix coordinates for embedding the first watermark information, the second watermark information or the third watermark information in the matrix coordinate system of the weighted frequency band image feature matrix.
[0020] In this application, after generating the watermark information corresponding to each frequency band and before generating the frequency domain eigenvalue matrix containing the watermark, a frequency band mask corresponding to each frequency band can be generated. The frequency band mask can be used to indicate the target matrix coordinates of the watermark embedding. The watermark information can be added to specific frequency bands (usually mid-frequency bands and low-frequency bands) that have low visual sensitivity and strong robustness to common attacks through a binary matrix. This reduces watermark embedding operations in high-frequency noise areas (such as high-frequency bands) or core low-frequency bands that are crucial to image quality (such as DC components), thereby reducing the introduction of unnecessary distortion and achieving a balance between concealment, robustness and embedding capacity. This enables accurate, controllable and efficient watermark information embedding, improving the overall stability and applicability of watermark embedding.
[0021] In one possible implementation of the first aspect, the watermark embedding indication information corresponding to the plurality of frequency bands includes the frequency band mask and watermark embedding strength coefficient corresponding to the plurality of frequency bands; the generation of a watermarked frequency domain image feature coefficient matrix based on the watermark embedding indication information corresponding to the plurality of frequency bands, the watermark information corresponding to each frequency band, and the weighted frequency band image feature matrix includes: quantizing and modulating the weighted frequency domain image feature coefficients corresponding to each of the target matrix coordinates in the weighted frequency domain image feature coefficient matrix according to the frequency band mask corresponding to the plurality of frequency bands, the watermark embedding strength coefficient, the first watermark information, the second watermark information, and the third watermark information, so as to embed the first watermark information, the second watermark information, and the third watermark information in the weighted frequency domain image feature coefficient matrix to obtain a watermarked frequency domain image feature coefficient matrix.
[0022] In this application, the weighted frequency domain image feature coefficients corresponding to each target matrix coordinate in the weighted frequency domain image feature coefficient matrix can be quantized and modulated based on the frequency band mask, watermark embedding strength coefficient, and watermark information corresponding to each frequency band, so as to realize the embedding of the first watermark information, the second watermark information, and the third watermark information, and obtain the watermarked frequency domain image feature coefficient matrix. The watermark information is encoded into the quantization amplitude of the weighted frequency domain image feature coefficients through a quantization network, so that the watermark information is deeply bound to the weighted frequency domain image feature coefficients. At the same time, the quantization network has anti-attack properties, which improves robustness and enhances the security of the system, and has high applicability.
[0023] In one possible implementation of the first aspect, the aforementioned weighted frequency domain image feature coefficient matrix satisfies:
[0024] in, The above is the weighted frequency domain image feature coefficient matrix. The above frequency domain image feature coefficient matrix, For the above learnable scaling parameters, This is the weighting coefficient matrix for the second image feature mentioned above.
[0025] In this application, a weighted frequency domain image feature coefficient matrix can be generated based on the frequency domain image feature coefficient matrix, a learnable scaling parameter, and a second image feature weighting coefficient matrix, wherein the learnable scaling parameter... It can be used as an intensity gate to control the scaling amplitude of dynamic weighting, so as to highlight the frequency points that are friendly to watermark embedding (such as being insensitive to distortion and highly robust to attacks), thereby obtaining a weighted frequency domain image feature coefficient matrix that enhances the image subject information and weakens noise. This lays the foundation for subsequent frequency segmentation to generate watermark information and embed the corresponding watermark information, balances the intensity and fidelity of subsequent watermark embedding, improves the robustness and flexibility of watermark embedding, and has high applicability.
[0026] In one possible implementation of the first aspect, the frequency domain image feature coefficient matrix containing the watermark satisfies:
[0027] in, The above is the frequency domain image feature coefficient matrix containing the watermark. The above is the weighted frequency domain image feature coefficient matrix. The above watermark embedding strength coefficient, For the above frequency band mask, This refers to the watermark information corresponding to each of the above frequency bands.
[0028] In this application, the weighted frequency domain image feature coefficients corresponding to each target matrix coordinate in the weighted frequency domain image feature coefficient matrix can be quantized and modulated based on the frequency band mask corresponding to each frequency band, the watermark embedding strength coefficient, and the watermark information, so as to realize the embedding of the first watermark information, the second watermark information, and the third watermark information. The watermark embedding strength coefficient is then used to... It can achieve precise control over the amplitude of watermark information to balance the robustness and concealment of watermark embedding. As a frequency band mask, the embedding position of the watermark information can be restricted, so that the watermark information is added to the pre-selected target matrix coordinate position, reducing unnecessary distortion. The watermark information corresponding to each of the aforementioned frequency bands possesses security and strong correspondence, which can improve the security and resistance to attacks after watermarking the image. This method calculates and generates a frequency domain image feature coefficient matrix containing the watermark; it is simple to implement, computationally efficient, robust, and highly applicable.
[0029] In one possible implementation of the first aspect, the above-mentioned method of performing a hybrid domain fusion process on the watermarked frequency domain image feature coefficient matrix in both the spatial and frequency domains based on the original image and the watermarked frequency domain image feature coefficient matrix to generate a fused image feature matrix after watermark embedding includes: performing an inverse transformation on the watermarked frequency domain image feature matrix using the basis matrix to obtain a spatial domain image block feature matrix after watermark embedding; extracting a spatial domain semantic feature matrix from the original image; and concatenating the watermarked spatial domain image block feature matrix and the spatial domain semantic feature matrix along the channel dimension to obtain a fused image feature matrix; and performing a hybrid domain fusion process on the fused image feature matrix. The matrix is adaptively weighted according to the channel dimension to generate a channel-weighted image feature matrix with the same number of image feature channels as the fused image feature matrix. The channel-weighted image feature matrix is used to emphasize or suppress different image feature channels. Using the frequency domain energy distribution matrix corresponding to the frequency domain image feature coefficient matrix containing the watermark, a spatially weighted image feature matrix with the same feature space size as the fused image feature matrix is generated. The spatially weighted image feature matrix is used to emphasize or suppress different spatial positions in the feature space. The channel-weighted image feature matrix and the spatially weighted image feature matrix are multiplied element-wise to generate the fused image feature matrix after watermark embedding.
[0030] In this application, a spatial image patch feature matrix can be reconstructed through inverse transformation, allowing the watermark information to be transformed into the spatial domain along with the watermarked frequency domain image feature matrix, thus obtaining a watermark-embedded spatial image patch feature matrix. The watermark-embedded spatial image patch feature matrix and the spatial semantic feature matrix are then concatenated along the channel dimension to obtain a fused image feature matrix. Adaptive weighting along the channel dimension can then be applied to the fused image feature matrix to generate a channel-weighted image feature matrix with the same number of image feature channels as the aforementioned fused image feature matrix. This channel-weighted image feature matrix enhances important channels and suppresses less important ones. Simultaneously, a spatially weighted image feature matrix with the same feature space size as the aforementioned fused image feature matrix can be generated based on the frequency domain energy distribution matrix corresponding to the watermarked frequency domain image feature coefficient matrix. This spatially weighted image feature matrix can be used to emphasize information-rich spatial regions. Furthermore, by combining the aforementioned channel-weighted image feature matrix and spatial-weighted image feature matrix, a fused image feature matrix after watermark embedding can be generated. This ensures that the generated fused image feature matrix retains key information related to the watermark information and the image semantics to the greatest extent, providing a strong feature foundation for subsequent image processing.
[0031] In one possible implementation of the first aspect, after generating the watermarked frequency domain image feature coefficient matrix and before performing hybrid domain fusion processing on the watermarked frequency domain image feature coefficient matrix in the spatial and frequency domains, the method further includes: performing high-frequency coefficient filtering on the watermarked frequency domain image feature matrix to suppress high-frequency noise in the watermarked frequency domain image feature matrix; after generating the watermark-embedded fused image feature matrix and before generating the watermarked target image based on the watermark-embedded fused image feature matrix, the method further includes: performing threshold filtering on the watermark-embedded fused image feature matrix to suppress residual high-frequency noise in the watermark-embedded fused image feature matrix.
[0032] In this application, after generating the watermarked frequency domain image feature coefficient matrix and before performing an inverse transform on the watermarked frequency domain image feature matrix, the high-frequency coefficients in the watermarked frequency domain image feature matrix can be filtered and attenuated to suppress residual high-frequency noise and improve the frequency domain robustness of the watermark embedding. After generating the fused image feature matrix after watermark embedding and before generating the watermarked target image based on the fused image feature matrix after watermark embedding, threshold filtering can be performed on the fused image feature matrix after watermark embedding to suppress residual high-frequency noise in the fused image feature matrix after watermark embedding, reduce excessive distortion (such as edge blurring or noise amplification), and enhance spatial visual quality.
[0033] In one possible implementation of the first aspect, the fused image feature matrix after watermark embedding satisfies:
[0034] in, The above is the feature matrix of the fused image after watermark embedding. The above channels are weighted image feature matrices. The above spatially weighted image feature matrix, The above is the fused image feature matrix. The above is the feature matrix of the spatial domain image block after watermark embedding. The above spatial semantic feature matrix, The frequency domain energy distribution matrix is the frequency domain image feature coefficient matrix corresponding to the watermarked frequency domain image.
[0035] In this application, the feature matrix of the spatial domain image patch after watermark embedding can be... Spatial semantic feature matrix Image features are stitched together along the channel dimension to obtain a fused image feature matrix. This allows for the fusion of image feature matrices. Adaptive weighting is performed along the channel dimension to generate a channel-weighted image feature matrix with the same number of image feature channels as the fused image feature matrix described above. The channel-weighted image feature matrix This allows for the enhancement of important channels and the suppression of secondary channels. Simultaneously, it can be based on the frequency domain energy distribution matrix corresponding to the frequency domain feature coefficient matrix of the watermarked image. Generate a spatially weighted image feature matrix with the same feature space size as the aforementioned fused image feature matrix. The spatially weighted image feature matrix It can be used to emphasize information-rich spatial regions. Furthermore, it can be combined with the aforementioned channel-weighted image feature matrix. And the spatially weighted image feature matrix mentioned above Generate a fused image feature matrix after watermark embedding. This results in the generated fused image feature matrix. This method preserves key information related to watermarking and image semantics to the greatest extent possible, providing a strong feature foundation for subsequent image processing. The method is used to calculate and generate a fused image feature matrix. It is simple to implement, computationally efficient, robust, and highly applicable.
[0036] In one possible implementation of the first aspect, generating a watermarked target image based on the fused image feature matrix after watermark embedding includes: introducing multiple adversarial samples using a differentiable operator, combining peak signal-to-noise ratio loss and anti-attack robustness loss, performing adversarial enhancement processing on the fused image feature matrix after watermark embedding to obtain an optimized fused image feature matrix; and generating a watermarked target image according to the optimized fused image feature matrix.
[0037] In this application, multiple adversarial samples can be introduced based on differentiable operators, and the peak signal-to-noise ratio loss and anti-attack robustness loss are combined to perform adversarial enhancement processing on the feature matrix of the fused image after watermark embedding. This enables the subsequently generated watermarked target image to maintain the stability of image semantics and watermark when facing real interference such as compression and noise, and improves the generalization ability of watermark and watermarked target image against unknown attacks. It has high security and strong applicability.
[0038] In one possible implementation of the first aspect, generating a watermarked target image based on the optimized fused image feature matrix includes: performing a frequency domain transformation on the optimized fused image feature matrix using the basis matrix to obtain a frequency domain image feature coefficient matrix after watermark embedding; performing an inverse transformation on the frequency domain image feature coefficient matrix after watermark embedding using the basis matrix to obtain a target spatial domain image block feature matrix; and obtaining a watermarked target image after the target image block matrix has undergone image block recombination and boundary smoothing processing.
[0039] In this application, the optimized fused image feature matrix is transformed in the frequency domain using the basis matrix to complete the mapping from the feature representation space to the frequency domain. Then, the mapping from the frequency domain to the spatial domain is achieved through the inverse transformation. This allows the multi-channel feature semantics to be mapped to the color space or channel representation to obtain the target spatial image block feature matrix. The target image block matrix can then be reconstructed and the boundary smoothing process is applied to obtain the watermarked target image, which improves the robustness and semantic fidelity of the watermark and has strong applicability.
[0040] Secondly, this application provides a watermark embedding device, which includes a module or unit for performing the watermark embedding method provided in the first aspect or any possible implementation of the first aspect.
[0041] For example, the watermark embedding device described above includes: The image preprocessing module is used to obtain the frequency domain image feature coefficient matrix of the original image; The frequency domain weighting module is used to perform frequency-band weighting on the above frequency domain image feature coefficient matrix based on the image feature weighting coefficients corresponding to multiple frequency bands, so as to obtain a weighted frequency domain image feature coefficient matrix. The watermark generation module is used to generate watermark information corresponding to each frequency band in the above multiple frequency bands based on the weighted frequency domain image feature coefficient matrix mentioned above. The watermark embedding module is used to generate a watermarked frequency domain image feature coefficient matrix based on the watermark embedding indication information corresponding to the above multiple frequency bands, the watermark information corresponding to each of the above frequency bands, and the weighted frequency band image feature matrix. The watermark embedding indication information corresponding to the above multiple frequency bands is used to indicate the embedding position of the above watermark information and / or the embedding strength of the above watermark information. The image generation module is used to perform hybrid domain fusion processing on the watermarked frequency domain image feature coefficient matrix in the spatial and frequency domains based on the original image and the watermarked frequency domain image feature coefficient matrix to generate a fused image feature matrix after watermark embedding. The image generation module is also used to generate a watermarked target image based on the fused image feature matrix after the watermark embedding.
[0042] Thirdly, this application provides a terminal device, which includes a processor and a memory; the processor is connected to the memory, wherein the memory is used to store program code, and the processor is used to call the program code to execute the watermark embedding method provided in the first aspect or any possible implementation of the first aspect.
[0043] Fourthly, this application provides a computer-readable storage medium storing a computer program adapted to be loaded by a processor and executed by the watermark embedding method provided in the first aspect or any possible implementation thereof.
[0044] Fifthly, this application provides a computer program product, which includes computer instructions stored in a computer-readable storage medium; a processor of a computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform the watermark embedding method provided in the first aspect or any possible implementation of the first aspect.
[0045] The technical effects of the second, third, fourth, and fifth aspects can be referred to the technical effects of the first aspect and any possible implementation thereof, and will not be repeated here. Attached Figure Description
[0046] Figure 1a This is a schematic diagram of the system architecture of the watermark embedding method provided in the embodiments of this application; Figure 1b This is a schematic diagram of the data flow processing of the watermark embedding method provided in the embodiments of this application; Figure 2 This is a schematic diagram illustrating an application scenario of the watermark embedding method provided in the embodiments of this application; Figure 3 This is a flowchart illustrating the watermark embedding method provided in an embodiment of this application; Figure 4 This is a schematic diagram of image normalization processing and image segmentation provided in the embodiments of this application; Figure 5 This is a schematic diagram comparing the original image and the target image containing the watermark provided in the embodiments of this application; Figure 6 This is a schematic diagram of the watermark embedding device provided in the embodiments of this application; Figure 7 This is a schematic diagram of the structure of the terminal device provided in the embodiments of this application.
[0047] Explanation of reference numerals in the attached figures: 10-Business Server; 20 - Terminal equipment; 60 - Watermark embedding device; 610 - Image preprocessing module; 620 - Frequency domain weighting module; 630 - Watermark generation module; 640 - Watermark embedding module; 650 - Image generation module; 70 - Terminal equipment; 701 - Processor; 702 - Communication bus; 703 - User interface; 704 - Network interface; 705 - Memory. Detailed Implementation
[0048] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of this application.
[0049] To facilitate understanding, the following brief explanations are provided for some of the terms: 1. Spatial domain, also known as the spatial domain, is a two-dimensional planar space composed of discrete pixels in digital image processing. Its processing methods directly manipulate the grayscale values of pixels with spatial location as the independent variable. Fourier transform can convert the image to the frequency domain for analysis, and the inverse transform reconstructs the frequency domain processing results in the spatial domain. Spatial domain processing techniques include, but are not limited to, grayscale mapping, histogram transform, and linear / nonlinear filtering, and are widely used in image enhancement, noise reduction, edge sharpening, and other fields. For ease of description, it will be referred to as the spatial domain from now on.
[0050] 2. The frequency domain, also known as the frequency field, is a mathematical space that describes a signal with frequency as the horizontal axis and amplitude as the vertical axis. It achieves the conversion from the time domain to the frequency domain through Fourier transform and is mainly used in signal processing, image compression, and communication. For ease of description, it will be referred to as the frequency domain from now on.
[0051] With the widespread application of internet technology, enterprises have increasingly prominent needs in areas such as data leakage prevention and copyright protection. Especially in scenarios involving internal sensitive data security and the protection of user interface and user experience design, front-end watermarking technology has become an important technical means to maintain the security of original content and prevent the theft, tampering, or leakage of digital assets. Current mainstream solutions typically concatenate timestamps and user identity information into dynamic watermark text, generate watermark elements based on scalable vector graphics technology, and then encapsulate them as components integrated into front-end frameworks such as Vue. Data binding mechanisms are used to achieve dynamic updates and real-time rendering of the watermark content.
[0052] However, current technical solutions still have several significant limitations: First, watermark elements are vulnerable to attacks from developer tools on the client side. Attackers can remove watermark SVG elements by directly deleting or modifying Document Object Model nodes, or achieve visual hiding through Cascading Style Sheets (CSS). Second, existing watermarks generally have weak robustness. Their generation mechanisms mostly rely solely on the internal state of front-end components and are not deeply bound to critical business data, making it difficult to effectively defend against forgery or impersonation. Furthermore, since the generation and rendering logic of watermarks are entirely within the front-end environment, attackers can reverse engineer the watermark generation algorithm using debugging tools to simulate legitimate watermark output, resulting in a lack of substantial protection against man-in-the-middle attacks or malicious forgery.
[0053] The watermark embedding method provided in this application embodiment can dynamically select frequency bands for watermark embedding based on the frequency domain image features corresponding to the original image. Furthermore, the watermark information embedded in each frequency band is dynamically generated based on the frequency band image features, which improves the dynamic adaptability and anti-attack capability of watermark embedding. The watermark has high robustness and high applicability.
[0054] For ease of description, we will take a terminal device as an example, combined with... Figure 1a The system architecture of the watermark embedding method provided in the embodiments of this application is described.
[0055] See Figure 1a , Figure 1a This is a schematic diagram of the system architecture of the watermark embedding method provided in the embodiments of this application. Figure 1aAs shown, the system architecture may include a business server 10 and a terminal device 20. The business server 10 can be a standalone physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing basic cloud computing services such as cloud databases, cloud services, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, and big data and artificial intelligence platforms. The terminal device 20 may be a PDA, smartphone, laptop, desktop computer, tablet, mobile internet device (MID), wearable device (e.g., smartwatch, smart bracelet), smart computer, or other smart terminal, but is not limited to these. The terminal device 20 may include an operating system, hardware, and application software. The hardware may include, but is not limited to, a central processing unit (CPU), memory (e.g., RAM, hard disk), and peripherals, providing core capabilities for computing, storage, and input / output. An operating system can serve as the hardware management and abstraction layer. Through its kernel and drivers, it can schedule CPU resources, manage memory space, and control peripheral operations. It also encapsulates complex hardware details into unified system calls and service interfaces, thereby providing a stable, efficient, and secure operating environment for upper-layer application software. Application software, built on the operating system, can use hardware resources through the interfaces provided by the operating system. For example, it can read code stored in memory to provide users with usable services. In this embodiment, the application software may include, but is not limited to, watermark embedding software. The terminal device 20 can run this watermark embedding software to execute the watermark embedding method provided in this embodiment.
[0056] The business server 10 and the terminal device 20 can establish a communication connection. The communication connection method is not limited; it can be established directly or indirectly via wired communication or wireless communication, etc., depending on the actual application scenario. This embodiment does not impose any restrictions here.
[0057] It should be understood that, such as Figure 1a The terminal device 20 shown can have an application client (not shown) with watermark embedding software installed. This application client can be used to prompt the user to upload the original image, display the target image with the watermark, etc. When this application client runs on each terminal device, it can interact with the aforementioned... Figure 1aThe business servers 10 shown interact with each other, enabling the business server 10 to receive business data from the terminal device 20. The business data may be image data uploaded by the user to the application client, image data generated during the use of the application client, etc. This business data can be shared with the business server 10, or it can be shared with other business devices (not shown in the figure) connected to the business server 10 through the business server 10. The specific method can be determined according to the actual application scenario, and no restrictions are imposed here.
[0058] The application client can be an application program, a webpage, or a website, depending on the specific application scenario, and is not limited here. The application client can be a standalone client or an embedded sub-client integrated into another client (such as an instant messaging client or a social networking client), depending on the specific application scenario, and is not limited here. Users can send business data to the business server 10 through the application client. This business data can be used to request the business server 10 to start relevant business processes and return relevant process data to the application client. The business server 10, as the server for the application client, can be a collection of multiple servers, including the backend server corresponding to the application client, a data processing server, and so on.
[0059] The method provided in the embodiments of this application can be derived from, for example... Figure 1a The business server 10 shown can execute the service, or the terminal device 20 can execute it, or the terminal device 20 and the business server 10 can execute it together. The specific execution method can be determined according to the actual application scenario, and no restrictions are imposed here.
[0060] For ease of subsequent understanding and explanation, in the embodiments of this application, the following will be used as... Figure 1a The terminal device 20 shown serves as the execution subject of the watermark embedding method (which can be simply referred to as the method for ease of description) provided in this application embodiment, combined with Figure 1a The data stream processing procedure of the watermark embedding method provided in the embodiments of this application will be described.
[0061] See Figure 1b , Figure 1b This is a schematic diagram of the data flow processing of the watermark embedding method provided in the embodiments of this application. For example... Figure 1b The application client for the watermark embedding software shown (not shown in the figure) can run on, for example, Figure 1aThe terminal device 20 shown can establish a communication connection with the business server 10. The watermark embedding software may include, but is not limited to, logical functional modules such as an image preprocessing module, a frequency domain weighting module, a watermark generation module, a watermark embedding module, and an image generation module. Assume a user wants to call this software to embed a watermark into an original image, which is stored on the business server 10. The image preprocessing module can read or call the original image from the business server 10, for example, it can download the original image from the business server 10 and perform image processing on the original image to obtain the frequency domain image feature coefficient matrix of the original image. The frequency domain image feature coefficient matrix can then be output to the frequency domain weighting module. Here, image processing includes, but is not limited to, image format conversion, size adjustment, normalization, filtering and denoising, edge symmetry filling, and other image standardization operations, as well as image block segmentation operations. For details, please refer to subsequent embodiments; they will not be elaborated here. It can be understood that each element in the above-mentioned frequency domain image feature coefficient matrix corresponds to the signal feature value of a certain frequency component within the image block. The frequency domain weighting module receives the frequency domain image feature coefficient matrix output by the image preprocessing module and performs frequency-band weighting processing on the frequency domain image feature coefficient matrix based on the image feature weighting coefficients corresponding to multiple frequency bands to obtain a weighted frequency domain image feature coefficient matrix. Here, the frequency domain weighting module can perform dynamic frequency band selection processing on the frequency domain image feature coefficient matrix, applying different weight scaling coefficients to different frequency bands using an attention mechanism to generate a weighted frequency domain image feature coefficient matrix and output it to the watermark generation module and watermark embedding module. The watermark generation module can generate watermark information corresponding to each of the multiple frequency bands based on the weighted frequency domain image feature coefficient matrix and output it to the watermark embedding module. It can be understood that the watermark information corresponding to each frequency band can be calculated based on the frequency band characteristics and the weighted frequency domain image feature coefficient matrix corresponding to each frequency band. Specific generation methods can be found in subsequent embodiments and will not be elaborated here. The watermark embedding module can acquire watermark embedding indication information corresponding to multiple frequency bands, watermark information corresponding to each frequency band output by the watermark generation module, and a weighted frequency band image feature matrix output by the frequency domain weighting module, generate a watermarked frequency domain image feature coefficient matrix, and output it to the image generation module. The watermark embedding indication information corresponding to the multiple frequency bands indicates the embedding position and / or embedding strength of the watermark information. The embedding position can be determined based on the frequency band division in the weighted frequency domain image feature coefficient matrix, or by methods such as Sobel edge detection; specific details can be found in subsequent embodiments, and no limitations are imposed here. The watermark embedding strength can be obtained from local storage, downloaded from a connected server, or obtained through user input; the specific value can be determined based on the actual application scenario, and no limitations are imposed here.The image generation module can perform hybrid domain fusion processing on the watermarked frequency domain image feature coefficient matrix in both the spatial and frequency domains, based on the original image and the watermarked frequency domain image feature coefficient matrix, to generate a fused image feature matrix with watermark embedding. Then, it generates the watermarked target image based on this fused image feature matrix. Here, the hybrid domain fusion processing can include adaptive weighting of the channel dimension and adaptive weighting of the spatial dimension, etc., which can be determined according to the actual application scenario and is not limited here. It is understood that the image generation module can output the target image, which can be displayed on the visual display screen of the terminal device 20 or stored as a file, etc., and is not limited here.
[0062] To better understand the application scenarios of the watermark embedding method provided in the embodiments of this application, please refer to [link to relevant documentation]. Figure 2 , Figure 2 This is a schematic diagram illustrating an application scenario of the watermark embedding method provided in this application. Assume a target user (referred to as "user" for convenience) wants to add a watermark to an original image using a terminal device. Here, the original image can be in various image formats, including but not limited to JPEG / JPG, PNG, GIF, BMP, WebP, and TIFF images, which can be determined according to the actual application scenario and are not limited here. The terminal device can provide a user interface (e.g., ...). Figure 2 The user interface 1) retrieves the user's image watermark embedding request, such as Figure 2 As shown in the diagram, assuming a user drags the original image for which watermark embedding is desired into the designated area in user interface 1 and clicks the "Upload" control, the terminal device immediately responds to the user's watermark embedding request. In the background, it performs watermark embedding processes on the original image, including but not limited to image processing, image feature extraction, watermark information generation, and watermark information embedding, to generate a target image containing the watermark. After the target image containing the watermark is generated, the terminal device can display a pop-up window or redirect to a user interface (such as...). Figure 2 The user interface 2) displays the generated watermarked target image. For example... Figure 2 As shown, after the watermarked target image is generated, the terminal device can generate a user interface 2 to display the original image and the watermarked target image, and provide a "refresh" control for the user to re-embed the watermark, and a "save" control for the user to download the watermarked target image. Optionally, the terminal device can provide target images of various formats for the user to select and download, which can be determined according to the actual application scenario and is not limited here. In the specific implementation, how the terminal device implements the watermark embedding of the image and the generation of watermark information can be found in the implementation methods described in the following embodiments, which will not be elaborated here.
[0063] It is understood that in the specific implementation of this application, data related to object information is involved. When the embodiments of this application are applied to specific products or technologies, permission or consent from the object is required, and the collection, use and processing of related data must comply with the relevant laws, regulations and standards of the relevant countries and regions.
[0064] The following will combine Figures 3 to 6 The watermark embedding method and apparatus provided in the embodiments of this application will be described in detail.
[0065] See Figure 3 , Figure 3 This is a flowchart illustrating the watermark embedding method provided in this application. For ease of understanding, this application uses a terminal device as an example for explanation. Figure 1a The description will take any one of the terminal devices 20 in the example. The embodiments of this application can be applied to various scenarios, including but not limited to cloud technology and artificial intelligence. Figure 3 In the watermark embedding method shown, each step of the watermark embedding process can be performed by the aforementioned terminal device, such as... Figure 3 As shown, the watermark embedding method may include at least the following steps S301-S306: S301, obtain the frequency domain image feature coefficient matrix of the original image.
[0066] In some feasible implementations, the terminal device can acquire the original image and obtain the basis matrix for frequency domain transformation of image features. The basis matrix is then used to perform a frequency domain transformation on the aforementioned spatial image block feature matrix to obtain a frequency domain image feature coefficient matrix. Here, the original image can be image resources stored in the terminal device's local storage space, or image data downloaded by the terminal device from a connected server; the specific method can be determined according to the actual application scenario and is not limited here. The terminal device can perform image standardization operations on the original image, including but not limited to image format conversion, size adjustment, normalization, filtering and noise reduction, and edge symmetry filling, to optimize subsequent processing effects. Furthermore, the terminal device can employ image segmentation strategies such as sliding windows and grid partitioning to divide the standardized image into non-overlapping, fixed-size image blocks. The specific image segmentation strategy is determined according to the actual application scenario and is not limited here. Furthermore, the terminal device can convert each image block into a two-dimensional matrix form. The matrix elements can be normalized pixel feature intensity values. The block matrices corresponding to each image block can be arranged and scanned sequentially according to the raster scanning order (e.g., traversing row by row from left to right or top to bottom) to obtain the spatial domain image block feature matrix of the original image. It can be understood that each matrix unit in the spatial domain image block feature matrix corresponds to a local image region in the original image. Further, the terminal device can automatically select or determine a frequency domain transformation method based on user selection. Here, the frequency domain transformation method can include, but is not limited to, discrete cosine transform, Fourier transform, or wavelet transform, etc., and the specific method can be determined according to the actual application scenario, without limitation. It can be understood that the above frequency domain transformation method defines a complete set of basis functions. The terminal device can sample and arrange the basis matrix at the corresponding spatial positions according to the size of the fixed-size image block (e.g., 8×8 pixels) to construct the basis matrix of the transformation. The specific method of obtaining the basis matrix can be determined according to the actual application scenario, without limitation. The terminal device can perform a frequency domain transformation on the spatial image patch feature matrix based on the aforementioned basis matrix to obtain the frequency domain image feature coefficient matrix. It can be understood that the terminal device can use mathematical algorithms, including but not limited to matrix multiplication or fast transform algorithms, to perform a forward transformation operation between the spatial image patch feature matrix and the aforementioned basis matrix, converting the pixel intensity distribution contained in the spatial image patch feature matrix into a frequency component representation, thereby obtaining the frequency domain coefficient feature matrix corresponding to the spatial image patch feature matrix. The specific method can be determined according to the actual application scenario and is not limited here.Here, each element in the frequency domain image feature coefficient matrix corresponds to the signal feature value of a certain frequency component within the image patch. The elements in the frequency domain image feature coefficient matrix can be complex numbers, each containing amplitude and phase. Amplitude characterizes the signal strength of the frequency component and its visual weight; a larger amplitude indicates a greater impact of the frequency component on the overall visual effect of the image. Phase represents the position and temporal information of the corresponding frequency component and determines the integrity of the image structure. Taking an 8×8 frequency domain image feature coefficient matrix as an example, each element (u, v) corresponds to the signal feature value of a specific frequency component within the original 8×8 image patch. A smaller u+v value corresponds to a lower frequency, which can be used to characterize core structural features; a larger u+v value corresponds to a higher frequency, which can be used to characterize fragmented features such as edges and noise. In essence, the frequency domain image feature coefficient matrix can be used to carry the frequency domain distribution information of the image patch. In this embodiment, the spatial image patch feature matrix of the original image and the basis matrix of the image feature matrix can be obtained. Based on the basis matrix, a frequency domain transformation is performed on the spatial image patch feature matrix to obtain a frequency domain image feature coefficient matrix. This achieves the transformation of features originally expressed in the spatial domain to the frequency domain. Using the spatial image patch feature matrix for frequency domain transformation instead of using the pixels of the original image for frequency domain transformation can obtain a more efficient and stable feature representation while preserving key information. Since the basis matrix has good decorrelation and energy compression properties, it can concentrate the dispersed energy in the spatial features onto a few low-frequency coefficients, thereby obtaining a highly sparse frequency domain coefficient matrix, achieving feature dimensionality reduction, and significantly reducing storage and subsequent computational overhead. The frequency domain transformation separates redundant information in the spatial domain. The low-frequency part is insensitive to noise and other interference, enhancing the robustness of the features; the high-frequency part retains details and edge information, facilitating flexible selection of coefficients according to task requirements. Simultaneously, the transformation based on the basis matrix is linear and reversible, allowing for lossless transformation of features between the frequency and spatial domains, facilitating feature analysis and reconstruction.
[0067] In some feasible implementations, after acquiring the original image, the terminal device can perform image normalization and image segmentation on the original image, and generate a spatial image block feature matrix of the original image based on the image data of each image block obtained after segmentation. It is understood that image normalization includes, but is not limited to, image format conversion, size adjustment, normalization, filtering and denoising, and edge symmetry filling. For example, the terminal device can convert the acquired original image to RGB image format and normalize the RGB channel values of the image to the range of [-1,1]. Adaptive histogram equalization can be used to enhance the local contrast of the image, and symmetrical filling (or mirror filling) can be performed on the image edges to improve the edge integrity after subsequent image segmentation. The specific image normalization operation can be determined according to the actual application scenario and is not limited here. The terminal device can segment the image after image normalization using image segmentation strategies including, but not limited to, sliding windows and grid partitioning, to divide the normalized image into non-overlapping fixed-size image blocks. For example, the terminal device can divide the image into 8×8 non-overlapping blocks (compatible with the JPEG standard). The specific image segmentation strategy is determined according to the actual application scenario and is not limited here. For easier understanding, please refer to [link to relevant documentation]. Figure 4 , Figure 4 This is a schematic diagram of image normalization processing and image segmentation provided in an embodiment of this application. For example... Figure 4As shown, the terminal device can acquire the original image, whose pixel value range is [0, 255]. The terminal device can normalize the original image to achieve dynamic range adjustment, linearly mapping the pixel values from [0, 255] to [-1, 1] to obtain a normalized image. It can be understood that the pixel value range of the normalized image is [-1, 1]. Furthermore, the terminal device can mirror-fill the image edges of the normalized image, with a fill width of 4 pixels to improve information loss after segmentation and for segmentation and reassembly. The specific fill width can be determined according to the actual application scenario and is not limited here. Further, the terminal device can segment and reassemble the image, dividing it into 8×8 non-overlapping blocks. For boundary regions that cannot be divided evenly, bilinear interpolation is used to adjust them to the standard size to obtain multiple 8×8 non-overlapping image blocks. The terminal device can arrange each image block into a three-dimensional tensor X∈R^{P×N×N} according to the raster scan order. Here, P is the batch size (i.e., the number of image blocks processed in one operation), and N is 8. It is understood that the terminal device can generate a block matrix X based on the image data of each image block obtained after segmentation, which serves as the spatial domain image block feature matrix of the original image. In this embodiment, after acquiring the original image, image standardization and image segmentation can be performed on the original image. By performing image standardization processing such as normalization on the original image, the original images acquired by different devices can be unified to the same numerical range, eliminating the differences in brightness, contrast, and color response of the original image caused by device differences. By segmenting the original image into image blocks of uniform size, a regular input unit (i.e., the image block mentioned above) is provided for subsequent processing. Then, the spatial domain image block feature matrix of the original image can be generated based on each segmented image block, preserving the image information of the original image and laying a good input foundation for subsequent watermark information embedding.
[0068] In some feasible implementations, the terminal device can construct a basis matrix with the same size as the spatial image patch feature matrix, where the initial values of the matrix elements are standard discrete cosine transform basis functions. The terminal device can use this basis matrix as trainable parameters of a neural network, acquire multiple sets of sample data as training input data for the neural network, and perform multiple backpropagation optimization training on the matrix elements of the basis matrix using a backpropagation algorithm to obtain a trained basis matrix. This trained basis matrix is then used as the basis matrix for the frequency domain transformation of image features. The backpropagation algorithm uses a loss function constructed according to the orthogonal regularization term of the basis matrix to optimize and update the matrix elements of the basis matrix. It can be understood that having the same matrix size as the spatial image patch feature matrix satisfies the requirement of equal inner dimensions in matrix multiplication during frequency domain transformation, and also makes the frequency domain transformation invertible; that is, an inverse transformation of the frequency domain image feature coefficient matrix can be achieved based on the basis matrix. The terminal device can select the basis function matrix of the standard discrete cosine transform (DCT) as the initial value to construct learnable basis matrix parameters. This basis matrix is then input into the neural network architecture as a trainable layer, and multiple sets of sample data are collected as network input. These multiple sets of sample data can be historical data stored by the terminal device during its previous operation, or multiple sets of sample data obtained by the terminal device through a connected server; the specific choice depends on the actual application scenario and is not limited here. During training, forward propagation is used to calculate the frequency domain transformation of the aforementioned spatial image patch feature matrix based on the current basis matrix to obtain the frequency domain image feature coefficient matrix, and the output error can be evaluated using a loss function. Furthermore, the terminal device can perform multiple rounds of gradient optimization and iterative updates on all matrix elements of the basis matrix based on the backpropagation algorithm. This ensures that the basis matrix maintains the core mathematical properties of the frequency domain transformation (such as orthogonality constraints) and adaptively learns and approximates the optimal feature transformation basis under the current task and data distribution. The basis matrix obtained after training convergence is then used as the basis matrix for the image feature frequency domain transformation. It is understood that, compared to a fixed DCT basis, the use of a learnable, data-driven optimized basis matrix in this embodiment can more effectively extract discriminative frequency domain features. The aforementioned neural network can be a single-layer linear fully connected network, or a multi-layer fully connected network, convolutional neural network, etc. The specific network structure can be determined according to the actual application scenario and is not limited here. Taking a single-layer linear fully connected network as an example, the input of this network is a flattened spatial image patch feature matrix (e.g., an 8×8 image patch corresponding to a 64-dimensional vector) as the aforementioned sample data. The intermediate transformation layer is an unbiased fully connected layer, which can use a linear activation function. The trainable parameters of this intermediate transformation layer are the aforementioned basis matrix. The output of this network is reshaped to the same size as the original spatial feature matrix (e.g., 8×8), thereby obtaining the frequency domain image feature coefficient matrix.During training, this network updates the basis matrix through backpropagation and uses a loss function to evaluate the output error, thereby optimizing the learned basis matrix. For stronger fitting capabilities, the aforementioned neural network can be stacked with two fully connected layers. For example, the first layer can be fixed as a DCT basis matrix for initial transformation, and the second layer can serve as a learnable fine-tuning layer. The specific layer can be determined based on the actual application scenario and is not limited here. Here, the aforementioned loss function can include a regularization term to constrain the orthogonality of the basis matrices. The regularization term can satisfy:
[0069] in, For the above regularization term, For the above learnable basis matrix parameters, This is the identity matrix. The terminal device can introduce a dynamic constraint strength decay strategy, allowing the regularization term to weaken by a fixed proportion at the end of each training session, thereby adjusting the constraint strength of the regularization term. For example, the terminal device can define initial weights based on historical performance or user input. In each training cycle, the weight λ decays by 2%. The specific initial weight value and decay value can be determined according to the actual application scenario and are not limited here. In this embodiment, a basis matrix with the same matrix size as the spatial domain image patch feature matrix can be constructed. This basis matrix can be used as a trainable parameter of the neural network. Its initial value can be the standard discrete cosine transform basis function. Multiple sets of sample data are used as the training input data of the above neural network, and the matrix elements of the above basis matrix are optimized and trained multiple times based on the backpropagation algorithm to obtain the trained basis matrix. The loss function constructed by the regularization term improves the stability and controllability of the learning process. By replacing the traditional fixed standard discrete cosine transform basis function with a trainable basis matrix, the transform space can be learned and optimized from multiple sets of sample data, so that the transformed frequency domain representation (i.e., the frequency domain image feature coefficient matrix) can more compactly and specifically represent the key features of the spatial domain image patch feature matrix, filter or reduce the influence of noise on image features, and has high applicability.
[0070] In some feasible implementations, the terminal device performs double matrix multiplication on each image block based on the aforementioned basis matrix to perform a frequency domain transformation on the spatial image block feature matrix to obtain a frequency domain image feature coefficient matrix. Here, the aforementioned frequency domain image feature coefficient matrix satisfies:
[0071] in, The frequency domain image feature coefficient matrix, The feature matrix of the spatial domain image patch. This is the basis matrix. It can be understood that the output frequency domain image feature coefficient matrix... The matrix size and the feature matrix of the spatial domain image patch The matrices are of the same size, and the frequency domain image feature coefficient matrix It can retain the complex form (the real part is the magnitude, and the imaginary part is the phase).
[0072] S302, the frequency domain image feature coefficient matrix is subjected to frequency band weighting processing based on the image feature weighting coefficients corresponding to multiple frequency bands to obtain the weighted frequency domain image feature coefficient matrix.
[0073] In some feasible implementations, the terminal device can perform frequency-band weighted processing on the frequency domain image feature coefficient matrix based on the image feature weighting coefficients corresponding to multiple frequency bands to obtain a weighted frequency domain image feature coefficient matrix. Here, the frequency band division can be based on the spatial distribution and directional characteristics of the frequency, such as concentric circle division based on frequency radius, sector division, multi-scale division based on frequency band decomposition, or division based on DCT blocks, etc. The specific division can be determined according to the actual application scenario and is not limited here. Furthermore, the terminal device can generate watermark information corresponding to each frequency band in multiple frequency bands based on the weighted frequency domain image feature coefficient matrix. It can be understood that the watermark information can be associated with the frequency band features and the frequency domain image feature coefficient matrix of each frequency band. For example, watermark information for a certain frequency band can be generated based on the quantization step size and the frequency domain image feature coefficient matrix. For details, please refer to subsequent embodiments, which will not be elaborated here.
[0074] In some feasible implementations, the aforementioned multiple frequency bands include a first frequency band, a second frequency band, and a third frequency band obtained by dividing the spatial frequencies corresponding to the matrix coordinate system of the aforementioned frequency domain image feature coefficient matrix; wherein the first frequency band, the second frequency band, and the third frequency band are arranged in ascending frequency order, the first frequency band corresponds to the main structure of the aforementioned original image, the second frequency band corresponds to the edge transition region of the aforementioned original image, and the third frequency band corresponds to the noise and texture details in the aforementioned original image. For ease of description, it is assumed that u represents the frequency component index of the image block in the horizontal direction (column direction), and v represents the frequency component index of the image block in the vertical direction (row direction). The terminal device can use the region u+v≤3 as the aforementioned first frequency band (also called the low frequency band), which can correspond to the main structure of the original image and can carry approximately 15% of the energy in the original image. The terminal device can use the region 4≤u+v≤5 as the aforementioned second frequency band (also called the mid frequency band), which can correspond to the edge transition region of the original image and can carry approximately 30% of the energy in the original image. The terminal device can use the region u+v≥6 as the aforementioned third frequency band (also known as the high-frequency band). The third frequency band corresponds to noise and texture details in the original image and can carry approximately 55% of the energy in the original image. It is understood that this division method facilitates the subsequent generation of watermark information adapted to different frequency characteristics, thereby optimizing the image quality after watermark embedding. In this embodiment, the frequency band can be divided into a first frequency band, a second frequency band, and a third frequency band according to the spatial frequency corresponding to the matrix coordinate system of the frequency domain image feature coefficient matrix. This effectively distinguishes the main structure, edge transition regions, and noise of the original image, achieving layered, decoupled, and interpretable processing of the original image information. This lays the foundation for subsequent frequency band-based watermark information generation and embedding of corresponding watermark information, improving the interpretability, robustness, and flexibility of watermark embedding, and enhancing its applicability.
[0075] In some feasible implementations, the terminal device can input the amplitude and phase of the aforementioned frequency domain image feature coefficient matrix into a weight generation network, respectively. After multi-layer convolution processing in the weight generation network, a first image feature weighting coefficient matrix is obtained. This first image feature weighting coefficient matrix includes multiple first image feature weighting coefficients corresponding to image feature coefficients at different spatial locations in the aforementioned frequency domain image feature coefficient matrix. It can be understood that the amplitude and phase of the frequency domain image feature coefficient matrix can serve as dual-channel inputs to the aforementioned weight generation network. Here, the weight generation network can include multi-layer convolution (such as 3 layers of 3×3 kernel convolution) and a GELU activation function to process the amplitude and phase of the aforementioned frequency domain image feature coefficient matrix input to the attention generation network, thereby outputting a single-channel weight map, i.e., the aforementioned first image feature weighting coefficient matrix. For example, the weight generation network described above can contain three convolutional layers. Each layer can use a 3×3 convolutional kernel with a stride of 1 and maintain consistent padding between layers (i.e., during convolution, the input matrix is padded with an appropriate number of zeros around its edges so that the height and width of the output matrix are the same as the input matrix). The first convolutional layer can have 2 input channels (corresponding to amplitude and phase) and 16 output channels, followed by GELU activation. The second convolutional layer can have 16 input channels and 8 output channels, followed by GELU activation. The third convolutional layer can have 8 input channels and 1 output channel to obtain a single-channel weight map, i.e., the first image feature weighting coefficient matrix described above. The entire network may not contain fully connected layers, but only generate the first image feature weighting coefficient matrix through multiple convolutions and the GELU activation function. The weight generation network described above can also be, but is not limited to, lightweight networks with residual connections, multi-scale convolutional networks, etc. The specific network structure of the weight generation network can be determined according to the actual application scenario and is not limited here. The terminal device can then apply different weight scaling coefficients to the first image feature weighting coefficients corresponding to the first frequency band, the second frequency band, and the third frequency band in the first image feature weighting coefficient matrix to generate a second image feature weighting coefficient matrix. The weight scaling coefficients may include gain coefficients for enhancing the first image feature weighting coefficients and attenuation coefficients for suppressing them. It is understood that the weight scaling coefficients corresponding to different frequency bands can be determined based on frequency characteristics. The terminal device can obtain the weight scaling coefficients corresponding to each frequency band from historical data or from user-defined weight scaling coefficients; the specific determination depends on the actual application scenario and is not limited here. For example, the terminal device can apply an attenuation coefficient of 0.3-0.5 to the first image feature weighting coefficients corresponding to the third frequency band, and apply enhancement coefficients of 1.2-1.5 to the first image feature weighting coefficients corresponding to the first and second frequency bands respectively to generate the second image feature weighting coefficient matrix.Furthermore, the terminal device can obtain learnable scaling parameters and generate a weighted frequency domain image feature coefficient matrix based on the aforementioned second image feature weighting coefficient matrix, the aforementioned learnable scaling parameters, and the aforementioned frequency domain image feature coefficient matrix. The aforementioned learnable scaling parameters are used to scale the aforementioned second image feature weighting coefficient matrix. It is understood that the terminal device can obtain the values of the aforementioned scaling parameters from historical data or obtain user-defined scaling parameter values; the specific value can be determined according to the actual application scenario and is not limited here. In this embodiment, the amplitude and phase of the frequency domain image feature coefficient matrix can be input into the weight generation network to obtain a first image feature weighting coefficient matrix. Then, different weight scaling coefficients can be applied to the first image feature weighting coefficients corresponding to the first frequency band, the second frequency band, and the third frequency band in the first image feature weighting coefficient matrix to generate a second image feature weighting coefficient matrix. Then, a weighted frequency domain image feature coefficient matrix can be generated by combining the learnable scaling parameters and the frequency domain image feature coefficient matrix. Attenuation or enhancement coefficients can be applied to the frequency domain image feature coefficient matrix of each frequency band to obtain a weighted frequency domain image feature coefficient matrix that strengthens the image subject information and weakens noise. This lays the foundation for subsequent generation of watermark information by frequency band and embedding of corresponding watermark information, improves the robustness and flexibility of watermark embedding, and has high applicability.
[0076] In some feasible implementations, the above-mentioned weighted frequency domain image feature coefficient matrix can satisfy:
[0077] in, The above is the weighted frequency domain image feature coefficient matrix. The above frequency domain image feature coefficient matrix, For the above learnable scaling parameters, This is the weighting coefficient matrix for the second image feature described above. It can be understood that the learnable scaling parameters... The value of can be determined according to the actual application scenario and is not limited here. In the embodiments of this application, a weighted frequency domain image feature coefficient matrix can be generated based on the frequency domain image feature coefficient matrix, the learnable scaling parameter, and the second image feature weighting coefficient matrix, wherein the learnable scaling parameter It can be used as an intensity gate to control the scaling of dynamic weighting to highlight frequencies that are friendly to watermark embedding (such as being insensitive to distortion and highly robust to attacks). This results in a weighted frequency domain image feature coefficient matrix that enhances the main image information and weakens noise, laying the foundation for subsequent frequency segmentation to generate watermark information and embed the corresponding watermark information. It balances the intensity and fidelity of subsequent watermark embedding, improves the robustness and flexibility of watermark embedding, and has high applicability.
[0078] S303, Based on the above-mentioned weighted frequency domain image feature coefficient matrix, generate watermark information corresponding to each frequency band in the above-mentioned multiple frequency bands.
[0079] In some feasible implementations, the terminal device can obtain a pseudo-random sequence generated using a first key and an image hash, and generate first watermark information corresponding to the first frequency band according to the pseudo-random sequence and the sign function value of the weighted frequency domain image feature coefficient matrix; generate second watermark information corresponding to the second frequency band according to the quantization step size and the weighted frequency domain image feature coefficient matrix; and generate third watermark information corresponding to the third frequency band according to the intensity coefficient, the weighted frequency domain image feature coefficient matrix, the local mean of the weighted frequency domain image feature coefficient matrix, and a random matrix controlled by the second key, thereby obtaining watermark information corresponding to each of the multiple frequency bands. It can be understood that the watermark information corresponding to each frequency band (i.e., the first watermark information, the second watermark information, and the third watermark information) can be calculated based on the frequency band characteristics and the weighted frequency domain image feature coefficient matrix corresponding to each frequency band. In this embodiment, the first watermark information corresponding to the first frequency band can be generated based on the pseudo-random sequence generated by the first key and image hash, and the sign function value of the weighted frequency domain image feature coefficient matrix. The second watermark information corresponding to the second frequency band can be generated based on the quantization step size and the weighted frequency domain image feature coefficient matrix. The third watermark information corresponding to the third frequency band can be generated based on the intensity coefficient, the weighted frequency domain image feature coefficient matrix, the local mean of the weighted frequency domain image feature coefficient matrix, and the random matrix controlled by the second key. Differentiated watermark information is generated according to the characteristics of each frequency band. The first watermark information corresponding to the low-frequency band can resist geometric attacks by binding the coefficient symbol. The second watermark information corresponding to the second frequency band (i.e., the mid-frequency band) can balance the robustness and concealment of the watermark embedding by dynamically adjusting the quantization. The third watermark information corresponding to the third frequency band (i.e., the high-frequency band) can generate weak, content-adaptive perturbations based on local statistical characteristics to support anti-counterfeiting and traceability. The differentiated watermark information corresponding to each frequency band can effectively solve the failure problem of traditional watermarks caused by image compression, rotation, and cropping. The watermark information has anti-counterfeiting properties, high security, high flexibility, and high applicability.
[0080] Optionally, in some feasible implementations, the first watermark information corresponding to the first frequency band can be generated based on the pseudo-random sequence generated by the first key and the image hash, as well as the sign function value of the weighted frequency domain image feature coefficient matrix. The first watermark information can satisfy:
[0081] in, The first watermark information mentioned above, The pseudo-random sequence generated from the first key and the image hash mentioned above. The above is the weighted frequency domain image feature coefficient matrix. This represents the sign function value of the aforementioned weighted frequency domain image feature coefficient matrix. It can be understood that the aforementioned first watermark information... It can preserve the sign information of the weighted frequency domain image feature coefficient matrix, thus improving the first watermark information. The embedding process does not disrupt the main energy structure of the image. Simultaneously, the aforementioned first watermark information... The calculation and generation methods can make the first watermark information It possesses strong correspondence, unpredictability, and a strict correlation with image content. Any modification to the image may cause a change in the image hash value, thereby disrupting watermark synchronization, and it can support integrity verification. The terminal device can transmit the first watermark information. Embedded into the weighted frequency domain image feature matrix corresponding to the first frequency band, the watermarked image's main structure, due to the first frequency band's position, makes the embedded watermark's main information robust against attacks. This can be understood as the first watermark information... The calculation method and specific value can be determined according to the actual application scenario, and no restrictions are imposed here.
[0082] Optionally, in some feasible implementations, the second watermark information corresponding to the second frequency band can be generated based on the quantization step size and the weighted frequency domain image feature coefficient matrix, and the second watermark information can satisfy:
[0083] in, The second watermark information mentioned above, For the above quantization step size, Let Q(·) be the weighted frequency domain image feature coefficient matrix mentioned above. Here, Q(·) is the matrix of features in the frequency domain. A quantizer for the step size can typically be a uniform quantizer or a dithering quantizer. This is used to normalize and quantize the coefficients in the weighted frequency domain image feature coefficient matrix to an integer grid. The quantization step size can be understood as... The larger the value of , the sparser the quantization grid, resulting in a wider quantization interval for each watermark information bit (e.g., bit 0 or 1), which can absorb more interference and has higher robustness; quantization step size The smaller the value, the finer the quantization grid, and the more accurate the second watermark information. The smaller modification magnitude (i.e., quantization error) to the original coefficients during embedding reduces additional distortion in watermark information embedding, making pixel value changes in the spatial domain less perceptible to the human visual system. This enhances the watermark's concealment and maximizes the preservation of image visual quality. The quantization step size can be understood as... The specific value can be dynamically or adaptively adjusted according to the expected attack strength, image characteristics, or concealment requirements to achieve the optimal balance in a specific application scenario. The specific value can be determined according to the actual application scenario and is not limited here.
[0084] Optionally, in some feasible implementations, the third watermark information corresponding to the third frequency band can be generated based on the intensity coefficient, the aforementioned weighted frequency domain image feature coefficient matrix, the local mean of the aforementioned weighted frequency domain image feature coefficient matrix, and a random matrix controlled by the second key. The third watermark information can satisfy:
[0085] in, The aforementioned third watermark information, The above strength coefficient, The above is the weighted frequency domain image feature coefficient matrix. The local mean of the aforementioned weighted frequency domain image feature coefficient matrix is... This refers to the random matrix controlled by the second key mentioned above. It can be understood that... The centered high-frequency residual can be used to represent the details and noise components in the original image. Here, the second key and the first key used to calculate the first watermark information can have the same or different values, depending on the specific application scenario; no restrictions are placed here. Here, the aforementioned intensity coefficient... A smaller value (e.g., 0.1-0.3) can be used to strictly control the modification range and reduce the occurrence of artifacts (such as ringing effects) in the reconstructed image due to excessive modification. Intensity coefficient The specific value can be determined based on the actual application scenario, and is not limited here. It can be understood that the third watermark information generated based on the above calculation method... It possesses high concealment; the disturbance is strictly confined to the high-frequency band region with a small intensity coefficient β, ensuring that the visual impact is below the human eye's perception threshold. Furthermore, due to the random matrix controlled by the second key... This prevents unauthorized parties from detecting or extracting embedded third-party watermark information. The images are highly secure.
[0086] S304, based on the watermark embedding indication information corresponding to the above multiple frequency bands, the watermark information corresponding to each of the above frequency bands, and the weighted frequency band image feature matrix, a watermarked frequency domain image feature coefficient matrix is generated, wherein the watermark embedding indication information corresponding to the above multiple frequency bands is used to indicate the embedding position of the above watermark information and / or the embedding strength of the above watermark information.
[0087] In some feasible implementations, the terminal device can generate a watermarked frequency domain image feature coefficient matrix based on the watermark embedding indication information corresponding to the aforementioned multiple frequency bands, the watermark information corresponding to each frequency band, and the aforementioned weighted frequency band image feature matrix. The watermark embedding indication information corresponding to the aforementioned multiple frequency bands is used to indicate the embedding position and / or embedding strength of the watermark information. Here, the terminal device can obtain the watermark embedding indication information corresponding to the aforementioned multiple frequency bands from local storage, download it from a connected server, or obtain it through user input. The specific method can be determined according to the actual application scenario and is not limited here. The terminal device can obtain the embedding position and / or embedding strength of the watermark information corresponding to each frequency band from the watermark embedding indication information corresponding to the above multiple frequency bands. Then, based on the embedding position and / or embedding strength of the watermark information, it can embed the corresponding watermark information into the weighted frequency band image feature matrix of each frequency band, that is, embed the first watermark information into the weighted frequency band image feature matrix of the first frequency band, embed the second watermark information into the weighted frequency band image feature matrix of the second frequency band, and embed the third watermark information into the weighted frequency band image feature matrix of the third frequency band.
[0088] In some feasible implementations, as described above, the watermark embedding indication information corresponding to the multiple frequency bands includes a frequency band mask corresponding to the multiple frequency bands. After generating the watermark information corresponding to each frequency band in the multiple frequency bands and before generating the watermarked frequency domain image feature coefficient matrix, the method further includes: generating a frequency band mask corresponding to the multiple frequency bands, wherein the frequency band mask is a binary matrix and the frequency band mask has the same matrix size as the weighted frequency domain image feature coefficient matrix. The frequency band mask is used to set multiple target matrix coordinates for embedding the first watermark information, the second watermark information, or the third watermark information in the matrix coordinate system of the weighted frequency band image feature matrix. For example, suppose the terminal device generates a frequency band mask matrix after generating the watermark information corresponding to each frequency band in the multiple frequency bands. ∈{0,1}, where the size of the frequency band mask matrix is the same as the size of the weighted frequency domain image feature coefficient matrix. This can be understood as... ∈{0,1} is a binary matrix with elements of 0 or 1. Its structure can be directly determined by the frequency band division. Non-zero elements (i.e. The position of [u,v]=1, where u and v are matrix coordinates, can correspond to the pre-selected frequency band coordinates suitable for embedding the watermark (e.g., the region corresponding to the first and second frequency bands), and the zero-value element (i.e. The position [u,v]=0, where u and v are matrix coordinates, indicates that the frequency band coordinates do not embed watermark information, keeping the original coefficients unchanged. Optionally, the terminal device can use the Sobel edge detection operator to calculate the edge intensity values of multiple image blocks of the original image (such as multiple image blocks after image segmentation of the original image mentioned above). Then, based on the edge intensity values of each image block and a preset edge intensity threshold, each image block can be identified as either one of at least a smooth region image block or one of at least a textured edge image block. If the edge intensity value of an image block is less than the edge intensity threshold, the terminal device can classify it as a smooth region image block; if the edge intensity value of an image block is greater than or equal to the edge intensity threshold, the terminal device can classify it as a textured edge image block. Furthermore, the terminal device can determine the candidate embedding location set based on the region type of the image patch (i.e., the smooth region image patch and the edge texture image patch mentioned above): For smooth region image patches, the terminal device can select the embedding location (i.e., the target matrix coordinates mentioned above) from the frequency domain coordinate set corresponding to the first frequency band (regions where u+v≤2); for edge region image patches, the terminal device can select the embedding location (i.e., the target matrix coordinates mentioned above) from the frequency domain coordinate set corresponding to the second frequency band (regions where 4≤u+v≤5). Here, the frequency domain coordinate set corresponding to the third frequency band is generally not used for embedding watermarks. Furthermore, the terminal device can construct multiple binary frequency band mask matrices corresponding to each image patch. In any frequency band mask matrix, the matrix element corresponding to the target embedding location mentioned above is set to 1, and the remaining matrix elements are set to 0. Integrating the binary frequency band mask matrices of all image patches can obtain multiple frequency band masks (i.e., M mentioned above). Optionally, the terminal device can also introduce sparsity control. For example, based on the image blocks of the original image, 6-12 coefficients are activated for the weighted frequency band image feature matrix corresponding to each image block, so that the embedding density of the watermark information is maintained at about 15%-25%, further improving the concealment and security of the watermark. In the embodiments of this application, after generating the watermark information corresponding to each frequency band, a frequency band mask corresponding to each frequency band can be generated. The frequency band mask can be used to indicate the target matrix coordinates of the watermark embedding. The watermark information can be added to specific frequency bands (usually mid-frequency bands and low-frequency bands) with low visual sensitivity and strong robustness to common attacks through binary matrices. This reduces watermark embedding operations in high-frequency noise areas (such as high-frequency bands) or core low-frequency bands (such as DC components) that are crucial to image quality, thereby reducing the introduction of unnecessary distortion and achieving a balance between concealment, robustness and embedding capacity. This enables accurate, controllable and efficient watermark information embedding, improving the overall stability and applicability of watermark embedding.
[0089] In some feasible implementations, the watermark embedding indication information corresponding to the aforementioned multiple frequency bands includes the frequency band mask and watermark embedding strength coefficient corresponding to the aforementioned multiple frequency bands. Here, the frequency band mask corresponding to the aforementioned multiple frequency bands can be found in the foregoing embodiments, and is not limited further. The terminal device can obtain the value of the aforementioned watermark embedding strength coefficient from local storage space, or download the value of the aforementioned watermark embedding strength coefficient from a connected server, or obtain the value of the aforementioned watermark embedding strength coefficient through user input. The specific value can be determined according to the actual application scenario, and is not limited here. It is understood that the values of the watermark embedding strength coefficients corresponding to different frequency bands can be the same or different, and the specific value can be determined according to the actual application scenario, and is not limited here. For ease of description, the following explanation will take the example of the same value for the watermark embedding strength coefficients corresponding to multiple frequency bands. The terminal device can quantize and modulate the weighted frequency domain image feature coefficients corresponding to each of the target matrix coordinates in the weighted frequency domain image feature coefficient matrix according to the frequency band mask corresponding to the multiple frequency bands, the watermark embedding strength coefficient, the first watermark information, the second watermark information, and the third watermark information, so as to embed the first watermark information, the second watermark information, and the third watermark information in the weighted frequency domain image feature coefficient matrix to obtain a watermarked frequency domain image feature coefficient matrix. It is understood that the terminal device can embed the first watermark information into the weighted frequency domain image feature coefficients corresponding to each target matrix coordinate in the frequency band mask corresponding to the first frequency band based on the first watermark information, the frequency band mask corresponding to the first frequency band, and the watermark embedding strength coefficient; embed the second watermark information into the weighted frequency domain image feature coefficients corresponding to each target matrix coordinate in the frequency band mask corresponding to the second frequency band based on the second watermark information, the frequency band mask corresponding to the second frequency band, and the watermark embedding strength coefficient; and embed the third watermark information into the weighted frequency domain image feature coefficients corresponding to each target matrix coordinate in the frequency band mask corresponding to the third frequency band, based on the third watermark information, the frequency band mask corresponding to the third frequency band, and the watermark embedding strength coefficient, to generate a watermarked frequency domain image feature coefficient matrix. In this embodiment, the weighted frequency domain image feature coefficients corresponding to each target matrix coordinate in the weighted frequency domain image feature coefficient matrix can be quantized and modulated based on the frequency band mask, watermark embedding strength coefficient, and watermark information corresponding to each frequency band. This enables the embedding of the first watermark information, the second watermark information, and the third watermark information, resulting in a watermarked frequency domain image feature coefficient matrix. The watermark information is then encoded into the quantization amplitude of the weighted frequency domain image feature coefficients through a quantization network, making the watermark information deeply bound to the weighted frequency domain image feature coefficients. At the same time, the quantization network has anti-attack properties, improving robustness and enhancing system security, and has high applicability.
[0090] In some feasible implementations, the frequency domain image feature coefficient matrix with watermark described above can satisfy:
[0091] in, The above is the frequency domain image feature coefficient matrix containing the watermark. The above is the weighted frequency domain image feature coefficient matrix. The above watermark embedding strength coefficient, For the above frequency band mask, This refers to the watermark information corresponding to each of the aforementioned frequency bands. It is understandable that... This can be the first watermark information, the second watermark information, and the third watermark information from the aforementioned embodiments. Assume the frequency band mask matrix is... ∈{0,1}, the terminal device can When the value is 1, the above watermark embedding is performed; in When the value is 0, the original coefficient remains unchanged, meaning no watermark information is embedded. In this embodiment, based on the frequency band mask corresponding to each frequency band, the watermark embedding strength coefficient, and the watermark information, the weighted frequency domain image feature coefficients corresponding to each target matrix coordinate in the weighted frequency domain image feature coefficient matrix can be quantized and modulated to achieve the embedding of the first, second, and third watermark information. The watermark embedding strength coefficient... It can achieve precise control over the amplitude of watermark information to balance the robustness and concealment of watermark embedding. As a frequency band mask, the embedding position of the watermark information can be restricted, so that the watermark information is added to the pre-selected target matrix coordinate position, reducing unnecessary distortion. The watermark information corresponding to each of the aforementioned frequency bands possesses security and strong correspondence, which can improve the security and resistance to attacks after watermarking the image. This method calculates and generates a frequency domain image feature coefficient matrix containing the watermark; it is simple to implement, computationally efficient, robust, and highly applicable.
[0092] Optionally, in some feasible implementations, the terminal device can perform dithering modulation on the weighted frequency domain image feature coefficients corresponding to the target matrix coordinates to generate a watermarked frequency domain image feature coefficient matrix. Here, the terminal device can introduce a quantization step size, the value of which can be determined according to image features and expected attack strength, etc. For example, the adaptive adjustment range of the value can be 0.05-0.15, and the specific value can be determined according to the actual application scenario, without limitation here. For example, assuming the frequency band mask matrix is... ∈{0,1}, u and v are matrix coordinates, then The position where [u,v]=1 can be the coordinates of the target matrix mentioned above. The position [u,v]=0 indicates that the frequency band coordinates do not embed watermark information, and the original coefficients remain unchanged. The above watermarked frequency domain graph coefficient matrix can satisfy:
[0093] in, Matrix coordinates The corresponding weighted frequency domain image feature coefficients, The above watermark embedding strength coefficient, For the above quantization step size, This refers to the watermark information corresponding to each of the aforementioned frequency bands. It can be understood that when... At that time, the watermark information is embedded into the corresponding weighted frequency domain image feature coefficients. In this case, the original coefficients remain unchanged. In the embodiments of this application, by introducing an adaptively adjusted quantization step size, each coefficient can obtain the optimal quantization step size within a given adjustment range, thereby maximizing robustness while maintaining high concealment.
[0094] S305, based on the original image and the watermarked frequency domain image feature coefficient matrix, the watermarked frequency domain image feature coefficient matrix is subjected to hybrid domain fusion processing in the spatial and frequency domains to generate a fused image feature matrix after watermark embedding.
[0095] In some feasible implementations, the terminal device can perform hybrid domain fusion processing on the watermarked frequency domain image feature coefficient matrix in both the spatial and frequency domains, based on the original image and the aforementioned watermarked frequency domain image feature coefficient matrix, to generate a fused image feature matrix with the watermark embedded. Here, the hybrid domain processing can employ methods including, but not limited to, weighted fusion, Poisson fusion, or gradient domain fusion, which can be determined according to the actual application scenario and is not limited here. For example, see... Figure 5 , Figure 5 This is a comparative diagram of the original image and the target image containing the watermark provided in an embodiment of this application. For example... Figure 5 As shown, the terminal device can use the watermark embedding method provided in this application embodiment to process the original image and output it as a target image with a watermark. After adaptive intensity adjustment, the watermark information can be highly integrated with the content of the original image, maintaining the excellent visual quality of the target image. At the same time, the target image retains the key semantic regions of the original image, improving the semantic consistency between the target image after watermark embedding and the original image.
[0096] In some feasible implementations, the terminal device can use the aforementioned basis matrix to perform an inverse transformation on the watermarked frequency domain image feature matrix to obtain a watermark-embedded spatial domain image block feature matrix. It can then extract a spatial semantic feature matrix from the original image and concatenate the watermark-embedded spatial domain image block feature matrix with the spatial semantic feature matrix along the channel dimension to obtain a fused image feature matrix. Here, the spatial semantic feature matrix can carry the spatial structure information of the original image. This spatial semantic feature matrix can be the spatial semantic feature matrix corresponding to the original image in the conv4 layer of a convolutional neural network. Specifically, the spatial semantic feature matrix corresponding to the original image in the conv4 layer of the convolutional neural network can be a high-dimensional representation of the original image in the mid-layer feature space of the network after multiple convolutions, pooling, and nonlinear activations. This spatial semantic feature matrix retains the spatial structure information, and its two-dimensional arrangement can maintain a correspondence with the spatial position of the input original image. The feature vector at each position can be used to reflect the semantic content of the corresponding local region of the original image. Compared to shallow features (such as edges and textures), conv4 layer features can include mid-level semantic concepts such as object parts and structural relationships. Its channel dimension carries the diversity of image features; each channel can be considered a specific semantic detector, responding to different attribute patterns in the image (such as specific textures, shapes, or parts). The terminal device can then adaptively weight the aforementioned fused image feature matrix according to its channel dimension, generating a channel-weighted image feature matrix with the same number of image feature channels as the fused image feature matrix. This channel-weighted image feature matrix can be used to emphasize or suppress different image feature channels. Here, adaptive weighting of the channel dimension can be implemented through compression and excitation (SE) modules, as detailed in subsequent embodiments. The terminal device can also utilize the frequency domain energy distribution matrix corresponding to the watermarked frequency domain image feature coefficient matrix to generate a spatially weighted image feature matrix with the same feature space size as the fused image feature matrix. This spatially weighted image feature matrix is used to emphasize or suppress different spatial locations in the feature space. It is understood that the feature space size of the spatially weighted image feature matrix is the same as that of the aforementioned fused image feature matrix, which enhances important spatial locations and suppresses less important ones. For details, please refer to subsequent embodiments; further, we will not elaborate here. Furthermore, the terminal device can multiply the aforementioned channel-weighted image feature matrix and the aforementioned spatially weighted image feature matrix element-wise to generate the fused image feature matrix after watermark embedding. It is understood that each matrix element in the fused image feature matrix after watermark embedding has undergone dual calibration using both channel attention and spatial attention mechanisms to retain key information related to the watermark and semantics to the greatest extent possible.In this embodiment, the spatial image patch feature matrix can be reconstructed through inverse transformation, allowing the watermark information to be transformed into the spatial domain along with the watermarked frequency domain image feature matrix, thus obtaining the watermark-embedded spatial image patch feature matrix. The watermark-embedded spatial image patch feature matrix and the spatial semantic feature matrix are then concatenated along the channel dimension to obtain a fused image feature matrix. Adaptive weighting along the channel dimension can then be applied to the fused image feature matrix to generate a channel-weighted image feature matrix with the same number of image feature channels as the aforementioned fused image feature matrix. This channel-weighted image feature matrix enhances important channels and suppresses less important ones. Simultaneously, a spatially weighted image feature matrix with the same feature space size as the aforementioned fused image feature matrix can be generated based on the frequency domain energy distribution matrix corresponding to the watermarked frequency domain image feature coefficient matrix. This spatially weighted image feature matrix can be used to emphasize information-rich spatial regions. Furthermore, by combining the aforementioned channel-weighted image feature matrix and spatial-weighted image feature matrix, a fused image feature matrix after watermark embedding can be generated. This ensures that the generated fused image feature matrix retains key information related to the watermark information and the image semantics to the greatest extent, providing a strong feature foundation for subsequent image processing.
[0097] Alternatively, in some feasible implementations, the terminal device can perform an inverse transformation on the watermarked frequency domain image feature matrix based on the aforementioned basis matrix to obtain a watermark-embedded spatial domain image block feature matrix, and obtain the spatial domain semantic feature matrix corresponding to the original image in the conv4 layer of the convolutional neural network. The terminal device can then concatenate the watermark-embedded spatial domain image block feature matrix and the spatial domain semantic feature matrix along the channel dimension to obtain a fused image feature matrix. This fused image feature matrix is then input into a compression and excitation module to generate a channel-weighted image feature matrix with the same number of image feature channels as the fused image feature matrix. This channel-weighted image feature matrix is used to emphasize or suppress different image feature channels. It is understandable that the terminal device can concatenate the spatial image block feature matrix after watermark embedding with the spatial semantic feature matrix along the channel dimension. Here, the number of channels in the spatial semantic feature matrix can be a fixed 512 channels. Assuming the number of channels in the spatial image block feature matrix after watermark embedding is 64 channels, the number of channels in the concatenated fused image feature matrix is 576 channels. This fused image feature matrix integrates image feature information from the frequency domain and the spatial domain, improving the expressive power of the features. The terminal device can input the fused image feature matrix into the SE module to generate a channel-weighted image feature matrix with 576 channels, thereby enhancing important channels and suppressing secondary channels among the 576 feature channels. Furthermore, the terminal device can generate a spatially weighted image feature matrix with the same feature space size as the fused image feature matrix based on the frequency domain energy distribution matrix corresponding to the frequency domain image feature coefficient matrix containing the watermark. The spatially weighted image feature matrix is used to emphasize or suppress different spatial locations in the feature space. For example, the terminal device can normalize the frequency domain energy distribution matrix |F''| corresponding to the watermarked frequency domain image feature coefficient matrix F'', and then use the output of a convolutional layer or directly as weights to generate a spatially weighted image feature matrix. The feature space size of this spatially weighted image feature matrix is the same as the feature space size of the fused image feature matrix, which enhances important spatial locations and suppresses secondary spatial locations. Furthermore, the terminal device can multiply the channel-weighted image feature matrix and the spatially weighted image feature matrix element-wise to generate the watermark-embedded fused image feature matrix. It can be understood that each matrix element in the watermark-embedded fused image feature matrix undergoes dual calibration using both channel attention and spatial attention mechanisms to maximize the retention of key information related to the watermark and semantics. In this embodiment, the spatial domain image block feature matrix can be reconstructed through inverse transformation, allowing the watermark information to be transformed into the spatial domain along with the watermarked frequency domain image feature matrix, thereby obtaining the watermark-embedded spatial domain image block feature matrix.The spatial image block feature matrix after watermark embedding and the spatial semantic feature matrix are concatenated along the channel dimension to obtain a fused image feature matrix. This fused image feature matrix can then be input into the compression and excitation module to generate a channel-weighted image feature matrix with the same number of image feature channels as the fused image feature matrix. This channel-weighted image feature matrix enhances important channels and suppresses less important channels. Simultaneously, based on the frequency domain energy distribution matrix corresponding to the frequency domain image feature coefficient matrix containing the watermark, a spatially weighted image feature matrix with the same feature space size as the fused image feature matrix can be generated. This spatially weighted image feature matrix can be used to emphasize information-rich spatial regions. Furthermore, the channel-weighted image feature matrix and the spatially weighted image feature matrix can be combined to generate the watermark-embedded fused image feature matrix. This ensures that the generated fused image feature matrix retains key information related to the watermark and image semantics to the greatest extent, providing a strong feature foundation for subsequent image processing.
[0098] In some feasible implementations, after generating the watermarked frequency domain image feature coefficient matrix and before performing hybrid domain fusion processing on the watermarked frequency domain image feature coefficient matrix in the spatial and frequency domains, the terminal device can perform high-frequency coefficient filtering on the watermarked frequency domain image feature matrix to suppress high-frequency noise in the watermarked frequency domain image feature matrix. Here, the terminal device can set the frequency domain image feature coefficients at the corresponding positions of the third frequency band in the watermarked frequency domain image feature matrix to 0 or multiply them by a very small attenuation factor (such as 0.1), while maintaining the frequency domain image feature coefficients at the corresponding positions of the first and second frequency bands. The specific method can be determined according to the actual application scenario and is not limited here. After generating the fused image feature matrix after watermark embedding and before generating the watermarked target image based on the fused image feature matrix after watermark embedding, the terminal device can perform threshold filtering on the fused image feature matrix after watermark embedding to suppress residual high-frequency noise in the fused image feature matrix after watermark embedding. Here, the terminal device can obtain a preset threshold, which can be input by the user or a historical threshold. This threshold can be related to the bit depth of the original image, and the specific determination can be made according to the actual application scenario, without restriction here. The terminal device can iterate through each feature value in the feature matrix of the fused image after watermark embedding. If its absolute value is less than the preset threshold, it is set to 0; otherwise, it remains unchanged. Optionally, the terminal device can also iterate through each feature value in the feature matrix of the fused image after watermark embedding. If its absolute value is less than the preset threshold, it is set to 0; if its absolute value is greater than the preset threshold, the absolute value of the feature value is subtracted from the preset threshold, while keeping the sign of the original value unchanged. The specific threshold filtering method can be determined according to the actual application scenario, without restriction here. It can be understood that by processing the high-frequency components twice, the robust embedding of watermark information in the frequency domain can be improved, and the visual quality of the image in the spatial domain can also be improved. In this embodiment, after generating the watermarked frequency domain image feature coefficient matrix and before performing an inverse transform on the watermarked frequency domain image feature matrix, the high-frequency coefficients in the watermarked frequency domain image feature matrix can be filtered and attenuated to suppress residual high-frequency noise and improve the frequency domain robustness of the watermark embedding. After generating the fused image feature matrix after watermark embedding and before generating the watermarked target image based on the fused image feature matrix after watermark embedding, threshold filtering can be performed on the fused image feature matrix after watermark embedding to suppress residual high-frequency noise in the fused image feature matrix after watermark embedding, reduce excessive distortion (such as edge blurring or noise amplification), and enhance spatial visual quality.
[0099] In some feasible implementations, the fused image feature matrix after watermark embedding can satisfy:
[0100] in, The above is the feature matrix of the fused image after watermark embedding. The above channels are weighted image feature matrices. The above spatially weighted image feature matrix, The above is the fused image feature matrix. The above is the feature matrix of the spatial domain image block after watermark embedding. The above spatial semantic feature matrix, This refers to the frequency domain energy distribution matrix corresponding to the frequency domain image feature coefficient matrix containing the watermark. In this embodiment, the spatial domain image block feature matrix after watermark embedding can be... Spatial semantic feature matrix Image features are stitched together along the channel dimension to obtain a fused image feature matrix. This allows for the fusion of image feature matrices. Adaptive weighting is performed along the channel dimension to generate a channel-weighted image feature matrix with the same number of image feature channels as the fused image feature matrix described above. The channel-weighted image feature matrix This allows for the enhancement of important channels and the suppression of secondary channels. Simultaneously, it can be based on the frequency domain energy distribution matrix corresponding to the frequency domain feature coefficient matrix of the watermarked image. Generate a spatially weighted image feature matrix with the same feature space size as the aforementioned fused image feature matrix. The spatially weighted image feature matrix It can be used to emphasize information-rich spatial regions. Furthermore, it can be combined with the aforementioned channel-weighted image feature matrix. And the spatially weighted image feature matrix mentioned above Generate a fused image feature matrix after watermark embedding. This results in the generated fused image feature matrix. This method preserves key information related to watermarking and image semantics to the greatest extent possible, providing a strong feature foundation for subsequent image processing. The method is used to calculate and generate a fused image feature matrix. It is simple to implement, computationally efficient, robust, and highly applicable.
[0101] S306, Generate a target image with watermark based on the feature matrix of the fused image after watermark embedding.
[0102] In some feasible implementations, the terminal device can introduce multiple adversarial examples based on differentiable operators, and jointly optimize the peak signal-to-noise ratio loss and the robustness loss to perform adversarial enhancement processing on the fused image feature matrix after watermark embedding, thereby obtaining an optimized fused image feature matrix. Then, a watermarked target image can be generated based on the optimized fused image feature matrix. Here, the multiple adversarial examples can include, but are not limited to, JPEG compression simulation, noise addition, etc. For example, a soft approximation (relaxed discrete cosine transform) of the quantization matrix can be performed in the frequency domain, or Gaussian noise and salt-and-pepper noise can be added, etc. The specific method can be determined according to the actual application scenario and is not limited here. The terminal device can introduce adversarial examples such as image compression and noise addition through differentiable operators, and jointly optimize the peak signal-to-noise ratio loss and the robustness loss to perform adversarial enhancement processing on the fused image feature matrix after watermark embedding, thereby obtaining the optimized fused image feature matrix. Here, the robustness loss can be related to the definition of the anti-attack loss, and the specific loss function can be determined according to the actual application scenario and is not limited here. It is understood that after obtaining the optimized fused image feature matrix, the terminal device can generate a watermarked target image through image reconstruction and image optimization. Specific image reconstruction and optimization methods can be found in subsequent embodiments and are not limited here. In this embodiment, multiple adversarial samples can be introduced based on differentiable operators, and peak signal-to-noise ratio loss and anti-attack robustness loss can be combined to perform adversarial enhancement processing on the fused image feature matrix after watermark embedding. This allows the subsequently generated watermarked target image to maintain the stability of image semantics and watermark when facing real interference such as compression and noise, improving the generalization ability of the watermark and the watermarked target image against unknown attacks, resulting in high security and strong applicability.
[0103] In some feasible implementations, the terminal device can use the aforementioned basis matrix to perform a frequency domain transformation on the optimized fused image feature matrix to obtain a frequency domain image feature coefficient matrix with watermark embedding, and then use the aforementioned basis matrix to perform an inverse transformation on the watermark-embedded frequency domain image feature coefficient matrix to obtain a target spatial domain image block feature matrix. Here, the aforementioned optimized fused image feature matrix can be a high-dimensional tensor containing multiple channels, with unnormalized numerical distribution and carrying nonlinear abstract features. Through one frequency domain transformation and one inverse transformation, the aforementioned high-dimensional tensor can be converted into a target spatial domain image block feature matrix in the spatial domain. The target spatial domain image block feature matrix can be a specific color space or channel representation required by the target, such as RGB three channels, CMYK four channels for printing, YCbCr channels for video, or grayscale and multispectral channels for professional fields, etc., which can be determined according to the actual application scenario and are not limited here. Then, the terminal device can obtain the target image with watermark after the aforementioned target image block matrix has undergone image block recombination and boundary smoothing processing. It is understandable that the terminal device can reorganize the target image block matrix based on the aforementioned image block segmentation. Overlapping areas can be smoothed using bicubic interpolation, and a specific pixel-width vignetting transition can be applied to the boundaries of each block. For example, a 5-pixel-width vignetting transition can be applied to the filled areas. The specific value of the aforementioned specific pixel width can be determined according to the actual application scenario and is not limited here. Optionally, the terminal device can perform color consistency verification, frequency domain constraint filtering, and other optimization processes after image block reorganization and boundary smoothing. For example, histogram matching can be performed on different channels in the Lab color space, or high-frequency abnormal peaks introduced by embedding can be filtered out in the DCT domain. The specifics can be determined according to the actual application scenario and are not limited here. In this embodiment, the optimized fused image feature matrix is transformed in the frequency domain using the basis matrix to complete the mapping from the feature representation space to the frequency domain. Then, the mapping from the frequency domain to the spatial domain is achieved through the inverse transformation. This allows the multi-channel feature semantics to be mapped to the color space or channel representation to obtain the target spatial image block feature matrix. The target image block matrix can then be processed by image block recombination and boundary smoothing to obtain the watermarked target image, which improves the robustness and semantic fidelity of the watermark and has strong applicability.
[0104] In summary, the watermark embedding method provided in this application can obtain the frequency domain image feature coefficient matrix corresponding to the spatial domain image block feature matrix of the original image based on the basis matrix. Then, it can perform frequency-band weighting on the frequency domain image feature coefficient matrix based on the image feature weighting coefficients corresponding to multiple frequency bands to obtain a weighted frequency domain image feature coefficient matrix, generating watermark information corresponding to each frequency band. This watermark information can then be embedded into the weighted frequency band image feature matrix according to the watermark embedding instruction information corresponding to each frequency band, generating a watermarked frequency domain image feature coefficient matrix. Finally, based on the original image, the watermarked frequency domain image feature coefficient matrix is subjected to hybrid domain fusion processing in the spatial and frequency domains to generate a fused image feature matrix after watermark embedding, thus generating the watermarked target image. It can dynamically generate watermark information for each frequency band based on the frequency domain image features and can dynamically select frequency bands for watermark embedding, improving the dynamic adaptability and anti-attack resistance of watermark embedding, exhibiting high robustness and applicability. Therefore, the watermark embedding method provided in this application embodiment can improve user experience and enhance the market competitiveness of products, providing a robust and secure watermark embedding solution for various enterprises.
[0105] Based on the description of the watermark embedding method above, this application also discloses a watermark embedding device. This watermark embedding device can be applied to... Figures 1a to 4 In the watermark embedding method of the illustrated embodiment, the watermark embedding device is used to perform the steps in the watermark embedding method, that is, the watermark embedding device can be the one described above. Figures 3 to 5 The execution entity in the watermark embedding method of the illustrated embodiment. See also... Figure 6 , Figure 6 This is a schematic diagram of the watermark embedding device provided in an embodiment of this application. Figure 6 As shown in this embodiment, the watermark embedding device 60 can operate the following modules: Image preprocessing module 610 is used to obtain the frequency domain image feature coefficient matrix of the original image; The frequency domain weighting module 620 is used to perform frequency-band weighting processing on the above frequency domain image feature coefficient matrix based on the image feature weighting coefficients corresponding to multiple frequency bands, so as to obtain a weighted frequency domain image feature coefficient matrix. The watermark generation module 630 is used to generate watermark information corresponding to each frequency band in the above multiple frequency bands based on the weighted frequency domain image feature coefficient matrix mentioned above. The watermark embedding module 640 is used to generate a watermarked frequency domain image feature coefficient matrix based on the watermark embedding indication information corresponding to the above multiple frequency bands, the watermark information corresponding to each of the above frequency bands and the weighted frequency band image feature matrix, wherein the watermark embedding indication information corresponding to the above multiple frequency bands is used to indicate the embedding position of the above watermark information and / or the embedding strength of the above watermark information. The image generation module 650 is used to perform hybrid domain fusion processing on the watermarked frequency domain image feature coefficient matrix in the spatial and frequency domains based on the original image and the watermarked frequency domain image feature coefficient matrix to generate a fused image feature matrix after watermark embedding. The image generation module 650 is also used to generate a target image containing a watermark based on the fused image feature matrix after the watermark is embedded.
[0106] In the embodiments of this application, the modules in the device shown in the figures above can be individually or entirely combined into one or more other modules, or some of the modules can be further divided into multiple functionally smaller modules. This achieves the same operation without affecting the technical effect of the embodiments of this application. The above modules are based on logical function division. In practical applications, the function of one module can be implemented by multiple modules, or the function of multiple modules can be implemented by one module. In other feasible implementations of this application, the device may also include other modules. In practical applications, these functions can also be implemented with the assistance of other modules, and can be implemented collaboratively by multiple modules, without limitation.
[0107] In some feasible implementations, the above Figures 3 to 5 The implementation methods provided for each step in the watermark embedding method shown can be derived from... Figure 6 Each module of the watermark embedding device 60 shown is executed. For example, the above... Figure 3 In the watermark embedding method shown, step S301 can be performed by... Figure 6 The image preprocessing module 610 in the device shown performs the step S302, which can be performed by... Figure 6 The frequency domain weighting module 620 in the device shown executes step S303, which can be performed by... Figure 6 The watermark generation module 630 in the device shown executes the step S304, which can be performed by... Figure 6 The watermark embedding module 640 in the device shown executes steps S305 and S306, which can be performed by... Figure 6 The image generation module 650 in the illustrated device is executed. The implementation method of this module can be found in the implementation methods provided in the various steps of the above embodiments, and will not be repeated here.
[0108] In summary, the watermark embedding device provided in this application embodiment comprises at least an image preprocessing module, a frequency domain weighting module, a watermark generation module, a watermark embedding module, and an image generation module, and may also include other functional sub-modules, which are not limited herein. This watermark embedding device can obtain the frequency domain image feature coefficient matrix of the original image, and then perform frequency-band weighting on the frequency domain image feature coefficient matrix based on the image feature weighting coefficients corresponding to multiple frequency bands to obtain a weighted frequency domain image feature coefficient matrix, and generate watermark information corresponding to each frequency band. Then, the watermark information corresponding to each frequency band can be embedded into the weighted frequency band image feature matrix according to the watermark embedding instruction information corresponding to each frequency band, generating a watermarked frequency domain image feature coefficient matrix. Finally, based on the original image, the watermarked frequency domain image feature coefficient matrix is subjected to hybrid domain fusion processing in the spatial and frequency domains to generate a fused image feature matrix after watermark embedding, thereby generating a watermarked target image. Watermark information for each frequency band can be dynamically generated based on frequency domain image features, and frequency bands can be dynamically selected for watermark embedding. This improves the dynamic adaptability and anti-attack capability of watermark embedding, resulting in high robustness and applicability. Therefore, the watermark embedding device provided in this application embodiment can enhance user experience and strengthen product market competitiveness, providing a robust and secure watermark embedding solution for various enterprises.
[0109] See Figure 7 , Figure 7 This is a schematic diagram of the structure of the terminal device provided in the embodiments of this application. For example... Figure 7 As shown, the terminal device 70 can be the above-mentioned Figures 1a to 5 The terminal device in the corresponding embodiment, such as Figure 1a The terminal device 20 is described above. The terminal device 70 may include a processor 701, a network interface 704, and a memory 705. Furthermore, the terminal device 70 may also include a user interface 703 and at least one communication bus 702. The communication bus 702 is used to enable communication between these components. The user interface 703 may include a display screen and a keyboard; optionally, the user interface 703 may also include a standard wired interface or a wireless interface. The network interface 704 may optionally include a standard wired interface or a wireless interface (such as a Wi-Fi interface). The memory 705 may be high-speed RAM or non-volatile memory, such as at least one disk storage device. Optionally, the memory 705 may also be at least one storage device located remotely from the aforementioned processor 701. Figure 7 As shown, the memory 705, which is a computer-readable storage medium, may include an operating system, a network communication module, a user interface module, and a device control application.
[0110] The network interface 704 in the terminal device 70 can also be connected to the aforementioned Figure 1a The service server 10 in the corresponding embodiment is connected to the network, and the optional user interface 703 may further include a display screen and a keyboard. Figure 7 In the terminal device 70 shown, the network interface 704 provides network communication functionality; the user interface 703 is mainly used to provide an input interface for the user; and the processor 701 can be used to call the device control application stored in the memory 705 to implement the aforementioned... Figure 3 The watermark embedding method in the corresponding embodiment.
[0111] It should be understood that the terminal device 70 described in the embodiments of this application can execute the foregoing text. Figures 3 to 5 The description of watermark embedding in the corresponding embodiments will not be repeated here. Furthermore, the beneficial effects of using the same method will also not be repeated.
[0112] This application also provides a computer-readable storage medium storing a computer program, the computer program including program instructions, which are implemented when executed by a processor. Figures 3 to 5 The watermark embedding methods provided in each step are detailed above. Figures 3 to 5 The implementation methods provided for each step will not be elaborated here.
[0113] The aforementioned computer-readable storage medium can be the watermark embedding device provided in any of the foregoing embodiments or the internal storage unit of the aforementioned terminal device, such as the hard drive or memory of the terminal device. The computer-readable storage medium can also be an external storage device of the terminal device, such as a plug-in hard drive, smart media card (SMC), secure digital (SD) card, flash card, etc., equipped on the terminal device. Furthermore, the computer-readable storage medium can include both internal storage units and external storage devices of the terminal device. The computer-readable storage medium is used to store the computer program and other programs and data required by the terminal device. The computer-readable storage medium can also be used to temporarily store data that has been output or will be output.
[0114] This application also provides a computer program product or computer program, which includes computer instructions stored in a computer-readable storage medium. The processor of a computer device can load and execute the computer instructions, enabling the computer device to perform the aforementioned actions. Figures 3 to 5 The watermark embedding methods provided in each step are detailed above. Figures 3 to 5The implementation methods provided for each step will not be elaborated here.
[0115] The term "comprising," and any variations thereof, in the specification, claims, and drawings of this application are intended to cover a non-exclusive inclusion. For example, a process, method, apparatus, product, or device that includes a series of steps or modules is not limited to the listed steps or modules, but may optionally include steps or modules not listed, or may optionally include other steps or modules inherent to such processes, methods, apparatus, products, or devices.
[0116] Those skilled in the art will recognize that the modules and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both. To clearly illustrate the interchangeability of hardware and software, the components and steps of the various examples have been generally described in terms of functionality in the foregoing description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.
[0117] The methods and related apparatus provided in this application are described with reference to the method flowcharts and / or structural diagrams provided in this application. Specifically, each block of the method flowcharts and / or structural diagrams, as well as combinations of blocks in the flowcharts and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing device, create means for implementing the functions specified in one or more blocks of the flowcharts and / or one or more blocks of the structural diagrams. These computer program instructions can also be stored in a computer-readable storage medium capable of directing a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means that implement the functions specified in one or more blocks of the flowcharts and / or one or more blocks of the structural diagrams. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer-implemented process, such that the instructions, which execute on the computer or other programmable apparatus, provide steps for implementing the functions specified in one or more flowcharts and / or one or more blocks in the structural diagram.
[0118] The above-disclosed embodiments are merely preferred embodiments of this application and should not be construed as limiting the scope of this application. Therefore, any equivalent variations made in accordance with the claims of this application shall still fall within the scope of this application.
Claims
1. A watermark embedding method, characterized in that, include: Obtain the frequency domain image feature coefficient matrix of the original image; The frequency domain image feature coefficient matrix is subjected to frequency band weighting processing based on the image feature weighting coefficients corresponding to multiple frequency bands to obtain a weighted frequency domain image feature coefficient matrix. Based on the weighted frequency domain image feature coefficient matrix, watermark information corresponding to each frequency band in the multiple frequency bands is generated; Based on the watermark embedding indication information corresponding to the multiple frequency bands, the watermark information corresponding to each frequency band, and the weighted frequency band image feature matrix, a watermarked frequency domain image feature coefficient matrix is generated, wherein the watermark embedding indication information corresponding to the multiple frequency bands is used to indicate the embedding position of the watermark information and / or the embedding strength of the watermark information. Based on the original image and the watermarked frequency domain image feature coefficient matrix, the watermarked frequency domain image feature coefficient matrix is subjected to hybrid domain fusion processing in the spatial and frequency domains to generate a fused image feature matrix after watermark embedding. A target image containing the watermark is generated based on the feature matrix of the fused image after the watermark is embedded.
2. The method according to claim 1, characterized in that, The frequency domain image feature coefficient matrix of the original image is obtained as follows: Obtain the spatial domain image patch feature matrix of the original image; Obtain the basis matrix for frequency domain transformation of image features; The spatial image block feature matrix is transformed in the frequency domain using the basis matrix to obtain the frequency domain image feature coefficient matrix.
3. The method according to claim 2, characterized in that, The process of obtaining the spatial image patch feature matrix of the original image includes: Obtain the original image, and perform image normalization and block segmentation on the original image; The spatial domain image block feature matrix of the original image is generated based on the image data of each image block obtained after the block division.
4. The method according to claim 2 or 3, characterized in that, The step of obtaining the basis matrix for the frequency domain transformation of image features includes: Construct an initial basis matrix with the same matrix size as the feature matrix of the spatial domain image patch, wherein the initial values of the matrix elements of the initial basis matrix are standard discrete cosine transform basis functions; The initial basis matrix is used as the trainable parameters of the neural network, and multiple sets of sample data are obtained as the training input data of the neural network. The matrix elements of the initial basis matrix are optimized and trained multiple times using the backpropagation algorithm to obtain the trained initial basis matrix. The trained initial basis matrix is used as the basis matrix for image feature frequency domain transformation. The backpropagation algorithm uses a loss function constructed according to the orthogonal regularization term of the basis matrix to optimize and update the matrix elements of the basis matrix.
5. The method according to any one of claims 1-4, characterized in that, The multiple frequency bands include a first frequency band, a second frequency band, and a third frequency band obtained by dividing the spatial frequency based on the matrix coordinate system corresponding to the feature coefficient matrix of the frequency domain image; wherein, the first frequency band, the second frequency band, and the third frequency band are arranged in ascending order of frequency, the first frequency band corresponds to the main structure of the original image, the second frequency band corresponds to the edge transition region of the original image, and the third frequency band corresponds to the noise and texture details in the original image.
6. The method according to claim 5, characterized in that, The step of performing frequency-band weighting processing on the frequency domain image feature coefficient matrix based on image feature weighting coefficients corresponding to multiple frequency bands to obtain a weighted frequency domain image feature coefficient matrix includes: The amplitude and phase in the frequency domain image feature coefficient matrix are respectively input into the weight generation network. After multi-layer convolution processing in the weight generation network, a first image feature weighting coefficient matrix is obtained. The first image feature weighting coefficient matrix includes multiple first image feature weighting coefficients corresponding to image feature coefficients at different spatial positions in the frequency domain image feature coefficient matrix. Different weight scaling coefficients are applied to the first image feature weighting coefficients corresponding to the first frequency band, the second frequency band, and the third frequency band in the first image feature weighting coefficient matrix to generate a second image feature weighting coefficient matrix. The weight scaling coefficients include a gain coefficient for enhancing the first image feature weighting coefficients and an attenuation coefficient for suppressing the first image feature weighting coefficients. Obtain learnable scaling parameters, and generate a weighted frequency domain image feature coefficient matrix according to the second image feature weighting coefficient matrix, the learnable scaling parameters, and the frequency domain image feature coefficient matrix, wherein the learnable scaling parameters are used to scale the second image feature weighting coefficient matrix.
7. The method according to claim 5 or 6, characterized in that, The step of generating watermark information corresponding to each frequency band in the plurality of frequency bands based on the weighted frequency domain image feature coefficient matrix includes: Obtain a pseudo-random sequence generated using the first key and image hash, and generate the first watermark information corresponding to the first frequency band according to the pseudo-random sequence and the sign function value of the weighted frequency domain image feature coefficient matrix; Based on the quantization step size and the weighted frequency domain image feature coefficient matrix, the second watermark information corresponding to the second frequency band is generated. The third watermark information corresponding to the third frequency band is generated according to the intensity coefficient, the weighted frequency domain image feature coefficient matrix, the local mean of the weighted frequency domain image feature coefficient matrix, and the random matrix controlled by the second key, so as to obtain the watermark information corresponding to each frequency band in the multiple frequency bands.
8. The method according to claim 2, characterized in that, The watermark embedding indication information corresponding to the multiple frequency bands includes the frequency band mask corresponding to the multiple frequency bands; after generating the watermark information corresponding to each frequency band in the multiple frequency bands and before generating the watermarked frequency domain image feature coefficient matrix, the method further includes: Generate frequency band masks corresponding to the multiple frequency bands, wherein the frequency band mask is a binary matrix and the frequency band mask has the same matrix size as the weighted frequency domain image feature coefficient matrix. The frequency band mask is used to set multiple target matrix coordinates that embed the first watermark information, the second watermark information, or the third watermark information in the matrix coordinate system of the weighted frequency band image feature matrix.
9. The method according to claim 8, characterized in that, The watermark embedding indication information corresponding to the multiple frequency bands includes the frequency band mask and watermark embedding strength coefficient corresponding to the multiple frequency bands; the step of generating a watermarked frequency domain image feature coefficient matrix based on the watermark embedding indication information corresponding to the multiple frequency bands, the watermark information corresponding to each frequency band, and the weighted frequency band image feature matrix includes: According to the frequency band mask corresponding to the multiple frequency bands, the watermark embedding strength coefficient, the first watermark information, the second watermark information, and the third watermark information, the weighted frequency domain image feature coefficients corresponding to each target matrix coordinate in the weighted frequency domain image feature coefficient matrix are quantized and modulated to embed the first watermark information, the second watermark information, and the third watermark information into the weighted frequency domain image feature coefficient matrix, thereby obtaining a watermarked frequency domain image feature coefficient matrix.
10. The method according to any one of claims 6-9, characterized in that, The weighted frequency domain image feature coefficient matrix satisfies: in, The weighted frequency domain image feature coefficient matrix is... The frequency domain image feature coefficient matrix is... For the learnable scaling parameter, This is the weighting coefficient matrix for the second image features.
11. The method according to claim 9, characterized in that, The frequency domain image feature coefficient matrix containing the watermark satisfies: in, The frequency domain image feature coefficient matrix containing the watermark is... The weighted frequency domain image feature coefficient matrix is... The watermark embedding strength coefficient is... For the frequency band mask, This refers to the watermark information corresponding to each frequency band.
12. The method according to any one of claims 2-11, characterized in that, The step of performing a hybrid domain fusion process on the watermarked frequency domain image feature coefficient matrix based on the original image and the watermarked frequency domain image feature coefficient matrix in both the spatial and frequency domains to generate a fused image feature matrix with watermark embedding includes: The watermarked frequency domain image feature matrix is inversely transformed using the basis matrix to obtain the spatial domain image block feature matrix after watermark embedding. The spatial semantic feature matrix is extracted from the original image, and the spatial image block feature matrix after watermark embedding is concatenated with the spatial semantic feature matrix along the channel dimension to obtain the fused image feature matrix. Adaptive weighting of the channel dimension is applied to the fused image feature matrix to generate a channel-weighted image feature matrix with the same number of image feature channels as the fused image feature matrix. The channel-weighted image feature matrix is used to emphasize or suppress different image feature channels. Using the frequency domain energy distribution matrix corresponding to the frequency domain image feature coefficient matrix containing the watermark, a spatially weighted image feature matrix with the same feature space size as the fused image feature matrix is generated, wherein the spatially weighted image feature matrix is used to emphasize or suppress different spatial locations in the feature space; The channel-weighted image feature matrix and the spatial-weighted image feature matrix are multiplied element by element to generate the fused image feature matrix after watermark embedding.
13. The method according to claim 12, characterized in that, After generating the watermarked frequency domain image feature coefficient matrix, and before performing a hybrid domain fusion process on the watermarked frequency domain image feature coefficient matrix in the spatial and frequency domains, the method further includes: High-frequency coefficients are filtered in the watermarked frequency domain image feature matrix to suppress high-frequency noise in the watermarked frequency domain image feature matrix; After generating the fused image feature matrix with watermark embedding and before generating the watermarked target image based on the fused image feature matrix with watermark embedding, the method further includes: Threshold filtering is applied to the feature matrix of the fused image after watermark embedding to suppress residual high-frequency noise in the feature matrix of the fused image after watermark embedding.
14. The method according to claim 12 or 13, characterized in that, The feature matrix of the fused image after watermark embedding satisfies: in, The feature matrix of the fused image after watermark embedding. The weighted image feature matrix for the channel, The spatially weighted image feature matrix, The fused image feature matrix, The feature matrix of the spatial image block after the watermark is embedded. The spatial semantic feature matrix is... The frequency domain energy distribution matrix is the frequency domain image feature coefficient matrix corresponding to the watermarked frequency domain image.
15. The method according to any one of claims 12-14, characterized in that, The step of generating a watermarked target image based on the fused image feature matrix after watermark embedding includes: Multiple adversarial samples are introduced using a differentiable operator, and the peak signal-to-noise ratio loss and anti-attack robustness loss are combined to perform adversarial enhancement processing on the fused image feature matrix after watermark embedding, so as to obtain an optimized fused image feature matrix. A watermarked target image is generated based on the optimized fused image feature matrix.
16. The method according to claim 15, characterized in that, The step of generating a watermarked target image based on the optimized fused image feature matrix includes: The optimized fused image feature matrix is transformed in the frequency domain using the basis matrix to obtain the watermark-embedded frequency domain image feature coefficient matrix. The watermark-embedded frequency domain image feature coefficient matrix is then transformed inversely using the basis matrix to obtain the target spatial domain image block feature matrix. The target image containing the watermark is obtained by recombining the target image block matrix and smoothing the boundaries.
17. A watermark embedding device, characterized in that, include: The image preprocessing module is used to obtain the frequency domain image feature coefficient matrix of the original image; The frequency domain weighting module is used to perform frequency-band weighting processing on the frequency domain image feature coefficient matrix based on the image feature weighting coefficients corresponding to multiple frequency bands, so as to obtain a weighted frequency domain image feature coefficient matrix. The watermark generation module is used to generate watermark information corresponding to each frequency band in the multiple frequency bands based on the weighted frequency domain image feature coefficient matrix. The watermark embedding module is used to generate a watermarked frequency domain image feature coefficient matrix based on the watermark embedding indication information corresponding to the multiple frequency bands, the watermark information corresponding to each frequency band, and the weighted frequency band image feature matrix. The watermark embedding indication information corresponding to the multiple frequency bands is used to indicate the embedding position of the watermark information and / or the embedding strength of the watermark information. The watermark embedding module is also used to perform hybrid domain fusion processing on the watermarked frequency domain image feature coefficient matrix in the spatial and frequency domains based on the original image and the watermarked frequency domain image feature coefficient matrix, so as to generate a fused image feature matrix after watermark embedding. The image generation module is used to generate a target image containing the watermark based on the feature matrix of the fused image after the watermark is embedded.
18. A terminal device, characterized in that, The terminal device includes a processor and a memory; The processor is connected to the memory, wherein the memory is used to store program code, and the processor is used to call the program code to execute the method as described in any one of claims 1 to 16.
19. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program adapted to be loaded by a processor and executed as described in any one of claims 1 to 16.
20. A computer program product, characterized in that, The computer program product includes computer instructions stored in a computer-readable storage medium; a processor of a computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions to cause the computer device to perform the method as described in any one of claims 1 to 16.