A digital watermark-based electronic archive copyright traceability method and system
By optimizing the watermark embedding area and intensity through frequency domain decomposition and embedding technology based on digital watermarking, the problem of watermark embedding incompatibility in existing electronic archive copyright protection systems is solved, enabling efficient copyright traceability and distribution tracking, and improving the reliability and efficiency of copyright protection.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HUAJUN TECHNOLOGY (CHONGQING) CO LTD
- Filing Date
- 2026-04-03
- Publication Date
- 2026-06-19
AI Technical Summary
Existing electronic archive copyright protection systems suffer from poor adaptability in watermark embedding area selection, embedding strength, and position control, resulting in low distribution tracking efficiency and insufficient accuracy in verifying copyright identifiers and distribution indexes. This leads to inaccurate or delayed infringement determinations, reducing the reliability and efficiency of copyright protection and distribution tracking.
A digital watermark-based method for tracing copyright of electronic archives is adopted. The first and second embedding regions are obtained through frequency domain decomposition. A frequency domain coefficient matrix is constructed and embedded with copyright identifier code and distribution index code. The amplitude intensity and position distribution of watermark embedding are optimized, distribution information is recorded, and consistency verification and matching analysis are performed when suspected infringement is detected to generate a traceability confidence index.
It enables precise protection and rapid traceability of copyright information in electronic archives, improves the security, reliability and automation of copyright management, reduces the risk of infringement, and improves the efficiency of traceability analysis.
Smart Images

Figure CN121959529B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of copyright tracing technology, specifically to a method and system for tracing the copyright of electronic archives based on digital watermarking. Background Technology
[0002] With the widespread application of electronic archives in government, enterprises, and scientific research, traditional copyright protection methods, relying mainly on manual registration or single watermarking technology, struggle to simultaneously meet the requirements of copyright information integrity, distribution tracking, and efficient infringement determination. Dual watermark embedding and systematic management can reduce human error, improve the reliability of copyright protection, and enable rapid identification of suspected infringing electronic archives through source tracing analysis, providing technical support for the secure management of electronic archives.
[0003] However, existing electronic archive copyright protection systems exhibit significant differences in adaptability regarding watermark embedding area selection, embedding strength, and location control, resulting in low distribution tracking efficiency and insufficient accuracy in verifying copyright identifiers and distribution indexes. In traditional copyright tracing methods, coarse watermark embedding or delayed tracing analysis can lead to inaccurate or delayed infringement determinations, reducing the reliability and efficiency of copyright protection and distribution tracking. Therefore, a digital watermark-based electronic archive copyright tracing method and system is needed to address these issues. Summary of the Invention
[0004] To address the aforementioned technical issues, this paper provides a method and system for tracing the copyright of electronic archives based on digital watermarking. This technical solution solves the problems mentioned in the background section regarding the significant differences in adaptability of electronic archives in terms of watermark embedding area selection, embedding strength, and position control in existing electronic archive copyright protection systems. These problems include low distribution tracking efficiency and insufficient verification accuracy of copyright identifiers and distribution indexes. In traditional copyright tracing methods, coarse watermark embedding or delayed tracing analysis may lead to inaccurate or delayed infringement determination, reducing the reliability and efficiency of copyright protection and distribution tracking.
[0005] To achieve the above objectives, the technical solution adopted by the present invention is as follows:
[0006] A method for tracing the copyright of electronic archives based on digital watermarking includes:
[0007] The electronic archive data to be protected and the distribution identification information are obtained, and the electronic archive data to be protected is subjected to frequency domain decomposition processing to obtain the first embedding region and the second embedding region.
[0008] A first frequency domain coefficient matrix is constructed based on the first embedding region, a copyright identifier code is generated and a first embedding control parameter is determined, and the copyright identifier code is embedded into the first frequency domain coefficient matrix according to the first embedding control parameter to obtain the first embedding result;
[0009] The second embedding control parameters are determined based on the first embedding control parameters, the second frequency domain coefficient matrix corresponding to the second embedding region is constructed, and the distribution index code is generated based on the distribution identification information. The distribution index code is then embedded into the second frequency domain coefficient matrix according to the second embedding control parameters to form the second embedding result.
[0010] Based on the first and second embedding results, generate electronic files with double watermarks, register distribution records, and establish copyright identification coding records and distribution index coding records.
[0011] When a suspected infringing electronic file is detected, it is decomposed in the frequency domain to obtain the first embedded region to be detected and the second embedded region to be detected, and the copyright mark code and the distribution index code to be detected are extracted.
[0012] The copyright identifier matching rate is obtained by performing a consistency check between the copyright identifier code to be detected and the copyright identifier code record. Then, the distribution index code to be detected is matched and analyzed with the distribution index code record to generate a source tracing confidence index and output the source tracing results.
[0013] In an optional embodiment, the step of performing frequency domain decomposition processing on the electronic archive data to be protected to obtain a first embedding region and a second embedding region specifically includes:
[0014] Based on the electronic archives to be protected, the original data of the electronic archives to be protected is obtained, and the archive type information is extracted from the electronic archives to be protected. The original data is subjected to amplitude normalization processing to obtain standardized electronic archive signals. The original data includes text character encoding values, image pixel grayscale values, audio sampling amplitude, and video frame image pixel grayscale values and audio sampling amplitude.
[0015] Perform discrete wavelet transform on the standardized electronic archive signal to decompose it into a low-frequency approximate signal and multiple high-frequency detail signals, forming a preliminary frequency domain signal set;
[0016] For the high-frequency detail signals of the initial frequency domain signal set, the sum of squares of frequency domain coefficients is calculated in the horizontal and vertical directions according to a preset local window to obtain the local energy value. The horizontal local energy value is used as the horizontal detail feature and the vertical local energy value is used as the vertical detail feature to generate a detail feature set.
[0017] Based on the energy value of each detail feature in the local region, the energy values are compared with preset energy thresholds in the horizontal and vertical directions respectively. Horizontal features with energy values greater than the preset energy thresholds are assigned to the horizontal target feature set, and vertical features with energy values greater than the preset energy thresholds are assigned to the vertical target feature set. At the same time, the spatial and temporal positions of each feature are recorded as positional information.
[0018] Based on the target feature set in the horizontal direction, a set of coefficients with small energy variation amplitude and uniform distribution in the horizontal and vertical directions in adjacent local regions is selected to generate the first embedding region candidate signal.
[0019] Based on the set of target features in the vertical direction, a set of coefficients with pixel gray value changes below the perceptible threshold and audio amplitude changes below the masking threshold are selected to generate candidate signals for the second embedding region.
[0020] For the first and second embedded region candidate signals, the frequency domain coefficients in the candidate signals are arranged according to their position information to obtain the first and second embedded region candidate signals with optimized spatial distribution.
[0021] The frequency domain coefficients and their position information in the optimized first embedding region candidate signal are determined as the first embedding region, and the frequency domain coefficients and their position information in the optimized second embedding region candidate signal are determined as the second embedding region.
[0022] In an optional embodiment, the step of constructing a first frequency domain coefficient matrix based on a first embedding region, generating a copyright identifier code and determining first embedding control parameters, and embedding the copyright identifier code into the first frequency domain coefficient matrix according to the first embedding control parameters to obtain a first embedding result specifically includes:
[0023] Based on the frequency domain coefficients and their position information determined in the first embedded region, the frequency domain coefficients are arranged according to the position information to form a first frequency domain coefficient matrix;
[0024] Obtain the corresponding copyright identification information from the electronic archives to be protected, extract the copyright subject identification data, and perform encoding conversion on the copyright subject identification data to generate a copyright identification code sequence;
[0025] The copyright identifier encoding sequence is length-adjusted and format-unified to obtain the adapted copyright identifier encoding.
[0026] Amplitude statistical analysis is performed on the frequency domain coefficients in the first frequency domain coefficient matrix to calculate the mean amplitude and variance of each frequency domain coefficient, thereby obtaining the amplitude distribution characteristics.
[0027] For the frequency domain coefficients in the first frequency domain coefficient matrix, the matrix is divided into several sub-matrix regions of fixed size according to the matrix row and column coordinates. Each sub-matrix region contains several adjacent frequency domain coefficients, and the number and distribution density of frequency domain coefficients in each sub-matrix region are counted to obtain the positional distribution characteristics.
[0028] Based on the amplitude distribution characteristics, statistical analysis is performed on the amplitude of each frequency domain coefficient in the first frequency domain coefficient matrix to obtain the amplitude range and amplitude interval division. Based on the amplitude of the coefficients in each amplitude interval, the embedding modulation amplitude corresponding to different amplitude intervals is determined to form the embedding strength parameter.
[0029] Based on the location distribution characteristics, the number and location of coefficients to be selected for embedding in each region are determined. At the same time, the embedding interval is determined based on the row and column distribution of each embedding coefficient in the matrix, so that the embedding coefficients are evenly distributed in the matrix, forming the embedding position control parameters.
[0030] Based on the embedding position determined by the embedding position control parameters, the embedding modulation amplitude of the corresponding amplitude range in the embedding intensity parameters is associated with each embedding position to form the first embedding control parameters.
[0031] The embedding position is obtained according to the first embedding control parameter, and the corresponding frequency domain coefficient is selected as the embedding carrier in the first frequency domain coefficient matrix.
[0032] The adapted copyright identifier code is embedded into the embedding carrier according to the embedding position and the corresponding embedding modulation amplitude, and the corresponding frequency domain coefficients are amplitude modulated according to the embedding modulation amplitude to form the first embedding result.
[0033] In an optional embodiment, the step of determining the second embedding control parameters according to the first embedding control parameters, constructing the second frequency domain coefficient matrix corresponding to the second embedding region, generating a distribution index code according to the distribution identifier information, and embedding the distribution index code into the second frequency domain coefficient matrix according to the second embedding control parameters to form the second embedding result specifically includes:
[0034] Based on the embedding strength parameter in the first embedding control parameter, the amplitude of the frequency domain coefficients in the second embedding region is statistically analyzed to obtain the amplitude range, which is then mapped to the amplitude interval in the first embedding control parameter to determine the embedding modulation amplitude of the second embedding region.
[0035] Based on the embedding position control parameters in the first embedding control parameters, the frequency domain coefficients of the second embedding region are divided into several sub-regions according to their position information. The number and distribution density of frequency domain coefficients in each sub-region are counted. Combined with the position distribution characteristics in the first embedding control parameters, the number of embedding positions, the embedding position order and the embedding interval of the second embedding region are determined to form the second embedding position control parameters.
[0036] The second embedding modulation amplitude is combined with the embedding position control parameter to form the second embedding control parameter;
[0037] Based on the second embedding control parameters, the frequency domain coefficients and their position information determined in the second embedding region are selected and arranged to construct the second frequency domain coefficient matrix;
[0038] Based on the distribution identification information corresponding to the electronic archives to be protected, the distribution index data is extracted, and the distribution index data is encoded and converted to generate a distribution index encoding sequence.
[0039] The length of the distribution index encoding sequence is adjusted and the format is standardized to obtain the adapted distribution index encoding.
[0040] According to the second embedding control parameters, the frequency domain coefficients in the second embedding region are grouped and sorted to form several embedding candidate units, each candidate unit containing a set of embeddable frequency domain coefficients.
[0041] Based on the distribution index encoding sequence, the length and encoding structure of the sequence are obtained, the encoding is split into several sub-segments, and each sub-segment is assigned to the corresponding embedding candidate unit to obtain the allocation result;
[0042] Based on the allocation results, the allocated coded segments are embedded into the selected frequency domain coefficients according to the embedding modulation amplitude determined by the control parameters, while maintaining the relative order of the coefficients within the candidate units. After the embedding operation of all candidate units, the complete second embedding result is obtained.
[0043] In an optional embodiment, the step of generating an electronic file with dual watermarks based on the first embedding result and the second embedding result, and registering distribution records, establishing copyright identification encoding records and distribution index encoding records, specifically includes:
[0044] Based on the first and second embedding results, the frequency domain coefficients corresponding to the first and second embedding regions are replaced or superimposed in the electronic archive data to be protected according to their respective position information, so as to maintain the relative structure and order of the original data and form a complete electronic archive with dual watermarks.
[0045] Generate unique file identification information for electronic files containing dual watermarks, and record file creation time, generator information, and metadata related to watermark type;
[0046] Based on the archival identification information, the electronic archives to be protected are distributed, and information related to the distribution target, distribution time and distribution method is recorded during the distribution process to generate an electronic archive distribution record table;
[0047] Based on the archive identification information in the distribution record table, a unique copyright identification code record is generated for the corresponding electronic archive;
[0048] Based on the distribution target information in the distribution record table, a corresponding distribution index code record is generated for each distribution action, and a relationship is established with the copyright identifier code.
[0049] In an optional embodiment, the step of performing a consistency check between the copyright identifier code to be detected and the copyright identifier code record, and after the check passes, performing a matching analysis between the distribution index code to be detected and the distribution index code record to generate a source tracing confidence index and outputting the source tracing result, specifically includes:
[0050] When a suspected infringing electronic file is detected, the original data of the electronic file is obtained and amplitude standardization processing is performed to obtain a standardized signal of the electronic file to be detected.
[0051] Based on the standardized electronic archive signal to be tested, extract the frequency domain coefficients of the first and second embedded regions of the standardized electronic archive signal to be tested.
[0052] Extract the copyright identifier code to be detected from the frequency domain coefficients of the first embedded region, and extract the distribution index code to be detected from the frequency domain coefficients of the second embedded region;
[0053] The consistency of the copyright identifier code to be detected with the copyright identifier code record is verified by performing an XOR operation on each bit of the two, counting the number of bits with a zero XOR result, and dividing the number of bits by the total length of the code to obtain the copyright identifier matching rate.
[0054] When the copyright identifier matching rate is not less than the preset copyright matching threshold, the distribution index code to be detected is split into several sub-segments according to the preset length, and the corresponding code in the distribution index code record is split into several record sub-segments according to the same rule;
[0055] Each coded segment to be detected is compared bit by bit with the corresponding record segment, and the number of codes that match completely is counted to obtain the matching status of each segment.
[0056] Based on the matching results, and combined with the location values of the code in spatial and temporal locations, a source tracing confidence index is obtained;
[0057] The source tracing confidence index is compared with the preset source tracing threshold range to obtain the judgment result;
[0058] When the source traceability confidence index is higher than the preset source traceability threshold range, it is judged as serious infringement; when the source traceability confidence index is within the preset source traceability threshold range, it is judged as suspected infringement; when the source traceability confidence index is lower than the preset source traceability threshold range, it is judged as no infringement.
[0059] Based on the judgment results, the matching rate of the electronic file identifier and copyright identifier, the matching status of the distribution index, the source traceability confidence index and the infringement level are recorded in the source traceability report, and the source traceability report is output as the final source traceability result.
[0060] The formula for calculating the source tracing confidence index is as follows:
[0061]
[0062] In the formula, As a confidence index for traceability, The number of codes that completely match the distribution index code in the distribution index code record to be detected. The total length of the distribution index encoding is determined by the following: The first index encoding to be distributed for detection The position value of the bit encoding To distribute the index-encoded record of the first The position value corresponding to the bit encoding. and These are the weighting coefficients.
[0063] Furthermore, a digital watermark-based electronic archive copyright tracing system is proposed to implement any of the copyright tracing methods mentioned above, including:
[0064] The data acquisition module is used to acquire the electronic archive data to be protected and the corresponding distribution identification information, and transmit the acquired data to the frequency domain processing module.
[0065] The frequency domain processing module is used to perform frequency domain decomposition processing on the electronic archive data to be protected, extract the first embedding region and the second embedding region, and generate candidate embedding signals.
[0066] A watermark embedding module is used to construct a first frequency domain coefficient matrix based on a first embedding region and embed copyright identification code, construct a second frequency domain coefficient matrix based on a first embedding control parameter and embed distribution index code, and generate an electronic archive with dual watermarks.
[0067] The distribution registration module is used to distribute electronic files containing double watermarks, record distribution information, and establish copyright identification code records and distribution index code records.
[0068] The source tracing analysis module is used to extract the watermark code to be detected when a suspected infringing electronic file is detected, perform consistency verification and matching analysis with the record, generate a source tracing confidence index, and output the source tracing results.
[0069] In an optional embodiment, the frequency domain processing module includes:
[0070] An amplitude normalization unit is used to perform amplitude normalization processing on the original data of the electronic archive to be protected, so as to obtain a standardized electronic archive signal.
[0071] The wavelet transform unit is used to perform discrete wavelet transform on the standardized electronic archive signal to decompose it into a low-frequency approximate signal and a high-frequency detail signal.
[0072] The detail feature extraction unit is used to calculate the local energy value of the high-frequency detail signal, generate horizontal and vertical detail feature sets, and record the feature location information.
[0073] An embedding region determination unit is used to select high-energy features based on a set of detailed features and optimize their spatial distribution to determine a first embedding region and a second embedding region.
[0074] In an optional embodiment, the watermark embedding module includes:
[0075] The first embedding unit is used to construct a first frequency domain coefficient matrix, generate a copyright identifier code, and embed the copyright identifier code into the first frequency domain coefficient matrix according to the amplitude distribution and position distribution characteristics to obtain a first embedding result.
[0076] The second embedding unit is used to construct a second frequency domain coefficient matrix according to the first embedding control parameters, generate a distribution index code, and embed it into the second frequency domain coefficient matrix to obtain a second embedding result.
[0077] An embedding result generation unit is used to merge the first embedding result and the second embedding result into the electronic file to form a complete electronic file with dual watermarks and transmit it to the distribution registration module.
[0078] In an optional embodiment, the source tracing analysis module includes:
[0079] A watermark extraction unit is used to extract the copyright identifier code and the distribution index code to be detected from the electronic file to be detected.
[0080] The copyright verification unit is used to perform a bit-by-bit consistency verification between the copyright identifier code to be detected and the copyright identifier code record, calculate the copyright identifier matching rate, and determine whether the preset copyright matching threshold has been reached.
[0081] The index matching analysis unit is used to compare the distribution index code to be detected with the distribution index code record bit by bit, count the matching situation, and generate a traceability confidence index by combining the code spatial location and temporal location.
[0082] The source tracing determination unit is used to determine the infringement level based on the source tracing confidence index and the preset source tracing threshold range, and output a source tracing report including electronic file identifier, copyright identifier matching rate, distribution index matching status, source tracing confidence index and infringement level.
[0083] Compared with the prior art, the beneficial effects of the present invention are:
[0084] This solution proposes a method and system for tracing the copyright of electronic archives based on digital watermarking. It collects the electronic archive data to be protected and its corresponding distribution identification information, performs amplitude normalization and frequency domain decomposition on the electronic archive, and extracts candidate signals for a first and second embedding region. A frequency domain coefficient matrix is constructed in the first embedding region and the copyright identification code is embedded therein; a frequency domain coefficient matrix is constructed in the second embedding region and the distribution index code is embedded therein, achieving dual watermark embedding. The amplitude intensity and position distribution of the watermark embedding are optimized based on embedding control parameters to achieve minimal detectability and high robustness of the watermark to the electronic archive. During distribution, the copyright identification code and distribution index code of the electronic archive are recorded, and a complete distribution record is established. When a suspected infringing electronic archive is detected, the watermark code to be detected is extracted, copyright consistency verification and distribution index matching analysis are performed, a traceability confidence index is generated, and the infringement level is determined, resulting in a traceability report. This invention enables precise protection, distribution tracking, and rapid traceability of electronic archive copyright information, significantly improving the security, reliability, and automation level of copyright management, while reducing infringement risks and improving traceability analysis efficiency. Attached Figure Description
[0085] Figure 1 This is a flowchart of a method for tracing the copyright of electronic archives based on digital watermarking, as proposed in this invention.
[0086] Figure 2 This is a flowchart of frequency domain processing and embedding region generation in this invention;
[0087] Figure 3 This is a flowchart of the watermark embedding and source tracing analysis process in this invention;
[0088] Figure 4 This is a system framework diagram of an electronic archive copyright tracing system based on digital watermarking proposed in this invention. Detailed Implementation
[0089] The following description is intended to disclose the invention and enable those skilled in the art to implement it. The preferred embodiments described below are merely examples, and other obvious variations will occur to those skilled in the art.
[0090] Reference Figure 1 - Figure 4As shown in the figure, an embodiment of the present invention provides a method for tracing the copyright of electronic archives based on digital watermarking, comprising:
[0091] The electronic archive data to be protected and the distribution identification information are obtained, and the electronic archive data to be protected is subjected to frequency domain decomposition processing to obtain the first embedding region and the second embedding region.
[0092] A first frequency domain coefficient matrix is constructed based on the first embedding region, a copyright identifier code is generated and a first embedding control parameter is determined, and the copyright identifier code is embedded into the first frequency domain coefficient matrix according to the first embedding control parameter to obtain the first embedding result;
[0093] The second embedding control parameters are determined based on the first embedding control parameters, the second frequency domain coefficient matrix corresponding to the second embedding region is constructed, and the distribution index code is generated based on the distribution identification information. The distribution index code is then embedded into the second frequency domain coefficient matrix according to the second embedding control parameters to form the second embedding result.
[0094] Based on the first and second embedding results, generate electronic files with double watermarks, register distribution records, and establish copyright identification coding records and distribution index coding records.
[0095] When a suspected infringing electronic file is detected, it is decomposed in the frequency domain to obtain the first embedded region to be detected and the second embedded region to be detected, and the copyright mark code and the distribution index code to be detected are extracted.
[0096] The copyright identifier matching rate is obtained by performing a consistency check between the copyright identifier code to be detected and the copyright identifier code record. Then, the distribution index code to be detected is matched and analyzed with the distribution index code record to generate a source tracing confidence index and output the source tracing results.
[0097] Furthermore, the electronic archival data to be protected is subjected to frequency domain decomposition to obtain a first embedding region and a second embedding region, specifically including:
[0098] Based on the electronic archives to be protected, the original data of the electronic archives to be protected is obtained, and the archive type information is extracted from the electronic archives to be protected. The original data is subjected to amplitude normalization processing to obtain standardized electronic archive signals. The original data includes text character encoding values, image pixel gray values, audio sampling amplitude, and video frame image pixel gray values and audio sampling amplitude.
[0099] Perform discrete wavelet transform on the standardized electronic archive signal to decompose it into a low-frequency approximate signal and multiple high-frequency detail signals, forming a preliminary frequency domain signal set;
[0100] For the high-frequency detail signals of the initial frequency domain signal set, the sum of squares of frequency domain coefficients is calculated in the horizontal and vertical directions according to a preset local window to obtain the local energy value. The horizontal local energy value is used as the horizontal detail feature and the vertical local energy value is used as the vertical detail feature to generate a detail feature set.
[0101] Based on the energy value of each detail feature in the local region, the energy values are compared with preset energy thresholds in the horizontal and vertical directions respectively. Horizontal features with energy values greater than the preset energy thresholds are assigned to the horizontal target feature set, and vertical features with energy values greater than the preset energy thresholds are assigned to the vertical target feature set. At the same time, the spatial and temporal positions of each feature are recorded as positional information.
[0102] Specifically, the process begins by acquiring the original data of the electronic archives to be protected, extracting relevant information based on the archive type, and then performing amplitude normalization on the original data. This normalizes text character encoding values, image pixel grayscale values, audio sampling amplitudes, and video frame image pixel grayscale values and audio sampling amplitudes to a standardized range of 0 to 1. For example, for a 512×512 pixel image, the pixel grayscale values are originally distributed between 0 and 255, and after normalization, they are mapped to the range of 0 to 1. For an audio sampling sequence, the sampling amplitudes are originally between -32768 and 32767, and after normalization, they are mapped to the range of -1 to 1. For text archives, the characters are first converted into corresponding character encoding values according to the text order, and then a one-dimensional numerical sequence is constructed according to the order of character appearance. This numerical sequence is then divided by the largest character encoding value for normalization. Subsequently, a three-level discrete wavelet transform is performed on the standardized electronic archival signal, decomposing the signal into low-frequency approximate signals and high-frequency detail signals. The high-frequency detail signals include horizontal, vertical, and diagonal direction coefficients, each corresponding to a 64×64 matrix, which in audio corresponds to a coefficient block of 256 sampling points. The initial frequency domain signal set consists of low-frequency approximate signals and high-frequency detail signals, providing a foundation for subsequent detail feature extraction and embedding region determination. It is worth noting that the choice of wavelet basis functions can be adjusted according to the archival type; for example, Daubechies db4 can be used for images, and Symlets sym4 can be used for audio, balancing signal fidelity and frequency domain resolution. The number of decomposition levels is chosen based on the signal length and frequency distribution characteristics; images typically have three levels, and audio typically has two.
[0103] In the initial frequency domain signal set, local energy values are calculated for high-frequency detail signals in both the horizontal and vertical directions according to preset local windows. The preset local windows are determined based on the spatial resolution of the electronic archive and the size of the frequency domain matrix after wavelet decomposition, ensuring the window size matches the scale of the frequency domain coefficient distribution. For example, for a 512×512 pixel image, after three-level wavelet decomposition, the high-frequency coefficient matrix size is 64×64. Therefore, an 8×8 coefficient block is divided into a local window, uniformly dividing the entire frequency domain matrix into multiple non-overlapping windows. For audio signals, in the high-frequency coefficient sequence obtained from wavelet decomposition, a window can be divided into 256 sampling points to maintain a balance between temporal and frequency resolution. For example, an 8×8 pixel block is used as a window in the image, and 256 sampling points are used as a window in the audio. The sum of squares of the frequency domain coefficients within the window is used as the local energy value. The horizontal local energy value is used to generate horizontal detail features, and the vertical local energy value is used to generate vertical detail features. Simultaneously, the spatial and temporal positions of each feature are recorded, forming a complete set of detail features. The preset energy threshold is adaptively determined based on the statistical results of the energy distribution of all local windows. Specifically, the overall mean and standard deviation of the horizontal and vertical energy values are calculated separately, and the mean plus twice the standard deviation is used as the judgment threshold for the corresponding direction. For example, when the horizontal energy mean is 0.032 and the standard deviation is 0.009, approximately 0.05 can be obtained as the horizontal threshold; when the vertical energy mean is 0.028 and the standard deviation is 0.006, approximately 0.04 can be obtained as the vertical threshold. By comparing the local energy values with these thresholds, windows with a horizontal energy greater than 0.05 are included in the horizontal target feature set, and windows with a vertical energy greater than 0.04 are included in the vertical target feature set, while retaining the corresponding position information. In practical applications, assuming the horizontal local window energy of an image is 0.072, with its starting pixel position at x=128, y=256, this window is labeled as a high-energy horizontal feature; the vertical window energy is 0.045, with a position at x=300, y=512, and is labeled as a high-energy vertical feature. Through this method, the high-energy feature set can reflect regions with relatively concentrated local energy distribution in the frequency domain, while carrying precise spatial location or temporal series information. This provides candidate base data for the subsequent division of the first and second embedding regions, giving the selection of embedding regions a clear frequency domain energy basis.
[0104] Based on the target feature set in the horizontal direction, a set of coefficients with small energy variation amplitude and uniform distribution in the horizontal and vertical directions in adjacent local regions is selected to generate the first embedding region candidate signal.
[0105] Based on the set of target features in the vertical direction, a set of coefficients with pixel gray value changes below the perceptible threshold and audio amplitude changes below the masking threshold are selected to generate candidate signals for the second embedding region.
[0106] Understandably, the perceptibility threshold originates from the perceptual characteristics of the human visual system. It characterizes the minimum degree to which the human eye can perceive changes in the spatial brightness or structure of an image or video signal. When the magnitude of the change in the image signal is below this threshold, the human eye typically cannot clearly perceive the corresponding change, allowing for appropriate information embedding in that area without being visually detected. The masking threshold originates from the auditory masking effect in psychoacoustic models. It describes how, in audio signals, a strong frequency component or time segment can mask weaker signals in its neighboring frequencies or time ranges, making these weaker changes difficult to perceive by the human auditory system. Therefore, in audio processing, determining the corresponding masking threshold is typically used to determine whether a signal change falls within a range imperceptible to the human ear. Determining signal change thresholds based on visual perceptibility and auditory masking characteristics is a commonly used perceptual model method in image processing, audio processing, and digital watermarking. This type of threshold determination method has been widely applied in related signal processing technologies and can be determined using existing visual perception models or psychoacoustic models.
[0107] For the first and second embedded region candidate signals, the frequency domain coefficients in the candidate signals are arranged according to their position information to obtain the first and second embedded region candidate signals with optimized spatial distribution.
[0108] The frequency domain coefficients and their position information in the optimized first embedding region candidate signal are determined as the first embedding region, and the frequency domain coefficients and their position information in the optimized second embedding region candidate signal are determined as the second embedding region.
[0109] Specifically, for the extracted set of horizontal target features, the corresponding local windows are further screened. Candidate embedding positions are determined by analyzing the energy variation amplitude between adjacent regions and the uniformity of frequency domain coefficient distribution in the horizontal and vertical directions. The difference between the local energy values of adjacent windows is calculated. When the difference between the maximum and minimum energy values among several adjacent windows is lower than a preset stability threshold, the region is considered to have a smooth energy change. Simultaneously, the distribution ratio of frequency domain coefficients within the window in the horizontal and vertical directions is statistically analyzed. When the effective coefficient ratio in both directions reaches a preset ratio, the corresponding frequency domain coefficient is selected as the first embedding candidate signal. For example, in a 512×512 pixel image, the horizontal high-energy window is divided into adjacent regions of 8×8. If the energy values of three consecutive windows are 0.061, 0.064, and 0.063, respectively, and their maximum and minimum difference is 0.003, which is less than the stability threshold of 0.01, and the effective coefficients in both the horizontal and vertical directions within the window reach more than 80%, then the frequency domain coefficient corresponding to this region is selected as the first embedding candidate signal. In audio processing, the high-frequency energy of adjacent 256 sampling point windows is compared. When the energy change amplitude is less than 0.005 and the coefficient is evenly distributed in the time series, it is retained as the first embedding candidate signal.
[0110] Simultaneously, for the vertical target feature set, after energy stationarity screening, further screening is performed by combining the amplitude changes of the original spatial or temporal domain signals to ensure the concealment of the embedding region. Specifically, the difference between the maximum and minimum grayscale values of local pixel blocks in the image is calculated and combined with a visually perceptible threshold. When the difference is lower than the visually perceptible threshold, the human eye typically cannot perceive the corresponding pixel change, and its corresponding frequency domain coefficients are retained. Similarly, the root mean square change in amplitude is calculated for the audio window and combined with an auditory masking threshold obtained from an auditory masking model. When the change is lower than the auditory masking threshold, the human ear typically cannot perceive the corresponding signal change, and it is selected as a second embedding candidate signal. For example, in an image, when the maximum and minimum grayscale difference of an 8×8 pixel block is 0.018, which is lower than the visually perceptible threshold of 0.02, its corresponding frequency domain coefficients are retained. In audio, when the root mean square change in amplitude of a window is 0.008, which is lower than the auditory masking threshold of 0.01, its corresponding coefficients are selected into the candidate set. Subsequently, the candidate frequency domain coefficients are arranged according to their spatial or temporal location information. In the image, they are sorted from left to right and from top to bottom according to their original pixel positions, and an interval is maintained between adjacent embedding positions. In the audio, they are arranged according to the sampling order, and a number of unembedded sampling points are reserved between adjacent embedding points, so that the embedding region is evenly distributed on the overall spatial or temporal axis. Finally, the first embedding region and the second embedding region are determined.
[0111] Furthermore, a first frequency domain coefficient matrix is constructed based on the first embedding region, a copyright identifier code is generated, and a first embedding control parameter is determined. The copyright identifier code is then embedded into the first frequency domain coefficient matrix according to the first embedding control parameter to obtain the first embedding result, specifically including:
[0112] Based on the frequency domain coefficients and their position information determined in the first embedded region, the frequency domain coefficients are arranged according to the position information to form a first frequency domain coefficient matrix;
[0113] Obtain the corresponding copyright identification information from the electronic archives to be protected, extract the copyright subject identification data, and perform encoding conversion on the copyright subject identification data to generate a copyright identification code sequence;
[0114] The copyright identifier encoding sequence is length-adjusted and format-unified to obtain the adapted copyright identifier encoding.
[0115] Amplitude statistical analysis is performed on the frequency domain coefficients in the first frequency domain coefficient matrix to calculate the mean amplitude and variance of each frequency domain coefficient, thereby obtaining the amplitude distribution characteristics.
[0116] For the frequency domain coefficients in the first frequency domain coefficient matrix, the matrix is divided into several sub-matrix regions of fixed size according to the matrix row and column coordinates. Each sub-matrix region contains several adjacent frequency domain coefficients, and the number and distribution density of frequency domain coefficients in each sub-matrix region are counted to obtain the positional distribution characteristics.
[0117] Specifically, firstly, based on the determined first embedding region, the selected frequency domain coefficients are arranged according to their spatial location or temporal order in the original electronic archive, transforming the discrete frequency domain data into a regular structured matrix, thus establishing a unified carrying structure for subsequent watermark embedding. In image archives, the frequency domain coefficients are sorted from left to right and from top to bottom according to the original pixel coordinates. When the first embedding region contains 64×64 frequency domain coefficients, a 64-row × 64-column frequency domain coefficient matrix can be formed, with each matrix element corresponding to a unique spatial location. In audio archives, they are arranged according to the sampling time order, for example, with each row consisting of 256 sampling points, so that the rows and columns in the matrix correspond to the time window and frequency components, respectively, thereby ensuring that the frequency domain data remains structurally consistent with the original signal. Subsequently, copyright identification information, such as copyright subject identification, unique archive number, or creator information, is extracted from the electronic archive to be protected and converted into an embeddable encoded sequence, mapping characters or numbers into binary form. For example, when the copyright subject identifier is "ARCH2026", it can be encoded into a fixed-length binary sequence. In order to match the embeddable capacity of the matrix, the encoded sequence is length adapted by padding with zeros, segmenting and repeating, or rearranging to make the number of encoded bits consistent with the number of embeddable frequency domain coefficients, thereby forming a copyright identifier encoded sequence that can be directly used for modulation embedding.
[0118] Simultaneously, statistical analysis is performed on the first frequency domain coefficient matrix to obtain its amplitude distribution characteristics and spatial distribution characteristics, providing a quantitative control basis for embedded modulation. Specifically, firstly, the mean amplitude and dispersion of all frequency domain coefficients in the matrix are statistically analyzed. For example, the mean amplitude of the first frequency domain coefficient matrix corresponding to a 512×512 pixel image is 0.063, with a small overall fluctuation range. Therefore, the embedding modulation amplitude can be controlled within a small interval near this mean to achieve covert embedding. Next, the matrix is divided into several fixed-size sub-regions for distribution density analysis. For example, a 64×64 matrix is divided into 16×16 sub-blocks, each containing 4×4 frequency domain coefficients. The proportion of frequency domain coefficients originating from the first embedding region and whose amplitude falls within a preset reasonable range is calculated as the effective embedding coefficient proportion. This preset reasonable range is centered on the original amplitude of the corresponding frequency domain coefficient and is determined by combining it with the embedding modulation amplitude to ensure the robustness and imperceptibility of the watermark embedding. When a sub-block contains 16 frequency domain coefficients, 15 of them originate from the first embedding region, and their amplitudes all fall within a range centered on their original amplitudes. Within the range determined by the center and the corresponding embedding modulation amplitude, for example, if the original amplitude of a frequency domain coefficient is 0.052 and the corresponding embedding modulation amplitude is 0.003, then its reasonable range is 0.049 to 0.055. Frequency domain coefficients within this range are all determined to meet the embedding conditions. At this time, the number of frequency domain coefficients that meet the embedding conditions is 15, which accounts for approximately 0.94 of the total number of frequency domain coefficients in this sub-region (16). When only 8 frequency domain coefficients in adjacent sub-blocks simultaneously meet the condition of originating from the first embedding region and having amplitudes within the corresponding reasonable range, the effective embedding coefficient ratio of this sub-region is 0.5. When the number of frequency domain coefficients that meet the embedding conditions in a sub-block is less than 8, for example, only 6 meet the conditions, then the ratio is approximately 0.375. In this case, during the subsequent embedding control process, by appropriately reducing the number of embeddings in other sub-regions and increasing the number of embedding coefficients selected in this sub-region, the overall embedding position is kept in a balanced spatial distribution, thereby avoiding excessive concentration or sparseness of embedding coefficients in local areas. By comprehensively analyzing the amplitude and spatial location characteristics, the watermark embedding is based on both frequency domain intensity modulation control and spatial structure uniform distribution constraint, thereby maintaining the overall quality and concealment of electronic archives while ensuring embedding capacity.
[0119] Based on the amplitude distribution characteristics, statistical analysis is performed on the amplitude of each frequency domain coefficient in the first frequency domain coefficient matrix to obtain the amplitude range and amplitude interval division. Based on the amplitude of the coefficients in each amplitude interval, the embedding modulation amplitude corresponding to different amplitude intervals is determined to form the embedding strength parameter.
[0120] Based on the location distribution characteristics, the number and location of coefficients to be selected for embedding in each region are determined. At the same time, the embedding interval is determined based on the row and column distribution of each embedding coefficient in the matrix, so that the embedding coefficients are evenly distributed in the matrix, forming the embedding position control parameters.
[0121] Based on the embedding position determined by the embedding position control parameters, the embedding modulation amplitude of the corresponding amplitude range in the embedding intensity parameters is associated with each embedding position to form the first embedding control parameters.
[0122] The embedding position is obtained according to the first embedding control parameter, and the corresponding frequency domain coefficient is selected as the embedding carrier in the first frequency domain coefficient matrix.
[0123] The adapted copyright identifier code is embedded into the embedding carrier according to the embedding position and the corresponding embedding modulation amplitude, and the corresponding frequency domain coefficients are amplitude modulated according to the embedding modulation amplitude to form the first embedding result.
[0124] Specifically, a systematic amplitude statistical analysis is first performed on all frequency domain coefficients in the first frequency domain coefficient matrix to comprehensively understand the distribution of frequency domain energy within the current embedded region. During the statistical process, not only are the maximum, minimum, and average amplitude values of all coefficients calculated, but the overall fluctuation range is also analyzed by considering the concentration and dispersion trends of the amplitudes, thereby obtaining the dynamic range of frequency domain intensity changes within the embedded region. After obtaining the upper and lower limits of the amplitude, the entire continuous amplitude interval is divided into several continuous sub-intervals using an equidistant or approximately equidistant method, allowing for hierarchical management of frequency domain coefficients at different amplitude levels. For example, when the statistical results show that the amplitude is mainly concentrated between 0.02 and 0.08, this range can be divided into four continuous intervals: 0.02 to 0.035, 0.035 to 0.05, 0.05 to 0.065, and 0.065 to 0.08. Subsequently, for each amplitude interval, the average amplitude of all frequency domain coefficients within that interval is calculated as the representative amplitude for that interval. This representative amplitude is then compared to the maximum amplitude among all frequency domain coefficients to obtain the corresponding normalized amplitude ratio. Based on this, the normalized amplitude ratio is multiplied by a preset maximum embedding modulation amplitude to determine the embedding modulation amplitude corresponding to that amplitude interval. This ensures that the embedding modulation amplitude varies proportionally with the amplitude across different amplitude intervals. The preset maximum embedding modulation amplitude is determined based on the average amplitude of the frequency domain coefficients within the minimum amplitude interval, ensuring that the embedding modulation amplitude does not exceed the minimum amplitude. The perceptible range of amplitude frequency domain coefficients; for example, when the maximum amplitude of all frequency domain coefficients is 0.08, and the average amplitude of a certain amplitude interval is 0.05, the corresponding normalized amplitude ratio is the ratio of 0.05 to 0.08. If the preset maximum embedding modulation amplitude is 0.004, then the embedding modulation amplitude for this interval is 0.004 multiplied by the ratio of 0.05 to 0.08, resulting in a calculation of 0.0025. Conversely, when the average amplitude of another amplitude interval is 0.025, the corresponding embedding modulation amplitude is 0.004 multiplied by the ratio of 0.025 to 0.08, resulting in a calculation of 0.00125. Through this amplitude-level-based modulation mapping mechanism, the embedding strength adaptively adjusts with the frequency domain energy level, ensuring not only stable writing of copyright encoding but also effectively controlling the degree of perturbation of the original signal amplitude structure by the embedding, maintaining a smooth transition in the overall spectrum distribution before and after embedding.
[0125] Simultaneously, after completing amplitude grading and modulation intensity settings, the spatial distribution of frequency domain coefficients in the matrix is further considered for position selection and homogenization control to avoid excessive concentration of embedding carriers in local areas. For example, the entire 64×64 frequency domain matrix is divided into 16×16 sub-matrix regions, each containing 4×4 adjacent frequency domain coefficients. The number of effective coefficients available for embedding and their distribution positions within each sub-region are statistically analyzed. When the number of effective embedding coefficients in a sub-matrix exceeds three, two coefficients are selected as actual embedding positions according to the row priority principle, while maintaining a spacing of at least two elements within the same row to reduce the risk of local energy abrupt changes caused by simultaneous modulation of adjacent coefficients. When the number of effective coefficients in some sub-matrixes is small, a balanced distribution is performed at the overall level, resulting in a dispersed distribution of embedding points in both the horizontal and vertical directions. By statistically analyzing the sub-matrix density and setting spacing rules, the embedding process forms a regular discrete structure in the spatial dimension, avoiding abnormal concentration of frequency domain energy in local areas. Subsequently, the amplitude modulation parameters and position selection rules are combined to form a complete first embedding control parameter system. When forming the first embedding control parameters, firstly, based on the embedding position determined in the embedding position control parameters, the corresponding frequency domain coefficients in the first frequency domain coefficient matrix are located, ensuring that each embedding position corresponds to a unique frequency domain coefficient. Then, according to the amplitude interval to which the amplitude of each frequency domain coefficient belongs, the embedding modulation amplitude of the corresponding amplitude interval in the embedding strength parameters is found, thereby assigning a corresponding embedding modulation amplitude to each embedding position. Based on this, the embedding positions and corresponding embedding modulation amplitudes are associated in a one-to-one correspondence, forming a combined data set containing "embedded positions and corresponding embedding modulation amplitudes," and arranged according to the row and column order of the embedding positions in the frequency domain coefficient matrix to obtain structured first embedding control parameters. For example, when an embedding position is located in the 2nd row and 3rd column of the matrix, with a corresponding frequency domain coefficient amplitude of 0.048, and its corresponding amplitude interval has an embedding modulation amplitude of 0.002, then this position is associated with 0.002 and recorded as an embedding control element. All embedding control elements are combined sequentially to form the complete first embedding control parameters.
[0126] In the process of embedding the adapted copyright identifier code into the embedding carrier, the embedding positions are first determined one by one according to the first embedding control parameters, and each embedding position is located point by point in the first frequency domain coefficient matrix to obtain the frequency domain coefficients corresponding to each embedding position as the embedding carrier. Then, the embedding execution order is determined according to the row priority or column priority order of the embedding position in the matrix, and the copyright identifier code is matched bit by bit according to the order to form a one-to-one correspondence between the encoding sequence and the embedding carrier sequence. During the embedding process, the amplitude of the frequency domain coefficient is modulated according to the modulation amplitude corresponding to the embedding position. When the encoding bit is "1", the corresponding frequency domain coefficient is superimposed on the original amplitude to achieve quantitative offset. When the encoding bit is "0", the original amplitude remains unchanged or a small balance correction is made, thereby completing the bit-by-bit embedding of the copyright identifier code and generating the first embedding result.
[0127] Furthermore, based on the first embedding control parameters, the second embedding control parameters are determined, a second frequency domain coefficient matrix corresponding to the second embedding region is constructed, and a distribution index code is generated based on the distribution identifier information. The distribution index code is then embedded into the second frequency domain coefficient matrix according to the second embedding control parameters to form the second embedding result, specifically including:
[0128] Based on the embedding strength parameter in the first embedding control parameter, the amplitude of the frequency domain coefficients in the second embedding region is statistically analyzed to obtain the amplitude range, which is then mapped to the amplitude interval in the first embedding control parameter to determine the embedding modulation amplitude of the second embedding region.
[0129] Based on the embedding position control parameters in the first embedding control parameters, the frequency domain coefficients of the second embedding region are divided into several sub-regions according to their position information. The number and distribution density of frequency domain coefficients in each sub-region are counted. Combined with the position distribution characteristics in the first embedding control parameters, the number of embedding positions, the embedding position order and the embedding interval of the second embedding region are determined to form the second embedding position control parameters.
[0130] The second embedding modulation amplitude is combined with the embedding position control parameter to form the second embedding control parameter;
[0131] Based on the second embedding control parameters, the frequency domain coefficients and their position information determined in the second embedding region are selected and arranged to construct the second frequency domain coefficient matrix;
[0132] Understandably, based on the amplitude modulation rules already determined in the first embedding control parameters, a complete amplitude statistical analysis is performed on all frequency domain coefficients within the second embedding region to clarify the current energy distribution state of that region. The second embedding region is typically selected from a frequency band in the frequency domain that has moderate energy, stable structure, and does not affect the subject's visual or auditory perception. Each frequency domain coefficient within it corresponds to a spatial frequency component in the original image or a frequency component in the audio signal; therefore, its amplitude directly reflects the energy intensity carried by that frequency component. By calculating the maximum, minimum, and average amplitude values of all coefficients within the second embedding region, the actual amplitude variation range of that region can be obtained. For example, in an image file, when the second embedding region contains 32×32 frequency domain coefficients, statistical analysis shows that the amplitude is mainly distributed between 0.015 and 0.07, indicating that the overall energy of this region is lower than that of the first embedding region but still possesses a certain modulation space. To ensure consistency and controllability of modulation intensity between different embedding regions, this actual amplitude range is mapped proportionally to the amplitude grading system established in the first embedding control parameters; that is, the corresponding modulation amplitude is selected based on the amplitude range. For example, values from 0.015 to 0.03 are mapped to the embedding amplitude of 0.001 corresponding to the original amplitude range of 0.02 to 0.035; values from 0.03 to 0.05 are mapped to the embedding amplitude of 0.002 corresponding to the original amplitude range of 0.035 to 0.05; and values above 0.05 are mapped to higher intervals. This interval mapping method ensures that the second embedding region maintains its amplitude structure while using the same modulation scale as the first embedding region. This guarantees that the embedding strength of the distribution index coding is always controlled within the natural fluctuation range of the original frequency domain coefficients, avoiding abrupt energy jumps or perceptual differences.
[0133] After amplitude mapping is completed, the frequency domain coefficients of the second embedding region are further spatially structured and embedded in a specific location according to the positional distribution rules formed in the first embedding control parameters. Positional distribution characteristics refer to the degree of dispersion of embedding points, row and column spacing rules, and sub-region density control methods formed during the first embedding process. Their purpose is to prevent embedding coefficients from concentrating in local areas, thereby maintaining a balanced spectral structure. In the second embedding region, the 32×32 matrix is also regularly divided according to row and column coordinates, for example, into 8×8 sub-regions. Each sub-region contains 4×4 adjacent frequency domain coefficients. The coefficients within each sub-region are spatially close to each other and belong to adjacent frequency bands. Subsequently, the number of coefficients in each sub-region whose amplitudes satisfy the embedding conditions is counted, and their distribution density is calculated, serving as the basis for selecting the embedding carrier. For example, when there are 8 embeddable coefficients in a certain sub-region, to avoid over-concentration, 4 of them can be selected as actual embedding positions, and a distance of 1 to 2 coefficients can be maintained in the row or column direction to leave buffer space between adjacent embedding points. If there are few embeddable coefficients in a certain sub-region, the number of embeddings in other sub-regions can be adjusted appropriately at the overall level to achieve a balanced distribution across the entire matrix. After combining the above amplitude modulation parameters with the spatial position selection rules, a complete second embedding control parameter is formed. Then, the corresponding frequency domain coefficients are extracted from the second embedding region according to the parameter, and the second frequency domain coefficient matrix is constructed according to the predetermined order, so that the discretely selected embedding carrier is transformed into a structured arrangement, which is convenient for bit-by-bit coding. For example, when the original amplitude of the coefficient in the 3rd row and 5th column of the matrix is 0.038, and the corresponding modulation amplitude after mapping is 0.002, the embedding code "1" is updated to 0.040, and the embedding code "0" is kept at its original value or undergoes a very small amplitude balancing adjustment. Through this embedding control under dual constraints at the amplitude and spatial levels, the second embedding matrix can not only fully carry the distribution index information, but also maintain the continuity and stability of energy distribution in the overall frequency domain structure, thereby ensuring that the electronic archives maintain good concealment and anti-interference capabilities before and after embedding.
[0134] Based on the distribution identification information corresponding to the electronic archives to be protected, the distribution index data is extracted, and the distribution index data is encoded and converted to generate a distribution index encoding sequence.
[0135] The length of the distribution index encoding sequence is adjusted and the format is standardized to obtain the adapted distribution index encoding.
[0136] According to the second embedding control parameters, the frequency domain coefficients in the second embedding region are grouped and sorted to form several embedding candidate units, each candidate unit containing a set of embeddable frequency domain coefficients.
[0137] Based on the distribution index encoding sequence, the length and encoding structure of the sequence are obtained, the encoding is split into several sub-segments, and each sub-segment is assigned to the corresponding embedding candidate unit to obtain the allocation result;
[0138] Based on the allocation results, the allocated coded segments are embedded into the selected frequency domain coefficients according to the embedding modulation amplitude determined by the control parameters, while maintaining the relative order of the coefficients within the candidate units. After the embedding operation of all candidate units, the complete second embedding result is obtained.
[0139] Understandably, the process begins by extracting distribution index data based on the distribution identifier information corresponding to the electronic archive to be protected, and then performing encoding conversion to map the distribution identifier into a suitable embedded digital encoding sequence. During the encoding conversion process, different character types, numbers, and special symbols are all processed according to unified encoding rules. Simultaneously, the encoding sequence undergoes length adjustment and format standardization to match the number of available frequency domain coefficients and grouping structure in the second embedding region. For example, the 256-bit distribution index encoding sequence is divided into 16 sub-segments, each containing 16 bits, ensuring that each encoding segment has a corresponding frequency domain carrier position, avoiding excessively high or low local amplitudes in the embedding. Through this processing, the encoding sequence can be evenly distributed throughout the entire second embedding region, while maintaining frequency domain amplitude uniformity and signal stability.
[0140] Subsequently, according to the second embedding control parameters, the frequency domain coefficients in the second embedding region are grouped and sorted to form embedding candidate units. Each unit contains several embeddable frequency domain coefficients. For example, the second embedding region contains 512 frequency domain coefficients, which can be divided into 16 candidate units, each containing 32 coefficients. The division of candidate units considers the spatial location, amplitude uniformity, and variation trend of adjacent coefficients to ensure smooth and concealed amplitude changes after embedding. The encoded sequence is split into 16 segments before embedding, with each segment corresponding to a candidate unit for embedding. The embedding process strictly follows the embedding amplitude (e.g., the maximum amplitude is adjusted to 0.02) and embedding interval set in the second embedding control parameters to ensure that no perceptible distortion occurs when adjusting the amplitude of each frequency domain coefficient, while maintaining the relative order of coefficients within the candidate unit. After embedding all candidate units, a complete second embedding result is formed, which carries the distribution index information and ensures the stability of the frequency domain signal structure and embedding concealment, providing a reliable foundation for the subsequent generation of electronic archives with double watermarks.
[0141] Furthermore, based on the first and second embedding results, an electronic archive with dual watermarks is generated, and distribution records are registered, establishing copyright identification coding records and distribution index coding records, specifically including:
[0142] Based on the first and second embedding results, the frequency domain coefficients corresponding to the first and second embedding regions are replaced or superimposed in the electronic archive data to be protected according to their respective position information, so as to maintain the relative structure and order of the original data and form a complete electronic archive with dual watermarks.
[0143] Generate unique file identification information for electronic files containing dual watermarks, and record file creation time, generator information, and metadata related to watermark type;
[0144] Based on the archival identification information, the electronic archives to be protected are distributed, and information related to the distribution target, distribution time and distribution method is recorded during the distribution process to generate an electronic archive distribution record table;
[0145] Based on the archive identification information in the distribution record table, a unique copyright identification code record is generated for the corresponding electronic archive;
[0146] Based on the distribution target information in the distribution record table, a corresponding distribution index code record is generated for each distribution action, and a relationship is established with the copyright identifier code.
[0147] Specifically, the frequency domain coefficients in the first and second embedding results are mapped back to the original data of the electronic archive to be protected based on their respective location information, and corresponding replacement or superposition processing is performed. During the mapping process, the spatial structure and temporal order of the original data are maintained to ensure that the embedding operation does not disrupt the overall logic of the text, image, audio, or video. For example, for a 512×512 pixel image, the frequency domain coefficients of the first embedding region correspond to the upper left area of the image, and the frequency domain coefficients of the second embedding region correspond to the lower right area. After embedding, the coefficient values of each embedding region are replaced with the corresponding positions in the original image, maintaining the visual coherence of the entire image. For an audio signal, the frequency domain coefficients of the first and second embedding regions correspond to sampling points at different time periods. After the embedding operation, the audio waveform is smooth, without obvious noise or abrupt changes. Through this processing, a complete electronic archive with dual watermarks is obtained, where the first watermark carries copyright information and the second watermark carries distribution index information, while ensuring the perceptible quality and data integrity of the original electronic archive.
[0148] Subsequently, a unique file identifier is created for the generated electronic files containing dual watermarks, and metadata related to file creation time, generator information, and watermark type is recorded for subsequent management and tracking. Based on the file identifier, the electronic files are distributed, and information such as the target, distribution time, and distribution method for each distribution is recorded, generating an electronic file distribution record table. For example, for an internal sharing, the distribution target is recorded as user A, the distribution time as March 3, 2026 at 10:00 AM, and the distribution method as encrypted transmission; for external distribution, the distribution target is recorded as user B, and the distribution time and transmission method are also recorded accordingly. Then, based on the file identifier information in the distribution record table, a unique copyright identifier code record is generated for each electronic file to correspond with the first embedded watermark; simultaneously, based on the distribution target information, a corresponding distribution index code record is generated for each distribution action and associated with the copyright identifier code, thereby achieving a complete mapping and tracking of electronic file copyright information and distribution information, providing a reliable data foundation for subsequent copyright tracing.
[0149] Furthermore, the consistency between the copyright identifier code to be detected and the copyright identifier code record is verified. After the verification is passed, the distribution index code to be detected is matched and analyzed with the distribution index code record to generate a traceability confidence index and output the traceability results, specifically including:
[0150] When a suspected infringing electronic file is detected, the original data of the electronic file is obtained and amplitude standardization processing is performed to obtain a standardized signal of the electronic file to be detected.
[0151] Based on the standardized electronic archive signal to be tested, extract the frequency domain coefficients of the first and second embedded regions of the standardized electronic archive signal to be tested.
[0152] Extract the copyright identifier code to be detected from the frequency domain coefficients of the first embedded region, and extract the distribution index code to be detected from the frequency domain coefficients of the second embedded region;
[0153] The consistency of the copyright identifier code to be detected with the copyright identifier code record is verified by performing an XOR operation on each bit of the two, counting the number of bits with a zero XOR result, and dividing the number of bits by the total length of the code to obtain the copyright identifier matching rate.
[0154] When the copyright identifier matching rate is not less than the preset copyright matching threshold, the distribution index code to be detected is split into several sub-segments according to the preset length, and the corresponding code in the distribution index code record is split into several record sub-segments according to the same rule;
[0155] Each coded segment to be detected is compared bit by bit with the corresponding record segment, and the number of codes that match completely is counted to obtain the matching status of each segment.
[0156] Based on the matching results, and combined with the location values of the code in spatial and temporal locations, a source tracing confidence index is obtained;
[0157] Specifically, when a suspected infringing electronic file is detected, the system first acquires the original data of the electronic file and performs amplitude normalization processing on it. This normalizes the text character encoding values, image pixel grayscale values, audio sampling amplitudes, and video frame pixel grayscale values and audio sampling amplitudes to a standardized range. For example, image pixel values are normalized to the 0-1 range, and audio amplitudes are normalized to the -1-1 range, ensuring consistency between subsequent frequency domain analysis and the original embedding. Then, the system extracts the frequency domain coefficients of the first and second embedding regions from the normalized electronic file signal. These coefficients correspond to the spatial and temporal positions selected during the original embedding. For example, for a 512×512 image, the frequency domain coefficients of the first embedding region correspond to the upper left 128×128 pixel sub-block, and the second embedding region corresponds to the lower right 128×128 sub-block; for an audio clip, the first embedding region corresponds to the first 5 seconds of sampling, and the second embedding region corresponds to the last 5 seconds of sampling. By extracting these frequency domain coefficients, the copyright identifier code and the distribution index code to be detected can be obtained, providing basic data for subsequent consistency verification and matching analysis.
[0158] Subsequently, the copyright identifier code to be detected is compared with the copyright identifier code records stored in the system for consistency verification. The verification process uses a bitwise XOR operation, counting the number of bits with a zero XOR result, and dividing this number by the total length of the code to obtain the copyright identifier matching rate. For example, if the length of the code to be detected is 256 bits, and the number of bits with a zero XOR result is 250, then the copyright identifier matching rate is approximately 250 / 256 ≈ 97.7%. When the matching rate reaches or exceeds a preset copyright matching threshold (e.g., 95%), the system considers the copyright identifier verification to be passed and further analyzes the distribution index code to be detected: the code sequence is split into several sub-segments according to a preset length (e.g., 256 bits are split into 16 sub-segments, each 16 bits), and the corresponding code in the distribution record is also split into the same sub-segments; then, each sub-segment of the code to be detected is compared bit by bit with the corresponding record sub-segment, and the number of codes that completely match is counted to obtain the matching status of each sub-segment. For example, if 14 out of 16 bits in a certain sub-segment completely match, then the matching rate is 87.5%. For different types of electronic archives, spatial and temporal locations are converted into uniform scalar location values when calculating source tracing confidence. In image archives, the spatial location of each pixel block is represented by its row and column numbers, which can be numbered in row-major order. For example, for a 512×512 pixel image, each 8×8 pixel block is divided into windows, and each window is numbered sequentially from top to bottom and from left to right. The number is directly used as the location value; thus, the location value of the first pixel block in the top left corner is 1, the location value of the first pixel block in the next row increases sequentially, and so on. In audio archives, the temporal location of each sample point is represented by its sequence number in the audio sequence, and the sequence number itself is used as the location value. For example, the location value of the first sample point is 1, the second is 2, and so on. In video archives, each frame is sorted according to its frame number, and the pixel blocks within each frame are numbered in row-major order. The frame number is multiplied by the total number of pixel blocks in each frame, and then the pixel block number within the frame is added to obtain a uniform location value, so that the location values of pixel blocks in different frames increase continuously. In text archives, the sequence number of each character within the document is directly used as its position value, with the first character being 1, the second 2, and so on. In mixed-type electronic archives, spatial and temporal indices can be combined according to the arrangement order of archive elements. For example, the frame index can be multiplied by the number of elements per frame and then added to the intra-frame element index to form a single scalar position value. In this way, all types of electronic archive elements can obtain unique and continuous scalar position values for source tracing confidence calculations, preserving the relative spatial and temporal order of elements while facilitating code matching and statistics. Combining the spatial position and distribution time position of each coded segment in the electronic archive, the system comprehensively calculates a source tracing confidence index to reflect the consistency between the electronic archive and the original distribution record and the degree of potential infringement.
[0159] The source tracing confidence index is compared with the preset source tracing threshold range to obtain the judgment result;
[0160] When the source traceability confidence index is higher than the preset source traceability threshold range, it is judged as serious infringement; when the source traceability confidence index is within the preset source traceability threshold range, it is judged as suspected infringement; when the source traceability confidence index is lower than the preset source traceability threshold range, it is judged as no infringement.
[0161] Based on the judgment results, the matching rate of the electronic file identifier and copyright identifier, the matching status of the distribution index, the source traceability confidence index and the infringement level are recorded in the source traceability report, and the source traceability report is output as the final source traceability result.
[0162] Specifically, the source tracing confidence index calculated in the previous step is compared with a preset source tracing threshold range to determine the infringement level of suspected infringing electronic files. The preset source tracing threshold range is typically determined based on historical infringement case analysis, copyright distribution strategies, and file type characteristics. For example, the system will statistically analyze the distribution of source tracing confidence indices for all known infringing files over a period of time, classifying files with indices exceeding 90% as serious infringements, those between 70% and 90% as suspected infringements, and those below 70% as non-infringing. Through statistical analysis and empirical adjustments, the threshold settings can ensure that they accurately identify high-risk infringements while avoiding misjudging low-risk or normal files. During the determination process, the system comprehensively considers the copyright identifier matching rate, distribution index matching, and the spatial and temporal consistency of the encoding. For example, if the copyright mark matching rate of the electronic file to be tested is 97%, the average matching rate of the distribution index coding segment is 85%, and the comprehensive calculated traceability confidence index is 0.92, then the system determines that the file has a serious risk of infringement; if the confidence index is 0.78, it is determined to be suspected infringement; if the confidence index is only 0.62, it is determined to be non-infringing. This tiered judgment method can effectively distinguish between different degrees of infringement, providing a basis for subsequent management or legal processing.
[0163] Subsequently, the system generates a complete traceability report based on the judgment results, recording the unique identifier of the electronic archive to be detected, the copyright mark matching rate, the distribution index matching status, the traceability confidence index, and the final infringement level. For example, for an image electronic archive, the report will list the archive number, the copyright mark matching rate of 97%, the average matching rate of the distribution index matching sub-segment of 85%, the calculated traceability confidence index of 0.92, and the final infringement level of "serious infringement". The traceability report can be output as a structured file, such as JSON or PDF, for copyright management departments to archive, track, and process subsequently, while ensuring that the traceability process is traceable and verifiable. By statistically analyzing historical data to determine thresholds and comparing them with real-time detection results, the entire electronic archive copyright traceability process forms a closed loop, from watermark extraction and encoding comparison to confidence level determination and report output, which can comprehensively reflect the copyright status and distribution behavior of electronic archives.
[0164] The formula for calculating the source tracing confidence index is as follows:
[0165]
[0166] In the formula, As a confidence index for traceability, The number of codes that completely match the distribution index code in the distribution index code record to be detected. The total length of the distribution index encoding is determined by the following: The first index encoding to be distributed for detection The position value of the bit encoding To distribute the index-encoded record of the first The position value corresponding to the bit encoding. and These are the weighting coefficients.
[0167] Understandably, when calculating the traceability confidence index, the first step is to count the number of codes in the distribution index of the electronic document under test that are completely identical to those in the record. For example, assuming the total length of the distribution index code is one hundred digits, and ninety of them are completely identical to the record, then the number of completely identical codes is ninety. This number, as a percentage of the total length, reflects the level of consistency of the coded content. If the percentage is close to one, it indicates that the document is highly consistent with the original record at the level of coded content, and the possibility of anomalies or tampering is low; if the percentage is low, it indicates that there are significant differences between the document and the record, and the consistency of the coded content is insufficient. In addition, the consistency of the spatial or temporal position of each digit in the code under test in the original record is also considered. For example, in a one hundred-digit code, an average positional deviation of zero to five digits per digit indicates high positional consistency, while a deviation of more than ten digits indicates low consistency. By comprehensively analyzing content consistency and positional consistency, a quantitative judgment can be made on the integrity and potential degree of infringement of the electronic document.
[0168] Subsequently, the consistency of encoded content and the consistency of location are combined according to preset weights to form the final traceability confidence index. Among them, the weight coefficients... The contribution weight to the consistency of the corresponding encoded content, and the weight coefficient. The contribution weight of the corresponding positional consistency, and satisfy + =1, Priority is determined based on copyright protection strategies, and the priority ratio is directly mapped to a weight coefficient value. This weight allocation, combined with practical scenarios of electronic archive copyright protection, was determined through manual experience and extensive source tracing experiments: When prioritizing the protection of copyrighted content, based on experience, the weight of coded content consistency is set as the primary weight, and the weight of location consistency is set as the secondary weight. The priority of content consistency is set to 70%, and the priority of location consistency is set to 30%, with corresponding values... =0.7、 =0.3; When prioritizing source tracing and localization, increase the weight of location consistency, so that... The value should not be lower than 0.5, and should be reduced accordingly. Values; adopted in normal protection mode =0.7、 =0.3 is used as the default optimal empirical value. The obtained traceability confidence index value is between zero and one. A value close to one indicates that the electronic file is highly consistent with the original record, and the possibility of serious infringement is high; an index of around 0.5 indicates suspected infringement, which requires further manual or system verification; an index close to zero indicates that the file differs significantly from the original record, and the possibility of infringement is low. In this way, not only is the degree of matching between the electronic file and the original record quantified, but a scientific basis for infringement determination is also provided, enabling the system to automatically output the corresponding infringement level according to different confidence levels.
[0169] Furthermore, a digital watermark-based electronic archive copyright tracing system is proposed to implement any of the copyright tracing methods mentioned above, including:
[0170] The data acquisition module is used to acquire the electronic archive data to be protected and the corresponding distribution identification information, and transmit the acquired data to the frequency domain processing module.
[0171] The frequency domain processing module is used to perform frequency domain decomposition processing on the electronic archive data to be protected, extract the first embedding region and the second embedding region, and generate candidate embedding signals.
[0172] The watermark embedding module is used to construct a first frequency domain coefficient matrix based on a first embedding region and embed the copyright identifier code, construct a second frequency domain coefficient matrix based on the first embedding control parameters and embed the distribution index code, and generate an electronic archive with double watermarks.
[0173] The distribution registration module is used to distribute electronic files containing double watermarks, record distribution information, and establish copyright identification code records and distribution index code records.
[0174] The source tracing analysis module is used to extract the watermark code to be detected when suspected infringing electronic files are detected, perform consistency verification and matching analysis with the records, generate source tracing confidence index, and output source tracing results.
[0175] Furthermore, the frequency domain processing module includes:
[0176] The amplitude normalization unit is used to perform amplitude normalization processing on the original data of the electronic archives to be protected, so as to obtain standardized electronic archive signals.
[0177] The wavelet transform unit is used to perform discrete wavelet transform on the standardized electronic archive signal, decomposing it into low-frequency approximate signal and high-frequency detail signal;
[0178] The detail feature extraction unit is used to calculate the local energy value of the high-frequency detail signal, generate horizontal and vertical detail feature sets, and record the feature location information.
[0179] The embedding region determination unit is used to select high-energy features based on the set of detailed features and optimize their spatial distribution to determine the first embedding region and the second embedding region.
[0180] Furthermore, the watermark embedding module includes:
[0181] The first embedding unit is used to construct a first frequency domain coefficient matrix, generate a copyright identifier code, and embed the copyright identifier code into the first frequency domain coefficient matrix according to the amplitude distribution and position distribution characteristics to obtain a first embedding result.
[0182] The second embedding unit is used to construct a second frequency domain coefficient matrix according to the first embedding control parameters, generate a distribution index code, and embed it into the second frequency domain coefficient matrix to obtain a second embedding result.
[0183] The embedding result generation unit is used to merge the first embedding result and the second embedding result into the electronic file to form a complete electronic file with dual watermarks and transmit it to the distribution registration module.
[0184] Furthermore, the source tracing analysis module includes:
[0185] The watermark extraction unit is used to extract the copyright identifier code and the distribution index code to be detected from the electronic file to be detected.
[0186] The copyright verification unit is used to perform a bit-by-bit consistency verification between the copyright identifier code to be detected and the copyright identifier code record, calculate the copyright identifier matching rate, and determine whether the preset copyright matching threshold has been reached.
[0187] The index matching analysis unit is used to compare the distribution index code to be detected with the distribution index code record bit by bit, count the matching situation, and generate the traceability confidence index by combining the code spatial location and temporal location.
[0188] The source tracing determination unit is used to determine the level of infringement based on the source tracing confidence index and the preset source tracing threshold range, and outputs a source tracing report that includes electronic file identifier, copyright identifier matching rate, distribution index matching status, source tracing confidence index and infringement level.
[0189] In summary, the advantages of this invention are as follows: By performing frequency domain decomposition on the electronic archival data to be protected, candidate signals for the first and second embedding regions are extracted. Copyright identifier codes are embedded in the first embedding region, and distribution index codes are embedded in the second embedding region, achieving dual watermark embedding. This ensures the integrity of the electronic archival copyright information and supports accurate tracking. Simultaneously, through amplitude normalization, wavelet transform, and high-frequency detail feature extraction, high-energy embedding regions are selected and spatial distribution optimized to minimize the impact of the watermark on the original archival data, ensuring the quality of electronic archival use and the imperceptibility of the watermark. Furthermore, through the first and second embedding control parameters... Precise control over the embedding strength and position of watermarks enhances their robustness and resistance to attacks. During distribution, copyright identification and distribution index codes are established, and combined with a source tracing analysis module, suspected infringing electronic archives undergo copyright verification and distribution index comparison. This allows for the rapid generation of source tracing confidence indicators and determination of infringement levels, achieving efficient and accurate copyright source tracing. Through modular system design, encompassing data acquisition, frequency domain processing, watermark embedding, distribution registration, and source tracing analysis, the electronic archive copyright protection and source tracing process is automated, highly controllable, and highly scalable, improving the security, reliability, and efficiency of copyright management.
[0190] The foregoing has shown and described the basic principles, main features, and advantages of the present invention. Those skilled in the art should understand that the present invention is not limited to the above embodiments. The embodiments and descriptions in the specification are merely principles of the invention. Various changes and modifications can be made to the invention without departing from its spirit and scope, and all such changes and modifications fall within the scope of the claimed invention. The scope of protection claimed by the appended claims and their equivalents is defined.
Claims
1. A method for tracing the copyright of electronic archives based on digital watermarking, characterized in that, include: The electronic archive data to be protected and the distribution identification information are obtained, and the electronic archive data to be protected is subjected to frequency domain decomposition processing to obtain the first embedding region and the second embedding region. A first frequency domain coefficient matrix is constructed based on the first embedding region, a copyright identifier code is generated and a first embedding control parameter is determined, and the copyright identifier code is embedded into the first frequency domain coefficient matrix according to the first embedding control parameter to obtain the first embedding result; The second embedding control parameters are determined based on the first embedding control parameters, the second frequency domain coefficient matrix corresponding to the second embedding region is constructed, and the distribution index code is generated based on the distribution identification information. The distribution index code is then embedded into the second frequency domain coefficient matrix according to the second embedding control parameters to form the second embedding result. Based on the first and second embedding results, generate electronic files with double watermarks, register distribution records, and establish copyright identification coding records and distribution index coding records. When a suspected infringing electronic file is detected, it is decomposed in the frequency domain to obtain the first embedded region to be detected and the second embedded region to be detected, and the copyright mark code and the distribution index code to be detected are extracted. The copyright identifier matching rate is obtained by performing consistency verification between the copyright identifier code to be detected and the copyright identifier code record. Then, the distribution index code to be detected is matched and analyzed with the distribution index code record to generate the source traceability confidence index and output the source traceability results. Specifically, the frequency domain decomposition process performed on the electronic archive data to be protected to obtain the first embedding region and the second embedding region includes: Based on the electronic archives to be protected, the original data of the electronic archives to be protected is obtained, and the archive type information is extracted from the electronic archives to be protected. The original data is subjected to amplitude normalization processing to obtain standardized electronic archive signals. The original data includes text character encoding values, image pixel grayscale values, audio sampling amplitude, and video frame image pixel grayscale values and audio sampling amplitude. Perform discrete wavelet transform on the standardized electronic archive signal to decompose it into a low-frequency approximate signal and multiple high-frequency detail signals, forming a preliminary frequency domain signal set; For the high-frequency detail signals of the initial frequency domain signal set, the sum of squares of frequency domain coefficients is calculated in the horizontal and vertical directions according to a preset local window to obtain the local energy value. The horizontal local energy value is used as the horizontal detail feature and the vertical local energy value is used as the vertical detail feature to generate a detail feature set. Based on the energy value of each detail feature in the local region, the energy values are compared with preset energy thresholds in the horizontal and vertical directions respectively. Horizontal features with energy values greater than the preset energy thresholds are assigned to the horizontal target feature set, and vertical features with energy values greater than the preset energy thresholds are assigned to the vertical target feature set. At the same time, the spatial and temporal positions of each feature are recorded as positional information. Based on the target feature set in the horizontal direction, a set of coefficients with small energy variation amplitude and uniform distribution in the horizontal and vertical directions in adjacent local regions is selected to generate the first embedding region candidate signal. Based on the set of target features in the vertical direction, a set of coefficients with pixel gray value changes below the perceptible threshold and audio amplitude changes below the masking threshold are selected to generate candidate signals for the second embedding region. For the first and second embedded region candidate signals, the frequency domain coefficients in the candidate signals are arranged according to their position information to obtain the first and second embedded region candidate signals with optimized spatial distribution. The frequency domain coefficients and their position information in the optimized first embedding region candidate signal are determined as the first embedding region, and the frequency domain coefficients and their position information in the optimized second embedding region candidate signal are determined as the second embedding region. The step of determining the second embedding control parameter based on the first embedding control parameter specifically includes: Based on the embedding strength parameter in the first embedding control parameter, the amplitude of the frequency domain coefficients in the second embedding region is statistically analyzed to obtain the amplitude range, which is then mapped to the amplitude interval in the first embedding control parameter to determine the embedding modulation amplitude of the second embedding region. Based on the embedding position control parameters in the first embedding control parameters, the frequency domain coefficients of the second embedding region are divided into several sub-regions according to their position information. The number and distribution density of frequency domain coefficients in each sub-region are counted. Combined with the position distribution characteristics in the first embedding control parameters, the number of embedding positions, the embedding position order and the embedding interval of the second embedding region are determined to form the second embedding position control parameters. The second embedding modulation amplitude is combined with the embedding position control parameter to form the second embedding control parameter.
2. The method for tracing the copyright of electronic archives based on digital watermarking according to claim 1, characterized in that, The process of constructing a first frequency domain coefficient matrix based on a first embedding region, generating a copyright identifier code and determining first embedding control parameters, and embedding the copyright identifier code into the first frequency domain coefficient matrix according to the first embedding control parameters to obtain a first embedding result specifically includes: Based on the frequency domain coefficients and their position information determined in the first embedded region, the frequency domain coefficients are arranged according to the position information to form a first frequency domain coefficient matrix; Obtain the corresponding copyright identification information from the electronic archives to be protected, extract the copyright subject identification data, and perform encoding conversion on the copyright subject identification data to generate a copyright identification code sequence; The copyright identifier encoding sequence is length-adjusted and format-unified to obtain the adapted copyright identifier encoding. Amplitude statistical analysis is performed on the frequency domain coefficients in the first frequency domain coefficient matrix to calculate the mean amplitude and variance of each frequency domain coefficient, thereby obtaining the amplitude distribution characteristics. For the frequency domain coefficients in the first frequency domain coefficient matrix, the matrix is divided into several sub-matrix regions of fixed size according to the matrix row and column coordinates. Each sub-matrix region contains several adjacent frequency domain coefficients, and the number and distribution density of frequency domain coefficients in each sub-matrix region are counted to obtain the positional distribution characteristics. Based on the amplitude distribution characteristics, statistical analysis is performed on the amplitude of each frequency domain coefficient in the first frequency domain coefficient matrix to obtain the amplitude range and amplitude interval division. Based on the amplitude of the coefficients in each amplitude interval, the embedding modulation amplitude corresponding to different amplitude intervals is determined to form the embedding strength parameter. Based on the location distribution characteristics, the number and location of coefficients to be selected for embedding in each region are determined. At the same time, the embedding interval is determined based on the row and column distribution of each embedding coefficient in the matrix, so that the embedding coefficients are evenly distributed in the matrix, forming the embedding position control parameters. Based on the embedding position determined by the embedding position control parameters, the embedding modulation amplitude of the corresponding amplitude range in the embedding intensity parameters is associated with each embedding position to form the first embedding control parameters. The embedding position is obtained according to the first embedding control parameter, and the corresponding frequency domain coefficient is selected as the embedding carrier in the first frequency domain coefficient matrix. The adapted copyright identifier code is embedded into the embedding carrier according to the embedding position and the corresponding embedding modulation amplitude, and the corresponding frequency domain coefficients are amplitude modulated according to the embedding modulation amplitude to form the first embedding result.
3. The method for tracing the copyright of electronic archives based on digital watermarking according to claim 1, characterized in that, The process of constructing the second frequency domain coefficient matrix corresponding to the second embedding region, generating a distribution index code based on the distribution identifier information, and embedding the distribution index code into the second frequency domain coefficient matrix according to the second embedding control parameters to form the second embedding result specifically includes: Based on the second embedding control parameters, the frequency domain coefficients and their position information determined in the second embedding region are selected and arranged to construct the second frequency domain coefficient matrix; Based on the distribution identification information corresponding to the electronic archives to be protected, the distribution index data is extracted, and the distribution index data is encoded and converted to generate a distribution index encoding sequence. The length of the distribution index encoding sequence is adjusted and the format is standardized to obtain the adapted distribution index encoding. According to the second embedding control parameters, the frequency domain coefficients in the second embedding region are grouped and sorted to form several embedding candidate units, each candidate unit containing a set of embeddable frequency domain coefficients. Based on the distribution index encoding sequence, the length and encoding structure of the sequence are obtained, the encoding is split into several sub-segments, and each sub-segment is assigned to the corresponding embedding candidate unit to obtain the allocation result; Based on the allocation results, the allocated coded segments are embedded into the selected frequency domain coefficients according to the embedding modulation amplitude determined by the control parameters, while maintaining the relative order of the coefficients within the candidate units. After the embedding operation of all candidate units, the complete second embedding result is obtained.
4. The method for tracing the copyright of electronic archives based on digital watermarking according to claim 1, characterized in that, The process of generating an electronic file with dual watermarks based on the first and second embedding results, registering distribution records, and establishing copyright identification coding records and distribution index coding records specifically includes: Based on the first and second embedding results, the frequency domain coefficients corresponding to the first and second embedding regions are replaced or superimposed in the electronic archive data to be protected according to their respective position information, so as to maintain the relative structure and order of the original data and form a complete electronic archive with dual watermarks. Generate unique file identification information for electronic files containing dual watermarks, and record file creation time, generator information, and metadata related to watermark type; Based on the archival identification information, the electronic archives to be protected are distributed, and information related to the distribution target, distribution time and distribution method is recorded during the distribution process to generate an electronic archive distribution record table; Based on the archive identification information in the distribution record table, a unique copyright identification code record is generated for the corresponding electronic archive; Based on the distribution target information in the distribution record table, a corresponding distribution index code record is generated for each distribution action, and a relationship is established with the copyright identifier code.
5. The method for tracing the copyright of electronic archives based on digital watermarking according to claim 1, characterized in that, The process involves verifying the consistency between the copyright identifier code to be detected and the copyright identifier code record. After successful verification, the process also includes matching and analyzing the distribution index code to be detected with the distribution index code record to generate a source tracing confidence index and outputting the source tracing results. Specifically, this includes: When a suspected infringing electronic file is detected, the original data of the electronic file is obtained and amplitude standardization processing is performed to obtain a standardized signal of the electronic file to be detected. Based on the standardized electronic archive signal to be tested, extract the frequency domain coefficients of the first and second embedded regions of the standardized electronic archive signal to be tested. Extract the copyright identifier code to be detected from the frequency domain coefficients of the first embedded region, and extract the distribution index code to be detected from the frequency domain coefficients of the second embedded region; The consistency of the copyright identifier code to be detected with the copyright identifier code record is verified by performing an XOR operation on each bit of the two, counting the number of bits with a zero XOR result, and dividing the number of bits by the total length of the code to obtain the copyright identifier matching rate. When the copyright identifier matching rate is not less than the preset copyright matching threshold, the distribution index code to be detected is split into several sub-segments according to the preset length, and the corresponding code in the distribution index code record is split into several record sub-segments according to the same rule; Each coded segment to be detected is compared bit by bit with the corresponding record segment, and the number of codes that match completely is counted to obtain the matching status of each segment. Based on the matching results, and combined with the location values of the code in spatial and temporal locations, a source tracing confidence index is obtained; The source tracing confidence index is compared with the preset source tracing threshold range to obtain the judgment result; When the source traceability confidence index is higher than the preset source traceability threshold range, it is judged as serious infringement; when the source traceability confidence index is within the preset source traceability threshold range, it is judged as suspected infringement; when the source traceability confidence index is lower than the preset source traceability threshold range, it is judged as no infringement. Based on the judgment results, the matching rate of the electronic file identifier and copyright identifier, the matching status of the distribution index, the source traceability confidence index and the infringement level are recorded in the source traceability report, and the source traceability report is output as the final source traceability result. The formula for calculating the source tracing confidence index is as follows: In the formula, C is the traceability confidence index, N is the number of codes that completely match the distribution index code to be detected and the distribution index code record, and M is the total length of the distribution index code. The first index encoding to be distributed for detection The position value of the bit encoding To distribute the index-encoded record of the first The position value corresponding to the bit encoding. and These are the weighting coefficients.
6. A copyright tracing system for electronic archives based on digital watermarking, used to implement the copyright tracing method as described in any one of claims 1-5, characterized in that, include: The data acquisition module is used to acquire the electronic archive data to be protected and the corresponding distribution identification information, and transmit the acquired data to the frequency domain processing module. The frequency domain processing module is used to perform frequency domain decomposition processing on the electronic archive data to be protected, extract the first embedding region and the second embedding region, and generate candidate embedding signals. A watermark embedding module is used to construct a first frequency domain coefficient matrix based on a first embedding region and embed copyright identification code, construct a second frequency domain coefficient matrix based on a first embedding control parameter and embed distribution index code, and generate an electronic archive with dual watermarks. The distribution registration module is used to distribute electronic files containing double watermarks, record distribution information, and establish copyright identification code records and distribution index code records. The source tracing analysis module is used to extract the watermark code to be detected when a suspected infringing electronic file is detected, perform consistency verification and matching analysis with the record, generate a source tracing confidence index, and output the source tracing results.
7. The electronic archive copyright tracing system based on digital watermarking according to claim 6, characterized in that, The frequency domain processing module includes: An amplitude normalization unit is used to perform amplitude normalization processing on the original data of the electronic archive to be protected, so as to obtain a standardized electronic archive signal. The wavelet transform unit is used to perform discrete wavelet transform on the standardized electronic archive signal to decompose it into a low-frequency approximate signal and a high-frequency detail signal. The detail feature extraction unit is used to calculate the local energy value of the high-frequency detail signal, generate horizontal and vertical detail feature sets, and record the feature location information. An embedding region determination unit is used to select high-energy features based on a set of detailed features and optimize their spatial distribution to determine a first embedding region and a second embedding region.
8. A copyright tracing system for electronic archives based on digital watermarking according to claim 6, characterized in that, The watermark embedding module includes: The first embedding unit is used to construct a first frequency domain coefficient matrix, generate a copyright identifier code, and embed the copyright identifier code into the first frequency domain coefficient matrix according to the amplitude distribution and position distribution characteristics to obtain a first embedding result. The second embedding unit is used to construct a second frequency domain coefficient matrix according to the first embedding control parameters, generate a distribution index code, and embed it into the second frequency domain coefficient matrix to obtain a second embedding result. An embedding result generation unit is used to merge the first embedding result and the second embedding result into the electronic file to form a complete electronic file with dual watermarks and transmit it to the distribution registration module.
9. A copyright tracing system for electronic archives based on digital watermarking according to claim 6, characterized in that, The source tracing analysis module includes: A watermark extraction unit is used to extract the copyright identifier code and the distribution index code to be detected from the electronic file to be detected. The copyright verification unit is used to perform a bit-by-bit consistency verification between the copyright identifier code to be detected and the copyright identifier code record, calculate the copyright identifier matching rate, and determine whether the preset copyright matching threshold has been reached. The index matching analysis unit is used to compare the distribution index code to be detected with the distribution index code record bit by bit, count the matching situation, and generate a traceability confidence index by combining the code spatial location and temporal location. The source tracing determination unit is used to determine the infringement level based on the source tracing confidence index and the preset source tracing threshold range, and output a source tracing report including electronic file identifier, copyright identifier matching rate, distribution index matching status, source tracing confidence index and infringement level.