A method and apparatus for embedding and extracting hash slot watermarks based on structured data

By embedding watermark bit fragments into the low-order part of structured data using hash slot watermarking technology and extracting the watermark using hash slot voting mechanism, the problem that existing watermarking schemes cannot resist data sorting, disordering, and deletion is solved, thus maintaining the original usability of the data.

CN122046320BActive Publication Date: 2026-06-30SHENZHEN OLYM INFORMATION SECURITY TECHOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHENZHEN OLYM INFORMATION SECURITY TECHOLOGY CO LTD
Filing Date
2026-04-16
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing structured data watermarking schemes cannot resist data sorting, out-of-order, deletion and noise attacks, and modifying the data structure affects data availability.

Method used

By using hash slot watermarking technology, watermark bit fragments are embedded into the variable low-order part of structured data, and the watermark is extracted through hash slot voting mechanism. Stable slot index values ​​are generated using the identifier and the fixed part of the embeddable data, so that watermark embedding and extraction do not depend on the physical order of the data.

Benefits of technology

It can effectively extract watermarks even under data disorder, deletion and noise attacks, and maintain the original usability with only minor data modifications.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122046320B_ABST
    Figure CN122046320B_ABST
Patent Text Reader

Abstract

This application provides a method and apparatus for embedding and extracting hash slot watermarks based on structured data, relating to the field of electronic watermarking. The embedding method includes: acquiring watermark data to be embedded and structured data to be embedded, and determining the total number of slots in the watermark data to be embedded based on the length of the watermark data to be embedded and the preset number of embedding bits per row; generating a slot index value for the target row based on the identifier of the target row of the structured data to be embedded, a first preset portion of the embeddable data of the target row, and the total number of slots; determining the watermark bit fragment to be embedded in the target row from the watermark data to be embedded based on the slot index value and the preset number of embedding bits per row; and embedding the watermark bit fragment into the second preset portion of the embeddable data of the target row. By generating the slot index value through the identifier and the fixed portion of the embeddable data, the watermark embedding does not depend on the physical order of the data; relying only on the identifier and the embeddable data, the watermark can still be restored even if other columns are missing.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of digital watermarking, specifically to a method and apparatus for embedding and extracting hash slot watermarks based on structured data. Background Technology

[0002] With the advent of the big data era, structured data (such as database tables, CSV records, etc.) has become an important component of enterprises' core assets. In business scenarios such as data sharing, outsourcing, and auditing, data owners often need to provide structured data to external organizations or partners. This structured data is often sorted, disordered, has rows deleted, or has noise samples added.

[0003] Digital watermarking, as a proactive protection method, embeds invisible identification information into raw data to trace its source and protect copyright. Traditional watermarking schemes that rely on line order or fixed block division struggle to maintain stable payload mapping, resulting in existing schemes failing to provide verifiable diagnostic information. Therefore, a structured data watermarking mechanism that is independent of line order and fixed blocks, and recoverable from out-of-order, line deletion, and noise attacks, is needed.

[0004] Traditional watermarking techniques primarily employ methods such as conventional encryption and adding fake rows or columns. These methods not only modify the detailed data record's content and introduce anomalous data, but also violate the requirement to preserve the original data format or content for data analysis consistency. Once external parties reorder the detailed data record, perform batch copying / merging, or partially delete it, or physically alter the embedded data, disputes arising from the inability to detect the watermark can arise. Summary of the Invention

[0005] In view of the aforementioned problems, this application is proposed to provide a method and apparatus for embedding and extracting hash slot watermarks based on structured data to overcome or at least partially solve the aforementioned problems, comprising:

[0006] A hash slot watermark embedding method based on structured data includes the following steps:

[0007] Obtain the watermark data to be embedded and the structured data to be embedded, and determine the total number of slots of the watermark data to be embedded based on the length of the watermark data to be embedded and the preset number of embedding bits per row;

[0008] The slot index value of the target row is generated based on the identifier of the target row of the structured data to be embedded, the first preset part of the embeddable data of the target row, and the total number of slots;

[0009] Based on the slot index value and the preset number of embedding bits for the row, the watermark bit segment to be embedded in the target row is determined from the watermark data to be embedded.

[0010] The watermark bit fragment is embedded into a second preset portion of the embeddable data of the target row; wherein the first preset portion and the second preset portion constitute the embeddable data of the target row, the first preset portion is a fixed high-order portion of the embeddable data that remains unchanged during the process; the second preset portion is a variable low-order portion of the embeddable data that can be modified during the process.

[0011] Furthermore, the step of generating the slot index value of the target row based on the identifier of the target row of the structured data to be embedded, the first preset portion of the embeddable data of the target row, and the total number of slots includes:

[0012] A hash calculation is performed on the identifier of the target row and the first preset portion of the embeddable data of the target row to obtain the corresponding hash value;

[0013] The slot index value corresponding to the target row is obtained by modulo the hash value by the total number of slots.

[0014] Furthermore, the step of embedding the watermark bit fragment into the second preset portion of the embeddable data of the target row includes:

[0015] Replace the second preset portion of the embeddable data in the target row with the watermark bit fragment;

[0016] Alternatively, the value can be adjusted within a preset range based on the difference between the current low-order value of the embeddable data and the watermark bit segment.

[0017] A hash slot watermark extraction method based on structured data is used to extract watermarks embedded by the above-mentioned embedding method, including the following steps:

[0018] Obtain the watermark data to be extracted and the structured data to be extracted, determine the total number of slots based on the length of the watermark data to be extracted and the preset number of embedded bits per row, and establish a voting box for each hash slot;

[0019] The slot index value of the target row is determined based on the identifier of the target row of the structured data to be extracted, the fixed high-order part of the embeddable data of the target row, and the total number of slots;

[0020] The target hash slot corresponding to the target row is determined based on the slot index value, and the votes are counted in the voting box of the target hash slot based on the low-order value of the embeddable data of the target row.

[0021] Based on the vote count results of each hash slot, the candidate watermark value of each hash slot is determined through weight analysis, and the target original watermark value is obtained after combination and verification.

[0022] Furthermore, the step of determining the candidate watermark value for each hash slot based on the vote counting results of each hash slot through weight analysis includes:

[0023] Determine the basic weights of the candidate values ​​in the ballot box;

[0024] If the total number of votes in the corresponding hash slot is less than a preset threshold, the votes are added to the basic weight to obtain the statistical weight.

[0025] If the total number of votes in the corresponding hash slot is greater than or equal to the preset threshold, then the average number of votes in the hash slot is determined, and the statistical weight is determined based on the deviation between the candidate value and the average value.

[0026] When the total number of votes in the hash slot reaches the preset minimum noise count threshold, candidate values ​​with a vote percentage less than the preset ratio are filtered out.

[0027] The statistical weights of the candidate values ​​are normalized to obtain the final weights of the candidate values.

[0028] Furthermore, the watermark data to be extracted includes a message authentication code and an error correction code; the step of obtaining the target original watermark value after combination and verification includes:

[0029] Extract the candidate value with the highest final weight in the hash slot and concatenate them to obtain the initial watermark sequence;

[0030] If the initial watermark sequence contains empty slots and the number of empty slots is within the range that the error correction code can correct, then correction is performed according to the error correction code.

[0031] If the number of empty slots exceeds the erasure range, then exhaustively search the low-weight slots in order of increasing weight, and verify them in conjunction with the error correction code, until the message authentication code verification is passed.

[0032] An embedding device for hash slot watermarks based on structured data, comprising:

[0033] The hash slot calculation module is used to obtain the watermark data to be embedded and the structured data to be embedded, and to determine the total number of slots of the watermark data to be embedded based on the length of the watermark data to be embedded and the preset number of embedding bits per row.

[0034] The slot index first calculation module is used to generate the slot index value of the target row based on the identifier of the target row of the structured data to be embedded, the first preset part of the embeddable data of the target row, and the total number of slots.

[0035] The watermark fragment calculation module is used to determine the watermark bit fragment to be embedded in the target row from the watermark data to be embedded based on the slot index value and the preset number of embedding bits for the row.

[0036] A watermark embedding module is used to embed the watermark bit fragment into a second preset portion of the embeddable data of the target row; wherein, the first preset portion and the second preset portion constitute the embeddable data of the target row; the first preset portion is a fixed high-order portion of the embeddable data, which remains unchanged during the process; the second preset portion is a variable low-order portion of the embeddable data, which can be modified during the process.

[0037] An extraction device for hash slot watermarks based on structured data, used to extract watermarks embedded by the aforementioned embedding device, comprising:

[0038] The voting box creation module is used to acquire the watermark data to be extracted and the structured data to be extracted, determine the total number of slots based on the length of the watermark data to be extracted and the preset number of embedded bits per row, and create a voting box for each hash slot.

[0039] The second slot index calculation module is used to determine the slot index value of the target row based on the identifier of the target row of the structured data to be extracted, the fixed high-order part of the embeddable data of the target row, and the total number of slots.

[0040] The slot voting module is used to determine the target hash slot corresponding to the target row based on the slot index value, and to count the votes in the voting box of the target hash slot based on the low-order value of the embeddable data of the target row as the voting value.

[0041] The watermark restoration module is used to determine the candidate watermark value of each hash slot based on the vote counting results of each hash slot through weight analysis, and obtain the target original watermark value after combination and verification.

[0042] A computer electronic device includes a processor, a memory, and a computer program stored in the memory and capable of running on the processor. When executed by the processor, the computer program implements the steps of the hash slot watermark embedding method based on structured data and the steps of the hash slot watermark extraction method based on structured data as described above.

[0043] A computer-readable storage medium storing a computer program, which, when executed by a processor, implements the steps of the hash slot watermark embedding method based on structured data and the hash slot watermark extraction method based on structured data as described above.

[0044] This application has the following advantages:

[0045] In the embodiments of this application, addressing the technical problem that watermarks in the prior art cannot resist attacks from sorting / randomization, deletion, small sample attacks, or noisy data, this application provides a solution that embeds watermarks in segments into the low bits of corresponding slot data by hashing slots using identifiers and fixed portions of variable data. Specifically, the solution involves: obtaining watermark data to be embedded and structured data to be embedded, and determining the total number of slots in the watermark data to be embedded based on the length of the watermark data to be embedded and the preset number of embedding bits per row; generating a slot index value for the target row based on the identifier of the target row of the structured data to be embedded, a first preset portion of the embeddable data of the target row, and the total number of slots; determining the watermark bit segments to be embedded in the target row from the watermark data to be embedded based on the slot index value and the preset number of embedding bits per row; and embedding the watermark bit segments into the second preset portion of the embeddable data of the target row. The first preset portion and the second preset portion constitute the embeddable data of the target row. The first preset portion is a fixed high-order portion of the embeddable data that remains unchanged during the process; the second preset portion is a variable low-order portion of the embeddable data that can be modified during the process. By generating stable slot index values ​​from the fixed portion of the identifier and embeddable data, watermark embedding and extraction can be achieved without relying on the physical order of the data; watermark embedding can be completed with only two columns of identifier and embeddable data, and the watermark can still be restored even when other data columns are missing; by embedding the watermark bit fragment into the low bits of the embeddable data, only minor modifications to the data can maintain the original usability of the data. Attached Figure Description

[0046] To more clearly illustrate the technical solution of this application, the drawings used in the description of this application will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0047] Figure 1 This is a flowchart illustrating the steps of a hash slot watermark embedding method based on structured data, provided in one embodiment of this application.

[0048] Figure 2 This is a flowchart illustrating a hash slot watermark embedding method based on structured data according to an embodiment of this application;

[0049] Figure 3 This is a flowchart illustrating the steps of a hash slot watermark extraction method based on structured data according to an embodiment of this application.

[0050] Figure 4 This is a flowchart illustrating a hash slot watermark extraction method based on structured data according to an embodiment of this application.

[0051] Figure 5 This is a structural block diagram of a hash slot watermark embedding device based on structured data provided in one embodiment of this application;

[0052] Figure 6 This is a structural block diagram of a hash slot watermark extraction device based on structured data provided in one embodiment of this application;

[0053] Figure 7 This is a schematic diagram of the structure of a computer electronic device provided in an embodiment of the present invention;

[0054] 1. Computer electronic device; 2. External device; 3. Processing unit; 4. Bus; 5. Network adapter; 6. I / O interface; 7. Display; 8. Memory; 9. Random access memory; 10. Cache memory; 11. Storage system; 12. Program / utility; 13. Program module. Detailed Implementation

[0055] To make the objectives, features, and advantages of this application more apparent and understandable, the application will be further described in detail below with reference to the accompanying drawings and specific embodiments. Obviously, the described embodiments are only some, not all, of the embodiments of this application. All other embodiments obtained by those skilled in the art based on the embodiments of this application without inventive effort are within the scope of protection of this application.

[0056] The inventors, through analysis of existing technologies, discovered that current structured data watermarking schemes generally suffer from a dependency on data order. Regardless of whether the scheme is based on row order or fixed block partitioning, the watermark embedding position is bound to the physical order of the data. When the data is reordered by a third party, the row order is disrupted, and the watermark cannot be extracted.

[0057] Furthermore, existing solutions lack robustness against deletion and noise. They employ a deterministic mapping method of embedding and extracting watermarks at one point, relying on only a few data rows and lacking redundancy mechanisms. Once these rows are deleted, the watermark information is permanently lost. Some solutions also carry watermarks by adding pseudo-columns or inserting pseudo-rows, altering the original data structure and impacting data usability.

[0058] Based on this, the inventors recognized the need for a watermarking scheme that is independent of line order, resistant to attacks, and requires minimal modification to the original data.

[0059] It should be noted that, in any embodiment of the present invention, the watermark embedding method and the watermark extraction method are applied to copyright protection, source tracking, or integrity verification of structured data. The watermark embedding method achieves minor modification of the original data by embedding watermark information into the low-order bits of the data, and the watermark extraction method restores the watermark information from the surviving data through a hash slot voting mechanism. Together, they constitute a complete data traceability solution.

[0060] Reference Figure 1 and Figure 2 This illustration shows a hash slot watermark embedding method based on structured data according to an embodiment of this application:

[0061] S110. Obtain the watermark data to be embedded and the structured data to be embedded, and determine the total number of slots of the watermark data to be embedded based on the length of the watermark data to be embedded and the preset number of embedding bits per row.

[0062] S120. Generate a slot index value for the target row based on the identifier of the target row of the structured data to be embedded, the first preset portion of the embeddable data of the target row, and the total number of slots.

[0063] S130. Based on the slot index value and the preset number of embedded bits for the row, determine the watermark bit segment that needs to be embedded in the target row from the watermark data to be embedded.

[0064] S140. Embed the watermark bit fragment into a second preset portion of the embeddable data of the target row; wherein, the first preset portion and the second preset portion constitute the embeddable data of the target row; the first preset portion is a fixed high-order portion of the embeddable data, which remains unchanged during the process; the second preset portion is a variable low-order portion of the embeddable data, which can be modified during the process.

[0065] In the embodiments of this application, addressing the technical problem that watermarks in the prior art cannot resist attacks from sorting / out-of-order, deletion, small sample, or noisy data, this application provides a solution that embeds the watermark in segments into the lower bits of the corresponding slot data by hashing the fixed portion of the identifier and the variable data. By generating a stable slot index value through the identifier and the fixed portion of the embeddable data, watermark embedding and extraction can be achieved without depending on the physical order of the data; watermark embedding can be completed with only two columns, the identifier and the embeddable data, and the watermark can still be restored even when other data columns are missing; by embedding the watermark bit segments into the lower bits of the embeddable data, only minor modifications to the data are needed to maintain the original usability of the data.

[0066] The following will further describe a hash slot watermark embedding method based on structured data in this exemplary embodiment.

[0067] In one embodiment of the present invention, the specific process of step S110, "acquiring watermark data to be embedded and structured data to be embedded, and determining the total number of slots of the watermark data to be embedded based on the length of the watermark data to be embedded and the preset number of embedding bits per row", can be further explained in conjunction with the following description.

[0068] In one embodiment of the present invention, the specific process of "obtaining watermark data to be embedded and structured data to be embedded" can be further described in conjunction with the following description.

[0069] It should be noted that the watermark data to be embedded refers to the original information that needs to be hidden within structured data for subsequent data traceability and copyright verification. The watermark data to be embedded can be a copyright identifier, company identifier, timestamp, random number, or any string used to identify the source of the data. The watermark data to be embedded can be copyright information "BANK-A-2025", company name "ABC Technology Co., Ltd.", user ID "10086", product serial number "SN20250001", randomly generated digital fingerprint "0x12345678" or timestamp "20250101120000".

[0070] As an example, the acquired watermark data is preprocessed: the watermark to be embedded is converted into binary bits to eliminate the diversity of the original watermark format, facilitating subsequent segmentation, embedding, and extraction operations. A message authentication code is generated using a key pair to match the watermark value, and an error correction code is generated from the watermark and message authentication code to form the preprocessed watermark value.

[0071] By adding a message authentication code to the watermark data, it is possible to verify whether the watermark has been tampered with during extraction. By adding a Reed-Solomon error correction code, the original watermark can be recovered even if some watermark data is lost or corrupted. The error correction code generates redundant verification information based on the original data; when data is lost due to deletion, tampering, or other attacks, this verification information can be used for recovery.

[0072] In one specific implementation, the preprocessing steps for the data to be embedded with the watermark are as follows:

[0073] 1. Let the watermark value to be preprocessed be (mark), and generate a binary bit sequence (mark_bits) for this value. If the preprocessed watermark value is 0x12345678, its binary bit sequence is mark_bits = 0001 0010 00110100 0101 0110 0111 1000;

[0074] 2. Let the key be (KEY), calculate the message authentication code (hmac) = HASH(KEY|mark_bits). Here, HASH can be either SM3_HASH or SM4_CMAC. Take the first 12 bits of the message authentication code. Let the generated value here be hmac = 0101 11110010;

[0075] 3. Calculate the Reed-Solomon error correction code (rscode) together with mark_bits and hmac. Each 4 bits is a symbol. GF(15,9) is used. Four symbol check bits are used for verification. It can correct 2 symbols and delete 4 symbols. Let the value obtained here be 0011 1100 1100 0111.

[0076] 4. Finally, concatenate the entire content (mark_bits|hmac|rscode) to obtain the preprocessed watermark value (preprocessed_mark) and the bit length (mark_length) = LEN(preprocessed_mark).

[0077] preprocessed_mark=0001 0010 0011 0100 0101 0110 0111 1000 0101 11110010 0011 1100 1100 0111

[0078] mark_length = 60.

[0079] It should be noted that the structured data to be embedded refers to the target data table for which watermarking is to be embedded, serving as the carrier of the watermark information. The structured data to be embedded is organized in rows and columns, with each row representing a record and each column representing a field. Structured data includes, but is not limited to, database tables, CSV files, Excel spreadsheets, JSON data, or XML data. The structured data to be embedded can be a bank transaction record table (including fields such as mobile phone number, transaction time, transaction amount, and merchant name), an employee information table (including fields such as employee ID, name, start date, and department), a device log table (including fields such as device ID, log time, and log content), or a user behavior data table (including fields such as user ID, behavior time, and behavior type).

[0080] In one embodiment of the present invention, the specific process of "determining the total number of slots of the watermark data to be embedded based on the length of the watermark data to be embedded and the preset number of embedding bits per row" can be further explained in conjunction with the following description.

[0081] It should be noted that the preset number of embedding bits per row is a manually predetermined number of watermark bits that can be embedded in each row of structured data. Its value determines the amount of watermark information carried by a single data entry and matches the bit length of the second preset part of the embeddable data. The data to be embedded with watermark is a continuous complete bit sequence, which cannot be directly embedded into massive data rows. Therefore, the total number of slots is obtained by dividing the length of the data to be embedded with watermark by the preset number of embedding bits per row. The complete watermark is then split into watermark segments with the same number of slots as the total number of slots and each segment having a length equal to the preset number of embedding bits per row. Subsequently, each row of data is calculated to match a unique slot, and then the watermark segment corresponding to that slot is embedded. This ultimately achieves the distributed hiding of the complete watermark in the structured data, and this splitting rule allows the extraction stage to reconstruct and splice the complete watermark according to the slots.

[0082] In one specific implementation, the length of the preprocessed watermark value is calculated and divided by the number of bits embedded per row to obtain the total number of slots. Let the number of bits embedded per row be (per_embed_length), which is taken as per_embed_length = 2 here. The total number of slots (slot_counts) = mark_length / per_embed_length

[0083] slot_counts = 60 / 2 = 30.

[0084] The hash slot generation mechanism makes watermark embedding independent of maintaining the order of structured data.

[0085] In one embodiment of the present invention, the specific process of "generating a slot index value of the target row based on the identifier of the target row of the structured data to be embedded, the first preset portion of the embeddable data of the target row, and the total number of slots" in step S120 can be further described in conjunction with the following description.

[0086] A hash calculation is performed on the identifier of the target row and the first preset portion of the embeddable data of the target row to obtain the corresponding hash value;

[0087] The slot index value corresponding to the target row is obtained by modulo the hash value by the total number of slots.

[0088] It should be noted that fixed values ​​in the structured data are taken as identifiers (IDs). These identifiers are mainly used to identify the subject of the watermark and occupy a primary position in the structured data. Changing or removing them will render the structured data meaningless. Examples of immutable information include primary keys, ID numbers, mobile phone numbers, order numbers, or user IDs. Embeddable data columns in the structured data that are partially variable but do not affect the overall structure are denoted as mutable. These are non-sensitive and fixed information, and are fields in the structured data that allow minor modifications, such as time, latitude and longitude coordinates, non-sensitive consumption amounts (i.e., changing the lowest digit will not affect the overall structure), file or image dimensions, etc.

[0089] The embeddable data consists of a first preset part and a second preset part. When time-based data is used as the embeddable data, the first preset part is the fixed high-order part of the time data, which is the key area representing the core semantics of time. Its value determines the main information of time and remains fixed throughout the entire process of watermark embedding and extraction. The second preset part is the variable low-order part of the time data, which represents the non-core supplementary information of time. It only makes fine-tuning numerical supplements to the time, and its minor modifications will not change the core semantics and practical use value of time. It is specifically used to embed watermark bit segments and can be modified or adjusted during the embedding process.

[0090] In one embodiment of the present invention, the specific process of “performing a hash calculation on the identifier of the target row and the first preset portion of the embeddable data of the target row to obtain the corresponding hash value” can be further described in conjunction with the following description.

[0091] It should be noted that the identifier for each line, along with the fixed portion of the data to be embedded with the watermark, is hashed to obtain a hash value. As an example, the hash calculation can be MD5, SHA-1, SHA-256, SM3, or a custom hash algorithm, mapping an input of arbitrary length to a fixed-length output. This ensures that if the input content remains unchanged, the hash value is unique and unchanging.

[0092] In one specific implementation, there is a data column in CSV format, the contents of which are shown in Table 1:

[0093]

[0094] Table 1

[0095] In this information, since the mobile phone number is key information and modifying it would be meaningless, the mobile phone number is chosen as the ID. If the consumption time is changed in the lower part, it will not affect the overall content, so the consumption time is chosen as the mutable.

[0096] Calculate the first row. ID=131****5678, mutable=20250101030112.

[0097] HASH(ID, mutable >> per_embed_length)

[0098] HASH(131****5678, 2025010<1030112>> 2)

[0099] In an embodiment of the present invention, the specific process of "obtaining the slot index value corresponding to the target row according to the hash value modulo the total number of slots" can be further described in combination with the following description.

[0100] As an example, take the high 4 bits of the hash value and convert them into an integer value, and then modulo this integer value by the total number of slots to obtain the slot index value (slot_index), slot_index = LEFT(HASH(ID, mutable >> per_embed_length), 4) mod slot_counts. The hash value is mapped to the range from 0 to the total number of slots - 1 through the modulo operation, ensuring that the generated slot index values are all valid hash slot numbers, and realizing the uniform distribution of hash slots for all target rows.

[0101] In a specific implementation, continuing with the information in Table 1 above, calculate the first row. ID = 131****5678, mutable = 20250101030112.

[0102] slot_index = LEFT(HASH(ID, mutable >> per_embed_length), 4) mod slot_counts

[0103] slot_index = LEFT(HASH(131****5678, 2025010<1030112>> 2), 4) mod 30

[0104] Ignore the calculation process here, and assume the final value slot_index = 2.

[0105] The second row can also be calculated in the same way.

[0106] In an embodiment of the present invention, the specific process of "determining the watermark bit segment to be embedded in the target row from the watermark data to be embedded according to the slot index value and the preset embedding bit number of the row" in step S130 can be further described in combination with the following description.

[0107] The watermark bit segment is a continuous bit segment in the watermark data to be embedded that corresponds to the slot index of the target row. Specifically, the position of the hash slot corresponding to the target row is determined according to the slot index value. The watermark bit segment matching the hash slot is split from the overall watermark data in combination with the preset number of embedded bits for the row and is used to embed into the embeddable data of the row.

[0108] In one specific implementation, for each row of structured data, the slot index value is first calculated, and then the watermark value (mark_embed) at the corresponding position is extracted from the preprocessed watermark value. The extraction rules are as follows:

[0109] mark_embed = preprocessed_mark[per_embed_length*(slot_index-1)..per_embed_length*(slot_index-1)+per_embed_length].

[0110] The data column is in CSV format, and its contents are shown in Table 2:

[0111]

[0112] Table 2

[0113] The generated preprocessed watermark value is: preprocessed_mark=0001 0010 0011 0100 01010110 0111 1000 0101 1111 0010 0011 1100 1100 0111

[0114] The watermark value to be embedded is mark_embed=preprocessed_mark[2*(2-1)..2*(2-1)+1]

[0115] mark_embed= preprocessed_mark[2..3]

[0116] mark_embed= 01

[0117] In one embodiment of the present invention, the specific process of "embedding the watermark bit fragment into the second preset portion of the embeddable data of the target row" in step S140 can be further described in conjunction with the following description.

[0118] Replace the second preset portion of the embeddable data in the target row with the watermark bit fragment;

[0119] Alternatively, the value can be adjusted within a preset range based on the difference between the current low-order value of the embeddable data and the watermark bit segment.

[0120] Understandably, there are two ways to embed data. One is to directly cover the lower per_embed_length bits of the data column that can be embedded with a watermark value. The other is to calculate the difference between the existing lower bits and the data to be embedded, allowing positive and negative adjustments within the range of 2^per_embed_length. The purpose of this method is to make the value change relatively small.

[0121] In one specific implementation, continuing with the specific example in Table 2 of step S130, this is the first embedding method:

[0122] Adjust the lower two bits of mutable=20250101030112 (BCD code) to mark_embed, thus completing the watermark embedding. Finally, after embedding, mutable is 20250101030111 (BCD code).

[0123] In another specific implementation, this is the second embedding method: for time or coordinate data, the numerical range can be reduced using two's complement. Taking binary as an example, if the lower two bits (11) need to be embedded, and the original data bits are 00, when using direct embedding, the new value differs from the original data by +3 (decimal). However, the same lower bit value can also be achieved by subtracting -1.

[0124] For example, the embedded value needs to be 3 (BCD code), which is 11 in binary.

[0125] The original data is 20 (BCD code), corresponding to binary bits (10100). Adding or subtracting 1 to this value will achieve the effect of the last two bits being 11 (binary). Here, we can choose -1, which means that the data change is smaller than the originally expected +3.

[0126] Reference Figure 3 and Figure 4 This illustration shows a hash slot watermark embedding method based on structured data according to an embodiment of this application:

[0127] S210. Obtain the watermark data to be extracted and the structured data to be extracted, determine the total number of slots based on the length of the watermark data to be extracted and the preset number of embedded bits per row, and establish a voting box for each hash slot.

[0128] S220. Determine the slot index value of the target row based on the identifier of the target row of the structured data to be extracted, the fixed high-order part of the embeddable data of the target row, and the total number of slots.

[0129] S230. Determine the target hash slot corresponding to the target row based on the slot index value, and count the votes in the voting box of the target hash slot based on the low-order value of the embeddable data of the target row as the voting value.

[0130] S240. Based on the vote counting results of each hash slot, determine the candidate watermark value of each hash slot through weight analysis, and obtain the target original watermark value after combination and verification.

[0131] In the embodiments of this application, the corresponding slot is obtained by hashing the slot number, and the same data in the slot is voted on. The weight is combined and the watermark value is restored by verification, which can resist various attacks.

[0132] The following will further explain a hash slot watermark extraction method based on structured data in this exemplary embodiment.

[0133] In one embodiment of the present invention, the specific process of step S210, which involves "obtaining watermark data to be extracted and structured data to be extracted, determining the total number of slots based on the length of the watermark data to be extracted and the preset number of embedded bits per row, and establishing a voting box for each hash slot", can be further explained in conjunction with the following description.

[0134] Calculate the total number of slots according to step S110 above and generate a slot voting box. Maintain consistency with the slot allocation rules in the embedding stage to ensure a one-to-one correspondence between the extracted and embedded hash slots; this will not be elaborated further here.

[0135] As an example, if the length of the watermark data to be extracted is 60 bits, and the preset number of embedding bits per row is 2, then the total number of slots is 30. Each slot's voting box contains 2^2 = 4 options, and the initial number of votes is 0 for each option.

[0136] In one specific implementation, a voting structure list (vote_boxes) is first generated. The value of each voting box is generated according to 2^the number of embedded bits per row, forming the following vote_boxes[2^per_embed_length]. Each element corresponds to an enumeration value of 2^per_embed_length. For per_embed_length=1, the value is [0,1]. For per_embed_length=2, the value is [00, 01, 10, 11]. For per_embed_length=3, the value is [000,001,010,011.. 111].

[0137] Then generate slot vote boxes (slot_vote_boxes), which are constructed as a two-dimensional array, as follows: slot_vote_boxes[slot_count, vote_boxes].

[0138] In one embodiment of the present invention, the specific process of "determining the slot index value of the target row based on the identifier of the target row of the structured data to be extracted, the fixed high-order part of the embeddable data of the target row, and the total number of slots" in step S220 can be further explained in conjunction with the following description.

[0139] Understandably, during voting, the slot index value (slot_index) is first calculated to locate the position of the slot voting box. Using the same hash function and calculation method as in the embedding stage, a hash calculation is performed on the fixed high-order bits of the identifier and embeddable data. The hash value is then modulo the total number of slots to obtain the slot index value. This slot index value is used to accurately locate the hash slot corresponding to the watermark embedding in the target row. The specific calculation process for the slot index value is described in step S120 above and will not be repeated here.

[0140] In one embodiment of the present invention, the specific process of step S230, which involves "determining the target hash slot corresponding to the target row based on the slot index value, and using the low-order value of the embeddable data of the target row as the voting value, and counting the votes in the voting box of the target hash slot", can be further explained in conjunction with the following description.

[0141] It should be noted that after finding the target hash slot through the slot index value, the low-order value of the data that can be embedded in the target row (i.e., the second preset part of the embedding stage, which stores the watermark bit fragment) is used as the voting value and counted in the corresponding option of the corresponding voting box, representing the row's vote for the candidate value of the slot.

[0142] As an example, if the slot index is 2 and the low-order value of the embeddable data is "01", then in the ballot box of slot 2, the vote count for option "01" is increased by 1.

[0143] In one specific implementation, during voting, the slot index value `slot_index` is first calculated to locate the position of the slot voting box, and then the low-order value of the currently embeddable data is obtained. Once a specific position is located in the voting list, its count is incremented by 1.

[0144] Table 3 below uses a 4-slot configuration with 2 bits embedded per row as an example, assuming the voting result is as follows:

[0145]

[0146] Table 3

[0147] Let the embeddable data value in the first row be 0xabcd00ee. Convert the least significant bit 'e' to binary 1110. Since the current rule is to embed two bits per row, we take the lower two bits, 10. Calculate slot_index, assuming its value is in slot 0. Then, in the row with slot 0, increment the number of votes with a value of 10 by 1, as shown in Table 4 below:

[0148]

[0149] Table 4

[0150] Processing the other rows in the same way will eventually create a ballot box for all rows. For undamaged and unattacked data, each slot in the ballot box will only have one value containing the vote count. There are a total of 200 rows of data, and an example of the four slots is shown in Table 5 below:

[0151]

[0152] Table 5

[0153] In one embodiment of the present invention, the specific process of step S240, which involves "determining the candidate watermark value of each hash slot based on the vote counting results of each hash slot through weight analysis, and obtaining the target original watermark value after combination and verification," can be further explained in conjunction with the following description.

[0154] Understandably, this process includes two core steps: first, using weight analysis to select the candidate watermark value most likely to be the true watermark value from the voting results of each hash slot, eliminating invalid votes caused by noise attacks; second, concatenating the candidate watermark values ​​of all hash slots into an initial watermark sequence, and restoring the original watermark value through error correction, verification, and other operations, thereby achieving attack-resistant watermark extraction.

[0155] In one embodiment of the present invention, the specific process of "determining the candidate watermark value of each hash slot based on the vote counting results of each hash slot through weight analysis" can be further explained in conjunction with the following description.

[0156] Determine the basic weights of the candidate values ​​in the ballot box;

[0157] If the total number of votes in the corresponding hash slot is less than a preset threshold, the votes are added to the basic weight to obtain the statistical weight.

[0158] If the total number of votes in the corresponding hash slot is greater than or equal to the preset threshold, then the average number of votes in the hash slot is determined, and the statistical weight is determined based on the deviation between the candidate value and the average value.

[0159] When the total number of votes in the hash slot reaches the preset minimum noise count threshold, candidate values ​​with a vote percentage less than the preset ratio are filtered out.

[0160] The statistical weights of the candidate values ​​are normalized to obtain the final weights of the candidate values.

[0161] Specifically, in order to eliminate invalid noise votes from the vote-counting results after an attack, quantify the credibility of each candidate value, and finally select the candidate watermark value closest to the true watermark value for each hash slot, this example uses slot weight analysis to make the weight calculation fit different data volumes and different attack scenarios, ensuring the accuracy of candidate value selection. Slot weight analysis specifically includes steps of basic weight calculation, sample size branch and mean correction, noise clipping and attack, and normalization.

[0162] The following values are defined in the slot weight (slot_weight) scoring rule, where count is the candidate votes, total is the total votes in the slot, μ is the slot mean, and σ is the standard deviation.

[0163] Slot distribution: Suppose there are 30 slots, and each slot carries 2 bits. The number of rows is denoted as N, and the slot sample average value λ = N / 30, which follows an approximate Poisson distribution. When there are 40 rows, λ ≈ 1.33, and the expected number of zero slots is 7 - 8; when there are 100 rows, λ ≈ 3.33, and the number of zero slots approaches 0.

[0164] For basic weight calculation:

[0165] Treat each slot as a "vote box". See which candidate value (00 / 01 / 02 / 03) has more votes and appears concentrated in the same slot. The basic weight is "the number of votes × 1024, and then divided by (32 + the number of rows where this value persists)". The more dispersed the rows are, the larger the denominator, and the lower the weight is suppressed.

[0166] For sample size branch and mean correction:

[0167] According to the size of the total votes in the slot total_votes = Σ count_v, the algorithm is divided into two paths:

[0168] 1. Low sample stage (total_votes < SORT3_HIGH_SAMPLE_THRESHOLD, the threshold is 500): directly accumulate count_v into w_base(v), that is, w_low(v) = w_base(v) + count_v. In this way, in the scenario within 500 rows, the high-vote candidate still has an absolute advantage. If the total number of votes is not much (<500), directly add points to the candidate with more votes;

[0169] 2. High sample stage (total_votes ≥ 50): Calculate the slot mean μ = (Σ v * count_v) / total_votes, and then adjust the weight according to the deviation between the candidate and the mean. When the number of votes is sufficient, see how far it is from the "average value". Those that are far away may be noise and will be divided by (1 + deviation) to make the weight drop:

[0170] deviation_v = |v - μ|, which is the distance between the candidate and the mean.

[0171] w_high(v) = max(1, w_base(v) / (1 + deviation_v))

[0172] Since each slot only has per_embed_length = 2 (candidate values ​​0~3), the deviation_v ranges from [0,3]. The larger the deviation, the larger the denominator, and the more significant the weight decay.

[0173] Ultimately, w_stats(v) is either w_low or w_high obtained from the above branches.

[0174] Main function: If the candidate is close to the mean, deviation_v≈0, the denominator (1+deviation_v)≈1, so w_high≈w_base, and its weight remains basically unchanged. If the candidate is far from the mean, for example, with a deviation of 2, then the denominator is 3, and the weight will be divided by 3, meaning it is more like noise. The max(1, …) layer ensures that even if the division is very small, the weight is at least 1, avoiding the candidate being completely lost in subsequent processes, so that the DFS / erasing strategy can still verify it.

[0175] Regarding noise clipping and attack penalties:

[0176] Two rules are applied before w_stats are calculated:

[0177] Minimum vote threshold: When the total number of votes in a slot (boxes[i].total) ≥ SORT3_MIN_NOISE_COUNT (=8), candidates must satisfy count_v ≥ ceil(total_votes * SORT3_NOISE_RATIO_NUM / SORT3_NOISE_RATIO_DEN) to be retained. SORT3_NOISE_RATIO_NUM=1, SORT3_NOISE_RATIO_DEN=10, meaning a percentage ≥ 10%. If all candidates are filtered, the candidate with the highest weight will be retained as a minimum. Candidates with a vote percentage below 10% will be removed. Main function: When the total number of votes in a slot reaches 8 or more, the system will use a 10% (SORT3_NOISE_RATIO_NUM / DEN) percentage threshold to filter out low-vote noise, avoiding the accidental deletion of all candidates when the sample size is too small.

[0178] For normalization:

[0179] After all candidates have undergone the above weighting, the final slot weights will be uniformly obtained:

[0180] w_final(v) = w_stats(v) / Σ w_stats(v)

[0181] Σ w_stats(v) is the sum of the weights of all candidates in the current slot. w_final is used to construct a priority enumeration list for the DFS / empty slot strategy (lower weight candidates enter DFS first). Its main function is to: represent the absolute score of candidate v after multiple rounds of weighting based on votes, time span, mean deviation, etc.; summing the w_stats of all candidates in each slot gives Σ w_stats(v); dividing a candidate's w_stats(v) by the sum gives the candidate's "share" in that slot, which is the final relative weight w_final(v). This ensures that the sum of w_final for all candidates in the same slot is always equal to 1, facilitating direct comparison of "who is most reliable, who is next, and who should be included in DFS / erased".

[0182] In one specific implementation, as shown in List 6 below:

[0183]

[0184] Table 6

[0185] In one embodiment of the present invention, the specific process of "obtaining the target original watermark value after combination and verification" can be further explained in conjunction with the following description.

[0186] Extract the candidate value with the highest final weight in the hash slot and concatenate them to obtain the initial watermark sequence;

[0187] If the initial watermark sequence contains empty slots and the number of empty slots is within the range that the error correction code can correct, then correction is performed according to the error correction code.

[0188] If the number of empty slots exceeds the erasure range, then exhaustively search the low-weight slots in order of increasing weight, and verify them in conjunction with the error correction code, until the message authentication code verification is passed.

[0189] Specifically, the candidate watermark values ​​selected from each hash slot are verified using erasure coding and message authentication codes to restore the original watermark value consistent with the embedding stage. The base sequence is concatenated using the highest-weighted candidate values, and a small number of empty slots are handled with error-correcting codes. Finally, a large number of empty slots are resolved through finite exhaustive search and message authentication code verification, effectively preventing watermark forgery and tampering. As an example, Reed-Solomon error-correcting codes are used. With small sample data, significant error correction can be performed, achieving the goal of restoring the watermark.

[0190] For verification with message authentication codes:

[0191] Any binary bit sequence generated after the watermark restoration process needs to be verified by the message authentication code.

[0192] The verification process involves obtaining the length of the binary bit value, with the first 32 bits being mark_bits and the last 12 bits being hmac`. Let the key be (KEY), calculate the message authentication code (hmac) = HASH(KEY|mark_bits), calculate the hmac of the first 12 bits of mark_bits, and compare the hmac with hmac`. If they are the same, the verification passes.

[0193] For the Reid-Solomon decoding:

[0194] When using Reed-Solomon erasure with GF(15,9), a 4-bit checksum can be used to verify the erase symbol for 4 symbols. Other GF types can also be selected.

[0195] When 2 bits are placed in the same slot, the two slots form one symbol. See Table 7:

[0196]

[0197] Table 7

[0198] Regarding the watermark restoration process:

[0199] The detection strategy is executed in the following order: "highest weight combination → empty slot filling → Reed-Solomon decoding with erasure → DFS breadth-first search". When Reed-Solomon decoding is successfully performed and the message authentication code is verified, a binary bit sequence will be obtained.

[0200] Extracting the binary bit sequence, the resulting binary value is converted to form the embedded watermark.

[0201] Highest weighted combination:

[0202] If each slot in the slot voting box has at least one total vote, then the data with the highest weight in all slots is serialized. Table 8 uses a 4-slot system with 2 bits embedded per row as an example, assuming the voting result is:

[0203]

[0204] Table 8

[0205] Take the value containing the maximum weight 80 from slot 0, which is 00;

[0206] Take the value of the maximum weight 90 from slot 1, which is 01;

[0207] Take the value of the maximum weight 50 from slot 2, which is 01;

[0208] Take the value of the largest weight 100 from slot 3, which is 10;

[0209] The final watermark sequence is 00 01 01 10.

[0210] Empty slots filled:

[0211] If at most one value in all slots of the ballot box has a vote count, then the empty slot filling scheme is used.

[0212] If there is an empty slot in the sign bit of the slot ballot box, and the total number of sign bits in the empty slot is less than or equal to the range of Reed-Solomon erasure codes, then erasure codes are used directly for correction.

[0213] When the total number of empty slot symbols exceeds the range of Reed-Solomon erasure codes, the DFS breadth-first approach is used.

[0214] DFS (Distance-First Search)

[0215] Based on the weights, the slot with the lowest weight is marked with Reed-Solomon erasure coding first, and the remaining slots with low weights are searched using data depth-first search (DFS).

[0216] For example, see Table 9. There are 30 slots, with weight ratios ranging from 1% to 30%. Exhaustive search is performed based on the search threshold configured in the DFS. The number of exhaustive searches is determined by the threshold.

[0217]

[0218] Table 9

[0219] The following are specific examples:

[0220] Example 1: Baseline Embedding and Detection

[0221] Watermarking is performed on 1000 rows of consumption records. The watermark value is set to 0x1000, with 2 bits embedded per row. The list of consumption records is shown in Table 10.

[0222]

[0223] Table 10

[0224] It uses GF(15,9), with 4 bits per symbol, a 12-bit message authentication code, and a 4-symbol (16-bit) RS checksum. It has a total of 60 bits and 30 slots.

[0225] Due to the uniformity brought about by the hash avalanche effect, the above data can be randomly mapped to 30 slots according to the Poisson exponent. Furthermore, only a single value votes for each slot. See Table 11:

[0226]

[0227] Table 11

[0228] You can directly restore the watermark.

[0229] Example 2: Randomly selecting a portion of the data

[0230] When using 60 bits, with 2 bits per slot, there are 30 slots in total. According to the Poisson distribution, E(30) = 30 × 3.994987 ≈ 119.85 rows. Using the Reed-Solomon error-correcting code, it can correct 4 symbols (8 slots). Adding DFS exhaustive search 4096 times, that is, 4^6 (6 slots), we get E(16) ≈ 63.919792. When randomly selecting 63 rows, almost 99.99% can be restored.

[0231] Example 3: Sorting and Randomization

[0232] Because this invention uses the method of calculating the hash of the identifier and the embeddable data and then generating slots modulo the hash, the slot values ​​are determined due to the determinism of the hash, and the extraction results will not be affected no matter how the order changes.

[0233] Taking the specific data implemented in step S130 as an example, it is shown in Table 12 below:

[0234]

[0235] Table 12

[0236] Regardless of how many rows are before or after N rows, the result of HASH(ID, mutable fixed value) will not be affected. Therefore, its slot_index is stable and can resist sorting and out-of-order attacks.

[0237] Example 4 Noise Attack

[0238] Attack 1: Duplicate Data Attack

[0239] In watermarking, when duplicate data is added, based on the calculation of the slot index value, the duplicate data will all vote for the same value in the same slot. This type of data can be easily extracted based on the current slot weight analysis.

[0240] Attack 2: Random Data Attack

[0241] For random data attacks, due to the principle of hash uniformity, voting will be evenly distributed across the voting slots. When noise attacks occur, the noise can be easily analyzed based on the weights. Furthermore, based on the various processes in watermark restoration, it can effectively resist related attacks.

[0242] As can be seen from the above embodiments, the present invention can reliably extract watermarks in various scenarios, including no-attack, small-sample deletion, disordered sorting, duplicate data insertion, and random noise insertion, verifying the comprehensive anti-attack capability and practicality of the present invention. When embedding the present invention, only the low-order bits of the variable data area need to be changed; during extraction, only the primary key or identifier and the low-order bits of the variable data area are required for extraction, without depending on other columns. It also has the following characteristics:

[0243] Watermark preservation in sorted and out-of-order situations: The original watermark carrier can still be located even after the data is rearranged by timestamp, sequence number or third-party rules.

[0244] Robust detection under large-scale data noise.

[0245] Only a small portion of the samples is needed to reconstruct the image.

[0246] As the device embodiment is basically similar to the method embodiment, the description is relatively simple, and relevant parts can be found in the description of the method embodiment.

[0247] Reference Figure 5 This illustration shows an embodiment of an embedding device for hash slot watermarking based on structured data, specifically including the following modules:

[0248] Specifically, it includes:

[0249] The hash slot calculation module 510 is used to obtain the watermark data to be embedded and the structured data to be embedded, and to determine the total number of slots of the watermark data to be embedded based on the length of the watermark data to be embedded and the preset number of embedding bits per row.

[0250] The slot index first calculation module 520 is used to generate the slot index value of the target row based on the identifier of the target row of the structured data to be embedded, the first preset part of the embeddable data of the target row, and the total number of slots.

[0251] The watermark fragment calculation module 530 is used to determine the watermark bit fragment to be embedded in the target row from the watermark data to be embedded based on the slot index value and the preset number of embedding bits for the row.

[0252] The watermark embedding module 540 is used to embed the watermark bit fragment into a second preset portion of the embeddable data of the target row; wherein the first preset portion and the second preset portion constitute the embeddable data of the target row; the first preset portion is a fixed high-order portion of the embeddable data, which remains unchanged during the process; the second preset portion is a variable low-order portion of the embeddable data, which can be modified during the process.

[0253] In one embodiment of the present invention, the slot index first calculation module 520 includes:

[0254] The hash calculation submodule is used to perform hash calculation on the identifier of the target row and the first preset part of the embeddable data of the target row to obtain the corresponding hash value;

[0255] The modulo module is used to obtain the slot index value corresponding to the target row by modulo the hash value by the total number of slots.

[0256] In one embodiment of the present invention, the watermark embedding module 540 includes:

[0257] The first embedding submodule is used to replace the second preset portion of the embeddable data of the target row with the watermark bit fragment;

[0258] or,

[0259] The second embedding submodule is used to adjust the value of the current low-order bit of the embeddable data within a preset range based on the difference between the watermark bit segment and the current low-order bit of the embeddable data.

[0260] Reference Figure 6 This illustration shows an embodiment of a hash slot watermark extraction device based on structured data, specifically including the following modules:

[0261] Specifically, it includes:

[0262] The voting box establishment module 610 is used to acquire the watermark data to be extracted and the structured data to be extracted, determine the total number of slots based on the length of the watermark data to be extracted and the preset number of embedded bits per row, and establish a voting box for each hash slot.

[0263] The second slot index calculation module 620 is used to determine the slot index value of the target row based on the identifier of the target row of the structured data to be extracted, the fixed high-order part of the embeddable data of the target row, and the total number of slots.

[0264] The slot voting module 630 is used to determine the target hash slot corresponding to the target row based on the slot index value, and to count the votes in the voting box of the target hash slot based on the low-order value of the embeddable data of the target row as the voting value.

[0265] The watermark restoration module 640 is used to determine the candidate watermark value of each hash slot based on the vote counting results of each hash slot through weight analysis, and obtain the target original watermark value after combination and verification.

[0266] In one embodiment of the present invention, the watermark restoration module 640 includes:

[0267] The basic weight calculation submodule is used to determine the basic weight of the candidate values ​​in the voting box;

[0268] The first statistical weight calculation submodule is used to add the number of votes to the basic weight to obtain the statistical weight if the total number of votes in the corresponding hash slot is less than a preset threshold.

[0269] The second statistical weight calculation submodule is used to determine the average number of votes in the hash slot if the total number of votes in the corresponding hash slot is greater than or equal to the preset threshold, and to determine the statistical weight based on the deviation between the candidate value and the average value.

[0270] The noise trimming submodule is used to filter out candidate values ​​whose vote percentage is less than a preset ratio when the total number of votes in the hash slot reaches a preset minimum noise count threshold.

[0271] The normalization submodule is used to normalize the statistical weights of the candidate values ​​to obtain the final weights of the candidate values.

[0272] In one embodiment of the present invention, the watermark restoration module 640 includes:

[0273] The splicing submodule is used to extract the candidate value with the highest final weight in the hash slot and splice them to obtain the initial watermark sequence.

[0274] The correction submodule is used to correct the initial watermark sequence if there are empty slots and the number of empty slots is within the range that the error correction code can delete.

[0275] The verification submodule is used to exhaustively search for low-weight slots in ascending order of weight if the number of empty slots exceeds the erasure range, and to verify them in conjunction with the error correction code until the message authentication code verification is passed.

[0276] Reference Figure 7 The illustration shows a computer electronic device for implementing a hash slot watermark embedding method based on structured data or a hash slot watermark extraction method based on structured data according to the present invention, which may specifically include the following:

[0277] The aforementioned computer electronic device 1 is manifested in the form of a general-purpose computing device. The components of the computer electronic device 1 may include, but are not limited to: one or more processors or processing units 3, memory 8, and a bus 4 connecting different system components (including memory 8 and processing unit 3).

[0278] Bus 4 represents one or more of several bus architectures, including memory buses or memory controllers, peripheral buses, graphics acceleration ports, processors, or local buses using any of the various bus architectures. For example, these architectures include, but are not limited to, the Industry Standard Architecture (ISA) bus, the Micro Channel Architecture (MAC) bus, the Enhanced ISA bus, the Audio / Video Electronics Standards Association (VESA) local bus, and the Peripheral Component Interconnect (PCI) bus.

[0279] Computer electronic device 1 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by computer electronic device 1, including volatile and non-volatile media, removable and non-removable media.

[0280] Memory 8 may include computer system readable media in the form of volatile memory, such as random access memory 9 and / or cache memory 10. Computer electronic device 1 may further include other removable / non-removable, volatile / non-volatile computer system storage media. By way of example only, storage system 11 may be used to read and write non-removable, non-volatile magnetic media (commonly referred to as a "hard disk drive"). Although Figure 7 As not shown, a disk drive for reading and writing to a removable non-volatile disk (such as a "floppy disk") and an optical disk drive for reading and writing to a removable non-volatile optical disk (such as a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 4 via one or more data media interfaces. The memory may include at least one program product having a set (e.g., at least one) of program modules 13 configured to perform the functions of the embodiments of this application.

[0281] A program / utility 12 having a set (at least one) of program modules 13 may be stored, for example, in memory. Such program modules 13 include—but are not limited to—an operating system, one or more application programs, other program modules 13, and program data. Each or some combination of these examples may include an implementation of a network environment. Program modules 13 typically perform the functions and / or methods described in the embodiments of this application.

[0282] The computer electronic device 1 can also communicate with one or more external devices 2 (e.g., keyboard, pointing device, display 7, camera, etc.), and with one or more devices that enable an operator to interact with the computer electronic device 1, and / or with any device that enables the computer electronic device 1 to communicate with one or more other computing devices (e.g., network card, modem, etc.). This communication can be performed through the I / O interface 6. Furthermore, the computer electronic device 1 can also communicate with one or more networks (e.g., local area network (LAN)), wide area network (WAN), and / or public networks (e.g., the Internet) through the network adapter 5. Figure 7 As shown, network adapter 5 communicates with other modules of computer electronic device 1 via bus 4. It should be understood that, although... Figure 7 Not shown, it may be combined with other hardware and / or software modules, including but not limited to: microcode, device drivers, redundant processing unit 3, external disk drive array, RAID system, tape drive and data backup storage system 11, etc.

[0283] The processing unit 3 executes various functional applications and data processing by running programs stored in memory 8, such as implementing a hash slot watermark embedding method based on structured data or a hash slot watermark extraction method based on structured data provided in the embodiments of this application.

[0284] That is, when the processing unit 3 executes the above program, it performs the following: acquiring the watermark data to be embedded and the structured data to be embedded, and determining the total number of slots in the watermark data to be embedded based on the length of the watermark data to be embedded and the preset number of embedding bits per row; generating a slot index value for the target row based on the identifier of the target row of the structured data to be embedded, a first preset portion of the embeddable data of the target row, and the total number of slots; determining the watermark bit fragment to be embedded in the target row from the watermark data to be embedded based on the slot index value and the preset number of embedding bits per row; embedding the watermark bit fragment into the second preset portion of the embeddable data of the target row; wherein, the first preset portion and the second preset portion constitute the embeddable data of the target row, the first preset portion is the fixed high-order portion of the embeddable data, which remains unchanged during the process; the second preset portion is the variable low-order portion of the embeddable data, which can be modified during the process. Alternatively, when the processing unit 3 executes the above procedure, it performs the following: acquiring the watermark data to be extracted and the structured data to be extracted; determining the total number of slots based on the length of the watermark data to be extracted and the preset number of embedding bits per row; and establishing a voting box for each hash slot; determining the slot index value of the target row based on the identifier of the target row of the structured data to be extracted, the fixed high-order part of the embeddable data of the target row, and the total number of slots; determining the target hash slot corresponding to the target row based on the slot index value; and counting the votes in the voting box of the target hash slot based on the low-order part of the embeddable data of the target row as the voting value; and determining the candidate watermark value of each hash slot through weight analysis based on the voting results of each hash slot, and obtaining the target original watermark value after combination and verification.

[0285] In the embodiments of this application, this application also provides a computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements a hash slot watermark embedding method based on structured data or a hash slot watermark extraction method based on structured data as provided in all embodiments of this application.

[0286] That is, when the program is executed by the processor, it performs the following: acquiring the watermark data to be embedded and the structured data to be embedded, and determining the total number of slots in the watermark data to be embedded based on the length of the watermark data to be embedded and the preset number of embedding bits per row; generating a slot index value for the target row based on the identifier of the target row of the structured data to be embedded, a first preset portion of the embeddable data of the target row, and the total number of slots; determining the watermark bit fragment to be embedded in the target row from the watermark data to be embedded based on the slot index value and the preset number of embedding bits per row; embedding the watermark bit fragment into the second preset portion of the embeddable data of the target row; wherein, the first preset portion and the second preset portion constitute the embeddable data of the target row, the first preset portion is the fixed high-order portion of the embeddable data, which remains unchanged during the process; the second preset portion is the variable low-order portion of the embeddable data, which can be modified during the process. Alternatively, when the program is executed by the processor, the following steps can be implemented: Acquire the watermark data to be extracted and the structured data to be extracted; determine the total number of slots based on the length of the watermark data to be extracted and the preset number of embedding bits per row; establish a voting box for each hash slot; determine the slot index value of the target row based on the identifier of the target row of the structured data to be extracted, the fixed high-order bits of the embeddable data in the target row, and the total number of slots; determine the target hash slot corresponding to the target row based on the slot index value, and count the votes in the voting box of the target hash slot based on the low-order bits of the embeddable data in the target row; based on the vote count results of each hash slot, determine the candidate watermark value for each hash slot through weight analysis, and obtain the target original watermark value after combination and verification.

[0287] Any combination of one or more computer-readable media may be used. A computer-readable medium can be a computer-readable signal medium or a computer-readable storage medium. A computer-readable storage medium can be, for example—but not limited to—an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (a non-exhaustive list) of computer-readable storage media include: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof. In this document, a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in connection with an instruction execution system, apparatus, or device.

[0288] Computer-readable signal media may include data signals propagated in baseband or as part of a carrier wave, carrying computer-readable program code. Such propagated data signals may take various forms, including—but not limited to—electromagnetic signals, optical signals, or any suitable combination thereof. Computer-readable signal media may also be any computer-readable medium other than computer-readable storage media, capable of transmitting, propagating, or transmitting programs for use by or in connection with an instruction execution system, apparatus, or device.

[0289] Computer program code for performing the operations of this application can be written in one or more programming languages ​​or a combination thereof. These programming languages ​​include object-oriented programming languages ​​such as Java, Smalltalk, and C++, as well as conventional procedural programming languages ​​such as C or similar languages. The program code can be executed entirely on the operator's computer, partially on the operator's computer, as a standalone software package, partially on the operator's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer can be connected to the operator's computer via any type of network—including a local area network (LAN) or a wide area network (WAN)—or can be connected to an external computer (e.g., via the Internet using an Internet service provider). The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on the differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably.

[0290] Although preferred embodiments of the present application have been described, those skilled in the art, upon learning the basic inventive concept, can make other changes and modifications to these embodiments. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments as well as all changes and modifications falling within the scope of the embodiments of the present application.

[0291] Finally, it should be noted that in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or terminal device that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or terminal device. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or terminal device that includes said element.

[0292] The above provides a detailed description of the method and apparatus for embedding and extracting hash slot watermarks based on structured data provided in this application. Specific examples have been used to illustrate the principles and implementation methods of this application. The description of the above embodiments is only for the purpose of helping to understand the method and its core ideas. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of this application. Therefore, this excerpt should not be construed as a limitation of this application.

Claims

1. A hash slot watermark embedding method based on structured data, characterized in that, Including the following steps: Obtain the watermark data to be embedded and the structured data to be embedded, and determine the total number of slots of the watermark data to be embedded based on the length of the watermark data to be embedded and the preset number of embedding bits per row; Based on the identifier of the target row of the structured data to be embedded, the first preset portion of the embeddable data of the target row, and the total number of slots, a slot index value for the target row is generated; specifically, a hash calculation is performed on the identifier of the target row and the first preset portion of the embeddable data of the target row to obtain the corresponding hash value; the slot index value corresponding to the target row is obtained by modulo the hash value by the total number of slots. Based on the slot index value and the preset number of embedding bits for the row, the watermark bit segment to be embedded in the target row is determined from the watermark data to be embedded. The second preset portion of the embeddable data in the target row is replaced with the watermark bit segment; or, the data is adjusted within a preset range based on the difference between the current low-order value of the embeddable data and the watermark bit segment; wherein, the first preset portion and the second preset portion constitute the embeddable data of the target row, the first preset portion is the fixed high-order portion of the embeddable data, which remains unchanged during the process; the second preset portion is the variable low-order portion of the embeddable data, which can be modified during the process.

2. A hash slot watermark extraction method based on structured data, used to extract the watermark embedded by the embedding method of claim 1, characterized in that, Including the following steps: Obtain the watermark data to be extracted and the structured data to be extracted, determine the total number of slots based on the length of the watermark data to be extracted and the preset number of embedded bits per row, and establish a voting box for each hash slot; The slot index value of the target row is determined based on the identifier of the target row of the structured data to be extracted, the fixed high-order part of the embeddable data of the target row, and the total number of slots; The target hash slot corresponding to the target row is determined based on the slot index value, and the votes are counted in the voting box of the target hash slot based on the low-order value of the embeddable data of the target row. Based on the vote count results of each hash slot, the candidate watermark value of each hash slot is determined through weight analysis, and the target original watermark value is obtained after combination and verification.

3. The extraction method according to claim 2, characterized in that, The step of determining the candidate watermark value for each hash slot based on the vote counting results of each hash slot through weight analysis includes: Determine the basic weights of the candidate values ​​in the ballot box; If the total number of votes in the corresponding hash slot is less than a preset threshold, the votes are added to the basic weight to obtain the statistical weight. If the total number of votes in the corresponding hash slot is greater than or equal to the preset threshold, then the average number of votes in the hash slot is determined, and the statistical weight is determined based on the deviation between the candidate value and the average value. When the total number of votes in the hash slot reaches the preset minimum noise count threshold, candidate values ​​with a vote percentage less than the preset ratio are filtered out. The statistical weights of the candidate values ​​are normalized to obtain the final weights of the candidate values.

4. The extraction method according to claim 2, characterized in that, The watermark data to be extracted includes a message authentication code and an error correction code; the step of obtaining the target original watermark value after combination and verification includes: Extract the candidate value with the highest final weight in the hash slot and concatenate them to obtain the initial watermark sequence; If the initial watermark sequence contains empty slots and the number of empty slots is within the range that the error correction code can correct, then correction is performed according to the error correction code. If the number of empty slots exceeds the erasure range, then exhaustively search the low-weight slots in order of increasing weight, and verify them in conjunction with the error correction code, until the message authentication code verification is passed.

5. An embedding device for hash slot watermarks based on structured data, characterized in that, include: The hash slot calculation module is used to obtain the watermark data to be embedded and the structured data to be embedded, and to determine the total number of slots of the watermark data to be embedded based on the length of the watermark data to be embedded and the preset number of embedding bits per row. The slot index first calculation module is used to generate the slot index value of the target row based on the identifier of the target row of the structured data to be embedded, the first preset part of the embeddable data of the target row, and the total number of slots. The slot index first calculation module includes: a hash calculation submodule, used to perform hash calculation on the identifier of the target row and the first preset part of the embeddable data of the target row to obtain the corresponding hash value; and a modulo submodule, used to obtain the slot index value corresponding to the target row by modulo the hash value by the total number of slots. The watermark fragment calculation module is used to determine the watermark bit fragment to be embedded in the target row from the watermark data to be embedded based on the slot index value and the preset number of embedding bits for the row. A watermark embedding module is used to embed the watermark bit fragment into a second preset portion of the embeddable data of a target row. The watermark embedding module includes: a first embedding submodule, used to replace the second preset portion of the embeddable data of the target row with the watermark bit fragment; or, a second embedding submodule, used to adjust the watermark bit fragment within a preset range based on the difference between the current low-order value of the embeddable data and the watermark bit fragment; wherein, the first preset portion and the second preset portion constitute the embeddable data of the target row; the first preset portion is a fixed high-order portion of the embeddable data, which remains unchanged during the process; the second preset portion is a variable low-order portion of the embeddable data, which can be modified during the process.

6. A device for extracting hash slot watermarks based on structured data, used to extract the watermark embedded by the embedding device of claim 5, characterized in that, include: The voting box creation module is used to acquire the watermark data to be extracted and the structured data to be extracted, determine the total number of slots based on the length of the watermark data to be extracted and the preset number of embedded bits per row, and create a voting box for each hash slot. The second slot index calculation module is used to determine the slot index value of the target row based on the identifier of the target row of the structured data to be extracted, the fixed high-order part of the embeddable data of the target row, and the total number of slots. The slot voting module is used to determine the target hash slot corresponding to the target row based on the slot index value, and to count the votes in the voting box of the target hash slot based on the low-order value of the embeddable data of the target row as the voting value. The watermark restoration module is used to determine the candidate watermark value of each hash slot based on the vote counting results of each hash slot through weight analysis, and obtain the target original watermark value after combination and verification.

7. A computer electronic device, characterized in that, It includes a processor, a memory, and a computer program stored in the memory and capable of running on the processor, wherein when the computer program is executed by the processor, it implements the steps of the hash slot watermark embedding method based on structured data as described in claim 1 or the hash slot watermark extraction method based on structured data as described in any one of claims 2 to 4.

8. A computer-readable storage medium, characterized in that, A computer program is stored on the computer-readable storage medium, which, when executed by a processor, implements the steps of the hash slot watermark embedding method based on structured data as described in claim 1 or the hash slot watermark extraction method based on structured data as described in any one of claims 2 to 4.