File digitization flow privacy protection method and system based on zero trust architecture

By employing a file digitization method based on a zero-trust architecture, and utilizing a bidirectional Transformer-Conditional Random Field model and homomorphic encryption technology, dynamic authorization and instant trust reconstruction are achieved during the file transfer process. This solves the problems of coarse privacy protection granularity and insufficient trustworthiness of the audit chain in existing technologies, thereby improving the security and controllability of the file collaboration process.

CN121413022BActive Publication Date: 2026-06-26CHINA NAT INST OF STANDARDIZATION

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
CHINA NAT INST OF STANDARDIZATION
Filing Date
2025-10-28
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing technologies struggle to achieve real-time auditing and identity tracing in encrypted environments, cannot dynamically generate authorization policies, and suffer from issues such as coarse-grained privacy protection, delayed revocation responses, and insufficient credibility of audit chains during dynamic file transfer.

Method used

A file digitization method based on a zero-trust architecture is adopted. A semantic context tag set and dynamic authorization intent are generated through a bidirectional Transformer-Conditional Random Field semantic parsing model. Combined with the operation granularity within the homomorphic encryption domain, a mapping table between semantics and encrypted fragments is established, generating a homomorphic operation plan and a set of re-encryptable credentials. Homomorphic collaborative operations are then executed within the encrypted container set, the access commitment chain is verified in real time, and an instant trust reconstruction record is generated.

Benefits of technology

It enables dynamic authorization and instant trust reconstruction during file transfer, improving the security, controllability, and traceability of file collaboration, and ensuring privacy protection and real-time auditing capabilities in a secure environment.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121413022B_ABST
    Figure CN121413022B_ABST
Patent Text Reader

Abstract

The application discloses a file digitization circulation privacy protection method and system based on a zero trust architecture, relates to the technical field of data security, and comprises the following steps: based on a semantic context label set and a dynamic authorization intention, combining operation granularity in a homomorphic encryption domain, establishing a mapping table of semantics and encryption fragments, and generating a homomorphic operation plan and a re-encryptable credential set; according to the homomorphic operation plan and the re-encryptable credential set, performing logical fragmentation and homomorphic encryption on a file to be circulated to form a set of secret state containers, obtaining an access commitment instance, and constructing an initial segment of an access commitment chain; performing a homomorphic cooperative operation in the set of secret state containers and real-time checking the initial segment of the access commitment chain, performing node revocation and authorization reversal according to a commitment chain state, and generating an updated access commitment chain and an instant trust reconstruction record. The application improves the security, controllability and traceability of the file cooperation process.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of data security technology, and in particular to a method and system for privacy protection during the digital transfer of files based on a zero-trust architecture. Background Technology

[0002] With the widespread adoption of digital office solutions, cloud storage, and remote collaboration, the frequency of electronic document circulation within and outside organizations has significantly increased. Documents are no longer limited to local transmission but are constantly migrating across distributed storage, heterogeneous terminals, and multi-role collaboration environments. Traditional access control systems are generally based on the concept of perimeter protection, restricting file access paths through authentication, access control lists (ACLs), and unified authentication mechanisms. However, in cloud collaboration, cross-domain sharing, and multi-tenant environments, network boundaries are gradually blurring, and trust assumptions are weakened. In recent years, Zero Trust security architecture has become an important development direction for data security systems. Its core concept is "never trust, continuous verification," meaning that any access request must be dynamically determined based on identity, behavior, and environment.

[0003] Existing technologies struggle to achieve real-time auditing and identity tracing in encrypted environments, and have limitations in addressing semantic differences, permission changes, and access revocation during dynamic file transfers. On one hand, existing file transfer security mechanisms largely remain at the static permission control level, lacking intelligent parsing of file content semantics and access behavior, making it difficult to dynamically generate authorization policies based on context. On the other hand, even with encrypted storage technology, the encrypted continuity of data is often disrupted during decryption and re-encryption, failing to support collaborative operations and trust reconstruction in a fully encrypted state. This results in problems such as coarse-grained privacy protection, delayed revocation responses, and insufficient credibility of the audit chain. Summary of the Invention

[0004] In view of the aforementioned existing problems, the present invention is proposed.

[0005] Therefore, this invention provides a privacy protection method for file digitization based on a zero-trust architecture to solve the technical problems of unverifiable static boundaries of trust dependence and encrypted operations in traditional file transfer processes.

[0006] To solve the above-mentioned technical problems, the present invention provides the following technical solution:

[0007] In a first aspect, the present invention provides a privacy protection method for file digitization transfer based on a zero-trust architecture, comprising,

[0008] Collect the content category, sensitivity markers, operation type, collaboration role information, and access time and space markers of the files to be transferred, and parse them through a bidirectional Transformer-Conditional Random Field semantic parsing model to generate a semantic context tag set and dynamic authorization intent;

[0009] Based on the semantic context tag set and dynamic authorization intent, combined with the operation granularity within the homomorphic encryption domain, a mapping table between semantics and encryption fragments is established to generate homomorphic operation plans and a set of re-encryptable credentials;

[0010] Based on the homomorphic operation plan and the set of re-encryptable credentials, logical fragmentation and homomorphic encryption are performed on the file to be transferred to form a set of encrypted containers, and an access commitment instance is obtained to construct the initial fragment of the access commitment chain.

[0011] Perform homomorphic collaborative operations within the dense container set and verify the initial fragment of the access commitment chain in real time. Based on the state of the commitment chain, perform node revocation and authorization reversal to generate an updated access commitment chain and an instant trust reconstruction record.

[0012] Based on the instant trust reconstruction record and the updated access commitment chain, a trusted audit index is built and privacy desensitization is performed to generate a trusted flow tracking record.

[0013] As a preferred embodiment of the file digitization flow privacy protection method based on zero-trust architecture described in this invention, the bidirectional Transformer-Conditional Random Field semantic parsing model consists of an input encoding layer, a bidirectional Transformer context representation layer, a labeling layer, and a conditional random field decoding layer.

[0014] The input encoding layer segments the standardized text corpus into word segments and generates an input matrix that integrates word vectors, position vectors, and label vectors.

[0015] The bidirectional Transformer context representation layer encodes context features into the input matrix using a multi-head self-attention mechanism, generating a context feature sequence.

[0016] The label scoring layer performs a linear transformation on the context feature sequence to generate label scoring vectors, which are then combined into a label scoring matrix according to time steps.

[0017] The Conditional Random Field (CRF) decoding layer performs CRF Viterbi decoding on the label scoring matrix, generates the label sequence and entity start and end position indices, and obtains the semantic candidate unit table.

[0018] As a preferred embodiment of the file digitization privacy protection method based on zero-trust architecture described in this invention, the specific steps for generating the semantic context tag set and dynamic authorization intent are as follows:

[0019] Based on the semantic candidate unit table, sentence vector, position vector and field label vector are calculated and stacked into a semantic feature tensor. Semantic context labels are identified through threshold judgment and rule matching.

[0020] Based on semantic context labels, dynamic authorization intent is obtained according to a fixed Boolean structure using the principle of minimum necessity.

[0021] As a preferred embodiment of the file digitization privacy protection method based on zero-trust architecture described in this invention, the steps for establishing a semantic-encrypted fragment mapping table and generating a homomorphic operation plan and a set of re-encryptable credentials are as follows:

[0022] Align the semantic context tag set with the dynamic authorization intent in a fixed field order and compile them into a semantic authorization baseline table;

[0023] In the homomorphic encryption domain, three levels of operation granularity are read: paragraph level, table cell level, and field level. The corresponding operation granularity is assigned to each semantic context tag, and an operation granularity configuration table is generated.

[0024] Based on the semantic authorization benchmark table and the operation granularity configuration table, generate fragment placeholder numbers for the files to be transferred, and obtain the fragment placeholder index table;

[0025] The semantic authorization baseline table, operation granularity configuration table, and shard placeholder index table are joined by primary key to generate a mapping table between semantic and encrypted shards;

[0026] In the semantic and cryptographic fragment mapping table, homomorphic operations are performed in conjunction with the fragment placeholder number and the configuration is frozen to generate a frozen mapping table;

[0027] Based on the frozen mapping table, using the fragment placeholder number as the unique index, assign each semantic tag a allowed homomorphic function and execution parameters, and generate a homomorphic operation plan and a plan reference index table.

[0028] Based on the homomorphic operation plan, the plan reference index table, and the semantic-encrypted fragment mapping table, fragment placeholder number ranges and re-encryption key fragments are assigned to each collaborating role to generate a set of re-encryptable credentials.

[0029] As a preferred embodiment of the file digitization and privacy protection method based on a zero-trust architecture described in this invention, the specific steps for performing logical fragmentation and homomorphic encryption on the file to be digitized according to the homomorphic operation plan and the set of re-encryptable credentials are as follows:

[0030] The homomorphic operation plan, the set of re-encryptable credentials, and the mapping table between semantics and cryptographic fragments are compiled into a fragment pending list in ascending order of fragment placeholder number;

[0031] Based on the list of fragments to be processed, the content in the file to be transferred is divided into boundaries according to the fragment placeholder number, a fragment sequence to be encrypted is generated, and hash operation is performed on each fragment to obtain the fragment hash sequence.

[0032] Based on the plan row identifier in the homomorphic operation plan and the re-encryption key fragment in the set of re-encryptable credentials, a hash-based key derivation function is executed on the fragment placeholder number to calculate the fragment homomorphic encryption key and reuse the re-encryption key fragment to generate a fragment homomorphic encryption key table.

[0033] Homomorphic encryption is performed on the fragmented sequence to be encrypted and the fragmented homomorphic encryption key table to generate a encrypted fragmented sequence.

[0034] As a preferred embodiment of the file digitization privacy protection method based on zero-trust architecture described in this invention, the specific steps for constructing the initial fragment of the access commitment chain are as follows:

[0035] Encapsulate the dense fragment sequence into container objects one by one according to the fragment placeholder number, write the access policy digest, generate a dense container set, and calculate access commitment instances to obtain the access commitment instance sequence.

[0036] The access commitment instance sequence is linked into a hash chain structure according to the shard placeholder number to generate the initial fragment of the access commitment chain.

[0037] As a preferred embodiment of the file digitization and privacy protection method based on a zero-trust architecture described in this invention, the specific steps for generating the updated access commitment chain and the instant trust reconstruction record are as follows:

[0038] Based on the homomorphic operation plan, the plan reference index table, and the dense container set, a homomorphic execution batch list is generated by aligning the fragment placeholder number with the plan line identifier.

[0039] Call the allowed homomorphic functions one by one in the homomorphic execution batch list, record the fragment placeholder number, plan line identifier, homomorphic function name, input digest and output digest, and generate a homomorphic operation digest sequence;

[0040] The homomorphic operation digest sequence is linked with the initial fragment of the access commitment chain according to the fragment placeholder number, and the authorization conditions, revocation conditions, identity digest and timestamp are extracted to generate the commitment verification request sequence;

[0041] Based on the sequence of commitment verification requests, the homomorphic operation summary, authorization conditions, revocation conditions and time window constraints are compared one by one to obtain the sequence of commitment verification results.

[0042] Filter out non-passing entries from the commitment verification result sequence, obtain node revocation instructions and authorization reversal instructions, and attach plan line identifiers and fragment placeholder numbers to generate a revocation and authorization reversal instruction sequence;

[0043] For each instruction sequence of revocation and authorization reversal, execute key revocation, session interruption, access status rollback and reauthorization to disk, and write the processing timestamp, container status change and key version number to generate processing event record and container status update table;

[0044] Write the event handling record and container state update table into the initial fragment of the access commitment chain in the order of the commitment nodes, recalculate the node hashes and generate a new chain tail hash to obtain the updated access commitment chain and instant trust reconstruction record.

[0045] As a preferred embodiment of the file digitization privacy protection method based on zero-trust architecture described in this invention, the specific steps for constructing the trusted audit index are as follows:

[0046] Align the instant trust reconstruction record and the updated access commitment chain with the shard placeholder number and commitment node sequence number, and extract the shard hash, identity digest, authorization conditions, revocation conditions, disposal timestamp and container status, and concatenate them into an audit item sequence;

[0047] Based on the audit entry sequence, calculate the SHA-256 of the concatenated field value, and combine it with the step sequence number and commitment node sequence number to generate a unique audit index key;

[0048] The Merkle tree is calculated from bottom to top using the unique audit index key as the leaf node to obtain the hierarchical hash set and the audit root hash. The leaf index and the parent-child relationship table are recorded to form a trusted audit index.

[0049] As a preferred embodiment of the file digitization and privacy protection method based on a zero-trust architecture described in this invention, the specific steps for performing privacy desensitization and generating trusted transfer tracking records are as follows:

[0050] Based on the trusted audit index, the leaf index is stably sorted according to the processing timestamp to generate a leaf index and time sequence number mapping, and then merged with the parent-child relationship table to generate a trusted audit index time axis mapping table.

[0051] Enumerate fields containing personal identity, contact information, geographical location and operation parameters according to the pre-defined field table, and extract field name, location, length and value range to form a list of privacy fields;

[0052] Perform HMAC-SHA256 on the identity-related fields in the privacy field list to generate a sequence of de-identified audit entries;

[0053] The sequence of de-identified audit entries, the trusted audit index timeline mapping table, the leaf index, and the time sequence number are compiled into a trusted flow tracking record.

[0054] Secondly, this invention provides a file digitization privacy protection system based on a zero-trust architecture, comprising:

[0055] The semantic parsing module is used to collect the content category, sensitivity markers, operation type, collaboration role information and access spatiotemporal markers of the files to be transferred, and to parse them through a bidirectional Transformer-Conditional Random Field semantic parsing model to generate a semantic context tag set and dynamic authorization intent;

[0056] The homomorphic mapping module is used to establish a mapping table between semantics and encrypted fragments based on the semantic context tag set and dynamic authorization intent, combined with the operation granularity within the homomorphic encryption domain, and to generate homomorphic operation plans and sets of re-encryptable credentials.

[0057] The encrypted encapsulation module is used to perform logical fragmentation and homomorphic encryption on the file to be transferred according to the homomorphic operation plan and the set of re-encryptable credentials, form a set of encrypted containers, obtain access commitment instances, and construct the initial fragment of the access commitment chain.

[0058] The trust reconstruction module is used to perform homomorphic collaborative operations within the set of dense containers and to verify the initial fragment of the access commitment chain in real time. Based on the state of the commitment chain, it performs node revocation and authorization reversal, and generates an updated access commitment chain and an instant trust reconstruction record.

[0059] The audit trail module is used to reconstruct records and updated access commitment chains based on real-time trust, build a trusted audit index, perform privacy desensitization, and generate trusted flow trace records.

[0060] The beneficial effects of this invention are as follows: Dynamic authorization and real-time trust adjustment are achieved during file transfer through semantic parsing and encrypted trust reconstruction. A bidirectional Transformer-Conditional Random Field semantic parsing model is used to generate a semantic context tag set and dynamic authorization intent, enabling adaptive matching of access control to the semantics of file content. Combined with homomorphic encryption and access commitment chains, authorization reversal and trust reconstruction are completed in an encrypted environment, improving the security, controllability, and traceability of the file collaboration process. Attached Figure Description

[0061] To more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the following description of the embodiments will be briefly introduced. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0062] Figure 1 A flowchart of a privacy protection method for file digitization based on a zero-trust architecture.

[0063] Figure 2 This is a schematic diagram of a privacy protection system for digital file transfer based on a zero-trust architecture.

[0064] Figure 3 A flowchart for establishing a homomorphic mapping based on semantic context tag sets and dynamic authorization intent.

[0065] Figure 4 This is a flowchart for constructing an access commitment chain and reconstructing trust based on a homomorphic operation plan. Detailed Implementation

[0066] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, the specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

[0067] Many specific details are set forth in the following description in order to provide a full understanding of the invention. However, the invention may also be practiced in other ways different from those described herein, and those skilled in the art can make similar extensions without departing from the spirit of the invention. Therefore, the invention is not limited to the specific embodiments disclosed below.

[0068] Secondly, the term "one embodiment" or "embodiment" as used herein refers to a specific feature, structure, or characteristic that may be included in at least one implementation of the present invention. The phrase "in one embodiment" appearing in different places in this specification does not necessarily refer to the same embodiment, nor is it a single or selective embodiment that is mutually exclusive with other embodiments.

[0069] Reference Figures 1-4 This is one embodiment of the present invention, which provides a method for protecting privacy during the digital transfer of files based on a zero-trust architecture, comprising the following steps:

[0070] S1: Collect the content category, sensitivity identifier, operation type, collaboration role information and access time and space identifier of the file to be transferred, and parse it through the bidirectional Transformer-Conditional Random Field semantic parsing model to generate a semantic context tag set and dynamic authorization intent;

[0071] S1.1: Collect the content category, sensitivity identifier, operation type, collaboration role information and access time and space identifier of the file to be transferred, and unify the character set and time format to generate a standardized table of collected fields;

[0072] Furthermore, at the file input end, a file attribute parsing interface is invoked to read the file's type identifier and content summary. Based on the file extension, file header identifier, and content fragment frequency statistics, the content category to which the file belongs is extracted. Feature scanning is performed on the file content, invoking sensitive word matching algorithms and regularized feature extraction rules to identify fragments containing sensitive features such as personal identification information, financial information, geographic location data, and security keys, and generating a sensitive identifier field based on the matching results. For the operation type field, by reading the current file's task metadata, operation request records, and call instruction sequences, the category of the current operation behavior is generated, such as copy, transfer, modification, download, or approval. For collaboration role information, through the access control policy table and role directory interface, the organization code, job title, and permission level of the user performing the operation are extracted and recorded as the collaboration role information field. For the access spatiotemporal identifier, the terminal device clock and network node information are invoked to obtain the current timestamp and geographic location coordinates, which are then combined into the access spatiotemporal identifier field. The data is read column by column in a fixed field order, and the character set is standardized to the same encoding format. Time fields are standardized to a uniform year-month-day-hour-minute-second format, and the naming rules and separator formats for roles and locations are standardized. Missing fields are reserved with empty placeholders, and abnormal characters are reversibly replaced and written back to the record, generating field verification markers. After completing the above processing, the five types of fields are written into the table structure with column alignment, generating a standardized table of collected fields.

[0073] The sensitive word matching algorithm and regularized feature extraction rules refer to the process of standardizing the text by case, full-width and half-width characters, and Unicode. Based on a categorized thesaurus (personal identity, financial, geographical location, security keys, etc.), it uses Aho-Corasick multi-pattern string matching combined with word boundary detection. It supports fault-tolerant comparison of synonym mapping, word form variants, and interference characters, and outputs the hit with category and confidence. The regularized feature extraction rules are a set of executable expressions and verification procedures used to extract structured sensitive patterns, such as citizen identification numbers or passport numbers (number range + check digit), mobile phone numbers and email addresses, IPv4 / IPv6 addresses (boundary anchoring), bank card numbers (Luhn verification), latitude and longitude coordinates (numerical range), certificate and key fragments (PEM header and footer and Base64 span), access tokens and session identifiers (prefix format and length threshold). It combines negative example constraints and context window verification to reduce false alarms, and finally returns the results in the form of "field name, start and end position, hit type, and verification result".

[0074] S1.2: Concatenate the fields in the standardized field collection table into a text sequence to generate standardized text corpus;

[0075] Furthermore, using the standardized data collection table as input, the content category, sensitivity markers, operation type, collaboration role information, and access time / space markers are concatenated line by line into single-line text, following a fixed field order. Fixed separators are used between field names and values, between fields, and between lines. Consecutive blank fields are left as placeholders without default values ​​to ensure the stability of subsequent position-related processing. After concatenating all lines, the resulting concatenated text serves as the standardized text corpus.

[0076] S1.3: Input the standardized text corpus into the bidirectional Transformer-Conditional Random Field semantic parsing model, perform word segmentation, context feature encoding and sequence labeling decoding operations, and generate a semantic candidate unit table;

[0077] Specifically, the bidirectional Transformer-Conditional Random Field (CRF) semantic parsing model consists of an input encoding layer, a bidirectional Transformer context representation layer, a label scoring layer, and a CRF decoding layer. The input encoding layer segments the standardized text corpus into word segments, generating an input matrix that integrates word vectors, position vectors, and label vectors. The bidirectional Transformer context representation layer encodes context features through a multi-head self-attention mechanism, generating a context feature sequence. The label scoring layer performs a linear transformation on the context feature sequence, generating label scoring vectors, and combines them into a label scoring matrix according to time steps. The CRF decoding layer performs CRF Viterbi decoding on the label scoring matrix, generating a label sequence and entity start and end position indices, and obtaining a semantic candidate unit table.

[0078] It should be noted that in the construction and training process of the bidirectional Transformer-Conditional Random Field semantic parsing model, firstly, based on standardized text corpora that have undergone character set unification and time format standardization, word segmentation is performed to divide the continuous text corpora into the smallest identifiable semantic units, and each word segment is assigned a unique sequence number and position index. The input encoding layer maps word segments to word vectors, and simultaneously generates position vectors based on the absolute position of the segments in the text, and generates label vectors based on the field category labels in manually labeled samples. The three types of vectors are concatenated in the same dimension to form an input three-dimensional matrix. In the bidirectional Transformer context representation layer, the input matrix is ​​simultaneously input into both forward and backward Transformer sublayers. The multi-head self-attention mechanism captures the contextual dependencies in the text, and outputs a context feature sequence after multi-layer residual superposition to represent the global dependency features between different fields. Subsequently, the label scoring layer performs a linear transformation on the context feature sequence, mapping the features at each time step to the label space to generate corresponding label scoring vectors, which are then superimposed in time step order to form a label scoring matrix. After receiving the label scoring matrix, the Conditional Random Field (CRF) decoding layer establishes a label transition probability table and executes the Viterbi decoding algorithm. At each time step, it selects the label path with the highest transition probability, ultimately outputting the optimal label sequence and the start and end position indices of the entity corresponding to each label. During training, standardized corpora with historical annotations are used as supervision signals. The Transformer parameters and CRF transition weights are adjusted by maximizing the conditional log-likelihood function, and repeatedly optimized under cross-entropy loss and label smoothing constraints until the model converges on the validation set. Finally, the trained bidirectional Transformer-CRF semantic parsing model can automatically parse and label sequences from any input standardized text corpus, and output a semantic candidate unit table, providing accurate input for subsequent semantic context label recognition.

[0079] The Viterbi decoding algorithm is a dynamic programming method used to find the sequence of hidden states with the highest probability (maximum likelihood path) given an observation sequence. It unfolds the problem as a "trellis" of time steps, progressively accumulating the optimal path metric for each state, retaining only the best predecessors leading to each state, and backtracking at the end to obtain the global optimum. It is commonly used in sequence labeling tasks such as convolutional code decoding and HMM / CRF, and is characterized by its accuracy, linear time complexity (which increases with sequence length), and memory usage (which increases with the number of states).

[0080] S1.4: Based on the semantic candidate unit table, calculate the sentence vector, position vector, and field label vector, and stack them into a semantic feature tensor. Identify semantic context labels through threshold judgment and rule matching.

[0081] Furthermore, using the semantic candidate unit table as input, a sentence vector is first calculated for each candidate unit based on a context window. Then, a position vector is generated by combining the absolute position of the word fragment in the text with the field source position. Simultaneously, the field labels corresponding to the candidate units are encoded as label vectors. The sentence vectors, position vectors, and label vectors are stacked to form a semantic feature tensor. Thresholding and rule matching are performed on the semantic feature tensor, outputting a set of labels corresponding to content category, sensitivity marker, operation type, collaborative role information, and access spatiotemporal marker. These labels are then deduplicated to identify semantic context labels, which are then aggregated to form a semantic context label set.

[0082] The specific operation process of threshold determination and rule matching is as follows: calculate the normalized similarity score of the feature vector in each dimension of the semantic feature tensor, and use the label confidence distribution range automatically learned by the bidirectional Transformer-Conditional Random Field semantic parsing model during the supervised training phase as the threshold reference range; when the normalized similarity score of the feature vector in the dimensions of content category, sensitive identifier, operation type, collaborative role information, or access spatiotemporal identifier exceeds the threshold reference range (the threshold reference range in this embodiment is: 0.75 to 0.90 for content category; 0.82 to 0.96 for sensitive identifier; 0.78 to 0.92 for operation type; 0.74 to 0.90 for collaborative role information; and 0.70 to 0.88 for access spatiotemporal identifier), it is marked as a valid semantic unit. The system reads semantic association rules from the rule base, which include field contextual dependencies, part-of-speech constraints, and positional constraints. Each tagged semantic unit is matched and verified. During rule matching, duplicate tags of the same field category undergo conflict resolution and priority merging, retaining only tags that match the rule matching results. All tags that match the rules through threshold determination are aggregated and deduplicated, outputting a set of semantic context tags corresponding to the content category, sensitivity marker, operation type, collaboration role information, and access time-space marker.

[0083] S1.5: Based on semantic context labels, dynamic authorization intent is obtained according to a fixed Boolean structure using the principle of minimum necessity;

[0084] Furthermore, using a semantic context tag set as input, a fixed Boolean structure is constructed according to the principle of minimum necessity. This fixed Boolean structure consists of three types of conditions—role constraints, spatiotemporal constraints, and operational boundaries—connected by AND relationships. Based on the sensitivity identifiers, collaborative role information, and access spatiotemporal identifiers in the semantic context tag set, the value ranges of role constraints, spatiotemporal constraints, and operational boundaries are filled in respectively, generating an authorization expression that includes the scope of accessible content, the scope of executable operations, and the effective time window. The output is a dynamic authorization intent.

[0085] It should be noted that the principle of least necessity means that in the process of access control and data processing, only the minimum permissions, minimum data, shortest time and narrowest scope necessary to complete the current explicit purpose are granted; default denial, temporary authorization as needed, permission and data fields are refined to an operational granularity by role, attribute and scenario, and time windows and geographical boundaries are set; continuous verification during use and immediate revocation are possible, requests exceeding the scope are blocked, and access and processing are auditable throughout, thereby reducing risk exposure and privacy leakage.

[0086] S2: Based on the semantic context tag set and dynamic authorization intent, combined with the operation granularity within the homomorphic encryption domain, establish a mapping table between semantics and encryption fragments, and generate a homomorphic operation plan and a set of re-encryptable credentials;

[0087] S2.1: Align the semantic context tag set with the dynamic authorization intent in a fixed field order and compile them into a semantic authorization baseline table;

[0088] Furthermore, the semantic context tag set and dynamic authorization intent are expanded column by column in a fixed field order. These fields include semantic tags, sensitivity levels, allowed operations, time windows, and collaboration roles. Primary key concatenation is performed row by row on the semantic context tag set and dynamic authorization intent, with the primary key being a combination of the semantic tag and collaboration role. Simultaneously, the start and end dates of the time windows are standardized and formatted uniformly. Missing items are kept aligned using empty placeholders, and conflicting items are marked with conflict flags and resolved using priority rules before being written back. After completing column alignment and conflict resolution, the results are written into a table structure to generate a semantic authorization baseline table.

[0089] S2.2: Read the three levels of operation granularity—paragraph level, table cell level, and field level—in the homomorphic encryption domain, assign the corresponding operation granularity to each semantic context tag, and generate an operation granularity configuration table;

[0090] Furthermore, within the homomorphic encryption domain, three levels of operation granularity—paragraph-level, table cell-level, and field-level—are read to assign a unique operation granularity to each semantic tag in the semantic authorization baseline table. During assignment, rules are matched based on the sensitivity level and allowed operations corresponding to the semantic tag, and the granularity code is output and written to an inline field. After all semantic tags are assigned, the correspondence between semantic tags and granularity codes is summarized to generate an operation granularity configuration table.

[0091] S2.3: Based on the semantic authorization benchmark table and the operation granularity configuration table, generate fragment placeholder numbers for the files to be transferred, and obtain the fragment placeholder index table;

[0092] Furthermore, using the semantic authorization benchmark table and the operation granularity configuration table as input, a fragmentation numbering rule is established for the files to be transferred. This rule consists of a file path hash value, a semantic tag, a granularity code, and a sequence number concatenated in a fixed order. The file content boundaries are traversed according to the numbering rule, generating a unique fragmentation placeholder number for each candidate segmentation position, and recording the correspondence between the fragmentation placeholder number and the primary key of the semantic authorization benchmark table. After the traversal is complete, the fragmentation placeholder numbers and their corresponding relationships are written into an index structure, generating a fragmentation placeholder index table.

[0093] S2.4: Connect the semantic authorization baseline table, operation granularity configuration table, and sharding placeholder index table by primary key to generate a mapping table between semantics and cryptographic shards;

[0094] Furthermore, using the semantic authorization baseline table, operation granularity configuration table, and shard placeholder index table as input, a three-table join is performed based on the primary key relationship. The join key consists of the semantic tag, the collaboration role, and the shard placeholder number. After the join, each row is padded with fields such as sensitivity level, allowed operations, time window, collaboration role, operation granularity, and shard placeholder number. Duplicate rows are deduplicated and rearranged. After completing the join and processing, a result table containing the complete field set is generated, producing a mapping between semantic and encrypted shards.

[0095] S2.5: In the semantic and cryptographic fragment mapping table, perform homomorphic operations and freeze the configuration by combining the fragment placeholder number to generate the frozen mapping table;

[0096] Furthermore, in the semantic and cryptographic fragment mapping table, a homomorphic operation field is filled for each fragment placeholder number according to the operation characteristics of the semantic tag, and the homomorphic operation field is checked for consistency with the sensitivity level, operation granularity, and allowed operations. After the check is completed, read-only locking is performed on the row content and field values, an immutable mark is written, and a frozen mapping table is generated.

[0097] S2.6: Based on the frozen mapping table, using the fragment placeholder number as the unique index, assign allowed homomorphic functions and execution parameters to each semantic tag, and generate a homomorphic operation plan and a plan reference index table;

[0098] Furthermore, using the frozen mapping table as input, the process iterates in ascending order by shard placeholder number, assigning a permissible homomorphic function and execution parameters to each semantic tag. The execution parameters include the maximum batch size, concurrency limit, and maximum number of calls within a time window. A plan row identifier is generated for each row, and a corresponding index is established between the shard placeholder number and the plan row identifier. The plan content and corresponding index are then written out, generating a homomorphic operation plan and a plan reference index table.

[0099] It should be noted that allowed homomorphic functions refer to a set of operations that are whitelisted in encrypted processing scenarios, can be directly executed on ciphertext, and match the authorized purpose; their application is limited to encrypted fragments, and the operation type and parameter domain are constrained by policies (such as homomorphic addition, homomorphic multiplication, conditional selection, counting summation, etc.), and boundaries are set for the number of calls, concurrency limit, time window, input / output type, and error tolerance; calls are only allowed when they meet the minimum necessity principle and commitment conditions, and the execution process and results are written into the audit log, thereby completing the required processing without decryption and controlling the privacy leakage surface.

[0100] S2.7: Based on the homomorphic operation plan, plan reference index table, and semantic and encrypted fragment mapping table, assign fragment placeholder number ranges and re-encryption key fragments to each cooperating role, and generate a set of re-encryptable credentials;

[0101] Furthermore, taking the homomorphic operation plan, plan reference index table, and semantic-encrypted shard mapping table as input, the corresponding shard placeholder number ranges are aggregated by collaborative role, and the set of allowed homomorphic functions is summarized from the homomorphic operation plan. For each collaborative role, a re-encryption key fragment digest is generated based on the collaborative role identifier and shard placeholder number range, and the validity period, revocation condition digest, and audit identifier are written in. The entries generated for all collaborative roles are written out one by one to generate a set of re-encryptable credentials.

[0102] S3: Based on the homomorphic operation plan and the set of re-encryptable credentials, perform logical fragmentation and homomorphic encryption on the file to be transferred, form a set of encrypted containers, obtain access commitment instances, and construct the initial fragment of the access commitment chain;

[0103] S3.1: Compile the homomorphic operation plan, the set of re-encryptable credentials, and the mapping table between semantics and cryptographic fragments in ascending order of fragment placeholder number into a fragment pending list;

[0104] Furthermore, taking the homomorphic operation plan, the set of re-encryptable credentials, and the mapping table between semantics and encryption fragments as input, the three are aligned by primary key according to the ascending order of fragment placeholder numbers. The alignment fields are the fragment placeholder number and the plan row identifier. After writing a conflict flag at the conflict record, the conflict is resolved according to the homomorphic operation plan priority. After alignment, the fragment placeholder number, semantic label, homomorphic operation, allowed homomorphic function, execution parameters, collaboration role, re-encryption key fragment, and time window of each row are extracted and written into a list structure to generate a fragment pending processing list.

[0105] S3.2: Based on the list of fragments to be processed, the content in the file to be transferred is divided into boundaries according to the fragment placeholder number, generating a fragment sequence to be encrypted, and performing hash operation on each fragment to obtain a fragment hash sequence;

[0106] Furthermore, using the list of fragments to be processed as input, the content in the file to be processed is segmented according to the boundary information corresponding to the fragment placeholder number, resulting in a continuous sequence of fragments to be encrypted. A hash operation is performed on each fragment sequence to be encrypted, recording the fragment placeholder number of each fragment and the calculated fragment hash. The fragment hashes are then sorted in ascending order by the fragment placeholder number to generate a fragment hash sequence.

[0107] S3.3: Based on the plan row identifier in the homomorphic operation plan and the re-encryption key fragment in the re-encryption credential set, perform a hash-based key derivation function on the fragment placeholder number, calculate the fragment homomorphic encryption key and reuse the re-encryption key fragment to generate a fragment homomorphic encryption key table.

[0108] Furthermore, taking the planned row identifier in the fragment pending processing list and the re-encryption key fragment in the re-encryptable credential set as input, a hash-based key derivation function is executed for each fragment placeholder number to derive the fragment homomorphic encryption key, and then binds it to the corresponding re-encryption key fragment. The fragment placeholder number, fragment homomorphic encryption key, and re-encryption key fragment are written into a table structure row by row to generate a fragment homomorphic encryption key table.

[0109] It should be noted that the hash-based key derivation function refers to using the re-encrypted key fragment in the set of re-encryptable credentials as the master key, combined with the plan row identifier and fragment placeholder number as binding factors, and deterministically generating a fragmented homomorphic encryption key with controlled length through hash primitives (such as HKDF / HMAC-SHA-256) under the constraints of fixed field separation label and version number. This derivation process ensures that the outputs of different plan rows or different fragments are different from each other and cannot be used to deduce the master key. When the re-encrypted key fragment is updated, the derivation result can be updated synchronously, and only the binding relationship of fragment placeholder number - fragmented homomorphic encryption key - re-encrypted key fragment identifier is recorded in the fragmented homomorphic encryption key table to support subsequent homomorphic operations and audit verification.

[0110] S3.4: Perform homomorphic encryption on the fragmented sequence to be encrypted and the fragmented homomorphic encryption key table to generate a encrypted fragmented sequence;

[0111] Furthermore, taking the fragment sequence to be encrypted and the fragment homomorphic encryption key table as input, the homomorphic encryption process is completed by calling the allowed homomorphic function according to the one-to-one correspondence of fragment placeholder numbers. For each fragment, the encrypted fragment is output and the fragment placeholder number and homomorphic operation type are recorded. All encrypted fragments are arranged in ascending order of fragment placeholder numbers and assembled into a sequence structure to generate the encrypted fragment sequence.

[0112] S3.5: Encapsulate the dense fragment sequence into container objects one by one according to the fragment placeholder number, write the access policy digest, generate a dense container set, and calculate the access commitment instance to obtain the access commitment instance sequence.

[0113] Furthermore, taking the encrypted sharding sequence as input, each shard is encapsulated into a container object according to its placeholder number. The sharding hash, semantic tag, homomorphic operation, allowed homomorphic functions, execution parameters, time window, collaboration role, and audit identifier are written into the container metadata. Simultaneously, access commitment instances are calculated, and their hashes are written into the container metadata. After all encapsulation is complete, these are aggregated to form an encrypted container set, and the access commitment instance sequence is output synchronously.

[0114] S3.6: Link the access commitment instance sequence into a hash chain structure according to the shard placeholder number to generate the initial fragment of the access commitment chain;

[0115] Furthermore, taking the sequence of access commitment instances as input, the hashes of the access commitment instances are linked sequentially according to the fragment placeholder number order, and a continuous hash chain structure is generated by linking them one after the other. The chain head identifier and chain tail hash are recorded, and the output is the initial fragment of the access commitment chain.

[0116] S4: Perform homomorphic collaborative operations within the dense container set and verify the initial fragment of the access commitment chain in real time. Based on the state of the commitment chain, perform node revocation and authorization reversal to generate an updated access commitment chain and an instant trust reconstruction record.

[0117] S4.1: Based on the homomorphic operation plan, plan reference index table, and dense container set, generate a homomorphic execution batch list by aligning the fragment placeholder number with the plan line identifier;

[0118] Furthermore, taking the homomorphic operation plan, the plan reference index table, and the dense container set as input, double-key alignment is performed according to the fragment placeholder number and the plan line identifier. During the alignment process, missing keys are written into empty placeholders and conflict markers are recorded. Conflict items are overwritten according to the line order of the homomorphic operation plan. Fragment placeholder number, plan line identifier, allowed homomorphic functions, execution parameters, and container positioning information are extracted and assembled into a list structure in ascending order of fragment placeholder number to generate a homomorphic execution batch list.

[0119] S4.2: Call the allowed homomorphic functions one by one in the homomorphic execution batch list, record the fragment placeholder number, plan line identifier, homomorphic function name, input digest and output digest, and generate a homomorphic operation digest sequence;

[0120] The homomorphic execution batch list serves as input. Each allowed homomorphic function is called sequentially. For each call, a fragment placeholder number, plan line identifier, homomorphic function name, input summary, and output summary are recorded. Simultaneously, a time window identifier and call completion flag are registered. Records are sorted by execution time and written into a sequence structure to generate a homomorphic operation summary sequence. Subsequently, the proportion of fragmented homomorphic calls is calculated and written back to the homomorphic operation summary sequence.

[0121] The expression for calculating the percentage of homomorphic calls across fragments is:

[0122] ;

[0123] in, For fragmentation The percentage of homomorphic calls within the current time window. Slicing within a time window The set of homomorphic call record indexes, For the first Each record corresponds to a fragment Indicator of whether the call was actually completed (completed is marked as 1). For homomorphic operation plans and partitioning The set of allowed homomorphic functions bound to the set. Homomorphic operation plan for partitioning Allowable homomorphic functions The maximum number of times the function can be called within a set time window.

[0124] S4.3: Connect the homomorphic operation digest sequence with the initial fragment of the access commitment chain according to the fragment placeholder number, and extract the authorization conditions, revocation conditions, identity digest and timestamp to generate the commitment verification request sequence;

[0125] Furthermore, taking the homomorphic operation digest sequence and the initial fragment of the access commitment chain as input, the table is linked according to the fragment placeholder number. For each record, the authorization conditions, revocation conditions, identity digest and timestamp are extracted and paired with the function name and input / output digest in the homomorphic operation digest to form an itemized request. This request is continuously written into the request sequence structure to generate the commitment verification request sequence.

[0126] S4.4: Based on the commitment verification request sequence, compare the homomorphic operation summary, authorization conditions, revocation conditions and time window constraints one by one to obtain the commitment verification result sequence;

[0127] Furthermore, using the commitment verification request sequence as input, each record is compared with the homomorphic operation summary, authorization conditions, revocation conditions, and time window constraints. For each record, a pass flag, violation type, and location information are generated and written into the result sequence, forming the commitment verification result sequence. To quantify the matching degree between requests and authorizations, an authorization consistency coefficient is calculated and written into the corresponding entry as a quantification field of the commitment verification result sequence.

[0128] The authorization consistency coefficient is calculated using the following expression:

[0129] ;

[0130] in, for The authorization consistency coefficient at any given time. for The complete set of operations that can be invoked at any given time. for Constantly monitor the operation The request strength (Boolean / Normalization count). for Constantly monitor the operation The strength of the authorization;

[0131] It should be noted that the expression for calculating the authorization consistency coefficient combines the "commitment verification request" and the "authorization conditions" at time [time]. The matching degree is modeled as two sets in the "all sets of callable operations". "Intersection and Union Ratios on the table: for each operation , Indicates the request strength (Boolean / Normalization count). Indicates the strength of authorization; the interchangeability of the two. This indicates that (only if both conditions are met will it be counted), and is expressed using... The expression (containing any one of them, avoiding duplication by inclusion-exclusion principle) is obtained by summing across all operations. ∈[0,1], the numerical value represents "consistency between request and authorization". The reason it can be used in an expression even without an operation is because each operation is simply an enumerated index: if an operation has neither a request nor authorization at that moment, then... = =0, the corresponding item has 0 in both numerator and denominator, which does not affect the whole; if there is only request or only authorization, the numerator is 0 and the denominator is 1, which correctly reflects the inconsistency; when both are 1, the numerator and denominator of the item are both 1, which reflects complete consistency. This not only aligns with the field semantics of "Homomorphic Operation Summary - Authorization / Revocation Condition Verification" in this invention, but also provides auditable quantitative indicators without adding additional calculation assumptions.

[0132] S4.5: Filter out non-passing items from the commitment verification result sequence, obtain node revocation instructions and authorization reversal instructions, and attach the plan line identifier and fragment placeholder number to generate a revocation and authorization reversal instruction sequence;

[0133] Furthermore, taking the sequence of commitment verification results as input, non-passing entries are filtered out and the corresponding access commitment chain nodes are located. For each located node, a node revocation instruction and an authorization reversal instruction are generated. The plan row identifier and the shard placeholder number are written into the instruction entry, and the continuous output is a sequence structure, outputting the revocation and authorization reversal instruction sequence.

[0134] S4.6: Execute key revocation, session interruption, access state rollback and reauthorization on disk one by one for the revocation and authorization reversal instruction sequence, and write the processing timestamp, container state change and key version number, and generate processing event record and container state update table;

[0135] Furthermore, taking the sequence of revocation and authorization reversal instructions as input, key revocation, session interruption, access state rollback and reauthorization are executed one by one and written to disk. For each action, the action timestamp, container state change and key version number are recorded and written to the event log entry. At the same time, the latest container state is written to the state update entry, and the data is batched into the action event log and container state update table.

[0136] S4.7: Write the event handling record and container state update table into the initial fragment of the access commitment chain in the order of the commitment nodes, recalculate the node hash and generate a new chain tail hash to obtain the updated access commitment chain and instant trust reconstruction record;

[0137] Furthermore, taking the event handling record and container state update table as input, the corresponding positions of the initial fragment of the access commitment chain are written in the order of the commitment nodes. After each write, a new node hash is calculated and the chain tail hash is updated sequentially until all events are written and hashes are recalculated. The updated access commitment chain and instant trust reconstruction record are then output.

[0138] S5: Based on the instant trust reconstruction record and the updated access commitment chain, build a trusted audit index and perform privacy desensitization to generate a trusted flow tracking record;

[0139] S5.1: Align the instant trust reconstruction record and the updated access commitment chain with the shard placeholder number and commitment node sequence number, and extract the shard hash, identity digest, authorization conditions, revocation conditions, disposal timestamp and container status, and concatenate them into an audit item sequence;

[0140] Furthermore, taking the instant trust reconstruction record and the updated access commitment chain as input, double-key alignment is performed according to the shard placeholder number and commitment node sequence number. During the alignment process, missing keys are filled with empty placeholders and verification marks are marked. Then, the shard hash, identity digest, authorization conditions, revocation conditions, disposal timestamp and container status are extracted from the alignment results one by one. The field names and field values ​​are concatenated into a single line of text according to a fixed field order and continuously written into the sequence structure to generate an audit entry sequence.

[0141] S5.2: Based on the audit entry sequence, calculate the SHA-256 of the concatenated field value, and combine it with the step sequence number and commitment node sequence number to generate a unique audit index key;

[0142] Furthermore, taking the audit entry sequence as input, a field concatenation value is generated for each audit entry, and a secure hash algorithm is called to calculate the digest value. The step sequence number and the commitment node sequence number are concatenated in a fixed order to the digest value, and the digest is calculated again to obtain a unique audit index key. The one-to-one correspondence between the unique audit index key and the corresponding audit entry is written into the index table.

[0143] It should be noted that a secure hash algorithm refers to a cryptographic hash primitive used to generate an irreversible, fixed-length digest of the input, with SHA-256 as the default (SHA-3-256 can be used as an equivalent strengthening scheme). When generating a unique audit index key, the hash is first calculated on the concatenated values ​​of uniformly encoded fields with length prefixes / delimiters. Then, the step sequence number and the commitment node sequence number are encoded and concatenated into the aforementioned digest in a fixed order and hashed again to obtain the unique audit index key. The algorithm meets the requirements of collision resistance, second preimage resistance, and uniform distribution. If it needs to be bound to the key to resist specific threats, it can be replaced with HMAC-SHA-256 in the same process, and the algorithm identifier and version number should be recorded in the audit strategy.

[0144] S5.3: Calculate the Merkle tree from bottom to top using the unique audit index key as the leaf node to obtain the hierarchical hash set and the audit root hash, and record the leaf index and parent-child relationship table to form a trusted audit index;

[0145] Furthermore, a hash binary tree is calculated from bottom to top using the unique audit index key as the leaf node. Adjacent leaf nodes are connected pairwise and the hash of the upper-level node is calculated until the root level. The hierarchical hash set and the audit root hash are output. At the same time, a leaf index is assigned to each leaf node and the parent-child relationship table is recorded. The two, together with the audit root hash, constitute a trusted audit index.

[0146] S5.4: Based on the trusted audit index, the leaf index is stably sorted according to the processing timestamp, generating a mapping between the leaf index and the time sequence number, and merging it with the parent-child relationship table into a trusted audit index time axis mapping table;

[0147] Furthermore, taking the trusted audit index as input, the processing timestamp is read and the leaf indexes are stably sorted to generate a mapping table between the leaf indexes and the time sequence number; the mapping table is connected to the parent-child relationship table according to the leaf index to form a joint structure containing leaf indexes, time sequence numbers and parent-child path information, and the output is the trusted audit index time axis mapping table.

[0148] S5.5: Enumerate fields containing personal identity, contact information, geographical location and operation parameters according to the preset field table, and extract the field name, location, length and value range to form a privacy field list;

[0149] Furthermore, using a pre-defined field table as input, the system enumerates fields containing personal identity, contact information, geographic location, and operational parameters. It then extracts the field name, its starting position in the audit entry, its length, and the allowed value range for each field. The results of this itemization are then summarized into a list structure to generate a privacy field list.

[0150] It should be noted that the pre-defined field table is a structured field configuration table established before the generation of the trusted audit index. It defines the scope and attributes of fields that may involve personal privacy or sensitive information during the digital transfer of documents. The pre-defined field table is typically pre-arranged based on document content categories and collaborative role permissions, and includes information such as field name, field type, relative position in the audit entry, field length, value range, and field category. Using the pre-defined field table, fields involving personal identity, contact information, geographical location, and operational parameters can be quickly located during privacy masking, ensuring that masking is performed under a fixed structure and rules. This maintains consistent data structure and traceability across different document types during the audit phase.

[0151] S5.6: Perform HMAC-SHA256 on the identity-related fields in the privacy field list to generate a sequence of de-identified audit entries;

[0152] Furthermore, taking the privacy field list as input, the key-based hash message authentication value is calculated for each identity field. The authentication value replaces the original field value while keeping the field position and length unchanged. At the same time, the timestamp is quantized to the minute, the geographic location is generalized to the administrative region, and the numerical parameters are mapped to a fixed interval and written with the interval identifier. The audit entries after the summary processing are a sequence structure, generating a de-identified audit entry sequence.

[0153] S5.7: Compile the de-identified audit item sequence, trusted audit index timeline mapping table, leaf index and time sequence number into a trusted flow tracking record;

[0154] Furthermore, taking the sequence of de-identified audit entries, the trusted audit index timeline mapping table, the leaf index and the time sequence number as input, the de-identified audit entries are bound to the time sequence number according to the leaf index, and the hash path of each entry is extracted from the parent-child relationship table as a verifiable path. The records are then assembled into a continuous record set in ascending order of time sequence number, and the output is a trusted flow tracking record.

[0155] It should be noted that the number of leaf nodes at the time of generation of the trusted audit index is used as the depth reference, and the count of verifiable path nodes attached to the trusted flow tracing record is used as the path length. The audit path coverage is obtained by dividing the count of verifiable path nodes by the logarithm of the number of leaf nodes to the base 2 and rounding up. The count of verifiable path nodes comes from the hash path attached to each entry in the trusted flow tracing record, and the number of leaf nodes comes from the leaf layer size of the trusted audit index. The calculated audit path coverage is used to quantify the verifiable completeness of the trusted flow tracing record.

[0156] The audit path coverage is calculated using the following expression:

[0157] ;

[0158] in, For audit entries Path coverage For audit entries The accompanying set of Merkle path nodes, The number of path nodes provided. This refers to the number of leaf nodes when the trusted audit index is generated. Indicates audit entries Merkle path node set Any path node in;

[0159] This embodiment also provides a file digitization privacy protection system based on a zero-trust architecture, including:

[0160] The semantic parsing module is used to collect the content category, sensitivity markers, operation type, collaboration role information and access spatiotemporal markers of the files to be transferred, and to parse them through a bidirectional Transformer-Conditional Random Field semantic parsing model to generate a semantic context tag set and dynamic authorization intent;

[0161] The homomorphic mapping module is used to establish a mapping table between semantics and encrypted fragments based on the semantic context tag set and dynamic authorization intent, combined with the operation granularity within the homomorphic encryption domain, and to generate homomorphic operation plans and sets of re-encryptable credentials.

[0162] The encrypted encapsulation module is used to perform logical fragmentation and homomorphic encryption on the file to be transferred according to the homomorphic operation plan and the set of re-encryptable credentials, form a set of encrypted containers, obtain access commitment instances, and construct the initial fragment of the access commitment chain.

[0163] The trust reconstruction module is used to perform homomorphic collaborative operations within the set of dense containers and to verify the initial fragment of the access commitment chain in real time. Based on the state of the commitment chain, it performs node revocation and authorization reversal, and generates an updated access commitment chain and an instant trust reconstruction record.

[0164] The audit trail module is used to reconstruct records and updated access commitment chains based on real-time trust, build a trusted audit index, perform privacy desensitization, and generate trusted flow trace records.

[0165] In summary, this invention achieves dynamic authorization and real-time trust adjustment during file transfer through semantic parsing and encrypted trust reconstruction. By utilizing a bidirectional Transformer-Conditional Random Field semantic parsing model to generate a semantic context tag set and dynamic authorization intent, it achieves adaptive matching of access control to the semantics of file content. Combined with homomorphic encryption and access commitment chains, authorization reversal and trust reconstruction are completed in an encrypted environment, improving the security, controllability, and traceability of the file collaboration process.

[0166] It should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit it. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all such modifications or substitutions should be covered within the scope of the claims of the present invention.

Claims

1. A privacy protection method for file digitization based on a zero-trust architecture, characterized by: include, The process involves collecting the content category, sensitivity markers, operation type, collaboration role information, and access time and space markers of the files to be transferred. This information is then parsed using a bidirectional Transformer-Conditional Random Field semantic parsing model to generate a semantic context tag set and dynamic authorization intent. The specific steps are as follows: The bidirectional Transformer-Conditional Random Field semantic parsing model consists of an input encoding layer, a bidirectional Transformer context representation layer, a labeling layer, and a conditional random field decoding layer. The input encoding layer segments the standardized text corpus into word segments and generates an input matrix that integrates word vectors, position vectors, and label vectors. The bidirectional Transformer context representation layer encodes context features into the input matrix using a multi-head self-attention mechanism, generating a context feature sequence. The label scoring layer performs a linear transformation on the context feature sequence to generate label scoring vectors, which are then combined into a label scoring matrix according to time steps. The Conditional Random Field (CRF) decoding layer performs CRF Viterbi decoding on the label scoring matrix to generate the label sequence and entity start and end position indices, and obtains the semantic candidate unit table. Based on the semantic candidate unit table, sentence vector, position vector and field label vector are calculated and stacked into a semantic feature tensor. Semantic context labels are identified through threshold judgment and rule matching. Based on semantic context labels, dynamic authorization intent is obtained according to a fixed Boolean structure using the principle of minimum necessity. Based on the semantic context tag set and dynamic authorization intent, combined with the operation granularity within the homomorphic encryption domain, a mapping table between semantics and encryption fragments is established to generate homomorphic operation plans and a set of re-encryptable credentials; Based on the homomorphic operation plan and the set of re-encryptable credentials, logical fragmentation and homomorphic encryption are performed on the file to be transferred to form a set of encrypted containers, and an access commitment instance is obtained to construct the initial fragment of the access commitment chain. Perform homomorphic collaborative operations within the dense container set and verify the initial fragment of the access commitment chain in real time. Based on the state of the commitment chain, perform node revocation and authorization reversal to generate an updated access commitment chain and an instant trust reconstruction record. Based on the instant trust reconstruction record and the updated access commitment chain, a trusted audit index is built and privacy desensitization is performed to generate a trusted flow tracking record.

2. The file digitization privacy protection method based on zero-trust architecture as described in claim 1, characterized in that: The steps for establishing a semantic-to-encrypted fragment mapping table and generating a homomorphic operation plan and a set of re-encryptable credentials are as follows: Align the semantic context tag set with the dynamic authorization intent in a fixed field order and compile them into a semantic authorization baseline table; In the homomorphic encryption domain, three levels of operation granularity are read: paragraph level, table cell level, and field level. The corresponding operation granularity is assigned to each semantic context tag, and an operation granularity configuration table is generated. Based on the semantic authorization benchmark table and the operation granularity configuration table, generate fragment placeholder numbers for the files to be transferred, and obtain the fragment placeholder index table; The semantic authorization baseline table, operation granularity configuration table, and shard placeholder index table are joined by primary key to generate a mapping table between semantic and encrypted shards; In the semantic and cryptographic fragment mapping table, homomorphic operations are performed in conjunction with the fragment placeholder number and the configuration is frozen to generate a frozen mapping table; Based on the frozen mapping table, using the fragment placeholder number as the unique index, assign each semantic tag a allowed homomorphic function and execution parameters, and generate a homomorphic operation plan and a plan reference index table. Based on the homomorphic operation plan, the plan reference index table, and the semantic-encrypted fragment mapping table, fragment placeholder number ranges and re-encryption key fragments are assigned to each collaborating role to generate a set of re-encryptable credentials.

3. The file digitization privacy protection method based on zero-trust architecture as described in claim 2, characterized in that: The process of performing logical fragmentation and homomorphic encryption on the file to be transferred based on the homomorphic operation plan and the set of re-encryptable credentials involves the following steps: The homomorphic operation plan, the set of re-encryptable credentials, and the mapping table between semantics and cryptographic fragments are compiled into a fragment pending list in ascending order of fragment placeholder number; Based on the list of fragments to be processed, the content in the file to be transferred is divided into boundaries according to the fragment placeholder number, a fragment sequence to be encrypted is generated, and hash operation is performed on each fragment to obtain the fragment hash sequence. Based on the plan row identifier in the homomorphic operation plan and the re-encryption key fragment in the set of re-encryptable credentials, a hash-based key derivation function is executed on the fragment placeholder number to calculate the fragment homomorphic encryption key and reuse the re-encryption key fragment to generate a fragment homomorphic encryption key table. Homomorphic encryption is performed on the fragmented sequence to be encrypted and the fragmented homomorphic encryption key table to generate a encrypted fragmented sequence.

4. The file digitization privacy protection method based on zero-trust architecture as described in claim 3, characterized in that: The specific steps for constructing the initial fragment of the access commitment chain are as follows. Encapsulate the dense fragment sequence into container objects one by one according to the fragment placeholder number, write the access policy digest, generate a dense container set, and calculate access commitment instances to obtain the access commitment instance sequence. The access commitment instance sequence is linked into a hash chain structure according to the shard placeholder number to generate the initial fragment of the access commitment chain.

5. The file digitization privacy protection method based on zero-trust architecture as described in claim 4, characterized in that: The specific steps for generating the updated access commitment chain and instant trust reconstruction record are as follows. Based on the homomorphic operation plan, the plan reference index table, and the dense container set, a homomorphic execution batch list is generated by aligning the fragment placeholder number with the plan line identifier. Call the allowed homomorphic functions one by one in the homomorphic execution batch list, record the fragment placeholder number, plan line identifier, homomorphic function name, input digest and output digest, and generate a homomorphic operation digest sequence; Join the homomorphic operation digest sequence with the initial fragment of the access commitment chain according to the fragment placeholder number, and extract the authorization conditions, revocation conditions, identity digest and timestamp to generate the commitment verification request sequence; Based on the sequence of commitment verification requests, the homomorphic operation summary, authorization conditions, revocation conditions and time window constraints are compared one by one to obtain the sequence of commitment verification results. Filter out non-passing entries from the commitment verification result sequence, obtain node revocation instructions and authorization reversal instructions, and attach plan line identifiers and fragment placeholder numbers to generate a revocation and authorization reversal instruction sequence; For each instruction sequence of revocation and authorization reversal, execute key revocation, session interruption, access status rollback and reauthorization to disk, and write the processing timestamp, container status change and key version number to generate processing event record and container status update table; Write the event handling record and container state update table into the initial fragment of the access commitment chain in the order of the commitment nodes, recalculate the node hashes and generate a new chain tail hash to obtain the updated access commitment chain and instant trust reconstruction record.

6. The file digitization privacy protection method based on zero-trust architecture as described in claim 5, characterized in that: The specific steps for constructing a trusted audit index are as follows. Align the instant trust reconstruction record and the updated access commitment chain with the shard placeholder number and commitment node sequence number, and extract the shard hash, identity digest, authorization conditions, revocation conditions, disposal timestamp and container status, and concatenate them into an audit item sequence; Based on the audit entry sequence, calculate the SHA-256 of the concatenated field value, and combine it with the step sequence number and commitment node sequence number to generate a unique audit index key; The Merkle tree is calculated from bottom to top using the unique audit index key as the leaf node to obtain the hierarchical hash set and the audit root hash. The leaf index and the parent-child relationship table are recorded to form a trusted audit index.

7. The file digitization privacy protection method based on zero-trust architecture as described in claim 6, characterized in that: The specific steps for performing privacy desensitization and generating trusted data transfer records are as follows. Based on the trusted audit index, the leaf index is stably sorted according to the processing timestamp to generate a leaf index and time sequence number mapping, and then merged with the parent-child relationship table to generate a trusted audit index time axis mapping table. Enumerate fields containing personal identity, contact information, geographical location and operation parameters according to the pre-defined field table, and extract field name, location, length and value range to form a list of privacy fields; Perform HMAC-SHA256 on the identity-related fields in the privacy field list to generate a sequence of de-identified audit entries; The sequence of de-identified audit entries, the trusted audit index timeline mapping table, the leaf index, and the time sequence number are compiled into a trusted flow tracking record.

8. A file digitization privacy protection system based on a zero-trust architecture, and a file digitization privacy protection method based on a zero-trust architecture as described in any one of claims 1 to 7, characterized in that: include, The semantic parsing module is used to collect the content category, sensitivity markers, operation type, collaboration role information, and access spatiotemporal markers of the files to be transferred. It then parses these data using a bidirectional Transformer-Conditional Random Field semantic parsing model to generate a semantic context tag set and dynamic authorization intent. The specific steps are as follows: The bidirectional Transformer-Conditional Random Field semantic parsing model consists of an input encoding layer, a bidirectional Transformer context representation layer, a labeling layer, and a conditional random field decoding layer. The input encoding layer segments the standardized text corpus into word segments and generates an input matrix that integrates word vectors, position vectors, and label vectors. The bidirectional Transformer context representation layer encodes context features into the input matrix using a multi-head self-attention mechanism, generating a context feature sequence. The label scoring layer performs a linear transformation on the context feature sequence to generate label scoring vectors, which are then combined into a label scoring matrix according to time steps. The Conditional Random Field (CRF) decoding layer performs CRF Viterbi decoding on the label scoring matrix to generate the label sequence and entity start and end position indices, and obtains the semantic candidate unit table. Based on the semantic candidate unit table, sentence vector, position vector and field label vector are calculated and stacked into a semantic feature tensor. Semantic context labels are identified through threshold judgment and rule matching. Based on semantic context labels, dynamic authorization intent is obtained according to a fixed Boolean structure using the principle of minimum necessity. The homomorphic mapping module is used to establish a mapping table between semantics and encrypted fragments based on the semantic context tag set and dynamic authorization intent, combined with the operation granularity within the homomorphic encryption domain, and to generate homomorphic operation plans and sets of re-encryptable credentials. The encrypted encapsulation module is used to perform logical fragmentation and homomorphic encryption on the file to be transferred according to the homomorphic operation plan and the set of re-encryptable credentials, form a set of encrypted containers, obtain access commitment instances, and construct the initial fragment of the access commitment chain. The trust reconstruction module is used to perform homomorphic collaborative operations within the set of dense containers and to verify the initial fragment of the access commitment chain in real time. Based on the state of the commitment chain, it performs node revocation and authorization reversal, and generates an updated access commitment chain and an instant trust reconstruction record. The audit trail module is used to reconstruct records and updated access commitment chains based on real-time trust, build a trusted audit index, perform privacy desensitization, and generate trusted flow trace records.