A data concealment transmission method based on traceability risk assessment and adversarial evaluation

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By unifying access and parsing traffic through a data communication protection gateway, integrating multi-dimensional features to construct a trajectory map, and conducting adversarial attack simulation tests, the problems of insufficient source tracing risk quantification and mismatched protection strategies in existing technologies are solved, thereby improving the security and reliability of cross-domain data communication and optimizing strategies.

CN122293384APending Publication Date: 2026-06-26HARBIN INST OF TECH AT WEIHAI

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: HARBIN INST OF TECH AT WEIHAI
Filing Date: 2026-03-25
Publication Date: 2026-06-26

Application Information

Patent Timeline

25 Mar 2026

Application

26 Jun 2026

Publication

CN122293384A

IPC: H04L9/40; H04L69/22

AI Tagging

Technology Topics

Inbound communicationData privacy protection

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

A data synthesis method fusing differential privacy and contrastive learning
CN121256851BDigital data protection Biological modelsDiscriminatorData privacy protection
A wireless data privacy protection device
CN224436910UData privacy protectionAttack
Island micro-grid virtual power plant collaborative scheduling method, device, equipment and medium
CN122292547Aimprove securityFix Response LagData privacy protectionMicrogrid
Electronic skin-based semantic interaction method and system, semantic interaction device and computer readable storage medium
CN122284834AEliminate the transmission linkAvoid the Risk of Privacy LeakageData privacy protectionInteraction device
system
JP2026103364AData processing applications Surgical needlesLearning basedData privacy protection

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing data concealment transmission technologies cannot accurately quantify the risks of source tracing and re-identification based on full-dimensional features, lack adversarial strategy verification mechanisms, resulting in a mismatch between protection strategies and actual source tracing risks, making it difficult to balance data concealment and business availability, and making them vulnerable to new source tracing attacks.

Method used

By deploying a data communication protection gateway to unify access and analyze traffic, integrating multi-dimensional features to construct a trajectory map, quantifying the risk score for source tracing, constructing adversarial attack samples for simulation testing, generating policy calibration suggestions, dynamically matching protection policy levels, executing a data transformation operator pipeline, and generating tamper-proof audit logs for model updates.

Benefits of technology

It enables precise quantification of cross-domain data communication traceability risks and targeted optimization of protection strategies, improves the security and reliability of data concealment transmission, balances data concealment and business continuity, forms a closed-loop protection system, and meets the requirements of compliance audit and post-event evidence collection.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122293384A_ABST

Patent Text Reader

Abstract

This invention provides a data concealment transmission method based on source tracing risk assessment and adversarial evaluation, relating to the fields of network security and data privacy protection technology. The method includes: deploying a data communication protection gateway between internal services and external networks to uniformly access inbound and outbound communication traffic; performing session reassembly and protocol parsing on the original network packets to obtain communication units abstracted from each request or response; based on the communication units, parsing their payload content to identify sensitive data and generate sensitive tags; extracting receiver identity, transmission channel attributes, and historical alarm records as contextual features from the communication environment and session metadata, fusing them to form a feature vector; and combining the feature vector with historical data flow trajectories. This invention achieves accurate quantification of source tracing risks in cross-domain data communication, dynamic adaptation and continuous optimization of protection strategies, effectively improving the security and reliability of concealed data transmission.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of network security and data privacy protection technology, and in particular to a method for data concealment transmission based on source tracing risk assessment and adversarial evaluation. Background Technology

[0002] As enterprises deepen their digital transformation, business systems are rapidly evolving towards cloudification and microservices. Data is increasingly flowing across internal services and external third-party platforms and cloud services. Typical scenarios such as third-party payment interface integration, intelligent customer service big model interaction, and data sharing and report push from partners have become commonplace. Data communication faces security challenges such as being traced, analyzed, and re-identified by external entities.

[0003] Existing data anonymization technologies generally suffer from the following shortcomings: they cannot accurately quantify the risks of source tracing and re-identification based on the full-dimensional characteristics of data communication, and they lack adversarial strategy verification mechanisms that incorporate risk characteristics. They rely solely on static rules or single message features for risk assessment and protection strategy configuration, failing to comprehensively integrate data sensitivity attributes, communication context characteristics, and historical flow trajectories for quantitative evaluation of source tracing risks. Furthermore, they struggle to conduct targeted adversarial attack simulation tests for communication scenarios with different risk levels. This results in a mismatch between protection strategies and actual source tracing risks: insufficient protection for high-risk cross-domain communications and excessive protection for low-risk communications impacting business continuity, making it difficult to balance data anonymity and business availability. Moreover, these strategies are vulnerable to new source tracing attacks, making it difficult to detect failure modes under attacks such as hint injection, data outsourcing, and re-identification in advance. Often, strategies are only adjusted after data leakage or source tracing incidents occur, resulting in irreversible security losses and failing to meet the privacy and security protection needs of enterprises for cross-domain data communication. Summary of the Invention

[0004] This invention provides a data concealment transmission method based on source tracing risk assessment and adversarial evaluation, which realizes accurate quantification of source tracing risks in cross-domain data communication, dynamic adaptation and continuous optimization of protection strategies, effectively improves the security and reliability of data concealment transmission, and takes into account business continuity.

[0005] To solve the above-mentioned technical problems, the technical solution of the present invention is as follows: Firstly, a data concealment transmission method based on source tracing risk assessment and adversarial evaluation, the method comprising: By deploying a data communication protection gateway between internal services and external networks, inbound and outbound communication traffic is uniformly accessed to perform session reassembly and protocol parsing on the original network packets, resulting in communication units abstracted from each request or response. Based on the communication unit, its payload content is parsed to identify sensitive data and generate sensitive tags. By extracting the receiver's identity, transmission channel attributes, and historical alarm records from the communication environment and session metadata as contextual features, they are fused to form a feature vector. By combining feature vectors with historical data flow trajectories, a data flow trajectory map is constructed to calculate the local risk, context risk, and trajectory risk components of communication units. The fusion results in a traceability risk score that characterizes the traceability and re-identification probability of communication. Based on the source tracing risk score and the current strategy configuration, construct adversarial attack samples for simulation testing to evaluate the effectiveness of the strategy in scenarios such as prompt injection, data out-of-banding, and re-identification attacks, so as to identify the weaknesses of the strategy and generate calibration suggestions. Based on the source tracing risk score and calibration recommendations, select the corresponding strategy level and execute a pipeline consisting of multiple data transformation operators to anonymize, encrypt, generalize, or differentially perturb the communication load to obtain the data after covert transmission. During the processing, an immutable audit log is generated for each strategy execution to form a chain of evidence. By combining the adversarial evaluation results with external alarm events to construct training samples, the source tracing risk assessment model and strategy thresholds are incrementally updated.

[0006] Furthermore, by deploying a data communication protection gateway between internal services and the external network, inbound and outbound communication traffic is uniformly accessed to perform session reassembly and protocol parsing on the original network packets, resulting in communication units abstracted from each request or response, including: Connect the data communication protection gateway to the communication link between the internal service and the external network in a series deployment, bypass deployment or sidecar injection mode, so that the inbound and outbound communication traffic flows through the data communication protection gateway for processing. The data communication protection gateway performs session reassembly on the received original network packets to obtain the reassembled complete packets. Based on the obtained protocol type, the payload in the reassembled complete packets is identified and parsed to abstract each complete request or response into a communication unit. Based on preset filtering rules, communication units are initially screened to mark communication units involving sensitive data or with potential traceability risks as objects to be protected, thus obtaining a set of communication units to be protected. The communication units in the set of communication units to be protected are fragmented or resampled to split the ultra-high load or long connection traffic into multiple logical sub-units, and the sampling rate is adjusted according to the importance of the business to form a formatted communication unit.

[0007] Furthermore, based on the communication unit, its payload content is parsed to identify sensitive data and generate sensitive tags. By extracting receiver identity, transmission channel attributes, and historical alarm records from the communication environment and session metadata as contextual features, these are fused to form a feature vector, including: The formatted communication unit's payload content is structured and its fields are extracted. Fields are extracted based on the content type of the payload content, along with field names, locations, and structural path information, to obtain a set of fields. Based on the field set, each field is identified for its sensitive category through regular expression matching, dictionary matching, or machine learning models to determine the sensitivity type of the field and calculate the sensitivity level, resulting in a set of sensitive tags containing the field, sensitivity type, and sensitivity level. Extract the receiver's identity, transmission channel attributes, and historical alarm records from the communication environment and session metadata of the formatted communication unit as context features; The set of sensitive tags is fused with contextual features to form a unified feature vector that represents the sensitive attributes and environmental attributes of the communication unit.

[0008] Furthermore, by combining feature vectors with historical data flow trajectories, a data flow trajectory map is constructed to calculate the local risk, contextual risk, and trajectory risk components of the communication unit. These components are then fused to obtain a traceability risk score characterizing the traceability and re-identification probability of the communication, including: Step 3.1: Based on the feature vector, obtain the historical flow records of the data objects involved in the communication unit, and construct a data flow trajectory diagram with the source system, intermediate system and external service as nodes and transmission behavior as edges; Based on the sensitive labels in the feature vector and combined with the field linkability, the local risk component of the communication unit is calculated; Based on the contextual features in the feature vector, the receiver trust level, channel security level, purpose conformity and historical risk factor are analyzed and obtained. The contextual risk component of the communication unit is calculated based on the receiver trust level, channel security level, purpose conformity and historical risk factor. Based on the data flow trajectory diagram, all transmission paths from the current node to the potential leakage node are analyzed, and the conditional leakage probability of each transmission path is obtained. The trajectory risk component of the communication unit is calculated based on the conditional leakage probability. The local risk component, context risk component, and trajectory risk component are weighted and fused, and after normalization, the traceability risk score, which represents the traceability and re-identification probability of communication, is obtained.

[0009] Furthermore, based on the source tracing risk score and the current strategy configuration, adversarial attack samples are constructed for simulation testing. Simultaneously, the effectiveness of the strategy is evaluated in scenarios involving prompt injection, data outbound attacks, and re-identification attacks. This aims to identify weaknesses in the strategy and generate calibration suggestions, including: Based on the source tracing risk score and combined with the current strategy configuration, an attack type that matches the current communication risk level is selected from the pre-built attack scenario library to generate an adversarial attack sample set. The set of adversarial attack samples is input into the data communication protection gateway under the current policy configuration for simulation testing. The sensitive information leakage, policy bypass behavior and false interception events of each attack sample are recorded during the test process, and the test results are recorded. Based on the test results, the leakage probability, bypass rate and false interception rate under each attack scenario are statistically analyzed. At the same time, the effectiveness and robustness of the current strategy in the scenarios of prompt injection, data out-of-band and re-identification attack are evaluated to obtain the strategy effectiveness evaluation results. Based on the results of the strategy effectiveness assessment, we identify the weak links in the current strategy configuration, analyze the attack types and reasons for the failure of protection corresponding to the weak points, and obtain information on the strategy weaknesses. Based on information about policy weaknesses, calibration recommendations are generated that include adjustments to risk thresholds, enhancements to concealment operators, or optimizations to the rule matching order.

[0010] Furthermore, based on the source tracing risk score and calibration recommendations, the corresponding strategy level is selected, and a pipeline consisting of multiple data transformation operators is executed to anonymize, encrypt, generalize, or differentially perturb the communication load, obtaining the data after covert transmission, including: Based on the source risk score and combined with calibration recommendations, a strategy level that matches the current risk level is selected from the preset set of strategy levels to obtain the selected strategy level. Based on the selected strategy level, the corresponding data transformation operator sequence is matched from the preset operator library to construct the execution pipeline for the current communication unit; Obtain the original load corresponding to the current communication unit, input the original load into the execution pipeline, and perform one or more of the following operations in sequence: field masking, tokenization, generalization and anonymization, differential perturbation or field-level encryption to obtain the transformed intermediate load. The transformed intermediate load is adapted to the output format and encapsulated for integrity, generating a final load that meets the requirements of the target interface, which serves as the data after covert transmission.

[0011] Furthermore, during the processing, immutable audit logs are generated for each policy execution, forming a chain of evidence. By combining adversarial evaluation results with external alarm events to construct training samples, the source tracing risk assessment model and policy thresholds are incrementally updated, including: The decision records for each strategy execution process are statistically analyzed. Based on the decision records, the link identifier, strategy identifier, risk score, input summary and output summary information are obtained to generate audit log entries containing timestamps and hash verification, thus obtaining a traceable audit evidence chain. Feature data of each strategy execution is extracted from the audit evidence chain, and adversarial evaluation results and alarm events reported by third-party services are obtained to construct a training sample set containing feature vectors and leakage labels. The training sample set is input into the source tracing risk assessment model, and the model parameters are optimized by gradient descent iterative calculation to update the fusion weights of local risk, context risk and trajectory risk, so as to obtain the optimized risk assessment model. Based on the training sample set, the actual leakage rate corresponding to each strategy level is statistically analyzed. The actual leakage rate is compared with the preset target leakage rate. The risk threshold set is adaptively adjusted based on the comparison results to obtain the updated strategy threshold configuration.

[0012] In a second aspect, a computing device includes: One or more processors; A storage device for storing one or more programs that, when executed by one or more processors, cause the one or more processors to implement the method.

[0013] Thirdly, a computer-readable storage medium storing a program that, when executed by a processor, implements the method.

[0014] The above-described solution of the present invention has at least the following beneficial effects: By employing a unified access and analysis of traffic via a data communication protection gateway, and integrating multi-dimensional features to construct trajectory maps and quantify source tracing risk scores, this approach overcomes the technical limitations of existing technologies that cannot accurately quantify source tracing and re-identification risks based on full-dimensional features. This enables refined and quantitative assessment of cross-domain data communication source tracing risks. Furthermore, by using a technique that combines source tracing risk scores to construct adversarial attack sample simulation tests and generate strategy calibration suggestions, this approach overcomes the technical problems of existing technologies lacking adversarial strategy verification mechanisms that incorporate risk features and being vulnerable to new source tracing attacks. This enables targeted optimization of protection strategies, improving their effectiveness and robustness against attacks such as hint injection, data outsourcing, and re-identification. Finally, by employing a technique that matches strategy levels according to source tracing risk scores and executes a data transformation operator pipeline, this approach overcomes the technical problem of mismatch between existing protection strategies and actual source tracing risks, thus achieving data... The dynamic adaptation of concealment protection strength effectively balances data concealment and business continuity. By employing techniques that generate tamper-proof audit logs and combine adversarial evaluation results with external alarms to construct training sample incremental update models and policy thresholds, it overcomes the technical problems of existing technologies lacking effective audit evidence chains and policies that cannot be continuously adaptively optimized. This forms a closed-loop protection system encompassing risk assessment, adversarial evaluation, policy execution, and audit feedback. It not only meets the requirements of compliance auditing and post-event evidence collection but also realizes the continuous evolution of the protection capabilities of the data concealment transmission system based on source tracing risk assessment and adversarial evaluation (the core carrier is a data communication protection gateway deployed between business systems and external networks and third-party services, which is an integrated system for the full-process protection, assessment, evaluation, and optimization of cross-domain data concealment transmission; all technical solutions in this invention are implemented based on this system). This comprehensively improves the security, controllability, and reliability of cross-domain data concealment transmission. Attached Figure Description

[0015] Figure 1 This is a flowchart illustrating a data concealment transmission method based on source tracing risk assessment and adversarial evaluation, provided by an embodiment of the present invention. Detailed Implementation

[0016] Exemplary embodiments of the present disclosure will now be described in more detail with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be implemented in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

[0017] like Figure 1 As shown, embodiments of the present invention propose a data concealment transmission method based on source tracing risk assessment and adversarial evaluation, the method comprising the following steps: Step 1: By deploying a data communication protection gateway between internal services and external networks, inbound and outbound communication traffic is uniformly accessed to perform session reassembly and protocol parsing on the original network packets, resulting in communication units abstracted from each request or response. Step 2: Based on the communication unit, its payload content is parsed to identify sensitive data and generate sensitive tags. The receiver's identity, transmission channel attributes, and historical alarm records are extracted from the communication environment and session metadata as context features and fused to form a feature vector. Step 3: Combine the feature vector with the historical data flow trajectory to construct a data flow trajectory map, so as to calculate the local risk, context risk and trajectory risk components of the communication unit, and fuse them to obtain the traceability risk score that represents the traceability and re-identification probability of the communication. Step 4: Based on the source tracing risk score and the current strategy configuration, construct adversarial attack samples for simulation testing to evaluate the effectiveness of the strategy in scenarios such as prompt injection, data out-of-banding, and re-identification attacks, so as to identify the weaknesses of the strategy and generate calibration suggestions. Step 5: Based on the source tracing risk score and calibration recommendations, select the corresponding strategy level and execute a pipeline consisting of multiple data transformation operators to anonymize, encrypt, generalize, or differentially perturb the communication load to obtain the data after covert transmission. Step 6: During the processing, an immutable audit log is generated for each strategy execution to form a chain of evidence. By combining the adversarial evaluation results with external alarm events to construct training samples, the source tracing risk assessment model and strategy thresholds are incrementally updated.

[0018] In this embodiment of the invention, a unified access and parsing of cross-domain communication traffic is achieved through a data communication protection gateway. By combining multi-dimensional features to quantify the source tracing risk score, the risk level of source tracing and re-identification of data communication can be accurately determined. Based on the risk score, targeted adversarial attack simulation tests can be conducted to effectively identify weaknesses in the protection strategy and complete accurate calibration, reasonably improving the strategy's ability to resist source tracing attacks such as prompt injection, data outstripping, and re-identification. According to the risk score and calibration recommendations, the protection strategy level is dynamically matched, and data transformation processing with multiple operators is performed to achieve adaptive adjustment of the data concealment protection strength, ensuring the security of data concealment transmission while taking into account business continuity. By generating tamper-proof audit logs to form a complete chain of evidence, the risk assessment model and strategy thresholds can be incrementally updated by combining adversarial evaluation results and external alarms, forming a closed-loop protection system of risk assessment, strategy execution, adversarial evaluation, and audit feedback. This allows the concealment transmission protection capability of the data concealment transmission system based on source tracing risk assessment and adversarial evaluation to continuously evolve with changes in business and threats, comprehensively meeting the privacy protection, security protection, and compliance auditing needs of enterprises in cross-domain data communication.

[0019] In a preferred embodiment of the present invention, step 1 above may include: Step 1.1: Connect the data communication protection gateway to the communication link between internal services and the external network using serial deployment, bypass deployment, or sidecar injection. This ensures that inbound and outbound communication traffic flows uniformly through the data communication protection gateway for processing. Specifically, this includes deploying the data communication protection gateway and connecting the communication link at the enterprise network egress, data center core switching node, cloud platform load balancer, or the north-south / east-west boundary of the service mesh. The internal services are various business carriers deployed independently by the enterprise, covering all entities that generate data communication requests, such as transaction systems, intelligent customer service systems, data sharing platforms, and microservice modules. They are the initiators and data sources of cross-domain data flow. The external network is the network environment outside the enterprise network security boundary, including all receiving ends that interact with internal services, such as third-party partner service clusters, public cloud service platforms, and internet application systems. Cross-domain data communication between internal services and the external network is completed through the designated physical or logical communication link.

[0020] The data communication protection gateway is a core hardware and software integrated device for achieving anonymous cross-domain data transmission. It adopts a layered modular architecture, integrating nine core functional modules, hardware support components, and a data communication protection gateway management backend. Each module achieves real-time information synchronization and functional linkage through an internal high-speed data interaction bus. The hardware support components provide basic resources such as computing, storage, and network transmission for all modules. The data communication protection gateway management backend is responsible for the gateway's global configuration, status monitoring, parameter optimization, and operation and maintenance management.

[0021] The nine core functional modules and their interrelationships are as follows: The traffic access and protocol parsing module, as the traffic entry point of the data communication protection gateway, is responsible for receiving network traffic in the communication link, completing session reassembly and protocol parsing, and is the foundation for all subsequent modules; the data classification and grading and sensitivity identification module receives the parsing results from the preceding modules, identifies and grades sensitive data in the communication load, and provides basic feature input for risk assessment; the source tracing risk assessment module combines sensitive features with historical data flow trajectories to calculate the source tracing risk score of the communication unit, providing a quantitative basis for strategy formulation; the adversarial evaluation module constructs attack scenarios and adversarial samples, conducts simulation tests on the current protection strategy, identifies policy weaknesses, and generates calibration suggestions; the data communication protection strategy engine, as the data communication protection... The gateway's decision-making core generates and distributes appropriate protection policies based on comprehensive risk scores and calibration recommendations. The execution and transformation module performs anonymization, encryption, and perturbation operations on the communication load according to the policies, enabling concealed data transmission. The key and credential management module provides the execution and transformation module with security credentials such as encryption / decryption keys and token mapping keys, ensuring the security of data transformation operations. The policy audit and traceable log module logs the entire operation of the data communication protection gateway, generating an immutable audit evidence chain to provide a basis for compliance audits and issue tracing. The feedback learning module extracts training samples from audit logs, adversarial evaluation results, and external alarms to optimize the source tracing risk assessment model and policy thresholds, achieving adaptive evolution of the data communication protection gateway's protection capabilities.

[0022] Based on the enterprise's network architecture, business deployment model, and security protection level requirements, one of three methods—serial deployment, bypass deployment, or Sidecar injection—is selected to connect the data communication protection gateway to the communication link between internal services and the external network. All deployment methods strictly adhere to the core principle of unified access for inbound and outbound traffic. That is, outbound traffic from internal services to the external network and inbound traffic from the external network to internal services must all be connected to the traffic access and protocol parsing module of the data communication protection gateway. This achieves unified collection and processing of cross-domain communication traffic across all domains, ensuring that no traffic bypasses the data communication protection gateway for transmission.

[0023] The specific deployment is as follows: When using a serial deployment, the data communication protection gateway is directly connected to the core communication link between internal services and the external network. All inbound and outbound traffic must be processed by the data communication protection gateway before transmission. Simultaneously, a primary / backup dual-machine or clustered high-availability architecture is built within the gateway, configured with a millisecond-level fault bypass switching mechanism. When the data communication protection gateway fails, traffic is automatically switched to the bypass link, avoiding single-point failures that could affect business communication continuity and preventing link anomalies from exposing traceable metadata such as internal service IPs and communication modes. When using a bypass deployment, mirroring ports of the enterprise's core switches and routers, or specialized traffic replication equipment, are used to connect the inbound and outbound traffic of internal services to the external network. Full traffic replication ensures that replicated traffic is forwarded to the data communication protection gateway via an independent link, while production traffic is transmitted along the original link. This achieves non-intrusive and unaffected traffic monitoring and processing for production operations, making it suitable for enterprise scenarios with extremely high business continuity requirements. When using Sidecar injection, the lightweight, microservice-based data communication protection gateway component is deployed as a sidecar service, adjacent to the enterprise microservice instance through a service mesh injection mechanism. It forms a one-to-one lifecycle binding relationship with the microservice, and all traffic between the microservice and the external network is forwarded and processed through the local Sidecar gateway component. This is suitable for cloud-native and microservice-based enterprise scenarios, enabling fine-grained traffic control at the single-service level.

[0024] All three deployment methods are implemented around the core principle of unified access for inbound and outbound traffic. Ultimately, all communication traffic between internal services and external networks using various protocols such as HTTP / HTTPS, gRPC, and WebSocket will flow uniformly through the traffic access and protocol parsing module of the data communication protection gateway. This provides a unified and controllable traffic entry point for the entire process of operations such as session reassembly, protocol parsing, sensitivity identification, and risk assessment, ensuring that every link in cross-domain data communication can be monitored, protected, and audited.

[0025] Step 1.2: The data communication protection gateway performs session reassembly on the received original network packets to obtain the reassembled complete packets. Based on the acquired protocol type, the payload in the reassembled complete packets is identified and parsed to abstract each complete request or response into a communication unit. Specifically, after receiving the original network packet stream through the designated network port, the data communication protection gateway first performs session reassembly based on the packet's five-tuple information, sequence number, and acknowledgment number, restoring the discretely transmitted network packets into a complete session stream during communication, resulting in the reassembled complete packets. For the reassembled complete packets, protocol type identification is performed using a combination of Deep Packet Inspection (DPI) and protocol fingerprinting methods. The DPI method performs layer-by-layer deep parsing of the packet payload, extracting the protocol header format, feature fields, key fields, and fixed identification information from the packet. The extracted information is then compared with the standard protocols built into the data communication protection gateway. The feature database performs precise matching to quickly identify various communication protocols using standard ports and standardized formats. The protocol fingerprinting method extracts protocol fingerprint information such as message interaction sequence, message length distribution, byte statistics, and connection behavior characteristics, and compares and matches it with a preset protocol fingerprint database. This database is built into the data communication protection gateway and covers feature information of various non-standard port protocols, encrypted tunnel protocols, and custom protocols. The database pre-stores unique fingerprint feature sets of different protocols in terms of interaction behavior, message form, and traffic characteristics, and supports on-demand updates and expansions. This enables effective identification of non-standard ports, encrypted tunnels, and custom protocols. The two methods work together to achieve comprehensive and accurate identification of various protocols such as HTTP / HTTPS, HTTP / 2, gRPC, WebSocket, MQ messages, file upload and download, and object storage callbacks, determining the protocol type corresponding to each complete message.

[0026] Based on the identified protocol type, and strictly adhering to the corresponding protocol's syntax rules and semantic specifications, the payload portion of the reconstructed complete message is structured and parsed to extract each session. f The five-tuple (source IP, source port, destination IP, destination port, protocol type) is used to extract core metadata such as request method, URL, header fields, payload length, and message exchange sequence. Using a single complete business request or response as the basic granularity, the parsed metadata and payload content are standardized and encapsulated, abstracting them into independent communication units. u Suppose that within a certain time window, the data communication protection gateway receives the set of original sessions. After the entire process of protocol identification and session reassembly, the corresponding set of communication units is obtained. And each communication unit in the set of communication units U Establish a session with one or more underlying primitives The precise mapping relationship ensures that all communication units can be traced back to the original network message.

[0027] Step 1.3: Based on preset filtering rules, preliminary screening of communication units is performed to mark communication units involving sensitive data or with potential traceability risks as objects to be protected, resulting in a set of communication units to be protected. Specifically, this includes: To reduce the computationally intensive load, while ensuring that high-risk communications are not overlooked, multi-level progressive filtering rules are pre-configured for the data communication protection gateway. The first level is a coarse filtering rule, which performs basic screening based only on the port number, protocol type, and destination IP / domain name of the communication unit, directly filtering out low-risk traffic unrelated to sensitive data and key business. The second level is a fine-grained filtering rule, which performs precise screening based on the URL prefix, business tag, API group, and load keyword characteristics of the communication unit, further identifying communication traffic that interacts with external third parties and involves core business data. Based on the above two levels of filtering rules, a filtering function is defined. The input to this function is a single communication unit. The output is a binary result of 0 or 1, where This indicates that the communication unit involves sensitive data, is a critical business interaction, or has potential traceability risks. This indicates that the communication unit has no traceability risk and requires no further protection; each communication unit in the communication unit set U... Enter the filtering functions sequentially The system performs a judgment, marking communication units with an output result of 1 as objects to be protected. Finally, it integrates all objects to be protected to obtain a set of communication units to be protected. This allows for the precise definition of the processing scope for sensitive data identification and risk assessment.

[0028] Step 1.4 involves fragmenting or resampling the communication units in the set of communication units to be protected, breaking down high-load or long-connection traffic into multiple logical sub-units, and adjusting the sampling rate based on service importance to form formatted communication units. Specifically, this includes: the set of communication units to be protected. Each communication unit undergoes comprehensive traffic attribute analysis, extracting core attributes such as load data volume, session duration, message interaction frequency, and business interface type. The analysis focuses on identifying communication units with excessive load and long connections. Excessive load refers to a communication unit's load data volume exceeding the load threshold set by the data communication protection gateway, while long connections refer to a communication unit's session duration exceeding a set duration threshold with continuous message interaction. For identified excessive load or long connection communication units, logical fragmentation is performed according to preset fragmentation rules. These fragmentation rules are pre-defined splitting specifications by the data communication protection gateway based on business data transmission characteristics and processing capabilities, including fixed fragment size, sub-unit identification rules, and metadata retention criteria. Based on these rules, a single large communication unit is split into multiple equal-length, fixed-size logical sub-units with consecutive unique identifiers, denoted as... , ..., the logical subunit is the smallest communication unit with independent processing attributes, which can participate in subsequent sensitivity identification and risk assessment independently. Each logical subunit completely retains all metadata, load association information and sensitive feature association information of the original communication unit, ensuring that the fragmentation operation does not affect the integrity and accuracy of feature extraction and risk assessment in subsequent steps.

[0029] Based on the enterprise's pre-defined business importance classification standards, differentiated sampling strategies are configured for communication units of different business types and sensitivity levels. For communication units marked as high-sensitivity business interfaces, core transaction links, and sensitive data transmission, a 100% full-volume sampling strategy is adopted. This means that all relevant traffic data, including the communication units themselves and their sub-units, is received and processed without omission or filtering, ensuring that all relevant traffic data is fully incorporated into the subsequent protection process. For low-risk communication units such as heartbeat detection, health checks, and status queries that do not involve sensitive data, an on-demand partial sampling strategy is adopted. This means that a reasonable sampling ratio is preset for these communication units based on the processing pressure and risk monitoring needs of the data communication protection gateway, and samples are randomly selected. Alternatively, periodic sampling processing can be used to effectively reduce the data processing pressure on the data communication protection gateway while ensuring that the source tracing risk of this type of low-risk traffic can be monitored in real time. After completing fragmentation processing and differentiated sampling, all processed communication units and logical sub-units are encapsulated in a standardized format, unifying the naming rules of metadata fields, the format rules of load parsing, and the storage structure of feature information. At the same time, redundant transmission identification information is eliminated, and finally, a formatted communication unit is formed. This provides a standardized, structurally sound, and directly processable basic object for sensitive data identification and context feature extraction, while avoiding the generation of redundant source tracing metadata for irrelevant traffic, thus reducing the exposure of data during transmission and processing from the source.

[0030] In a preferred embodiment of the present invention, step 2 above may include: Step 2.1: Perform structured parsing and field extraction on the payload content of the formatted communication unit. Extract fields according to the content type of the payload content, and attach field name, location and structure path information to obtain a field set. Specifically, for the obtained formatted communication unit, first preprocess its payload content to complete the unified conversion of encoding format and the filtering and cleaning of invalid characters. Then identify the Content-Type, data encoding method and protocol semantic features of the payload content to accurately distinguish between structured payload types such as JSON, XML, and forms and unstructured payload types such as plain text and semi-structured documents. For structured workloads such as JSON, XML, and forms, hierarchical node parsing and precise field extraction are performed using path expressions such as JSONPath, XPath, and form parsing rules to ensure the completeness of the extracted fields. For unstructured workloads such as text and semi-structured documents, candidate content fragments with business significance are identified and extracted through word segmentation, keyword positioning, and pattern matching. Each extracted field and candidate fragment is accompanied by its complete field name, physical offset position in the workload, its structural path information, and content relationships. After full field extraction, deduplication and integration are performed to form a field set, denoted as . ,in This is the currently formatted communication unit. For individual fields extracted; for payloads using compression algorithms, the corresponding decompression operation is performed first, provided that the business allows it; for payloads with encrypted transmission, decryption is completed after obtaining decryption permission before field extraction; in scenarios where decryption is not possible but there is a need for risk assessment, approximate field features are extracted based on side-channel features such as message length distribution, session interaction mode, and field length characteristics to ensure the comprehensiveness of field analysis.

[0031] Step 2.2: Based on the field set, identify the sensitive category of each field using regular expression matching, dictionary matching, or machine learning models to determine the sensitivity type of the field and calculate the sensitivity level, resulting in a sensitive label set containing the field, sensitivity type, and sensitivity level. Specifically, this includes: pre-defining the sensitive category set according to the enterprise's data security specifications and relevant regulatory requirements. This collection comprehensively covers various sensitive data types, including personal identification information, contact information, financial account credentials, confidential information related to business operations, and system access authentication information. It also configures corresponding exclusive identification rules and matching standards for each sensitive category. Based on this sensitive category collection, the field set... Each field in Targeted sensitive category identification is conducted using a multi-method fusion strategy: For structured identifier fields with fixed encoding formats and exclusive verification rules, high-precision identification is prioritized using regular expression matching combined with check bits. Initial screening is performed by matching features such as fixed character length, character composition rules, and format separators. The initial screening results are then verified by calculating the field's check bit value and comparing it with the field's built-in check bit to confirm the field's validity and matching accuracy. For industry-specific terms, enterprise-specific identifiers, and business-specific codes, dictionary matching methods are used for precise comparison and identification with a pre-set enterprise-level sensitive dictionary. This pre-set enterprise-level sensitive dictionary is a structured dictionary library built into the data communication protection gateway and supports dynamic updates. It contains enterprise-defined sensitive terms, exclusive codes, business keywords, etc. During matching, the field is preprocessed by word segmentation and stop word removal, and then compared with entries in the dictionary library using precise matching or fuzzy matching algorithms to output a matching degree value.

[0032] For unstructured fields such as natural language text and business scenario descriptions, sequence labeling is used to determine sensitive types. A multi-strategy approach is employed to ensure both accuracy and recall. Simultaneously, feature matching is used to calculate the sensitivity category of each field. probability The final sensitivity type of each field is determined based on the maximum a posteriori principle. The final sensitivity type of each field is expressed as follows: Furthermore, the work of determining sensitivity levels and grading labels was carried out, first configuring basic sensitivity scores for different sensitivity types. Then, the context sensitivity correction item is calculated by combining factors such as the business context in which the field appears, the data flow scenario, and compliance requirements. Ultimately, through the formula Calculate each field The sensitivity levels are determined, and hierarchical labels are defined. Each field is associated and bound to its corresponding sensitivity type and sensitivity level calculated by a formula, completing the standardized labeling process for all fields. A unique identifier is added to each label to achieve traceability, ultimately generating a set of sensitive labels containing fields, sensitivity types, and sensitivity levels. ,in For fields The sensitivity levels calculated by the formula fully preserve the correspondence between fields and sensitive labels, providing accurate data on field sensitivity characteristics for source tracing risk assessment.

[0033] Step 2.3: Extract the receiver's identity, transmission channel attributes, and historical alarm records from the formatted communication unit's communication environment and session metadata as contextual features. Specifically, this includes: accurately extracting the receiver's core identity information from the formatted communication unit's communication environment metadata, including the receiver's name, domain name, access IP address, service interface identifier, and set receiver reputation rating information; comprehensively extracting the transmission channel's security attributes, covering key security content such as the encryption algorithm used, communication port type, whether two-way identity verification is enabled, and whether it is a dedicated encrypted tunnel; and accurately retrieving historical alarm records highly relevant to the current communication unit from the historical alarm database built into the data communication protection gateway, based on key features such as the communication unit's source information, receiver information, and transmission channel identifier. The historical alarm database is a pre-built structured database used to store all kinds of security risk events and abnormal alarm information during cross-domain data communication. The database stores all alarm records in a standardized data format. Each record includes a unique alarm identifier, the characteristics of the communication unit associated with the alarm, the alarm trigger time, the alarm type, the source / receiver / transmission channel information involved, the alarm level, the risk description, and the subsequent processing results and rectification status. At the same time, the database supports accurate retrieval by multiple dimensions such as source, receiver, transmission channel, alarm type, and time dimension. Historical alarm data is continuously updated and retained for a long time, providing complete data support for risk tracing. The retrieved records include the receiver's past violation records, historical risk warning events of similar communication, security alarm information and processing results that have occurred on the transmission channel, etc.

[0034] The extracted basic information, including receiver identity, transmission channel attributes, and historical alarm records, is standardized and structured. Redundant, invalid, and repetitive data is removed, missing key features are marked, and core and valuable features are retained. Finally, the three types of information are integrated to form a unified and standardized contextual feature of the current communication unit, providing core input for the communication environment dimension of risk assessment.

[0035] Step 2.4 involves fusing the sensitive tag set with contextual features to form a unified feature vector representing the sensitive attributes and environmental attributes of the communication unit. Specifically, this includes: first, processing the sensitive tag set... All fields undergo feature quantization processing, and sensitivity levels are assigned to each field. Iso-interval vectorization encoding is performed, and the sensitivity type of each field is converted into a standardized numerical feature vector with one-hot encoding. After integrating the feature vectors of all fields, a sensitivity feature vector that can comprehensively characterize the data sensitivity attributes of the communication unit is formed. ,in The dimension of the sensitive feature vector is defined, and this vector fully preserves the correlation between the sensitive features of each field. Then, feature engineering is performed on the obtained context features. Numerical features such as receiver reputation rating and channel security level are normalized, while discrete features such as receiver identity, encryption method, and historical alarm type are encoded and converted. All non-numerical information is transformed into computable and fusionable standardized numerical features, forming a context feature vector that accurately characterizes the communication environment attributes of the communication unit. ,in The dimension of the context feature vector.

[0036] According to the preset feature fusion rules, the sensitive feature vectors are first processed. With context feature vector Dimension alignment and feature completion are performed, and then the feature concatenation and fusion formula is used. Complete the orderly splicing and integration of the two, among which This is a vector concatenation operator that concatenates the dimensions of two feature vectors. The resulting feature vector expression is: During the fusion process, the original attribute identifiers and weight associations of each feature are preserved to ensure that feature information is not lost or confused. Finally, a high-dimensional, unified feature vector V corresponding to the expression is formed. This vector can simultaneously represent the data sensitivity attributes and communication environment attributes of the communication unit. It will serve as the core input data for the subsequent source tracing risk assessment module, providing comprehensive and accurate feature support for risk score calculation.

[0037] In a preferred embodiment of the present invention, step 3 above may include: Step 3.1: Based on the feature vector, obtain the historical flow records of the data objects involved in the communication unit, and construct a data flow trajectory diagram with the source system, intermediate system, and external services as nodes and transmission behavior as edges. Specifically, this includes: based on the obtained unified feature vector V of the communication unit, accurately extracting core related features such as source system identifier, unique data object identifier, service interface type, and receiver identifier; using these features as retrieval conditions, retrieving the full and traceable historical flow records of the data objects involved in the current communication unit from the historical data flow database of the data communication protection gateway. This record contains data... Key information includes the object's generation node information, transmission node information at each stage, final receiving node information, timestamps of transmission between nodes, the type of transmission channel used, and the identifier of the operating entity. The data flow trajectory map uses the source system, various intermediate systems, and external services in the entire data flow process as nodes. The source system is the node that generates and initiates the data object. It is the data generating entity of the enterprise's internal business systems, data middle platform, microservice modules, etc. It is responsible for the initial generation, storage, and first external transmission of data. It is the starting point of the data flow and includes various internal data production carriers of the enterprise, such as transaction systems, data acquisition systems, and business middle platforms.

[0038] The intermediate system comprises nodes that forward, process, and store data during the data flow process. It serves as an intermediary connecting the source system and external services, encompassing internal enterprise forwarding servers, data exchange platforms, caching systems, and cross-domain transmission proxy servers and cloud gateways. These nodes are responsible for data relay, format conversion, and temporary storage, acting as intermediate links in the data flow. External services are the final receiving nodes in the data flow, representing third-party service entities outside the enterprise network boundary. They include partner business systems, public cloud service platforms, and external interface services, serving as the endpoint for cross-domain data flow. Each node is labeled with attributes such as system type, security level, domain, and service function. Each single data transmission between nodes is represented as a directed edge in the trajectory graph. Each directed edge is labeled with attributes such as transmission time, transmission channel, transmission protocol, and data operation type. Strictly adhering to the actual time sequence of data object flow and network topology, a complete and visualized directed data flow trajectory graph is constructed, denoted as [symbol missing]. ,in Let be the set of all nodes in the trajectory graph. It is the set of all directed edges of the trajectory graph, which can clearly and completely depict the propagation path, flow sequence and potential traceability links of data objects among multiple nodes.

[0039] Step 3.2: Based on the sensitive labels in the feature vector and combined with field linkability, calculate the local risk components of the communication unit. Specifically, this includes: extracting all feature components corresponding to the sensitive label set from the unified feature vector V of the communication unit, and reconstructing each field. Sensitivity level Simultaneously, by combining enterprise data association rules with historical traceability analysis data from the data communication protection gateway, the linkability of each field is obtained through feature association analysis. Linkability is represented by a value between 0 and 1, with a higher value indicating a greater probability that the field can be identified by external tracing after being associated with other information; the field set of a communication unit. To determine the calculation range, first use the following formula: ; The original score of local risk was calculated. ,in These are preset configurable weighting coefficients, corresponding to the weighting percentage of sensitivity level and field linkability in localized risks, and can be dynamically adjusted according to the enterprise's data security needs; then, using the following formula: ; The original local risk score is nonlinearly normalized to obtain the local risk component of the communication unit. ,in This is a preset local risk adjustment coefficient used to adapt to the risk assessment scale of different business scenarios. The value of this component ranges from 0 to 1, and it specifically represents the degree of traceability and re-identification risk generated solely based on the sensitive characteristics of the data itself and the linkability of the fields.

[0040] Step 3.3: Based on the contextual features in the feature vector, analyze and obtain the receiver trust level, channel security level, purpose conformity, and historical risk factors. Calculate the contextual risk component of the communication unit based on the receiver trust level, channel security level, purpose conformity, and historical risk factors. Specifically, this includes: extracting all feature components corresponding to the contextual features from the unified feature vector V of the communication unit; combining the preset multi-dimensional evaluation model and historical security evaluation data; and performing quantitative analysis on each dimension feature to obtain the standardized receiver trust level. Passage security Application conformity and historical risk factors Each indicator is represented by a value between 0 and 1. Higher values for receiver trust, channel security, and purpose conformity correspond to lower risk, while higher values for the historical risk factor correspond to higher risk. This is expressed by the following formula: The context risk component of the communication unit is calculated. ,in These are preset configurable weighting coefficients, and These correspond to the weighting of receiver trust, channel security, purpose compliance, and historical risk factors in contextual risk, and can be dynamically adjusted according to the security focus of cross-domain communication for enterprises. The value range of each component is 0 to 1, specifically representing the degree of tracing and re-identification risk generated based on communication environment attributes, receiver characteristics, transmission channel status, and historical risk events.

[0041] Step 3.4: Based on the data flow trajectory diagram, analyze all transmission paths from the current node to potential leakage nodes, and simultaneously analyze the conditional leakage probability of each transmission path. Calculate the trajectory risk component of the communication unit based on the conditional leakage probability. Specifically, this includes: based on the constructed directed data flow trajectory diagram... Using a depth-first topology path analysis approach, starting from the current node of the data object, it traverses all reachable nodes in the trajectory graph to accurately extract all transmission paths from the current node to all potential leakage nodes (nodes without subsequent forwarding, external untrusted nodes, etc.), forming a complete set of transmission paths. Where K is the total number of transmission paths; for each transmission path It is decomposed into a set of directed edges connected sequentially. Where M is the path The number of edges contained, each edge This corresponds to an independent data transmission segment between nodes; by combining multiple dimensions such as the security level of each node in the path, the risk level of each transmission channel segment, historical leakage records, and access control policies, the edge can be calculated. Independent leakage probability The probability is represented by a value between 0 and 1. The higher the value, the greater the possibility that the transmission segment has been leaked and can be traced.

[0042] Through formula Calculated path Conditional leakage probability This formula calculates the joint probability that no leakage occurs along any edge of the path through multiplication, and then takes the complement to obtain the overall leakage probability of the entire path. This decomposes the path-level conditional leakage probability into a combination of the independent leakage probabilities of each edge. The trajectory risk component of the communication unit was calculated. ,in This is a multiplication operator used to calculate the joint probability that no leakage occurs in all transmission paths. The formula calculates the probability that no leakage occurs in all paths by multiplication, and then takes the complement to obtain the overall leakage risk of the trajectory. The value of this component ranges from 0 to 1, and it specifically characterizes the degree of tracing and re-identification risk generated by the historical flow topology path, node and channel security status of the data object.

[0043] Step 3.5 involves weighted fusion of the local risk component, context risk component, and trajectory risk component, followed by normalization to obtain a traceability risk score characterizing the traceability and re-identification probability of communication. Specifically, this includes configuring weight coefficients for the local risk component, context risk component, and trajectory risk component that can be autonomously learned and optimized by the data communication protection gateway, based on the enterprise's overall security strategy for cross-domain data communication and the risk assessment requirements of different business scenarios. ,and These correspond to the weighting percentages of the three risk components in the overall risk assessment, and the weighting coefficients can be dynamically updated based on the accuracy of historical risk assessments. First, the three risk components are weighted and summed with their corresponding weighting coefficients, then a preset bias term is added. The overall risk composite value is obtained, and finally, the overall risk composite value is input. The activation function undergoes non-linear normalization, ultimately resulting in the following formula: ; The traceability risk score of the communication unit was calculated. ,in The function expression is The source tracing risk score The value range is strictly from 0 to 1. The higher the value, the greater the possibility that the communication unit is subject to source tracing analysis and subject re-identification by external attackers or untrusted third parties during cross-domain transmission. This score provides a unified, accurate, and quantitative decision-making basis for the selection and execution of subsequent data concealment transmission strategies.

[0044] In a preferred embodiment of the present invention, step 4 above may include: Step 4.1: Based on the source tracing risk score and combined with the current strategy configuration, select attack types that match the current communication risk level from the pre-built attack scenario library to generate an adversarial attack sample set. Specifically, this includes: based on the calculated communication unit source tracing risk score... Combined with the protection policy configuration currently loaded and running on the data communication protection gateway Extract core parameters from the configuration, such as the combination of protection operators, rule matching logic, risk threshold range, and execution pipeline, and simultaneously execute them according to preset parameters. The mapping relationship between the value range and the risk level accurately determines the low, medium, high, or ultra-high source tracing risk level corresponding to the current communication unit. Using this risk level and the core parameters of the strategy configuration as dual search conditions, the system selects attack types that match the current communication risk level and protection strategy from a pre-built and continuously updated attack scenario library. This attack scenario library contains a standardized set of attack scenarios. It comprehensively covers typical source tracing attacks such as prompt injection attacks, data outbound attacks, re-identification attacks, member inference attacks, model inversion attacks, traffic analysis and side-channel attacks, and each attack type All of them predefine the attack input pattern, target business interface, execution constraints, and expected tracing attack target, and include general attack templates and enterprise-customized attack samples.

[0045] Based on the normal service request corresponding to the current communication, input x, and given the disturbance budget and norm type Under strict constraints, adversarial inputs are constructed by specifically modifying the basic input through iterative optimization algorithms or rule-based perturbation generation algorithms. The modification process must meet certain constraints. And maximize the loss function ,in To characterize the probability of sensitive information leakage, the scoring function for successful strategy bypass or output of inappropriate information, a set of adversarial attack samples containing multiple types, intensities, and scenarios of attack instances is ultimately generated. This ensures that the samples cover the current protection strategy in testing.

[0046] Step 4.2: Input the set of adversarial attack samples into the data communication protection gateway under the current policy configuration for simulation testing. Record the sensitive information leakage, policy bypass behavior, and false interception events of each attack sample during the test process, and obtain the test results record. Specifically, this includes: generating the set of adversarial attack samples... Encapsulate the simulated traffic according to the format of real business traffic and input it into the currently loaded policy configuration. In the data communication protection gateway, a full-process protection test is performed in a shadow simulation environment that is completely consistent with the enterprise's real online business environment. Online business data and test data are completely isolated to avoid any impact on normal business operations. During the simulation test, the current policy configuration is strictly followed. The execution pipeline in the system sequentially performs full-process protection operations such as field identification, rule detection, data transformation, and access control on the simulated traffic.

[0047] Simultaneously, behavioral data, protection trigger data, and result data of each attack sample throughout the entire testing process are continuously collected and fully recorded. The focus is on accurately recording the specific locations, types, and extent of sensitive information leakage that still occurs after each attack sample triggers protection; the specific triggering conditions, bypass paths, and points of failure of protection rules for attack behaviors that breach data communication protection gateway rule detection; and the triggering scenarios, involved business interfaces, and reasons for misinterpretation of normal business traffic as risky traffic by the gateway. All multi-dimensional data and various abnormal event information collected are organized and stored in a unified structured format according to the attack sample's unique identifier, attack type, triggered protection rule ID, detection judgment result, protection handling status, and abnormality details, forming a complete test result record that can be directly used for subsequent analysis.

[0048] Step 4.3: Based on the test results, statistically analyze the leakage probability, bypass rate, and false interception rate for each attack scenario. Simultaneously, evaluate the effectiveness and robustness of the current strategy in scenarios involving prompt injection, data out-of-banding, and re-identification attacks to obtain the strategy effectiveness evaluation results. Specifically, this includes: based on the obtained structured test results, analyzing each attack type in attack scenario set A. Perform classification, filtering, and statistical analysis to accurately calculate the current strategy configuration. The following attack scenarios correspond to the probability of sensitive information leakage. Success rate of bypassing attacks and the false blocking rate of normal business traffic Each indicator is represented by a value between 0 and 1. The calculation process requires removing invalid data from the test to ensure the accuracy of the results. Based on the threat level of each attack type to enterprise data security, each attack type... Configure preset threat weights Through formula and Calculate the overall leakage risk index and the overall bypass risk index of the current strategy separately. At the same time, summarize and statistically analyze the false interception data under each attack scenario to obtain the overall false interception risk index. .

[0049] Using three core quantitative indicators as the core evaluation criteria, this study focuses on three high-priority attack scenarios: hint injection, data outage, and re-identification. The evaluation assesses the protection capabilities of the current policy configuration from three dimensions: detection accuracy, blocking effectiveness, and business compatibility robustness. It also considers the system resource consumption and data transmission latency overhead incurred during policy execution. Through a comprehensive evaluation formula: ; Calculate the overall evaluation value of the strategy, where The system uses preset indicator weights, with the sum of the weights being 1. Finally, it integrates the quantitative evaluation results of each attack scenario, the comprehensive evaluation value, the conclusion of the protection capability judgment, and the analysis of advantages and disadvantages into a complete strategy effectiveness evaluation result.

[0050] Step 4.4: Based on the strategy effectiveness assessment results, identify weaknesses in the current strategy configuration, analyze the attack types and reasons for protection failures corresponding to these weaknesses, and obtain strategy weakness information. Specifically, this includes: based on the obtained strategy effectiveness assessment results, determining the probability of data leakage. Bypass rate For attack scenarios that exceed preset thresholds and have insufficient robustness or low overall evaluation scores, accurately pinpoint the specific evaluation items and confirm the current strategy configuration. This study identifies existing vulnerabilities in protection and their corresponding business application scenarios. It analyzes the typical attack types, implementation methods, and variations of each vulnerability, considering the specific types of adversarial attack samples, the triggering conditions of protection rules during testing, and the underlying logic of the data communication protection gateway's full-process protection execution logs and policy configurations. This confirms the specific failure manifestations of each vulnerability in the actual protection process, including rule omissions, operator defense failures, and threshold judgment deviations. Furthermore, it provides an in-depth analysis from multiple core dimensions, including protection operator strength, sensitive identification rule coverage, rule matching execution order, source tracing risk threshold settings, and access control policy granularity. The root causes of protection failures at each vulnerability include insufficient defense capabilities of concealment operators against specific attack patterns, lack of coverage of attack variations by sensitive identification rules, rule priority conflicts due to unreasonable rule matching order, mismatch between the source tracing risk threshold and the actual attack risk level, and coarse-grained access control policies that cannot resist sophisticated attacks. The specific location, corresponding attack type, failure manifestation, impact scope, and core causes of each protection vulnerability are systematically and structurally summarized and organized to form complete policy vulnerability information, including detailed problem descriptions, risk impact levels, and precise source tracing, providing accurate targeting basis for policy calibration.

[0051] Step 4.5: Based on the strategy vulnerability information, generate calibration recommendations that include risk threshold adjustments, concealment operator enhancements, or rule matching order optimizations. Specifically, this includes: based on the obtained strategy vulnerability information, combined with the traceability risk score of the communication unit. Based on distribution characteristics, security protection requirements and data availability tolerances for different business scenarios within enterprises, and taking into account the system processing capabilities of data communication protection gateways, targeted policy calibration suggestions are generated that are fully compatible with the existing policy system of data communication protection gateways and can be directly implemented. Furthermore, for protection failures caused by unreasonable setting of traceability risk thresholds, a set of traceability risk judgment thresholds is generated. The system provides dynamic adjustment suggestions, confirming the specific correction direction and range for each threshold to achieve precise matching between policy levels and actual communication tracing risks. For attack bypass issues caused by insufficient defense capabilities of protective operators, it generates strength enhancement suggestions for covert operators such as segmented masking, tokenization, generalization and anonymization, differential perturbation, and field-level encryption, for example, increasing the noise intensity of differential perturbation. ,promote The degree of generalization of anonymous generalization Optimize the algorithm complexity of field-level encryption, etc.

[0052] Simultaneously, specific parameter values for operator strength adjustment are determined; for protection vulnerabilities caused by unreasonable rule matching order, optimization suggestions for the execution order of rule matching such as sensitive identification, anomaly detection, and access control are generated, the priority of different types of detection rules is adjusted to avoid rule conflicts or missed detections, and sensitive identification rules that do not cover attack transformation features are supplemented and improved; for protection failures caused by coarse-grained access control policies, refined optimization suggestions for access control policies are generated; at the same time, combined with the core causes of each vulnerability, supplementary calibration suggestions such as refined correction of anomaly detection conditions and optimization of protection operator combinations are generated. All calibration suggestions clearly define the optimization direction, specific adjustment parameters, execution steps, and expected optimization effects, providing accurate and implementable basis for updating and optimizing protection policies for the policy engine.

[0053] In a preferred embodiment of the present invention, step 5 above may include: Step 5.1: Based on the source tracing risk score and in conjunction with calibration recommendations, select a strategy level that matches the current risk level from the preset strategy level set to obtain the selected strategy level. Specifically, this includes: based on the calculated source tracing risk score of the communication unit... Simultaneously, combining the complete strategy calibration recommendations output, we first extract the core execution requirements involved in the calibration recommendations, such as adjusting the traceability risk threshold, adapting the protection strength, and optimizing the operator combination. Based on these requirements, we then adjust the preset traceability risk threshold set. Make targeted dynamic corrections, and strictly maintain the corrected state. The numerical ranking relationship is determined, and the corrected thresholds must match the rectification requirements for security vulnerabilities identified in the adversarial evaluation; then, based on the corrected threshold set, the current source tracing risk score of the communication unit is accurately determined. The specific risk range to which it belongs is determined according to the preset binding rules between risk ranges and strategy tiers, from a predefined set of strategy tiers. Match the corresponding gear in the middle, where Matching basic protection level ; Matching protection level ; Matching high-strength protection level ; Matching blocking / manual review levels For the matched policy level, the pre-parameter adaptation is completed in conjunction with the calibration recommendations, including adjusting the trigger conditions of the protection rules within the level and optimizing the basic execution strength of the operator, to ensure that the protection capability of the selected level is compatible with the actual traceability risk of the current communication and the data availability tolerance of the enterprise business, and finally obtain the selected policy level that can be directly implemented and has completed parameter pre-configuration.

[0054] Step 5.2: Based on the selected policy level, match the corresponding data transformation operator sequence from the preset operator library to construct the execution pipeline for the current communication unit. Specifically, this includes: based on the determined selected policy level, first retrieving the preset core protection capability requirements, operator combination benchmark rules, and protection strength range for that level; simultaneously, combining the specific requirements mentioned in the calibration recommendations, such as concealment operator enhancement and operator combination optimization, accurately matching the corresponding data transformation operator sequence from the standardized preset operator library built into the data communication protection gateway. This operator library includes field masking, tokenization, generalization and anonymization, differential perturbation, and field-level... The system includes a full range of data transformation operators, such as encryption and output content review, each with adjustable execution parameters of varying strengths to support flexible adaptation to different protection needs. Following a pipeline construction principle of identifying sensitive fields before data transformation, prioritizing lightweight protection operations before high-strength protection operations, and emphasizing basic transformations before composite transformations, all matched data transformation operators are systematically arranged to confirm their execution order, triggering preconditions, data processing range, and inter-operator linkage rules. Based on calibration recommendations, optimized execution parameters are configured for each operator, such as adjusted noise intensity for the differential perturbation operator. Configure improved generalization level for generalization and anonymization operators. First, configure the optimized mask coverage ratio for the field mask operator to ensure that the operator execution intensity matches the current source tracing risk level. Finally, the operator sequence that is arranged in an orderly manner, has completed parameter configuration and clear execution rules is structured and integrated. A unique execution identifier and execution result verification requirement are added to each operator to build a personalized and directly executable complete data transformation execution pipeline for the current communication unit.

[0055] Step 5.3: Obtain the original payload corresponding to the current communication unit, input the original payload into the execution pipeline, and sequentially perform one or more of the following operations: field masking, tokenization, generalization and anonymization, differential perturbation, or field-level encryption, to obtain the transformed intermediate payload. Specifically, this includes: accurately extracting the original payload corresponding to the current communication unit from the communication units after formatting. During the extraction process, the original payload's field hierarchy structure, data types of each field, native encoding methods, and relationships between key business fields are fully preserved. Simultaneously, redundant transmission identifiers and invalid padding data unrelated to the business are removed to ensure the purity of the original payload. The extracted and preprocessed original payload is then standardized according to the input data format requirements preset by the execution pipeline, ensuring that the payload's input format perfectly matches the processing format of the first operator in the pipeline. The execution pipeline is a finite set of data transformation operators arranged in a preset execution order, in the form of… Where K is the number of data transformation operators in the operator set. As the first operator to be executed in the pipeline, The last operator executed in the pipeline, the operator set strictly follows the execution logic that the output of the previous operator is the input of the next operator, according to... The steps are executed in sequence.

[0056] The standardized raw load is input into the execution pipeline composed of this set of operators. The pipeline performs targeted data transformation processing on the load sequentially according to the execution order set by the operator set. For fields in the load marked with different sensitivity levels in step 2.2, one or a combination of operations such as field masking, tokenization, generalization and anonymization, differential perturbation, and field-level encryption are selected and performed according to the configuration requirements of the operator set. Among them, the differential perturbation operation strictly follows the formula: ,and The generalization and anonymization operations satisfy the equivalence class constraint. After each operator is executed, the load transformation result of that step is validated for validity and reasonableness. The validation includes whether the field structure is complete, whether the necessary business information is retained, and whether the transformation effect matches the preset protection strength. If the validation fails, the operator re-execution mechanism is triggered. After all operators are executed and all validation steps pass, the transformed intermediate load is obtained with complete field structure, adequate concealment of sensitive data, and compliance with protection attributes. The generation logic formula for the intermediate load is as follows: ; Step 5.4 involves adapting the output format and encapsulating the integrity of the transformed intermediate load to generate a final load that meets the requirements of the target interface. This final load serves as the data after covert transmission. Specifically, this includes: accurately extracting the core adaptation information of the target receiver's service interface from the metadata set of the current communication unit, including the communication protocol types supported by the interface, data transmission format requirements, field naming conventions, data encoding standards, message structure constraints, etc., to confirm the output format and encapsulation requirements of the final load; based on this adaptation information, performing refined output format adaptation processing on the obtained transformed intermediate load. According to the requirements of the target interface, this involves sequentially performing operations such as adjusting the field hierarchy structure, accurately converting the data types of each field, uniformly adapting the data encoding method, and standardizing the naming of business fields. For interfaces with mandatory field requirements, compliant default fill data is added; for interfaces with field length limitations, compliant trimming or adaptation is performed to ensure that the format-adapted load fully complies with the parsing rules of the target interface, avoiding transmission failures due to format incompatibility.

[0057] After format adaptation, the payload undergoes protocol-level integrity encapsulation. Following the communication protocol requirements of the target interface, necessary encapsulation information is added to the payload, including standard protocol headers, protocol trailers, data check bits, business identifier fields, and payload length identifiers. The encapsulation process strictly adheres to the protocol's field ordering and length constraints. After encapsulation, a full-dimensional integrity check is performed on the entire payload, including check bit verification, field length verification, and structural integrity checks, to prevent data loss, field misalignment, or information corruption during encapsulation. Upon successful verification, a final payload fully compliant with all technical requirements and communication specifications of the target interface is generated. This payload, consisting of secure data after all covert transmission processing, can be directly forwarded to the target receiver through the data communication protection gateway, achieving secure, compliant, and covert transmission of sensitive data in cross-domain communication.

[0058] In a preferred embodiment of the present invention, step 6 above may include: Step 6.1: Statistically record the decision records for each strategy execution process. Based on these decision records, obtain the link identifier, strategy identifier, risk score, input summary, and output summary information. Generate audit log entries containing timestamps and hash verification to obtain a traceable audit evidence chain. Specifically, this includes: performing full statistics on the entire execution process of each data communication protection strategy, completely recording the decision records for each stage of strategy execution, including core content such as risk assessment decisions, strategy level selection decisions, operator sequence matching decisions, and data transformation execution decisions; and accurately extracting the unique link identifier from the decision records. Strategy identifier Risk score for tracing the source of communication units Original load input summary Summary of final load output after transformation It provides key information such as the time stamp and generates a precise timestamp for each record. To ensure the timeliness and uniqueness of audit information, the extracted information is integrated according to a preset format, and an audit summary is calculated using a hash algorithm. Each piece of integrated information is given a hash check value and an anti-tampering identifier to generate a structured audit log entry. All audit log entries are stored in a chain according to time sequence and link relationship to form an immutable and fully traceable audit evidence chain, enabling full-dimensional traceability and evidence collection for each policy execution process.

[0059] Step 6.2: Extract feature data of each strategy execution from the audit evidence chain, and simultaneously obtain the generated adversarial evaluation results and alarm events reported by third-party services. Construct a training sample set containing feature vectors and leakage labels. Specifically, this includes: extracting the communication unit feature vectors and source tracing risk scores corresponding to each strategy execution in batches from the constructed traceable audit evidence chain. The system acquires feature data such as selected strategy levels, operator execution parameters, and data transformation results. The feature vector includes all dimensions of sensitive label features, context features, and trajectory features. Simultaneously, it acquires all evaluation results generated during the adversarial evaluation process, including leakage probability, bypass rate, false interception rate, and strategy vulnerability information under various attack scenarios. It also acquires external feedback data such as security alerts and data leakage anomalies reported by third-party services / external systems (third-party services / external systems refer to partner services, cloud services, or other external systems located outside the enterprise boundary; they are the receivers or processors of data communication, receiving request or response data processed by the data communication protection gateway; they can only see the results after de-identification and anonymization, making it difficult to directly trace back to the internal entity). The system then correlates and fuses the feature data, adversarial evaluation results, and external alerts, labeling each set of correlated data with a corresponding leakage tag. ,in This indicates that a data breach or violation occurred during the execution of this strategy. This indicates that no leakage occurred during the strategy execution and the protection was effective; the labeled feature data and leakage tags are then structured and organized to construct a feature vector containing communication units. With leaked labels training sample set This provides a data foundation for the optimization of model parameters and the adjustment of thresholds.

[0060] Step 6.3 involves inputting the training sample set into the source tracing risk assessment model and optimizing the model parameters through gradient descent iterative calculations to update the fusion weights of local risk, contextual risk, and trajectory risk, thereby obtaining the optimized risk assessment model. Specifically, this includes: inputting the constructed training sample set... The data is input into the source tracing risk assessment model built into the data communication protection gateway. This model adopts a multi-dimensional feature fusion neural network architecture based on an attention mechanism, consisting of an input layer, a feature encoding layer, an attention fusion layer, a risk calculation layer, and an output layer. It is an improved architecture that combines the classic logistic regression model with the feature fusion capabilities of deep learning. The input layer is responsible for receiving the multi-dimensional feature vectors of the communication unit. The feature encoding layer uses a multilayer perceptron (MLP) to non-linearly encode sensitive label features, context features, and trajectory features respectively. The attention fusion layer assigns dynamic weights to features of different dimensions to enhance the contribution of high-risk features. The risk calculation layer integrates the encoded features and outputs a normalized source tracing risk score. The output layer maps the score to [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 ... , 1] Interval; taking the deviation between the source tracing risk score predicted by the model and the actual leakage label as the optimization objective, the logistic regression loss function is defined as follows: ; in M The total number of training samples, For the model to the first m Each sample communication unit The predicted source tracing risk score; the loss function is iteratively calculated using gradient descent optimization, according to the formula. Update the model's parameter vector w ,in For model learning rate, It is the first i The new values of the model parameters after the update; It is the first i The old values of each model parameter before the update; Includes feature weights in the attention fusion layer and fusion weights in the risk calculation layer (local risk, context risk, trajectory risk). and in local risk calculation In context risk calculation All learnable parameters are evaluated; after each parameter iteration update, the model's prediction accuracy, recall, and F1 score are verified using an independent validation set to ensure the model's prediction accuracy and generalization ability. The model is considered valid when the loss function converges to a preset threshold (e.g., ...). L When the value is less than 0.01 or the number of iterations reaches a set value (e.g., 1000 rounds), stop parameter optimization and obtain the optimized source risk assessment model after weight update and parameter tuning.

[0061] Step 6.4: Based on the training sample set, statistically analyze the actual leakage rate corresponding to each strategy level, compare the actual leakage rate with the preset target leakage rate, and adaptively adjust the risk threshold set based on the comparison results to obtain the updated strategy threshold configuration. Specifically, this includes: based on the constructed training sample set... According to the preset strategy gear set The samples are categorized, and the ratio of the number of samples that actually experienced data leakage to the total number of samples executed under each strategy tier is calculated to obtain the RealLeak for each strategy tier. The preset target leakage rate (TargetLeak) for each strategy tier is retrieved. This target leakage rate is pre-configured based on the enterprise's data security requirements and business availability tolerance, with different target leakage rate thresholds for different tiers. The actual leakage rate and target leakage rate for each strategy tier are compared and analyzed one by one. If the actual leakage rate is higher than the target leakage rate, it indicates that the current tier's protection strength is insufficient; if the actual leakage rate is significantly lower than the target leakage rate, it indicates that the current tier's protection strength is too high and may affect business availability. Based on the comparison results, the formula is used... For the set of traceability risk thresholds Perform adaptive adjustment, where Adjust the learning rate to the threshold. θ i For a single threshold in the threshold set, It is the new value after a single threshold update; It is the old value before the single threshold update; it is strictly maintained after adjustment. The numerical relationship is determined; at the same time, a protection boundary is configured for threshold adjustment to avoid excessive tightening or loosening of the threshold due to short-term sample fluctuations, and finally an updated policy threshold configuration that is adapted to the actual protection needs is obtained.

[0062] Embodiments of the present invention also provide a computing device, including: a processor and a memory storing a computer program, wherein the computer program, when executed by the processor, performs the method described above. All implementations in the above method embodiments are applicable to this embodiment and can achieve the same technical effects.

[0063] Embodiments of the present invention also provide a computer-readable storage medium storing instructions that, when executed on a computer, cause the computer to perform the method described above. All implementations in the above method embodiments are applicable to this embodiment and can achieve the same technical effects.

[0064] The above description represents the preferred embodiments of the present invention. It should be noted that those skilled in the art can make various improvements and modifications without departing from the principles of the present invention, and these improvements and modifications should also be considered within the scope of protection of the present invention.

Claims

1. A data privacy transmission method based on traceability risk assessment and adversarial evaluation, characterized in that, The method comprises: Through the data communication protection gateway deployed between the internal service and the external network, the inbound and outbound communication traffic is uniformly accessed to perform session reconstruction and protocol analysis on the original network packet to obtain a communication unit abstracted from each request or response; Based on the communication unit, the load content thereof is parsed to identify sensitive data and generate a sensitive tag, and by extracting the receiver identity, transmission channel attribute and historical alarm record from the communication environment and session metadata as context features, a feature vector is formed by fusion; The feature vector is combined with the historical data flow trajectory to construct a data flow trajectory graph to calculate the local risk, context risk and trajectory risk components of the communication unit, and the traceability risk score representing the possibility of communication traceability and re-identification is obtained by fusion; According to the traceability risk score and the current policy configuration, an adversarial attack sample is simulated and tested to evaluate the effectiveness of the policy in the prompt injection, data out-of-band and re-identification attack scenarios, so as to identify the weak points of the policy and generate calibration suggestions; According to the traceability risk score and the calibration suggestions, the corresponding policy gear is selected to execute a pipeline composed of multiple data transformation operators to anonymize, encrypt, generalize or differentially perturb the communication load to obtain the data after the transmission is hidden; During the processing, an unforgeable audit log is generated for each policy execution to form an evidence chain, and by combining the adversarial evaluation results and external alarm events to construct training samples, the traceability risk evaluation model and the policy threshold are incrementally updated.

2. The data obfuscation transmission method based on traceability risk assessment and adversarial evaluation according to claim 1, characterized in that, Through the data communication protection gateway deployed between the internal service and the external network, the inbound and outbound communication traffic is uniformly accessed to perform session reconstruction and protocol analysis on the original network packet to obtain a communication unit abstracted from each request or response, comprising: The data communication protection gateway is accessed to the communication link between the internal service and the external network in a series deployment, bypass deployment or Sidecar injection manner, so that the inbound and outbound communication traffic uniformly flows through the data communication protection gateway for processing; The data communication protection gateway performs session reconstruction on the received original network packet to obtain a reconstructed complete packet, and performs protocol identification and analysis on the load in the reconstructed complete packet according to the obtained protocol type, for abstracting each complete request or response into a communication unit; Based on the preset filtering rule, the communication unit is preliminarily screened to mark the communication unit involving sensitive data or having potential traceability risk as a to-be-protected object to obtain a set of to-be-protected communication units; The communication units in the set of to-be-protected communication units are processed by fragmentation or resampling to split the super-large load or long connection traffic into multiple logical sub-units, and by adjusting the sampling rate according to the business importance, a formatted communication unit is formed.

3. The data obfuscation transmission method based on traceability risk assessment and adversarial evaluation according to claim 2, characterized in that, Based on the communication unit, the load content thereof is parsed to identify sensitive data and generate a sensitive tag, and by extracting the receiver identity, transmission channel attribute and historical alarm record from the communication environment and session metadata as context features, a feature vector is formed by fusion, comprising: The formatted communication unit's payload content is structured and its fields are extracted. Fields are extracted based on the content type of the payload content, along with field names, locations, and structural path information, to obtain a set of fields. Based on the field set, each field is identified for its sensitive category through regular expression matching, dictionary matching, or machine learning models to determine the sensitivity type of the field and calculate the sensitivity level, resulting in a set of sensitive tags containing the field, sensitivity type, and sensitivity level. Extract the receiver's identity, transmission channel attributes, and historical alarm records from the communication environment and session metadata of the formatted communication unit as context features; The set of sensitive tags is fused with contextual features to form a unified feature vector that represents the sensitive attributes and environmental attributes of the communication unit.

4. The data obfuscation transmission method based on traceability risk assessment and adversarial evaluation according to claim 3, characterized in that, By combining feature vectors with historical data flow trajectories, a data flow trajectory map is constructed to calculate the local risk, contextual risk, and trajectory risk components of communication units. These components are then fused to obtain a traceability risk score characterizing the traceability and re-identification probability of communication, including: Based on the feature vector, the historical flow records of the data objects involved in the communication unit are obtained, and a data flow trajectory diagram is constructed with the source system, intermediate system and external service as nodes and the transmission behavior as edges. Based on the sensitive labels in the feature vector and combined with the field linkability, the local risk component of the communication unit is calculated; Based on the contextual features in the feature vector, the receiver trust level, channel security level, purpose conformity and historical risk factor are analyzed and obtained. The contextual risk component of the communication unit is calculated based on the receiver trust level, channel security level, purpose conformity and historical risk factor. Based on the data flow trajectory diagram, all transmission paths from the current node to the potential leakage node are analyzed, and the conditional leakage probability of each transmission path is obtained. The trajectory risk component of the communication unit is calculated based on the conditional leakage probability. The local risk component, context risk component, and trajectory risk component are weighted and fused, and after normalization, the traceability risk score, which represents the traceability and re-identification probability of communication, is obtained.

5. The data obfuscation transmission method based on traceability risk assessment and adversarial evaluation according to claim 4, characterized in that, Based on the source tracing risk score and the current strategy configuration, adversarial attack samples are constructed for simulation testing. The effectiveness of the strategy is evaluated in scenarios involving prompt injection, data outbound attacks, and re-identification attacks. This aims to identify weaknesses in the strategy and generate calibration suggestions, including: Based on the source tracing risk score and combined with the current strategy configuration, an attack type that matches the current communication risk level is selected from the pre-built attack scenario library to generate an adversarial attack sample set. The set of adversarial attack samples is input into the data communication protection gateway under the current policy configuration for simulation testing. The sensitive information leakage, policy bypass behavior and false interception events of each attack sample are recorded during the test process, and the test results are recorded. Based on the test results, the leakage probability, bypass rate and false interception rate under each attack scenario are statistically analyzed. At the same time, the effectiveness and robustness of the current strategy in the scenarios of prompt injection, data out-of-band and re-identification attack are evaluated to obtain the strategy effectiveness evaluation results. Based on the results of the strategy effectiveness assessment, we identify the weak links in the current strategy configuration, analyze the attack types and reasons for the failure of protection corresponding to the weak points, and obtain information on the strategy weaknesses. Based on information about policy weaknesses, calibration recommendations are generated that include adjustments to risk thresholds, enhancements to concealment operators, or optimizations to the rule matching order.

6. The data obfuscation transmission method based on traceability risk assessment and adversarial evaluation according to claim 5, characterized in that, Based on the source tracing risk score and calibration recommendations, the corresponding strategy level is selected, and a pipeline consisting of multiple data transformation operators is executed to anonymize, encrypt, generalize, or differentially perturb the communication load, obtaining the concealed transmitted data, including: Based on the source risk score and combined with calibration recommendations, a strategy level that matches the current risk level is selected from the preset set of strategy levels to obtain the selected strategy level. Based on the selected strategy level, the corresponding data transformation operator sequence is matched from the preset operator library to construct the execution pipeline for the current communication unit; Obtain the original load corresponding to the current communication unit, input the original load into the execution pipeline, and perform one or more of the following operations in sequence: field masking, tokenization, generalization and anonymization, differential perturbation or field-level encryption to obtain the transformed intermediate load. The transformed intermediate load is adapted to the output format and encapsulated for integrity, generating a final load that meets the requirements of the target interface, which serves as the data after covert transmission.

7. The data obfuscation transmission method based on traceability risk assessment and adversarial evaluation according to claim 6, characterized in that, During the processing, an immutable audit log is generated for each policy execution, forming a chain of evidence. Training samples are constructed by combining adversarial evaluation results with external alert events to incrementally update the source tracing risk assessment model and policy thresholds, including: The decision records for each strategy execution process are statistically analyzed. Based on the decision records, the link identifier, strategy identifier, risk score, input summary and output summary information are obtained to generate audit log entries containing timestamps and hash verification, thus obtaining a traceable audit evidence chain. Feature data of each strategy execution is extracted from the audit evidence chain, and adversarial evaluation results and alarm events reported by third-party services are obtained to construct a training sample set containing feature vectors and leakage labels. The training sample set is input into the source tracing risk assessment model, and the model parameters are optimized by gradient descent iterative calculation to update the fusion weights of local risk, context risk and trajectory risk, so as to obtain the optimized risk assessment model. Based on the training sample set, the actual leakage rate corresponding to each strategy level is statistically analyzed. The actual leakage rate is compared with the preset target leakage rate. The risk threshold set is adaptively adjusted based on the comparison results to obtain the updated strategy threshold configuration.

8. A computing device, comprising: include: One or more processors; A storage device for storing one or more programs, which, when executed by one or more processors, cause the one or more processors to implement the method as described in any one of claims 1 to 7.

9. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a program that, when executed by a processor, implements the method as described in any one of claims 1 to 7.