A neural machine translation internet of things remote attestation method for data flow attacks
By learning the mapping relationship between program input and control flow path through a neural machine translation model and combining it with control flow graph structure constraints, this approach addresses the shortcomings of existing dynamic remote proof schemes in detecting data flow attacks, and achieves unified detection and interpretability analysis of control flow and data flow attacks.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- NANJING UNIV OF SCI & TECH
- Filing Date
- 2026-03-24
- Publication Date
- 2026-06-23
Smart Images

Figure CN122268631A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of software security and program analysis technology, and in particular to a neural machine translation-based remote verification method for Internet of Things (IoT) targeting data stream attacks. Background Technology
[0002] With the increasing complexity and networking of software systems, dynamic attacks based on memory corruption vulnerabilities have become one of the core threats in the field of cybersecurity. These attacks can generally be categorized into two main types: first, control flow hijacking attacks, where attackers tamper with the program execution flow, implementing code reuse techniques such as Return-Oriented Programming (ROP) and Jump-Oriented Programming (JOP) to bypass traditional defense mechanisms; and second, non-control data attacks, where attackers tamper with critical data in the program (such as configuration tags, credentials, function pointers, etc.) without altering the control flow, which can also lead to serious consequences such as privilege escalation, information leakage, or system crashes. Both types of attacks are sophisticated and directly harmful, posing a severe challenge to various software systems. Especially in resource-constrained scenarios such as the Internet of Things (IoT) and edge computing, the software often lacks robust real-time protection mechanisms, making it more vulnerable to dynamic attacks exploiting memory vulnerabilities.
[0003] To ensure the integrity of software during remote execution, dynamic remote proof technology, capable of effectively detecting and responding to dynamic attacks, has emerged. Unlike traditional static proofs, which only verify the integrity of the initial code, dynamic remote proofs enable the verifier to continuously evaluate the behavior of the software on the remote device during runtime. As a key branch of this field, control flow proofs effectively identify various abnormal execution behaviors, including control flow hijacking, by capturing and verifying the actual execution path of the program in real time and determining whether its control flow conforms to the expected security policies and legal patterns. This technology not only provides direct evidence for runtime attack detection but also lays the foundation for building a trusted proof system covering the entire software lifecycle, becoming a core technical approach for dealing with complex dynamic attacks.
[0004] However, existing dynamic remote authentication schemes have significant limitations in detecting and defending against dataflow attacks. Mainstream solutions primarily rely on verifying the legitimacy of program control flow, effectively identifying abnormal jumps and hijacking behaviors in the execution path. However, they generally lack detection capabilities for dataflow attacks that do not alter the control flow but only modify critical memory data. Such attacks often silently modify critical data affecting program logic, such as account balances, security flags, or sensor values, while maintaining control flow compliance, thereby causing serious business errors or system outages. Because existing mechanisms only focus on the legitimacy of the execution path, they create a structural blind spot for detecting data semantic tampering, thus proving inadequate in dealing with attacks that only manipulate data without deviating from the control flow.
[0005] In summary, existing solutions have explored diverse approaches to enhance control flow proofs to address data flow threats, encompassing various technical paths from hardware customization, compiler instrumentation, binary rewriting to machine learning. However, they generally face difficult trade-offs between monitoring integrity, deployment versatility, performance overhead, and detection determinism. Solutions heavily reliant on specific hardware or TEEs limit their portability and large-scale deployment; pure software solutions often struggle to balance efficiency and security strength; and machine learning-based methods introduce uncertainty. Designing a lightweight, platform-compatible proof mechanism capable of accurate and deterministic tracing and verification of program control flow and critical data flow remains a core challenge that urgently needs to be addressed in this field. Summary of the Invention
[0006] The purpose of this invention is to provide a neural machine translation IoT remote verification method for data flow attacks, so as to achieve a unified detection capability for both control flow attacks and non-control flow attacks, and overcome the limitation of traditional dynamic remote verification schemes in their insufficient detection capability for data flow attacks achieved by tampering with key memory data.
[0007] The technical solution to achieve the purpose of this invention is: a neural machine translation IoT remote verification method for data stream attacks. The verifier verifies and deeply analyzes the program execution behavior on the remote verifier's device. The entities involved are: verifier and verifier. The solution of this invention mainly consists of two stages: an offline stage and a runtime verification stage; wherein,
[0008] The offline phase involves initializing secure encrypted communication for both the verifier and the prover, training a neural machine translation model using a fused graph neural network, and deploying the device program instrumentation to the prover's end. The specific steps in this phase are as follows:
[0009] Step 1-1, Security credential presetting and communication initialization:
[0010] This step aims to establish a secure foundation for remote proof communication between the prover and the verifier. The verifier and prover first negotiate a symmetric key pair k, which is used for integrity protection and replay prevention mechanisms in subsequent challenge-response interactions. This step guarantees the security of communication during the online phase.
[0011] Steps 1-2: Program instrumentation and control flow dataset construction:
[0012] To enable the device to output accurate control flow paths during runtime, this solution performs lightweight instrumentation on the target IoT program, allowing it to capture the entry order of each basic block during execution. The verifier disassembles and analyzes the target program to identify the boundaries of all basic blocks. Then, it iterates through the starting address of each basic block, inserting a function call instruction before each block's starting address. This function calls the control flow path recording function, which takes the current basic block identifier as a parameter and records it in a specified memory space during execution. Finally, a new executable file embedding control flow tracing logic is generated, providing a data acquisition foundation for subsequent runtime verification.
[0013] Subsequently, based on a large number of benign inputs, the program is executed and sample pairs of <input sequence, actual execution path> are systematically collected. During the path collection process, a coverage-oriented input generation mechanism is introduced to maximize the exploration depth and breadth of different execution branches of the program, so that the model can fully learn the semantic correspondence between input and path. In addition, factors such as random seeds and environmental variables that may affect the path are regarded as part of special inputs to maintain the determinism of the path generation process.
[0014] Steps 1-3, Training the Neural Machine Translation Model:
[0015] This step trains a neural machine translation model with an encoder-decoder architecture to learn the mapping relationship from program input sequences to control flow execution path sequences. Specifically, the model first treats the program input and execution path as source and target language sequences, respectively, and performs basic training in a sequence-to-sequence translation manner to enable the model to grasp the semantic relationships between them. Subsequently, in the fine-tuning stage, program structure constraints are introduced, that is, the last layer of the decoder is replaced with a graph neural network module. This module integrates the control flow graph embedding as prior knowledge into the attention mechanism, thereby guiding the model to jointly model behavioral differences and path legality when generating paths, based on both the input semantics and the structural rules defined by the control flow graph (such as path reachability and legality). After training on a large-scale <input, path> dataset, the model can infer the corresponding "expected execution path" from a given input, providing the core capability foundation for detecting abnormal behavior in the online stage by comparing the deviation between the actual path and the predicted path.
[0016] The runtime verification phase allows the verifier to assess the device's security status. During this phase, the device undergoes remote authentication triggered by the verifier based on a challenge-response model. This allows the verifier to determine the device's security. The specific steps are as follows:
[0017] Step 2-1, the verifier issues a proof challenge:
[0018] The verifier initiates a remote proof challenge to the prover based on a preset verification strategy or risk event detection. The verification strategy includes a periodic verification mechanism, and the risk event includes the identification of suspicious external inputs. The verifier generates and sends a challenge request containing a random number, requiring the prover to return its recorded control flow path data.
[0019] The validator generates a random number. The challenge request is sent to the prover. This random number serves as the unique credential for this session, and its core function is to prevent replay attacks and ensure the freshness of the proof response, forcing the prover to generate an immediate and unique response for this challenge. This request is sent to the prover through a dedicated ATTEST authentication proxy channel.
[0020] Step 2-2, the prover extracts control flow data and generates a proof response:
[0021] In its normal operation, where no remote proof challenge is initiated by the verifier, the verifier device continues to execute its daily task program. During this period, the statically instrumented code embedded in the program runs synchronously, recording the program's control flow path. When a subsequent verifier initiates a proof request, the verifier can submit the program input executed by the target device and its corresponding control flow path for the verifier to perform integrity verification and behavioral analysis.
[0022] Once the prover receives a challenge from the verifier via the ATTEST agent, the device immediately enters prover mode. In this mode, the prover performs the following operations: accesses the memory space that records control flow data, extracts the control flow data from it, and constructs a response message according to a predefined format. The structure of this response message is as follows:
[0023]
[0024] in The random value sent by the verifier. This represents the input sequence and its corresponding control flow path sequence most recently executed by the target program in this challenge. Indicates the use of a key The generated authentication code is used to ensure the integrity and authenticity of the response data. This response message is ultimately sent to the verifier for control flow validation.
[0025] Steps 2-3: Verify device status:
[0026] The verifier's verification of the prover's device status is a process based on control flow integrity. Upon receiving a response, the verifier uses a trained neural machine translation model to calculate the predicted execution path for the received input and measures the similarity between the predicted path and the actual path. If the difference exceeds a preset threshold, it indicates that the current input has not triggered the execution behavior learned by the model under normal semantics. This means that the input has caused an unauthorized control flow transfer or an abnormal input-driven data flow, thus being identified as a potential attack and triggering a security alert.
[0027] Steps 2-4: The verifier records the valid status.
[0028] After confirming that the control flow path data returned by the verifier has passed verification, the verifier immediately updates the device status record table it maintains, marks the operating status of the target device as "legitimate", and records the timestamp, authentication identifier and critical path characteristics of this successful verification. This status record will serve as the benchmark for subsequent periodic verification and provide data support for device safety status assessment.
[0029] The runtime verification phase will periodically execute the above four steps 2-1 to 2-4.
[0030] Compared with the prior art, the significant advantages of this invention are:
[0031] (1) Context-aware behavior authentication model: By using program input as the context information of the model, a dynamic relationship is established between the input and the execution path, realizing individualized behavior authentication under different input scenarios. This model effectively distinguishes the polymorphic behavior of the program caused by input differences, fundamentally reducing the false alarm rate caused by behavioral diversity, and improving the applicability and accuracy of the detection mechanism in real complex environments.
[0032] (2) Explainable security detection mechanism: By comparing the “expected path” predicted by the model with the “actual path” reported by the device, the solution can intuitively present the differences and deviations between the two, making the detection results inherently explainable. This feature not only supports the accurate location of anomalies in control flow and data flow, but also provides a clear basis for subsequent root cause analysis, attack chain reconstruction and vulnerability repair, greatly enhancing the efficiency of security incident response and investigation.
[0033] (3) Reduce the dependence on the completeness of training data: This scheme does not require the training data to cover all possible execution paths. Instead, it learns the mapping rules and generation logic from input features to path structure through neural networks, thereby achieving reasonable inference of unknown inputs and unseen paths. This design significantly reduces the dependence on the "complete coverage" of the training set and improves the feasibility and generalization ability of the model in data-constrained scenarios. Attached Figure Description
[0034] Figure 1 This is the system architecture diagram of the present invention.
[0035] Figure 2 This is a basic flowchart of the neural machine translation IoT remote verification method of the present invention.
[0036] Figure 3 This is a schematic diagram of the control flow dataset construction based on coverage-oriented fuzz testing according to the present invention.
[0037] Figure 4 This is a schematic diagram of the architecture of the neural translation model of the present invention. Detailed Implementation
[0038] The present invention will now be described in further detail with reference to the accompanying drawings and examples. The following embodiments are implemented based on the technical solution of the present invention, providing detailed implementation methods and processes; however, the scope of protection of the present invention is not limited to the following embodiments.
[0039] This invention aims to address the lack of context awareness in existing dynamic remote proof schemes, which leads to false positives when dealing with polymorphic program behavior. It provides a control flow remote proof and anomaly diagnosis method based on context-aware neural translation. This invention proposes a non-control data attack detection method based on the mapping relationship between program input and its execution path; it also proposes a transformer-based neural machine translation model and integrates graph neural networks to improve the model's accuracy and generalization ability. Furthermore, the difference between the expected path and the actual path gives this invention inherent interpretability, greatly assisting in the root cause analysis of attacks. The system model of this method is as follows: Figure 1As shown, the system comprises two types of entities: Verifiers and Provers. Verifiers are trusted, secure verification entities with powerful computing capabilities and a complete program analysis environment. Besides undertaking the core task of verifying the trustworthiness of the prover's software behavior, they are also responsible for training and deploying neural machine translation models, context-aware path prediction, difference comparison, and anomaly diagnosis report generation. Provers are remote IoT devices that need to be verified. These devices run target programs with lightweight static instrumentation, are able to collect runtime control flow paths, and securely interact with verifiers through an authentication proxy (ATTEST). In this invention, it is assumed that the prover device possesses a basic secure execution environment capable of protecting metric logic and communication keys. Verifiers will periodically or event-based triggering of remote proof protocols.
[0040] This invention employs a collaborative intelligent verification mechanism of "offline training and online verification." It learns the semantic mapping between program input and execution path through a neural machine translation model, and combines this with structural constraints of the program control flow graph to achieve accurate detection and anomaly diagnosis of control flow hijacking and non-control data attacks. The main process is divided into two core stages: an offline stage and a runtime verification stage, as shown in the flowchart below. Figure 2 As shown. The offline phase refers to the model building and data preparation completed at the verification end before system deployment, such as... Figure 2 Steps 101 to 103 are shown in the middle; the runtime verification phase represents the real-time, context-dependent behavioral verification process initiated by the verifier against the prover during the execution of the target program, such as... Figure 2 Steps 104 to 107 are shown below. The following will be combined with... Figure 2 The flowchart below provides a detailed explanation of each step in the two stages of this invention.
[0041] The offline phase involves initializing secure encrypted communication for both the verifier and the prover, training a neural machine translation model using a fused graph neural network, and deploying the device program instrumentation to the prover's end. The specific steps in this phase are as follows:
[0042] Step 101: Security credential presetting and communication initialization.
[0043] The security of remote proof relies on the communication channel's resistance to replay and forgery attacks. This step aims to establish a secure foundation for remote proof communication between the prover and verifier. To ensure the verifier's security in this scheme... With the witness To ensure novelty and completeness in interaction, this invention designs a secure communication protocol based on symmetric cryptography. The protocol follows the standard challenge-response paradigm and is extended to support the two-phase interaction process unique to this scheme. The system in... The client deploys a software-defined ATTEST authentication agent as a lightweight alternative to the ideal hardware trust anchor, used to process authentication requests from... It handles authentication requests and protects keys and critical state. Due to the lack of readily available hardware trust anchors on general-purpose platforms, this proxy allows us to verify the feasibility of the solution in our existing environment. During the deployment phase, and First, a shared long-term master key is obtained through a key negotiation process. This key is then divided among... End-to-end secure database and The trust anchor at each end is strictly protected to prevent it from being stolen by malware. In each authentication session, both parties use the master key to derive a session key, among which the key specifically used for the Message Authentication Code (MAC) ensures the integrity of all interactive messages (including challenges, responses, and evidence data), resisting forgery and tampering, thus forming the secure foundation for a complete authentication encryption operation.
[0044] Step 102: Program instrumentation and control flow dataset construction.
[0045] This scheme employs binary instrumentation on the target program to enable it to record control flow paths in real time, thereby supporting verifiers in performing integrity analysis of program execution behavior. To reduce path storage overhead, we design a compressed representation mechanism based on a "sequential flow queue + deduplication index array". Specifically, while ensuring that the basic block identifier is within a predefined numbering field, the main sequence of the path uses a sequential queue to record the basic block access trajectory in the actual execution order. For duplicate basic blocks, they are not repeatedly written to the main sequence, but their count information is recorded through a parallel-maintained auxiliary array. By decoupling the execution order from the duplicate information, this mechanism significantly reduces redundant storage overhead while maintaining path reconstruction capabilities. Based on this representation model, we insert lightweight recording instructions at key control flow transfer points to update the queue and array states in real time, thereby efficiently capturing control flow path information during program execution. The specific instrumentation logic and pseudocode are shown in Algorithm 1 in Table 1.
[0046] Table 1. Pseudocode for Algorithm 1
[0047] This algorithm aims to record control flow path information during program execution. First, it obtains two shared memory pointers through environment variables: an array (afl_area_ptr) storing the number of basic block hits and a queue (afl_order_ptr) recording the order of first accesses (line 1). Then, for the input basic block identifier (target), it checks its current value in the hit array: if it's less than the preset maximum number of hits (MAX_HIT_COUNT), its count is incremented by 1, indicating an increase in the basic block's access count (lines 2-4). Next, it determines if the basic block is being accessed for the first time: if so, it updates the index of the sequence queue (incrementing the value of the first element as the new offset) and stores the basic block ID in the corresponding position in the sequence queue, thus recording the first access order of the basic block (lines 6-10). Finally, the algorithm records two dimensions of control flow data through shared memory space: the access frequency of each basic block and the order of the basic blocks' first appearances, which will serve as the program's control flow information.
[0048] After statically instrumenting the target program, this invention employs coverage-oriented fuzz testing to automatically generate a large number of benign program inputs and collect corresponding control flow paths in order to construct a high-quality model training dataset, thereby forming <input, execution path> sample pairs. Combined with... Figure 3 Specifically, this invention utilizes a customized fuzzing tool, with code coverage as feedback, to drive input mutation and explore different execution branches of a program. During this process, only legitimate execution paths that do not trigger exceptions are collected, ensuring the purity and representativeness of the dataset. This method efficiently enumerates the program's behavior space under normal semantics, providing a reliable data foundation for subsequent training of neural machine translation models that can accurately map the relationship between input and path.
[0049] Step 103: Training the neural machine translation model.
[0050] The verifier uses the high-quality <input, execution path> dataset constructed in step 102 to implement a phased model training strategy to build a neural machine translation model capable of accurately predicting program behavior. Its architecture diagram is shown below. Figure 4 As shown, the neural machine translation model is a well-known model in the field, and its structure is not described in detail; only the improved parts are described.
[0051] In the first stage, the basic semantic mapping training stage, this invention trains a standard Transformer encoder-decoder model. The core objective of this stage is to establish a stable translation relationship from the program input sequence to the execution path sequence. The encoder is responsible for contextual encoding of the input sequence, while the decoder generates the path sequence in an autoregressive manner. Through its built-in self-attention mechanism, the model learns the dependencies between basic block identifiers within the path sequence to maintain the local coherence of the generated path; simultaneously, through the cross-attention mechanism, the model achieves dynamic semantic alignment between input features and path nodes. The training in this stage aims to minimize the cross-entropy loss, enabling the model to master the basic ability to predict path sequences based on the input without external structural constraints, thus completing effective semantic initialization of the model parameters.
[0052] In the second stage, the structural constraint integration and fine-tuning stage, this invention, based on the model trained in the first stage, introduces structured knowledge of the program control flow graph to enhance the legality of path generation. Specifically, this invention first pre-trains a graph neural network, which uses the jump relationships between basic blocks in the control flow graph as supervision signals to learn the structured embedding vector of each basic block. Subsequently, the Transformer decoder is specifically improved: keeping the encoder unchanged, the last layer of the decoder is replaced with a structure-aware attention layer integrating the aforementioned graph neural network. In this layer, the model computes self-attention, cross-attention, and the newly added structure-aware attention in parallel. Based on the embedding vectors provided by the GNN, the structure-aware attention explicitly determines whether the successor of the currently generated basic block is legal in the control flow graph, thereby directly injecting the structural rules of the program into the generation process. This stage uses a relatively small learning rate to fine-tune the overall model and introduces a structural compliance reward term into the loss function, guiding the model to output a predicted path that strictly conforms to the control flow graph constraints while maintaining the semantic relevance of the input, ultimately achieving a unity of semantic accuracy and structural legality.
[0053] The runtime verification phase allows the verifier to assess the device's security status. During this phase, the device undergoes remote authentication triggered by the verifier based on a challenge-response model. This allows the verifier to determine the device's security. The specific steps are as follows:
[0054] Step 104: The validator sends a proof challenge.
[0055] The verifier maintains a proof cycle timer and a device status log table, continuously monitoring the interval between the current system time and the timestamp of the device's most recent successful proof. When the system detects that this time interval has reached the preset proof cycle threshold for the device, it immediately triggers the remote proof process and performs a routine control flow integrity audit.
[0056] During the control flow proof phase, the verifier first generates a high-entropy random number. This is a one-time challenge used to ensure session freshness and defend against replay attacks. To prevent the challenge from being tampered with or forged during transmission, the verifier uses a pre-shared verification key. For random numbers Calculate message authentication code The random number is then combined with HMAC to form the challenge message:
[0057] (1)
[0058] Among them, random numbers It is transmitted publicly in plaintext, and the accompanying HMAC provides the prover with evidence to verify the source of the challenge and the integrity of the message.
[0059] Step 105: The prover extracts control flow data and generates a proof response.
[0060] Upon receiving the challenge, the prover's internal monitoring component accesses the shared memory space deployed within the secure execution domain, extracts the sequence of basic block identifiers currently being executed and their corresponding execution frequencies, and generates the actual execution path. Simultaneously, the monitor captured the input of this proof procedure. And generate a plaintext response message:
[0061] (2)
[0062] To protect the integrity of information such as the execution path, the prover uses an authentication key. An authentication code is generated for the response message, and the two are assembled into a complete response message. Finally, the message is sent to the verifier through the ATTEST proxy deployed on the prover's end, thus ensuring the confidentiality, integrity, and verifiability of the response during transmission.
[0063] Step 106: Verifier verifies device status
[0064] After receiving the response message from the prover, the verifier will input the program contained therein. Submit to the trained NMT model to generate the predicted execution path corresponding to the input. This is used for subsequent path validity verification. In this invention, the execution path is predicted quantitatively. With the actual execution path To measure the difference between them, a metric based on edit distance was designed: treating the two path sequences as symbol strings, and calculating the edit distance... Convert to The difference between predicted and actual paths is measured by the minimum number of basic operations required—including insertion, deletion, and replacement. A smaller edit distance indicates a closer approximation between the predicted and actual paths; conversely, a larger edit distance indicates a greater deviation between the predicted and actual paths. This method quantitatively reflects the accuracy of path prediction, providing precise metrics for subsequent validation and model optimization.
[0065] In this invention, the predicted path generated by the model is not required. With the actual path Complete consistency is not feasible. Since path generation relies on the NMT model, whose output inherently contains randomness and minor structural biases (e.g., slight insertions, deletions, or replacements), a rigid requirement for complete consistency is both impractical and could lead to misjudgments of high-quality predictions. Therefore, we employ a relative threshold based on edit distance for determination:
[0066] Accept when (3)
[0067] in Indicates edit distance, To predict path length, the coefficients This refers to the relative tolerance rate. The specific value is determined through a data-driven approach. Linking the threshold to the path length allows for adaptation to the fault tolerance requirements of paths of different sizes: a smaller absolute tolerance is used for short paths, while small deviations that are approximately constant in proportion are allowed for long paths, thus avoiding excessive punishment for long sequences or excessive leniency for short sequences.
[0068] Step 107: The verifier records the valid status.
[0069] After confirming that the control flow path data returned by the verifier has passed verification, the verifier immediately updates the device status record table it maintains, marks the operating status of the target device as "legitimate", and records the timestamp, authentication identifier and critical path characteristics of this successful verification. This status record will serve as the benchmark for subsequent periodic verification and provide data support for the security posture assessment of the device group.
[0070] The runtime verification phase will periodically execute the above four steps.
[0071] Step 108: Generation of Abnormal Diagnostic Report
[0072] When the legality determination result in step 107 is invalid, the verifier automatically generates a structured anomaly diagnostic report based on the difference comparison analysis results. This report not only records the conclusion of "verification failure" but also provides in-depth diagnostic information, specifically including: the location of the specific basic block where the deviation occurred, and a step-by-step comparison between the complete expected path sequence and the actual reported path sequence. This report integrates structured predictions from the neural translation model and static constraint information from the program control flow graph, thus forming a highly interpretable security analysis result that can be directly used to guide vulnerability remediation, attack chain tracing, or security policy adjustments, achieving a closed loop from anomaly detection to root cause diagnosis.
[0073] In summary, this invention constructs an integrated framework for remote control flow proof and anomaly diagnosis based on context-aware neural machine translation, achieving intelligent prediction, high-precision verification, and semantic-level anomaly diagnosis of program runtime behavior. By integrating coverage-oriented fuzzy testing, neural sequence-to-sequence modeling, and control flow graph structure constraints, this invention can not only effectively detect traditional control flow hijacking attacks but also identify non-control data attacks that do not alter the control flow structure, significantly expanding the security coverage of dynamic proof. The system automatically locates discrepancies and generates highly interpretable anomaly diagnosis reports by comparing the expected path predicted by the neural model with the actual path reported by the device, achieving a complete security closed loop from anomaly perception to root cause analysis. This scheme completes complex model inference at the verification end and requires only lightweight instrumentation at the device end, thus maintaining acceptable runtime overhead for IoT devices while ensuring powerful detection capabilities. It provides a learnable, interpretable, and scalable intelligent verification scheme for trusted software execution in resource-constrained environments. This invention achieves accurate modeling of program execution paths and demonstrates effective detection capabilities for anomaly paths triggered by malicious inputs containing real vulnerabilities, achieving a good balance between detection coverage and runtime overhead.
[0074] The specific embodiments described above further illustrate the purpose, technical solution, and beneficial effects of the present invention. It should be understood that the above description is only a specific embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.
Claims
1. A neural machine translation-based remote verification method for the Internet of Things (IoT) oriented towards data stream attacks, characterized in that, A collaborative intelligent verification mechanism of "offline training and online verification" is adopted. This mechanism learns the semantic mapping between program input and execution path through a neural machine translation model, and combines this with the structural constraints of the program control flow graph to achieve accurate detection and anomaly diagnosis of control flow hijacking and non-control data attacks. Specifically, it is divided into an offline phase and a runtime verification phase. The offline phase is the phase in which the verifier and the prover build secure encrypted communication initialization, train the neural machine translation model of the fusion graph neural network, and complete the instrumentation and deployment of the device program to the prover's end. The runtime verification phase is where the verifier checks the security status of the device. During this phase, the device will be remotely authenticated by the verifier based on a challenge-response model, and based on this, the verifier can determine the security of the device.
2. The neural machine translation IoT remote verification method for data stream attacks as described in claim 1, characterized in that, The offline phase specifically includes: Step 1-1, Security credential pre-setting and communication initialization, establishes a secure foundation for remote proof communication between the prover and the verifier; Steps 1-2: Construct the program instrumentation and control flow dataset; Steps 1-3: Based on the dataset, train the neural machine translation model to learn the mapping relationship from the program input sequence to the control flow execution path sequence.
3. The neural machine translation IoT remote verification method for data stream attacks according to claim 2, characterized in that, Step 1-1 specifically involves the verifier and the prover negotiating a symmetric key pair k, which is used for integrity protection and anti-replay mechanisms in subsequent challenge-response interactions.
4. The neural machine translation IoT remote verification method for data stream attacks according to claim 2, characterized in that, Steps 1-2 specifically include: First, lightweight instrumentation is performed on the target IoT program to enable it to capture the entry order of each basic block during execution. This includes: the verifier disassembles and parses the target program to identify the boundaries of all basic blocks, then iterates through the starting address of each basic block, inserts a function call instruction before the starting address of all basic blocks, calls the control flow path recording function, which takes the current basic block identifier as a parameter, and records the current basic block identifier in the specified memory space when the function is executed, generating a new executable file that embeds control flow tracing logic. Subsequently, based on a large number of benign inputs triggering program execution, a systematic collection of <input sequence, actual execution path> sample pairs is conducted. During the path collection process, a coverage-oriented input generation mechanism is introduced to maximize the exploration depth and breadth of different execution branches of the program, enabling the neural machine translation model to learn the semantic correspondence between inputs and paths. In addition, random seeds and environmental variables that may affect the path are considered as part of the input.
5. A neural machine translation IoT remote verification method for data stream attacks according to claim 2, characterized in that, Steps 1-3 specifically include: The neural machine translation model first treats the program input and execution path as source and target language sequences, respectively, and performs basic training in the form of sequence-to-sequence translation to enable the model to learn the semantic relationships between them. Subsequently, program structure constraints are introduced to adjust and optimize the model, that is, the last layer of the decoder is replaced with a graph neural network module. This module integrates the control flow graph embedding as prior knowledge into the attention mechanism, guiding the model to jointly model behavioral differences and path legality based on the input semantics and the structural rules defined by the control flow graph when generating paths.
6. The neural machine translation IoT remote verification method for data stream attacks according to claim 1, characterized in that, The runtime verification phase includes: Step 2-1: The verifier initiates a remote proof challenge to the prover based on a preset verification strategy or risk event detection. Step 2-2: The prover extracts control flow data and generates a proof response; Steps 2-3: The verifier verifies the status of the certifier's device; Steps 2-4: After confirming that the control flow path data returned by the verifier has passed verification, the verifier immediately updates the device status record table it maintains, marks the operating status of the target device as legitimate, and records the timestamp, authentication identifier, and critical path characteristics of this successful verification.
7. A neural machine translation IoT remote verification method for data stream attacks as described in claim 6, characterized in that, The verification strategy described in step 2-1 includes a periodic verification mechanism. The risk events include the identification of suspicious external inputs. The verifier generates and issues a challenge request containing a random number, which serves as the unique credential for this session, requiring the prover to return the control flow path data it records.
8. A neural machine translation IoT remote verification method for data stream attacks according to claim 6, characterized in that, Step 2-2 specifically includes: In the normal operating state where the verifier does not initiate a remote proof challenge, the verifier device continues to execute its daily task program. During this period, the static instrumentation code embedded in the program runs synchronously, recording the program's control flow path. When the verifier subsequently initiates a proof request, the verifier submits the program input executed by the target device and its corresponding control flow path for the verifier to perform integrity verification and behavior analysis. Once the prover receives a challenge from the verifier through the ATTEST agent, the device immediately enters the prover mode. In this mode, it accesses the memory space that records control flow data, extracts the control flow data, and constructs a response message according to a predefined format. This response message is then sent to the verifier for control flow validity verification.
9. A neural machine translation IoT remote verification method for data stream attacks according to claim 8, characterized in that, The predefined format is: in The random value sent by the verifier. This represents the input sequence and its corresponding control flow path sequence most recently executed by the target program in this challenge. Indicates the use of a key The generated authentication code.
10. A neural machine translation-based remote verification method for IoT applications resistant to data stream attacks, as described in claim 1, characterized in that... In steps 2-3, after receiving the response, the verifier uses the trained neural machine translation model to calculate the predicted execution path of the received input and measures the similarity between the predicted path and the actual path. If the similarity between the two exceeds a preset threshold, it indicates that the current input has not triggered the execution behavior under the normal semantics learned by the model, and is judged as a potential attack and triggers a security alert.