An internet of things malicious code detection system based on a graph neural network
By constructing a multi-relational code attribute graph and a hierarchical graph attention network, and combining edge-cloud collaborative deployment, the problem of cross-architecture unified representation and resource constraints in malicious code detection in IoT devices is solved, achieving efficient identification and accurate classification of malicious code.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- NANJING FORESTRY UNIV
- Filing Date
- 2026-03-25
- Publication Date
- 2026-06-12
Smart Images

Figure CN122197015A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of IoT security and deep learning technology, specifically to an IoT malware detection system based on graph neural networks. Background Technology
[0002] With the rapid development of IoT technology, a large number of IoT devices are being deployed in smart homes, industrial control, smart cities, and other scenarios. IoT devices are diverse, including routers, network cameras, smart home devices, and industrial sensors, and their operating systems are typically embedded Linux systems, with firmware stored in binary form. However, IoT devices often have limited computing resources and weak security capabilities, making them a primary target for malicious code attacks. Malicious code in the IoT field can infect a large number of devices in a short period, building botnets to launch large-scale distributed denial-of-service attacks, posing a serious threat to network security.
[0003] Current IoT malware detection technologies can be broadly categorized as follows: Signature-based detection methods extract signatures from known malware and match them against the target code, offering fast detection speeds but limited ability to detect unknown malware and variants; Static analysis-based methods extract program behavior features through disassembly, control flow analysis, and data flow analysis of binary code, but their effectiveness decreases when faced with code obfuscation and packing techniques; Machine learning-based methods utilize opcode frequency statistics and call sequence analysis for classification learning, but feature extraction relies on manual design and struggles to fully uncover the deep semantics of the code; Deep learning-based methods automatically learn code feature representations using convolutional neural networks and recurrent neural networks, but typically treat code as a linear sequence, ignoring its graph structure characteristics; Graph neural network security detection methods are primarily used for anomaly detection and intrusion detection, modeling network communication relationships as a graph structure for malicious behavior identification.
[0004] In existing technologies, static analysis methods for embedded firmware use abstract syntax trees and control flow graphs to represent the code, and then detect vulnerabilities through data flow analysis and taint analysis. However, these methods only analyze source code in specific languages, while IoT malware typically spreads in compiled binary form, making direct source code access impossible. Furthermore, these methods execute control flow graphs and data flow analysis as independent steps, failing to integrate control flow relationships, data dependencies, and function call relationships into a unified graph structure for holistic modeling, thus failing to capture multi-dimensional relationships between code elements. Existing malicious traffic detection methods based on graph neural networks, while employing graph neural network technology, model network devices and traffic packets as graph structures, representing network traffic-level detection and unable to delve into the internal structure of the code. Existing knowledge graph-based network attack early warning methods heavily rely on manually designed rule templates and expert experience, lacking the ability to automatically identify and learn new malware and unknown attack patterns. Furthermore, none of the aforementioned existing solutions have designed a unified code representation method across architectures to address the heterogeneity of IoT devices, nor have they considered the limited computing resources of IoT devices, and they lack lightweight deployment and edge-cloud collaborative detection scheme designs. Summary of the Invention
[0005] The purpose of this invention is to provide an Internet of Things (IoT) malware detection system based on graph neural networks.
[0006] To achieve the above objectives, the present invention provides the following technical solution: an IoT malware detection system based on graph neural networks, comprising:
[0007] The firmware acquisition and preprocessing module is used to acquire firmware samples of IoT devices, unpack the firmware and extract executable files, select the corresponding disassembler engine according to the target architecture type of the executable file for disassembly processing, and convert the assembly instructions of different architectures into architecture-independent intermediate representations.
[0008] The code attribute graph construction module is used to construct a control flow graph, a data dependency graph, and a function call graph based on the intermediate representation, and merge the node sets and edge sets of the three graphs to obtain a multi-relation code attribute graph that integrates control flow relationships, data dependency relationships, and function call relationships, and initialize the features of the nodes in the multi-relation code attribute graph.
[0009] The hierarchical graph attention feature learning module is used to perform multi-granularity feature learning on the multi-relation code attribute graph from the instruction level, basic block level to the function level using a hierarchical graph attention network, and to generate global embedding vectors of code samples through the global graph readout layer;
[0010] The malicious code detection and classification module is used to perform binary classification detection of malicious code based on the global embedding vector, and to perform multi-class family identification on samples determined to be malicious code.
[0011] The edge-cloud collaborative deployment module is used to deploy a lightweight detection model at the edge for initial screening, deploy a complete detection model in the cloud for accurate detection and family identification, and periodically send the updated lightweight model parameters from the cloud to the edge.
[0012] As a further aspect of the present invention: in the firmware acquisition and preprocessing module, the intermediate representation adopts a three-address code format, and each intermediate representation instruction consists of three elements: operation type, operand, and target register; the instruction categories of the intermediate representation are uniformly divided into seven categories: arithmetic operation instructions, logical operation instructions, data transfer instructions, conditional jump instructions, unconditional jump instructions, function call instructions, and system call instructions.
[0013] As a further aspect of the present invention: the process by which the code attribute graph construction module constructs the multi-relationship code attribute graph includes:
[0014] The intermediate representation code is identified by function entry points. The consecutively executed instruction sequence is divided into basic blocks, with the target address of the jump instruction and the address of the next instruction after the jump instruction as the dividing points. Each basic block is a node in the code attribute graph.
[0015] Traverse all basic blocks, determine the successor basic block based on the type of the instruction at the end of each basic block, and add control flow type edges between basic block nodes that have control transfer relationships;
[0016] Define and analyze the intermediate representation instructions within each function. When the first instruction defines a variable and the second instruction uses that variable, and there is no redefinition of the variable between them, add an edge of data dependency type between the basic block node to which the first instruction belongs and the basic block node to which the second instruction belongs.
[0017] Identify the function call instruction in the intermediate representation instruction, and add an edge of the call type between the basic block node where the caller function is located and the entry basic block node of the called function;
[0018] The node sets and edge sets corresponding to the control flow type edges, data dependency type edges, and call type edges are merged to form the multi-relation code attribute graph.
[0019] As a further aspect of the present invention: the process of initializing the features of graph nodes in the code attribute graph construction module includes:
[0020] For each intermediate representation instruction within a basic block, one-hot encoding is performed according to the instruction operation type to obtain an operation type vector. Registers and immediate values in the operands are normalized and encoded to obtain an operand feature vector. The operation type vector and the operand feature vector are concatenated to form the feature vector of a single instruction.
[0021] For a basic block containing multiple instructions, the feature vectors of all instructions in the basic block are aggregated into an initial feature vector at the basic block level through an average pooling operation.
[0022] As a further aspect of the present invention: the hierarchical graph attention feature learning module includes an instruction-level graph attention layer, a basic block-level multi-relation graph attention layer, a function-level graph attention layer, and a global graph readout layer connected in sequence;
[0023] The instruction-level graph attention layer is used to perform linear transformations on the features of each intermediate representation instruction within the basic block to obtain query vectors, key vectors, and value vectors. Attention scores between instructions are calculated by scaling dot products and normalized to obtain attention weights. The attention weights are used to perform weighted aggregation on the value vectors and then nonlinear activation to obtain the enhanced feature representation of the basic block.
[0024] The basic block-level multi-relation graph attention layer is used to calculate relation-specific attention weights for the neighbor nodes corresponding to control flow edges, data dependency edges, and call edges, respectively, and to perform weighted aggregation of the features of various neighbor nodes to obtain aggregated features of each relation type. The aggregated features of each relation type are weighted and fused by learnable relation fusion weights and superimposed with residual connections to obtain updated basic block node features.
[0025] The function-level graph attention layer is used to calculate the contribution weight of each basic block node within the same function to the overall semantics of the function, and the basic block features are weighted and summed using the contribution weights to obtain the function-level feature representation;
[0026] The global graph readout layer is used to calculate the contribution weight of each function to the overall code semantics, and the contribution weight is used to perform a weighted summation of the function-level features to obtain the global embedding vector of the code sample.
[0027] As a further aspect of the present invention: In the basic block-level multi-relation graph attention layer, for each relation type, the node features are linearly transformed by a relation-specific learnable transformation matrix, and then the transformed center node features are concatenated with the neighbor node features. The attention score is obtained by a nonlinear transformation with a leaky rectified linear activation function and a dot product operation of the learnable attention vector. The learnable relation fusion weights are obtained by softmax normalization, and the sum of the fusion weights for the three relation types is 1.
[0028] As a further aspect of the present invention: in the malicious code detection and classification module:
[0029] The binary classification detection is achieved through two fully connected layers and a sigmoid output layer, which maps the global embedding vector to a malicious probability value. When the malicious probability value is greater than 0.5, it is determined to be malicious code.
[0030] The multi-class family identification is achieved through two fully connected layers and a softmax output layer. The global embedding vector of the sample determined to be malicious code is mapped to the probability distribution of each malicious code family, and the family category to which the malicious code belongs is determined based on the probability distribution.
[0031] The overall loss function during the training phase is a weighted combination of binary classification loss and multi-class classification loss, with a weight coefficient of 0.4 for binary classification loss and a weight coefficient of 0.6 for multi-class classification loss.
[0032] As a further aspect of the present invention: in the edge-cloud collaborative deployment module:
[0033] The lightweight detection model deployed at the edge only constructs a control flow graph and uses a single-layer graph attention network and global average pooling for feature extraction;
[0034] At the edge, a first confidence threshold of 0.2 and a second confidence threshold of 0.8 are set. When the malicious probability output by the edge model is less than the first confidence threshold, it is judged as normal code and allowed to pass. When the malicious probability is greater than the second confidence threshold, it is initially judged as malicious code and uploaded to the cloud for confirmation. When the malicious probability is not less than the first confidence threshold and not greater than the second confidence threshold, it is marked as an uncertain sample and uploaded to the cloud for precise detection.
[0035] The cloud periodically uses newly collected malicious code samples to incrementally train and update the complete model. Knowledge distillation is used to transfer the knowledge of the complete model to the lightweight model, and the updated lightweight model parameters are then sent to the edge.
[0036] Compared with the prior art, the beneficial effects of the present invention by adopting the above technical solution are as follows:
[0037] 1. This invention constructs a multi-relation code attribute graph that integrates control flow relationships, data dependency relationships, and function call relationships, providing rich multi-relation graph inputs for graph neural networks. This enables the comprehensive capture of the structural and semantic features of malicious code, overcoming the shortcomings of existing static analysis methods that have a single code representation method.
[0038] 2. This invention designs a hierarchical graph attention network from the instruction level, basic block level to the function level, which can automatically learn the structural patterns of different levels in the code, effectively identify the functional modules and attack logic of malicious code, and improve the detection capability of new variant malicious code.
[0039] 3. This invention eliminates feature shifts caused by architecture differences by uniformly converting binary codes of different architectures into architecture-independent intermediate representations, thereby improving the cross-architecture generalization ability of the detection model in the multi-architecture environment of the Internet of Things.
[0040] 4. This invention designs a lightweight detection framework that integrates edge and cloud technologies. It deploys a lightweight feature extraction model at the edge and a complete detection model in the cloud, achieving a good balance between detection accuracy and computational efficiency, and meeting the deployment requirements under resource-constrained conditions in IoT scenarios. Attached Figure Description
[0041] Figure 1 This is a schematic diagram of the overall system in an embodiment of the present invention;
[0042] Figure 2 This is a schematic diagram of multi-architecture disassembly and intermediate representation conversion in an embodiment of the present invention;
[0043] Figure 3 This is a schematic diagram of the edge-cloud collaborative deployment architecture in an embodiment of the present invention. Detailed Implementation
[0044] The specific embodiments of the present invention will be further described below with reference to the accompanying drawings. It should be noted that the description of these embodiments is for the purpose of helping to understand the present invention, but does not constitute a limitation of the present invention.
[0045] Furthermore, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not conflict with each other.
[0046] The overall architecture comprises five core modules: firmware acquisition and preprocessing module, code attribute graph construction module, hierarchical graph attention feature learning module, malware detection and classification module, and edge-cloud collaborative deployment module. These modules are interconnected, and the edge-cloud collaborative deployment module interacts with other modules through a collaborative mechanism.
[0047] The complete workflow of the system is explained in detail below, with each module described in detail.
[0048] I. Firmware Acquisition and Preprocessing Module
[0049] The firmware acquisition and preprocessing module is responsible for collecting firmware samples from IoT devices, unpacking, disassembling, and converting intermediate representations to obtain a unified code representation.
[0050] The firmware unpacking and executable file extraction process is as follows: First, the firmware image file to be tested is collected from the firmware update server and internal storage of the IoT device. Then, the firmware file is automatically unpacked, the file system type is identified, and the complete file system directory structure is extracted. Next, ELF format executable files and shared library files are identified and extracted from the unpacked file system. By parsing the magic number field and ELF header information in the file header, the target processor architecture, byte order, and bit width of the file are determined.
[0051] The multi-architecture disassembly and intermediate representation conversion process selects the appropriate disassembly engine to disassemble the executable file based on the target architecture type identified in the ELF file header. For ARM architecture, the ARM disassembly engine is used; for MIPS architecture, the MIPS disassembly engine is used; for x86 architecture, the x86 disassembly engine is used; and for other architectures, a general-purpose disassembly engine is used. The disassembly process employs a recursive descent disassembly strategy, recursively traversing all reachable code paths starting from the function entry point to generate a complete sequence of assembly instructions.
[0052] This invention unifies the conversion of assembly instructions from different architectures into architecture-independent intermediate representations. The intermediate representation designed in this invention adopts a three-address code format, with each instruction consisting of three elements: operation type, operands, and destination register. The intermediate representation instructions are uniformly classified into the following seven categories: arithmetic operation instructions (e.g., ADD, SUB, MUL, DIV), logical operation instructions (e.g., AND, OR, XOR, NOT), data transfer instructions (e.g., MOV, LOAD, STORE), conditional jump instructions (e.g., JCC), unconditional jump instructions (e.g., JMP), function call instructions (e.g., CALL), and system call instructions (e.g., SYSCALL). This unified intermediate representation conversion eliminates the feature offset caused by differences in instruction sets between different processor architectures.
[0053] II. Code Attribute Graph Construction Module
[0054] The code attribute graph construction module transforms the preprocessed intermediate code representation into a code attribute graph that integrates multi-dimensional relationships. The code attribute graph proposed in this invention is a multi-relationship directed graph that integrates control flow relationships, data dependency relationships, and function call relationships. The formal definition of the code attribute graph is... ,in For a set of nodes, Let be the set of edges. For the set of edge relation types, The node feature matrix and the set of edge relationship types are given. ,in Indicates control flow relationships. Indicates data dependencies. Indicates the calling relationship;
[0055] The detailed construction of a code property graph includes the following steps:
[0056] Function identification and basic block partitioning: Function entry points are identified in the intermediate representation code. Starting from each function entry address, a recursive descent method is used to traverse all instructions within the function body. During the traversal, the target address of a jump instruction and the address of the instruction following the jump instruction are used as dividing points to partition the continuously executed instruction sequence into basic blocks. Each basic block serves as a node in the code attribute graph, recording its contained instruction sequence, start address, and end address.
[0057] Control flow graph construction: Traverse all basic blocks, determine the successor basic block based on the type of the instruction at the end of each basic block. For unconditional jump instructions, add a control flow edge between the current basic block and the target basic block; for conditional jump instructions, add a control flow edge between the current basic block and both the target basic block and the basic block containing the next instruction; for sequentially executed basic blocks, add a control flow edge between the current basic block and the next basic block. The relationship type of all control flow edges is marked as follows: ;
[0058] Data dependency graph construction: Define and analyze intermediate representation instructions within each function. For each instruction, extract its defined and used variables. Defined variables And instructions Used variables And from arrive There is no pair of variables between them The redefinition then in Belonging to basic blocks and Add a data dependency edge between the constituent basic blocks, with the relationship type labeled as follows: Here and These represent instructions for defining variables and instructions for using variables, respectively. The target variable being analyzed can be a register or a memory location.
[0059] Function call graph construction: Traverse all intermediate representation instructions and identify function call instructions. For each function call instruction, determine the basic block node containing the caller function and the entry basic block node of the called function, and add a call edge between them, with the relationship type labeled as follows. ;
[0060] Code attribute graph fusion: The node sets and edge sets of the above three graphs are merged to obtain a code attribute graph that integrates multi-dimensional relationships. ,in The union of all basic block nodes. It is the union of all control flow edges, data dependency edges, and call edges.
[0061] Graph node feature initialization: Feature encoding is performed on each basic block node in the code attribute graph. For each intermediate instruction within a basic block, one-hot encoding is first performed based on the instruction operation type to obtain an operation type vector. Then, the registers and immediate values involved in the instruction operands are normalized and encoded to obtain an operand feature vector. The operation type vector and the operand feature vector are concatenated to form the feature vector of a single instruction. For a basic block containing multiple instructions, the feature vectors of all its instructions are aggregated into an initial feature vector at the basic block level through an average pooling operation. ,in For feature dimension, Represents a basic block node The initial feature vector, express 3D real space.
[0062] III. Hierarchical Graph Attention Feature Learning Module
[0063] The hierarchical graph attention feature learning module uses the hierarchical graph attention network (HGAT) designed in this invention to learn features from the code attribute graph. HGAT consists of three layers of graph attention computation layers and a global graph readout layer, which perform feature learning and aggregation at the instruction level, basic block level, and function level, respectively.
[0064] Instruction-level graph attention layer
[0065] The instruction-level graph attention layer performs attention-weighted aggregation on the intermediate representation instructions within the basic block, learning the enhanced feature representation of the basic block. The specific implementation process is as follows:
[0066] For basic blocks Instruction sequence Each instruction The initial feature vector is First, a linear transformation is performed on the characteristics of each instruction. For each instruction... Calculate its query vector Key vector Sum value vector ;
[0067] ;
[0068] in, For learnable query transformation matrix, The key transformation matrix is a learnable matrix. For learnable value transformation matrix, The transformed feature dimensions, For instructions The initial eigenvectors.
[0069] Calculation instructions With instructions Attention scores between :
[0070] ;
[0071] in, query vector transpose, For instructions The key vector, This is a scaling factor used to avoid the gradient vanishing problem caused by excessively large dot product values.
[0072] Normalize the attention scores to obtain the attention weights. :
[0073] ;
[0074] in, It is an exponential function, with the denominator being the sum of the attention scores over all instructions within the basic block to achieve normalization. The number of instructions within a basic block.
[0075] The value vector is weighted and aggregated using attention weights to obtain basic blocks. Enhanced feature representation :
[0076] ;
[0077] in, The LeakyReLU nonlinear activation function is used. For instructions The value vector, Representing basic blocks The enhanced feature vector output by the instruction-level graph attention layer.
[0078] Basic block-level multi-relationship graph attention layer
[0079] The basic block-level multi-relation graph attention layer performs message passing and feature aggregation on the code attribute graph. Its key lies in calculating attention weights for different types of edge relationships and then fusing them. The specific implementation process is as follows:
[0080] For basic block nodes in the code property graph Obtain its feature vector after being updated by the instruction-level graph attention layer. Get Nodes In relation types The set of neighboring nodes ,in These correspond to control flow neighbors, data dependency neighbors, and call relationship neighbors, respectively.
[0081] For each relation type First, a relation-specific linear transformation is performed on the node features:
[0082] ;
[0083] ;
[0084] in, For relationship The corresponding learnable transformation matrix, The transformed feature dimensions, and They are the central nodes. and neighboring nodes The feature vector after relation-specific transformation.
[0085] Attention scores are calculated using splicing and nonlinear transformation:
[0086] ;
[0087] in, For relationship The corresponding learnable attention vector, Transpose it. This represents the feature vector concatenation operation. It is a linear activation function with leakage rectification.
[0088] For the same relation type Normalize the attention scores of all neighboring nodes:
[0089] ;
[0090] in, For nodes In relation types The set of all neighboring nodes.
[0091] By using attention weights to weight and aggregate the features of neighboring nodes, we can obtain relationship-specific aggregated features:
[0092] ;
[0093] in, For nodes In relation types Aggregated feature vectors under, It is a non-linear activation function.
[0094] The aggregation features of the three relation types are fused to obtain the basic block nodes. Updated features:
[0095] ;
[0096] in, For relationship The corresponding learnable fusion weights are obtained through softmax normalization and satisfy the following conditions: , It is a set of three relation types. For residual connection terms, instruction-level features are preserved to prevent gradient vanishing. For basic block nodes The feature vector is updated after passing through the basic block-level graph attention layer.
[0097] The above calculation process can be stacked in multiple layers. In this embodiment of the invention, it is set to two layers, so that each node can perceive neighbor information over a larger range.
[0098] Function-level graph attention layer
[0099] The function-level graph attention layer aggregates all basic block features within the same function into a function-level feature representation.
[0100] For functions The set of basic blocks contained within Obtain the features updated by the basic block-level graph attention layer. ,in For function The number of basic blocks within.
[0101] Calculate the attention score for each basic block node. ;
[0102] ;
[0103] in, For learnable transformation matrices, For bias vectors, This is a learnable attention parameter vector. Transpose it. The hyperbolic tangent activation function is used. For basic block nodes The feature vector is updated after passing through the basic block-level graph attention layer.
[0104] The attention scores are then normalized using softmax to obtain the attention weights. :
[0105] ;
[0106] By using attention weights to perform a weighted summation of the basic block features, a function is obtained. Feature representation :
[0107] ;
[0108] in, For function The aggregated feature vector.
[0109] Global graph readout layer
[0110] The global graph readout layer aggregates all function-level features into a global embedding vector for the entire code sample.
[0111] For all functions contained in the code sample To obtain its function-level features ,in This represents the total number of functions.
[0112] An attention pooling mechanism is used to calculate the contribution weight of each function to the overall code semantics. :
[0113] ;
[0114] in, For the learnable transformation matrix of the global readout layer, For bias vectors, This is a learnable attention parameter vector. Transpose it.
[0115] The global embedding vector of the code sample is obtained by weighting and summing the function features using attention weights.
[0116] ;
[0117] in, This is the global embedding vector of the code sample, used for subsequent malicious code detection and classification.
[0118] IV. Malicious Code Detection and Classification Module
[0119] The malware detection and classification module performs binary classification detection and multi-class family identification of malware based on global embedding vectors.
[0120] V. Edge-Cloud Collaborative Deployment Module
[0121] To resolve the conflict between limited computing resources and real-time detection requirements of IoT devices, this invention designs an edge-cloud collaborative deployment architecture.
[0122] Lightweight detection at the edge: Deploying a simplified version of the detection model at the edge. The lightweight design includes building only the control flow graph to reduce graph construction overhead, using a single-layer graph attention network instead of a hierarchical multi-layer network to reduce computational complexity, and using global average pooling instead of attention pooling to reduce the number of parameters. Through this lightweight design, the number of parameters and computational cost of the edge model is reduced to approximately 15% to 20% of the full model.
[0123] Confidence threshold filtering: Set the first confidence threshold at the edge. Second confidence threshold ,in , . This represents the lower bound of the confidence level for normal code. This represents the upper bound of the confidence level for malicious code. It is the probability of malicious code output by the edge model. When it is, it is judged as normal code and allowed to pass directly; when At that time, it was initially determined to be malicious code and an alert was immediately triggered, while the information was uploaded to the cloud for confirmation; when ≤ At that time, the sample was marked as uncertain and uploaded to the cloud for precise detection.
[0124] Cloud-based precise detection: After receiving the samples uploaded from the edge device, the cloud uses a complete detection system to perform precise detection and family identification, and returns the detection results to the edge device and stores them in the detection result database.
[0125] Model incremental update and distribution: The cloud periodically uses newly collected malicious code samples to incrementally train and update the complete model. Then, knowledge distillation technology is used to transfer the knowledge of the complete model to the lightweight edge model, and the updated lightweight model parameters are distributed to the edge devices.
[0126] While the present invention has been disclosed above with reference to preferred embodiments, it is not intended to limit the invention. Any variations and modifications can be made by those skilled in the art without departing from the spirit and scope of the invention. Therefore, any modifications, equivalent changes, and alterations made to the above embodiments based on the technical essence of the present invention, without departing from the scope of the invention, fall within the protection scope defined by the claims of the present invention.
Claims
1. A malicious code detection system for the Internet of Things based on graph neural networks, characterized in that, include: The firmware acquisition and preprocessing module is used to acquire firmware samples of IoT devices, unpack the firmware and extract executable files, select the corresponding disassembler engine according to the target architecture type of the executable file for disassembly processing, and convert the assembly instructions of different architectures into architecture-independent intermediate representations. The code attribute graph construction module is used to construct a control flow graph, a data dependency graph, and a function call graph based on the intermediate representation, and merge the node sets and edge sets of the three graphs to obtain a multi-relation code attribute graph that integrates control flow relationships, data dependency relationships, and function call relationships, and initialize the features of the nodes in the multi-relation code attribute graph. The hierarchical graph attention feature learning module is used to perform multi-granularity feature learning on the multi-relation code attribute graph from the instruction level, basic block level to the function level using a hierarchical graph attention network, and to generate global embedding vectors of code samples through the global graph readout layer; The malicious code detection and classification module is used to perform binary classification detection of malicious code based on the global embedding vector, and to perform multi-class family identification on samples determined to be malicious code. The edge-cloud collaborative deployment module is used to deploy a lightweight detection model at the edge for initial screening, deploy a complete detection model in the cloud for accurate detection and family identification, and periodically send the updated lightweight model parameters from the cloud to the edge.
2. The IoT malware detection system based on graph neural networks according to claim 1, characterized in that: In the firmware acquisition and preprocessing module, the intermediate representation adopts a three-address code format. Each intermediate representation instruction consists of three elements: operation type, operand, and target register. The instruction categories of the intermediate representation are uniformly divided into seven categories: arithmetic operation instructions, logical operation instructions, data transfer instructions, conditional jump instructions, unconditional jump instructions, function call instructions, and system call instructions.
3. The IoT malware detection system based on graph neural networks according to claim 1, characterized in that: The process by which the code attribute graph construction module constructs the multi-relationship code attribute graph includes: The intermediate representation code is identified by function entry points. The consecutively executed instruction sequence is divided into basic blocks, with the target address of the jump instruction and the address of the next instruction after the jump instruction as the dividing points. Each basic block is a node in the code attribute graph. Traverse all basic blocks, determine the successor basic block based on the type of the instruction at the end of each basic block, and add control flow type edges between basic block nodes that have control transfer relationships; Define and analyze the intermediate representation instructions within each function. When the first instruction defines a variable and the second instruction uses that variable, and there is no redefinition of the variable between them, add an edge of data dependency type between the basic block node to which the first instruction belongs and the basic block node to which the second instruction belongs. Identify the function call instruction in the intermediate representation instruction, and add an edge of the call type between the basic block node where the caller function is located and the entry basic block node of the called function; The node sets and edge sets corresponding to the control flow type edges, data dependency type edges, and call type edges are merged to form the multi-relation code attribute graph.
4. The IoT malware detection system based on graph neural networks according to claim 1, characterized in that: In the code attribute graph construction module, the process of initializing the features of graph nodes includes: For each intermediate representation instruction within a basic block, one-hot encoding is performed according to the instruction operation type to obtain an operation type vector. Registers and immediate values in the operands are normalized and encoded to obtain an operand feature vector. The operation type vector and the operand feature vector are concatenated to form the feature vector of a single instruction. For a basic block containing multiple instructions, the feature vectors of all instructions in the basic block are aggregated into an initial feature vector at the basic block level through an average pooling operation.
5. The IoT malware detection system based on graph neural networks according to claim 1, characterized in that: The hierarchical graph attention feature learning module includes an instruction-level graph attention layer, a basic block-level multi-relation graph attention layer, a function-level graph attention layer, and a global graph readout layer connected in sequence. The instruction-level graph attention layer is used to perform linear transformations on the features of each intermediate representation instruction within the basic block to obtain query vectors, key vectors, and value vectors. Attention scores between instructions are calculated by scaling dot products and normalized to obtain attention weights. The attention weights are used to perform weighted aggregation on the value vectors and then nonlinear activation to obtain the enhanced feature representation of the basic block. The basic block-level multi-relation graph attention layer is used to calculate relation-specific attention weights for the neighbor nodes corresponding to control flow edges, data dependency edges, and call edges, respectively, and to perform weighted aggregation of the features of various neighbor nodes to obtain aggregated features of each relation type. The aggregated features of each relation type are weighted and fused by learnable relation fusion weights and superimposed with residual connections to obtain updated basic block node features. The function-level graph attention layer is used to calculate the contribution weight of each basic block node within the same function to the overall semantics of the function, and the basic block features are weighted and summed using the contribution weights to obtain the function-level feature representation; The global graph readout layer is used to calculate the contribution weight of each function to the overall code semantics, and the contribution weight is used to perform a weighted summation of the function-level features to obtain the global embedding vector of the code sample.
6. The IoT malware detection system based on graph neural networks according to claim 5, characterized in that: In the basic block-level multi-relation graph attention layer, for each relation type, the node features are linearly transformed by a relation-specific learnable transformation matrix. Then, the transformed center node features are concatenated with the neighbor node features. The attention score is obtained by a nonlinear transformation with a leaky rectified linear activation function and a dot product operation of the learnable attention vector. The learnable relation fusion weights are obtained by softmax normalization, and the sum of the fusion weights for the three relation types is 1.
7. The IoT malware detection system based on graph neural networks according to claim 1, characterized in that: In the malicious code detection and classification module: The binary classification detection is achieved through two fully connected layers and a sigmoid output layer, which maps the global embedding vector to a malicious probability value. When the malicious probability value is greater than 0.5, it is determined to be malicious code. The multi-class family identification is achieved through two fully connected layers and a softmax output layer. The global embedding vector of the sample determined to be malicious code is mapped to the probability distribution of each malicious code family, and the family category to which the malicious code belongs is determined based on the probability distribution. The overall loss function during the training phase is a weighted combination of binary classification loss and multi-class classification loss, with a weight coefficient of 0.4 for binary classification loss and a weight coefficient of 0.6 for multi-class classification loss.
8. The IoT malware detection system based on graph neural networks according to claim 1, characterized in that: In the edge-cloud collaborative deployment module: The lightweight detection model deployed at the edge only constructs a control flow graph and uses a single-layer graph attention network and global average pooling for feature extraction; Set the first confidence threshold to 0.2 and the second confidence threshold to 0.8 at the edge. When the malicious probability output by the edge model is less than the first confidence threshold, it is judged as normal code and allowed to pass. When the malicious probability is greater than the second confidence threshold, it is initially judged as malicious code and uploaded to the cloud for confirmation. When the malicious probability is not less than the first confidence threshold and not greater than the second confidence threshold, it is marked as an uncertain sample and uploaded to the cloud for precise detection. The cloud periodically uses newly collected malicious code samples to incrementally train and update the complete model. Knowledge distillation is used to transfer the knowledge of the complete model to the lightweight model, and the updated lightweight model parameters are then sent to the edge.