Multi-platform firmware analysis and vulnerability scanning system
By using a multi-platform firmware parsing and vulnerability scanning system, the problems of poor detection capabilities and low efficiency in existing technologies have been solved, enabling efficient and accurate vulnerability identification and repair suggestions for various domestic hardware architectures and operating system platforms.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- NANJING NANZI DIGITAL SECURITY TECH CO LTD
- Filing Date
- 2026-03-10
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies are difficult to adapt to various domestic processor platforms and operating systems, resulting in low detection efficiency, high false positive and false negative rates, inability to cope with complex attack methods, and inability to meet the need for rapid evaluation of large batches of firmware samples.
A multi-platform firmware parsing and vulnerability scanning system is constructed. By decoupling the processing flow through architecture identification and module plug-in mechanism, parsing modules are dynamically loaded. Fingerprint matching, behavior modeling and multimodal program semantic modeling are integrated to achieve systematic assessment of firmware security status and risk identification.
It improves the adaptability and efficiency of the detection system, accurately locates unknown vulnerabilities, provides structured vulnerability reports and remediation suggestions, and enhances security protection capabilities and response levels.
Smart Images

Figure CN121808796B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of software security technology, specifically to a multi-platform firmware parsing and vulnerability scanning system. Background Technology
[0002] Firmware, as a crucial component connecting the underlying hardware and the upper-level operating system, undertakes fundamental functions such as booting, hardware initialization, and access control. Its security directly impacts the entire system's trusted boot chain, the integrity of the operating environment, and the security boundaries of subsequent operating systems and application software. If firmware is maliciously tampered with or contains exploitable vulnerabilities, attackers can bypass traditional operating system protection mechanisms to achieve persistent control over the system, leading to serious consequences such as device failure, data breaches, and even the collapse of critical infrastructure. Therefore, in today's rapidly evolving environment, firmware security has become a key focus and a significant challenge in cyberspace security.
[0003] However, traditional firmware analysis and vulnerability detection methods are mostly developed based on processor architecture and common embedded operating systems. They suffer from strong toolchain dependencies and poor adaptability, making it difficult to cover various domestic processor platforms such as Phytium, Kunpeng, Loongson, and Shenwei, and also incompatible with the diverse deployment environments of domestic operating systems like Kylin and NeoKylin. Furthermore, existing methods typically use static signature matching or rule engines to identify vulnerabilities, which are limited in their ability to identify firmware samples with code obfuscation, module cross-dependencies, or unsigned information, and cannot cope with increasingly complex attack methods. In terms of detection efficiency, traditional tools often suffer from fragmented processing flows, coarse analysis granularity, and missing contextual information, resulting in slow scanning speeds, high false positive and false negative rates, poor detection capabilities, and low efficiency, making them unsuitable for the rapid evaluation needs of large batches of firmware samples in real-world engineering projects. Especially with the trend of rapid device deployment, security detection urgently needs stronger architectural adaptability, intelligent analysis capabilities, and operable results.
[0004] Therefore, it is necessary to build a firmware parsing and vulnerability scanning method with good adaptability, detection accuracy and analysis efficiency, which can uniformly support multiple domestic hardware architectures and operating system platforms, integrate static and dynamic feature analysis methods, realize systematic assessment of firmware security status and risk identification, thereby enhancing the security protection capability and response level of products in actual deployment. Summary of the Invention
[0005] This application provides a multi-platform firmware parsing and vulnerability scanning system to solve the problems of poor detection capabilities and low efficiency in existing technologies.
[0006] The first aspect of this application provides a multi-platform firmware parsing and vulnerability scanning system, comprising: a data acquisition module, a firmware preprocessing module, a feature recognition module, a firmware extraction module, a vulnerability scanning module, and a vulnerability remediation suggestion module; wherein, the data acquisition module is used to acquire multi-source firmware samples of the product, wherein the multi-source firmware samples include original factory, customized, test version firmware, and standard and cross-platform adapted firmware for compliance testing; the firmware preprocessing module is used to receive and identify the format and processor architecture of the firmware samples, obtain the identification result, and dynamically load the corresponding parsing module according to the identification result; the feature recognition module is used to extract firmware header information, kernel code, and file system features, determine the firmware type, and quickly locate the location of key components based on the firmware type; the firmware extraction module is used to parse the firmware structure and extract the bootloader, kernel code, and root file system information based on the firmware type and the location of the key components; the vulnerability scanning module is used to identify known and unknown vulnerabilities based on fingerprint matching, behavioral modeling, and multimodal program semantic modeling methods, and dynamically optimize the scanning strategy during the scanning process; the vulnerability remediation suggestion module is used to generate a vulnerability list sorted by risk level and provide specific remediation suggestions in combination with the environment.
[0007] Preferably, the data acquisition module includes a sample access unit, a legality verification unit, and a metadata management unit. The sample access unit can access multi-source firmware samples of the product using three access methods: local file upload, remote device firmware retrieval, and batch import via API interface. The legality verification unit verifies sample integrity, filters invalid non-firmware files, and matches trusted source whitelists based on MD5 / SHA256 hash values. The metadata management unit extracts device model, manufacturer information, and firmware version metadata and enters them into the system.
[0008] Preferably, the firmware preprocessing module includes a format identification unit, an architecture identification unit, and a parsing module loading unit. The format identification unit is used to identify the firmware format type by analyzing the firmware header information, magic number field, and structural features. The architecture identification unit is used to identify the processor instruction set architecture adapted to the firmware. The parsing module loading unit is used to dynamically load the parsing module that matches the identification result, and is encapsulated using a dynamic link library, supporting a plugin registration mechanism.
[0009] Preferably, the feature recognition module includes a header information extraction unit, a kernel feature analysis unit, a file system recognition unit, and a component location and type determination unit. The header information extraction unit is used to extract key fields such as version number, checksum, and load base address. The kernel feature analysis unit is used to extract function entry addresses and call relationship kernel code features based on static analysis. The file system recognition unit is used to identify the file system type and structure based on storage structure features and metadata. The component location and type determination unit is used to locate key components of the bootloader and driver module, and determine the firmware type.
[0010] Preferably, the firmware extraction module includes a structure parsing unit, a bootloader extraction unit, a kernel module extraction unit, and a file system extraction unit. The structure parsing unit is used to determine the dependencies and layout of each module; the bootloader extraction unit is used to extract the startup code and core functions of the initialization process; the kernel module extraction unit is used to generate code through disassembly and construct a control flow graph; and the file system extraction unit is used to construct a complete root file system structure and output it in a structured manner.
[0011] Preferably, the vulnerability scanning module includes a known vulnerability identification unit, a behavior modeling and detection unit, a multimodal semantic modeling unit, and a scanning strategy optimization unit. The known vulnerability identification unit is used for fingerprint identification using signature rules from a vulnerability feature database; the behavior modeling and detection unit is used to construct a finite state machine model to identify abnormal behavior patterns; the multimodal semantic modeling unit is used for cross-architecture semantic modeling through instruction sequence semantic encoding and graph structure semantic encoding; and the scanning strategy optimization unit is used to dynamically adjust the scanning strategy based on the firmware architecture and module complexity.
[0012] Preferably, the vulnerability remediation suggestion module includes a risk level classification unit, a remediation solution generation unit, and a report output unit. The risk level classification unit is used to classify vulnerabilities into high, medium, and low risks and generate a structured list. The remediation solution generation unit is used to provide patch update and configuration hardening suggestions based on the deployment environment. The report output unit is used to output a standardized vulnerability assessment report.
[0013] The second aspect of this application provides a method for multi-platform firmware parsing and vulnerability scanning, comprising: acquiring multi-source firmware sample data of a product; dynamically loading a corresponding parsing module by receiving and identifying the format and processor architecture of the multi-source firmware sample data; extracting firmware header information, kernel code, and file system features based on the parsing module, determining the firmware type, quickly locating key component positions based on the firmware type, parsing the firmware structure and extracting bootloader, kernel code, and root file system information, and outputting a firmware scanning strategy; identifying known and unknown vulnerabilities of the firmware sample using fingerprint matching, behavioral modeling, and multimodal program semantic modeling methods based on the bootloader, kernel code, and root file system information, and dynamically optimizing the firmware scanning strategy during the scanning process; generating a vulnerability list sorted by risk level based on the known and unknown vulnerabilities of the firmware sample, and providing specific remediation suggestions in conjunction with the optimized firmware scanning strategy and the environment.
[0014] A third aspect of this application provides an electronic device, including: a memory, a processor, and a computer program stored in the memory and executable on the processor. The processor executes the program to implement the multi-platform firmware parsing and vulnerability scanning method as described in the above embodiments.
[0015] A fourth aspect of this application provides a computer-readable storage medium having a computer program stored thereon, which is executed by a processor to implement the multi-platform firmware parsing and vulnerability scanning method as described in the above embodiments.
[0016] Therefore, this application has the following beneficial effects:
[0017] This application's embodiments decouple the processing flow through architecture identification and module plug-in mechanisms. It analyzes the firmware magic number, header information, and structural features to dynamically load and parse modules, adapting to various domestic firmware formats and processor architectures, improving parsing adaptability and efficiency, and meeting the processing needs of firmware with heterogeneous architectures and diverse formats. It integrates detection methods such as fingerprint matching, behavioral modeling, and multimodal program semantic modeling. Through unified modeling with dual semantic coding branches and dual constraints, it maintains semantic invariance and achieves unknown vulnerability identification and accurate location based on deviation detection of normal program semantic distribution, enhancing the ability to discover complex logical defects and unknown risk paths. A vulnerability report and remediation suggestion generation mechanism is designed, combining deployment architecture and module features to output structured and executable remediation information, facilitating rapid response and closed-loop remediation, and improving the engineering value and practical security operation and maintenance usability of the detection system. Thus, it solves the problems of poor detection capabilities and low efficiency in existing technologies.
[0018] Additional aspects and advantages of this application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of this application. Attached Figure Description
[0019] The above and / or additional aspects and advantages of this application will become apparent and readily understood from the following description of the embodiments taken in conjunction with the accompanying drawings, wherein:
[0020] Figure 1 This is a schematic diagram of the structure of a multi-platform firmware parsing and vulnerability scanning system provided according to an embodiment of this application;
[0021] Figure 2 This is a schematic diagram illustrating a security assessment of domestically produced office terminal firmware according to an embodiment of this application;
[0022] Figure 3 This is a schematic diagram of a firmware security-specific detection method provided according to an embodiment of this application;
[0023] Figure 4 This is a flowchart of a multi-platform firmware parsing and vulnerability scanning system provided according to an embodiment of this application;
[0024] Figure 5 This is a flowchart of a multi-platform firmware parsing and vulnerability scanning method according to an embodiment of this application;
[0025] Figure 6 This is a schematic diagram of a multi-platform firmware parsing and vulnerability scanning method according to an embodiment of this application;
[0026] Figure 7 This is a schematic diagram of the structure of an electronic device provided according to an embodiment of this application. Detailed Implementation
[0027] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of this application.
[0028] The following describes an embodiment of the multi-platform firmware parsing and vulnerability scanning system according to the accompanying drawings. Addressing the issue of poor detection capabilities mentioned in the background section, this application provides a multi-platform firmware parsing and vulnerability scanning system. In this system, the processing flow is decoupled through architecture identification and module plug-in mechanisms. The system analyzes the firmware magic number, header information, and structural features to dynamically load parsing modules, adapting to various domestic firmware formats and processor architectures, improving parsing adaptability and efficiency, and meeting the processing needs of firmware with heterogeneous architectures and diverse formats. It integrates detection methods such as fingerprint matching, behavioral modeling, and multimodal program semantic modeling. Through unified modeling with dual semantic coding branches and dual constraints, semantic invariance is maintained. Based on deviation detection of normal program semantic distribution, unknown vulnerabilities are identified and accurately located, enhancing the ability to discover complex logical defects and unknown risk paths. A vulnerability report and remediation suggestion generation mechanism is designed, combining deployment architecture and module features to output structured and executable remediation information, facilitating rapid response and closed-loop remediation, and improving the engineering value and practical security operation and maintenance usability of the detection system. Thus, the problems of poor detection capabilities and low efficiency in the prior art are solved.
[0029] Figure 1 This is a schematic diagram of the structure of the multi-platform firmware parsing and vulnerability scanning system provided in the embodiments of this application.
[0030] This application provides a multi-platform firmware parsing and vulnerability scanning system, the system 10 including:
[0031] The system includes a data acquisition module 100, a firmware preprocessing module 200, a feature recognition module 300, a firmware extraction module 400, a vulnerability scanning module 500, and a vulnerability remediation suggestion module 600.
[0032] The system includes the following modules: a data acquisition module 100, which acquires multi-source firmware samples of the product, including original manufacturer, customized, and test firmware, as well as standard and cross-platform compatible firmware for compliance testing; a firmware preprocessing module 200, which receives and identifies the format and processor architecture of the firmware samples, obtains the identification results, and dynamically loads the corresponding parsing module based on the identification results; a feature identification module 300, which extracts firmware header information, kernel code, and file system features to determine the firmware type and quickly locates the location of key components based on the firmware type; a firmware extraction module 400, which parses the firmware structure and extracts the bootloader, kernel code, and root file system information based on the firmware type and the location of key components; a vulnerability scanning module 500, which identifies known and unknown vulnerabilities based on fingerprint matching, behavioral modeling, and multimodal program semantic modeling methods, and dynamically optimizes the scanning strategy during the scanning process; and a vulnerability remediation suggestion module 600, which generates a vulnerability list sorted by risk level and provides specific remediation suggestions based on the environment.
[0033] It is understood that this application embodiment decouples the processing flow through architecture identification and module plug-in mechanism, analyzes firmware magic number, header information and structural features to dynamically load parsing modules, adapts to various domestic firmware formats and processor architectures, improves parsing adaptability and efficiency, and meets the processing needs of firmware with heterogeneous architecture and diverse formats; it integrates detection methods such as fingerprint matching, behavior modeling and multimodal program semantic modeling, maintains semantic invariance through unified modeling of dual semantic coding branches and dual constraints, and achieves unknown vulnerability identification and accurate location based on deviation detection of normal program semantic distribution, enhancing the ability to discover complex logical defects and unknown risk paths; it designs a vulnerability report and remediation suggestion generation mechanism, combines deployment architecture and module features to output structured and executable remediation information, facilitates rapid response and closed-loop remediation, and improves the engineering value and practical security operation and maintenance utility of the detection system. Thus, it solves the problems of poor detection capability and low efficiency in existing technologies.
[0034] In this embodiment, the data acquisition module 100 includes: a sample access unit, a legality verification unit, and a metadata management unit.
[0035] The sample access unit can use three access methods: local file upload, remote device firmware retrieval, and batch import via API interface to access multi-source firmware samples of the product; the legality verification unit verifies the integrity of the sample based on MD5 / SHA256 hash values, filters invalid non-firmware files, and matches trusted source whitelists; the metadata management unit extracts device model, manufacturer information, and firmware version metadata and enters them into the system.
[0036] It is understood that the embodiments of this application use three access methods—local file upload, remote device firmware retrieval, and batch import via API interface—to adapt to firmware acquisition needs in different scenarios and ensure the comprehensiveness and flexibility of multi-source sample acquisition. The legality verification unit verifies sample integrity based on MD5 / SHA256 hash values, filters invalid non-firmware files, and matches a trusted source whitelist to ensure the validity and reliability of samples and avoid invalid data or malicious samples interfering with the analysis process. The metadata management unit extracts metadata such as device model, manufacturer information, and firmware version and enters it into the system to achieve standardized archiving of firmware samples, providing basic support for subsequent classification analysis and version tracing.
[0037] In this embodiment, the firmware preprocessing module 200 includes: a format recognition unit, an architecture recognition unit, and a parsing module loading unit.
[0038] The format identification unit is used to identify the firmware format type by analyzing the firmware header information, magic number field and structural features; the architecture identification unit is used to identify the processor instruction set architecture that the firmware is adapted to; the parsing module loading unit is used to dynamically load the parsing module that matches the identification result, which is encapsulated by dynamic link library and supports plugin registration mechanism.
[0039] It is understood that the embodiments of this application accurately determine the firmware format type by analyzing the firmware header information, magic number field and structural features through the format recognition unit, and accurately identify the compatible processor instruction set architecture through the architecture recognition unit, providing the core basis for the parsing module matching; the parsing module loading unit dynamically loads the compatible parsing module based on the recognition results, adopts dynamic link library encapsulation and supports the plugin registration mechanism, which can flexibly extend to adapt to new formats and new architectures, and can quickly be compatible with various firmware without modifying the core code, greatly improving the system's multi-platform adaptability and expansion flexibility, and providing efficient and stable pre-support for subsequent feature extraction and firmware parsing.
[0040] In this embodiment, the feature recognition module 300 includes: a header information extraction unit, a kernel feature analysis unit, a file system recognition unit, and a component location and type judgment unit.
[0041] The header information extraction unit is used to extract key fields such as version number, checksum, and load base address; the kernel feature analysis unit is used to extract function entry addresses and kernel code features of call relationships based on static analysis; the file system identification unit is used to identify the file system type and structure based on storage structure features and metadata; and the component location and type judgment unit is used to locate key components of the bootloader and driver module and determine the firmware type.
[0042] It is understood that, in this embodiment of the application, the header information extraction unit extracts key fields such as version number, checksum, and load base address to provide basic configuration basis for firmware parsing; the kernel feature analysis unit extracts kernel code features such as function entry address and call relationship based on static analysis to improve the core execution logic mining capability; the file system identification unit accurately identifies the file system type and structure based on storage structure features and metadata to clarify the firmware storage organization method; the component location and type judgment unit quickly locates key components such as bootloader and driver module and judges firmware type, focusing on core analysis objects, comprehensively mining firmware core features and key information, providing accurate data support for subsequent firmware structure parsing, core component extraction and vulnerability scanning, and greatly improving the analysis targeting and efficiency of subsequent steps.
[0043] In this embodiment, the firmware extraction module 400 includes a structure parsing unit, a bootloader extraction unit, a kernel module extraction unit, and a file system extraction unit.
[0044] The structure parsing unit is used to determine the dependencies and layout of each module; the bootloader extraction unit is used to extract the startup code and core functions of the initialization process; the kernel module extraction unit is used to generate code through disassembly and build a control flow graph; and the file system extraction unit is used to build a complete root file system structure and output it in a structured manner.
[0045] It is understood that the embodiments of this application clarify module dependencies and layout through the structure parsing unit, laying a logical foundation for extracting core components; the bootloader extraction unit extracts core functions such as startup code and initialization process, locking in the key logic of firmware startup; the kernel module extraction unit generates code through disassembly and constructs a control flow graph, presenting the kernel execution path and logical association; the file system extraction unit constructs a complete root file system structure and outputs it in a structured manner, restoring the firmware storage organization and file resources, transforming complex firmware into standardized, analyzable structured data, providing high-quality analysis objects for the vulnerability scanning module to accurately locate vulnerability locations and analyze vulnerability causes, thereby improving the accuracy and efficiency of vulnerability detection.
[0046] For example, such as Figure 2 As shown, when a company conducted a security assessment of the firmware of domestically produced office terminals, it collected firmware samples from over 500 customized domestic terminals. The firmware extraction module proceeded according to the following process: After the process started, the structure parsing unit first performed firmware structure parsing operations to clarify the dependencies and layout of each module. After successful parsing, the bootloader extraction unit extracted the bootloader, accurately extracting the firmware's startup code and core functions of the initialization process. Subsequently, the kernel module extraction unit extracted the kernel module, generated code through disassembly, and constructed a control flow graph, clearly presenting the kernel execution path and logical connections. Next, the file system extraction unit extracted the file system, restoring the complete root file system structure. Finally, structured output was completed, and all extracted content was organized and archived in a standardized format. Throughout the process, only 3 complex customized firmware samples failed to be parsed, achieving an effective extraction rate of 99.4%. This provided structured and highly available analytical data for the subsequent vulnerability scanning module to accurately locate vulnerabilities in core components, significantly improving the efficiency and accuracy of the firmware security assessment.
[0047] In this embodiment, the vulnerability scanning module 500 includes a known vulnerability identification unit, a behavior modeling and detection unit, a multimodal semantic modeling unit, and a scanning strategy optimization unit.
[0048] Among them, the known vulnerability identification unit is used to perform fingerprint identification using the signature rules of the vulnerability feature database; the behavior modeling and detection unit is used to construct a finite state machine model to identify abnormal behavior patterns; the multimodal semantic modeling unit is used to perform cross-architecture semantic modeling through instruction sequence semantic encoding and graph structure semantic encoding; and the scanning strategy optimization unit is used to dynamically adjust the scanning strategy according to the firmware architecture and module complexity.
[0049] It is understood that, in this embodiment of the application, the known vulnerability identification unit uses the signature rules of the vulnerability feature database to perform fingerprint identification, quickly matching known vulnerabilities; the behavior modeling and detection unit constructs a finite state machine model to efficiently capture abnormal behavior patterns such as memory access and system calls, making up for the shortcomings of static feature matching; the multimodal semantic modeling unit conducts cross-architecture semantic modeling through instruction sequence semantic encoding and graph structure semantic encoding, breaking through the limitations of different processor architectures and compilation conditions, and accurately identifying potential unknown vulnerabilities; the scanning strategy optimization unit dynamically adjusts the allocation of scanning resources and detection priority according to the firmware architecture and module complexity, balancing detection efficiency and accuracy, comprehensively covering the vulnerability types of firmware on multiple platforms, significantly reducing the false positive and false negative rates, and improving the efficiency and adaptability of batch scanning.
[0050] It should be noted that instruction sequence semantic encoding represents the disassembled instruction sequence of a firmware kernel module or target function as an instruction feature sequence. (X is a set of input sequences containing n elements;) It is the first element in set X; It is the nth element in set X; (The second element in set X), where each instruction feature includes at least the opcode type, operand field category, and control attribute information.
[0051] The instruction sequence semantic encoding preferably employs a sequence encoding network based on the Transformer architecture to model the contextual semantics of the instruction feature sequence. An embedding layer maps the instruction features into vector representations, and a self-attention mechanism is used to characterize the semantic relationships between different instructions, generating the semantic representation of the instruction sequence as shown in the formula: ( The feature vector output by the sequence encoder; (where X is the sequence encoding function and X is the input sequence data set). The Transformer architecture uses a self-attention mechanism to model the context of disassembled instruction sequences. The self-attention calculation process is shown in the formula: ( Here, Q is the attention mechanism function; K is the query matrix; V is the key matrix; and V is the value matrix. This is a normalized exponential function used to calculate attention weights; To query the product of the matrix and the transpose of the key matrix; The dimension of the key vector; (This is the square root of the key vector dimension, used to scale the attention score).
[0052] Graph structure semantic encoding branch
[0053] By using basic blocks as graph nodes and control jump relationships and data dependencies as graph edges, a graph structure representation is formed. (G represents the graph data structure; V represents the set of vertices (nodes) in the graph; E represents the set of edges in the graph). The graph structure semantic encoding branch generates a graph structure semantic representation as shown in the formula by aggregating features of basic block nodes and their control dependencies and data dependencies: ( The feature vector output by the graph encoder; G is the graph encoding function; G is the input graph data structure.
[0054] Semantic alignment constraints and cross-architecture consistency constraints
[0055] To represent the semantics of instruction sequences semantic representation of graph structure Mapping to a unified semantic space, a semantic alignment constraint is introduced to maintain consistency between the two, the form of which is shown in the formula: ( Alignment loss is used to measure the difference between sequence features and graph features; The feature vector output by the sequence encoder; The feature vector output by the graph encoder; The L2 norm (Euclidean distance) is used to calculate the distance between two vectors. Furthermore, to ensure consistency in the semantic vectors of the same functional code generated under different processor architectures or compilation conditions, the semantic representation of the same functional code under different conditions is analyzed. and Introducing cross-architecture consistency constraints as shown in the formula: ( Consistency loss is used to measure the feature differences between different enhanced versions of the same input. The feature vector corresponding to input a; The feature vector corresponding to input b; (L2 norm), where the same functional code can be determined by matching function names, call relationships, control flow structures, or key constants. Through the above semantic alignment constraints and cross-architecture consistency constraints, the program's semantic vector maintains semantic invariance in cross-platform scenarios.
[0056] Anomaly detection and localization based on normal program semantic distribution
[0057] Based on multiple known normal firmware samples, historical stable firmware versions, or firmware modules that have been manually confirmed to be free of vulnerabilities, we extract their program semantic vector sets to construct a normal program semantic distribution. For the semantic vector of the program to be detected Calculate its deviation from the normal semantic distribution, as shown in the formula: ( The discriminant function or distance function is z; z is the input feature vector. The normalized feature set; This is a distance calculation function used to calculate z and... (Distance between them), when the deviation exceeds a preset threshold, it is determined to be related to the semantic vector. The associated functions or basic blocks are identified as suspicious locations, and further filtering is performed based on abnormal behavior patterns. The alert results for potential unknown vulnerabilities and their corresponding function or basic block location information are then output.
[0058] For example, such as Figure 3 As shown, a financial company conducted a firmware security test, collecting firmware samples from over 800 servers, network devices, and terminals, including a large number of customized firmware with obfuscated code. The test proceeded according to a process: first, module information extraction was performed. After successful extraction, known vulnerability characteristics were matched using fingerprint recognition, followed by the behavioral modeling detection unit. This unit constructed a finite state machine model that fits the actual working logic of each type of firmware, simulating the entire process of firmware startup, network communication, data storage, etc., and capturing memory access patterns, system call sequences, and abnormal interruption response paths in real time. After completing the behavioral modeling detection, deep learning detection was carried out for further risk verification. At the same time, the scanning strategy optimization unit dynamically adjusted the scanning strategy based on the firmware architecture and the current module complexity. During the detection process, the behavioral modeling detection unit successfully identified 32 abnormal behavior patterns that deviated from the normal flow: including 15 abnormal operations attempting to modify firmware configuration files after unauthorized access, 10 high-risk system call chains exceeding the normal range, and 7 abnormal memory read / write behaviors. Ultimately, it correlated and located 18 potential vulnerabilities that were not detected in the fingerprint recognition process, achieving a detection accuracy rate of over 95%. This unit effectively compensates for the shortcomings of static fingerprint matching in detecting code obfuscation and unknown attack paths. Combined with deep learning detection and strategy optimization within the process, it builds a dynamic protection barrier for firmware security, significantly reducing the risk of malicious exploitation of abnormal behavior to intrude into the system.
[0059] In this embodiment, the vulnerability remediation suggestion module 600 includes a risk level classification unit, a remediation scheme generation unit, and a report output unit.
[0060] The risk level classification unit is used to classify vulnerabilities into high, medium, and low risks and generate a structured list; the remediation solution generation unit is used to provide patch updates and configuration hardening suggestions based on the deployment environment; and the report output unit is used to output a standardized vulnerability assessment report.
[0061] It is understood that the embodiments of this application accurately classify vulnerabilities into high, medium, and low risks and generate a structured list through a risk level classification unit, clarifying the priority of vulnerability harm and helping security personnel focus on prioritizing the handling of core risks; the remediation solution generation unit deeply integrates with the product's hardware architecture, operating system, and other deployment environments, providing targeted suggestions such as patch updates and configuration hardening, ensuring that the remediation solution is highly adaptable and executable; the report output unit outputs a standardized vulnerability assessment report containing vulnerability details, risk analysis, and remediation guidelines, facilitating the security team to track remediation progress and archive management, significantly improving the efficiency and targeting of firmware vulnerability remediation, reducing remediation costs, and providing strong support for security audits and compliance checks, ensuring the implementation of a closed-loop firmware security protection system.
[0062] The multi-platform firmware parsing and vulnerability scanning system proposed in this application decouples the processing flow through architecture identification and module plug-in mechanisms. It analyzes the firmware magic number, header information, and structural features to dynamically load parsing modules, adapting to various domestic firmware formats and processor architectures, thus improving parsing adaptability and efficiency and meeting the processing needs of firmware with heterogeneous architectures and diverse formats. It integrates detection methods such as fingerprint matching, behavioral modeling, and multimodal program semantic modeling. Through unified modeling with dual semantic coding branches and dual constraints, it maintains semantic invariance and achieves unknown vulnerability identification and accurate location based on deviation detection of normal program semantic distribution, enhancing the ability to discover complex logical defects and unknown risk paths. A vulnerability report and remediation suggestion generation mechanism is designed, combining deployment architecture and module features to output structured and executable remediation information, facilitating rapid response and closed-loop remediation, and improving the engineering value and practical security operation and maintenance usability of the detection system. Therefore, it solves the problems of poor detection capabilities and low efficiency in existing technologies.
[0063] The following will illustrate a multi-platform firmware parsing and vulnerability scanning system through a specific embodiment, such as... Figure 4 As shown, it includes:
[0064] A security vendor's analysis project required comprehensive security testing of the firmware of a total of 180 devices of various types, including 100 servers, 50 network devices, and 30 embedded terminals. To achieve efficient and accurate security assessment, the vendor introduced the multi-platform firmware analysis and vulnerability scanning system described in this application. This system aims to achieve fully automated detection and closed-loop remediation of various firmware samples, from collection, analysis, feature extraction, vulnerability scanning to the generation of remediation suggestions, thereby comprehensively improving the security baseline of the infrastructure.
[0065] The system deployment employs a hybrid architecture of local servers and distributed nodes to balance centralized management with parallel processing efficiency. The local server, serving as the core control and data hub, is equipped with two Phytium FT-2000 / 4 processors, 64GB of memory, and 2TB of SSD storage, used for deploying the system's core management, scheduling, and database modules. Simultaneously, three functionally defined distributed nodes are set up, each dedicated to data preprocessing, deep vulnerability scanning, and report generation tasks, respectively. All nodes are interconnected via an internal high-speed LAN, ensuring communication latency is controlled within 10 milliseconds, guaranteeing real-time data flow and task collaboration across nodes. The system collects raw firmware samples using three methods adapted to different scenarios: for 100 servers, batch import of firmware versions is performed by calling the device management API interface; for 50 network devices, firmware images are retrieved remotely via a secure protocol; and for 30 embedded terminals, firmware files are received from local storage via the file upload function of the management interface. After initial collection, the legitimacy verification unit immediately activated, performing rigorous SHA256 hash value integrity checks on all 180 initial samples, successfully eliminating two incomplete samples due to transmission interruptions. Subsequently, using techniques such as file header feature analysis, five incorrectly submitted non-firmware files were filtered out. Finally, the remaining samples were compared with a pre-defined whitelist of trusted vendor sources to ensure the compliance of all firmware sources to be analyzed. After this triple verification, 173 valid firmware samples were ultimately retained for subsequent processes. The metadata management unit worked synchronously, automatically extracting key attributes for each device from the firmware header or auxiliary information, such as model, manufacturer, and firmware version number, generating a standardized sample ledger and persistently recording it into the system database, providing a foundation for end-to-end tracking and traceability.
[0066] The firmware preprocessing module first performs batch format identification on 173 valid samples. The format identification unit uses various techniques such as file signature analysis and magic number comparison to quickly and accurately identify the sample's packaging format. Statistical results show that 120 firmware samples are packaged using the SquashFS compressed file system format, 30 use the common Ext4 disk image format, and 23 are proprietary packaging formats developed by the device manufacturers themselves. This identification result forms the basis for subsequent selection of the correct parser. The architecture identification unit analyzes the instruction set characteristics, entry point code patterns, and architecture identification information in the firmware binary code to determine the target processor architecture compiled for each firmware. In this batch of samples, 80 firmware samples are identified as based on the ARMv8 instruction set, 50 as based on the LoongArch instruction set, and 43 as based on the Shenwei SW64 instruction set. Accurate architecture identification is crucial for subsequent disassembly, code analysis, and vulnerability feature matching. Based on the format and architecture identification results, the parsing module loading unit begins its work. The system maintains a repository containing 20 registered parsing modules, which are optimized for different combinations of formats and architectures. Based on the identification results of the first two steps, the loading unit automatically matches and dynamically loads the most suitable parsing module for each sample. In this task, all 173 samples were successfully matched with the corresponding modules, achieving a 100% loading success rate. This process is highly automated, with the average preprocessing time per sample controlled within 1 second, providing performance assurance for large-scale batch processing. The successfully loaded parsing module then becomes the core engine for interacting with the firmware sample, responsible for the initial unpacking and structural parsing of the firmware binary content, and initiating in-depth feature recognition. Feature recognition begins with extracting key header information, such as version number, checksum, and load base address. The kernel feature analysis unit then extracts a large number of function-level features through static analysis. Simultaneously, the file system identification unit confirms the file system type distribution, and the component location and type judgment unit further locates key components and comprehensively determines different types such as BIOS, UEFI, and embedded device firmware, completing the initial profile of the firmware sample.
[0067] After initial feature extraction, the system enters a deeper firmware analysis and vulnerability scanning phase. The firmware extraction module is activated, with its structure analysis unit dedicated to understanding the dependencies and control flow between modules and components within the firmware, generating a module dependency graph. The bootloader extraction unit specifically extracts and analyzes the system's startup code and initialization process. The kernel module extraction unit utilizes disassembly techniques to construct a detailed control flow graph. The file system extraction unit completely reconstructs and outputs a JSON-formatted list of the root file system, thus achieving a comprehensive deconstruction of the firmware's internal structure. Based on the results of this deep analysis, the vulnerability scanning module employs a multi-level, composite detection strategy to comprehensively uncover security flaws. The known vulnerability identification unit, as the first line of defense, relies on a vulnerability feature database containing over 5000 signature rules for fingerprint matching, efficiently identifying 32 known vulnerabilities, including 15 CVE-2023 series vulnerabilities and 17 vulnerabilities in domestic firmware. The second layer is the behavior modeling and detection unit, which simulates and identifies abnormal behavior patterns by constructing a finite state machine model, successfully discovering 8 suspicious patterns such as unauthorized access and abnormal file modification. The third layer is the multimodal semantic modeling unit, which uses deep learning technology to perform semantic analysis on the code, aiming to discover unknown vulnerabilities. This time, it successfully identified 5 unknown vulnerabilities, 2 of which are related to the encryption module. To improve scanning efficiency, the scanning strategy optimization unit dynamically adjusts parameters based on sample complexity, increasing the scanning priority of core modules by 50%, thus reducing the overall detection time from the estimated 12 hours to 8 hours.
[0068] After scanning, the process enters the vulnerability remediation and knowledge output phase. The vulnerability remediation recommendation module first activates the risk level classification unit. This unit comprehensively considers factors such as the difficulty of exploitation and the severity of impact to quantify and classify the discovered vulnerabilities: 32 known vulnerabilities were divided into 8 high-risk, 15 medium-risk, and 9 low-risk; the 5 unknown vulnerabilities were all assessed as medium-high risk. Based on this, a structured vulnerability list is generated, with high-risk vulnerabilities displayed at the top to ensure clear and explicit priority for handling. Subsequently, the remediation solution generation unit plays a crucial role, closely integrating with the environment to provide highly actionable customized remediation recommendations for each vulnerability. For the 8 high-risk vulnerabilities, the installation of verified and compatible domestic patches is explicitly recommended, with detailed installation steps provided. For the 17 medium-risk vulnerabilities, specific configuration hardening solutions are mainly provided. For the 9 low-risk vulnerabilities, it is recommended to include them in subsequent firmware upgrade plans. The report output unit finally integrates the entire analysis process and results, generating a summary report covering 173 samples and a detailed single-device report for each device. The report includes a vulnerability summary, detailed information, customized remediation recommendations, and supporting data, and supports export in PDF format, providing security teams with a complete basis for decision-making and action.
[0069] Upon receiving the detailed report generated by the system, the security team immediately initiated remediation actions based on the risk prioritization and customized remediation recommendations. With clear prioritization guidelines and actionable steps, the security team efficiently completed the remediation of all high-risk vulnerabilities within one week, quickly eliminating the most pressing security threats. Subsequently, over the next two weeks, the team systematically advanced and completed the remediation and system hardening measures for all medium- and low-risk vulnerabilities.
[0070] In summary, this application's embodiments utilize a data acquisition and legality verification module to acquire firmware samples across multiple scenarios, providing compliant and valid input. The firmware preprocessing and feature recognition module, through multi-dimensional analysis, accurately identifies the format architecture and core components, significantly reducing detection failures caused by parsing and adaptation errors and lowering manual intervention costs. The deep analysis and multi-level scanning module improves the accuracy and efficiency of vulnerability discovery through multi-strategy detection and dynamic optimization, and, in conjunction with the scanning strategy optimization unit, adjusts the priority of core modules to reduce overall detection time. The vulnerability remediation and closed-loop management module generates customized solutions based on environment adaptation, balancing risk management and business continuity, reducing vulnerability exploitation probability and remediation costs, while avoiding the blindness of manual investigation and compatibility issues with remediation solutions. This enhances the infrastructure security baseline, stabilizes business operations, and reduces overall security risks.
[0071] Next, referring to the accompanying drawings, a multi-platform firmware parsing and vulnerability scanning method based on embodiments of this application is described.
[0072] like Figure 5 As shown, this multi-platform firmware parsing and vulnerability scanning method includes the following steps:
[0073] In step S101, multi-source firmware sample data of the product is obtained.
[0074] It is understood that the embodiments of this application comprehensively acquire multi-source firmware sample data, covering firmware of different types and versions of products, to ensure the comprehensiveness of the detection, provide sufficient and diverse data sources for subsequent analysis, and avoid detection blind spots caused by single samples.
[0075] In step S102, the corresponding parsing module is dynamically loaded by receiving and identifying the format and processor architecture of the multi-source firmware sample data.
[0076] It is understood that the embodiments of this application, by receiving and identifying the format and processor architecture of multi-source firmware samples and dynamically loading the corresponding parsing modules, can not only accurately adapt to the diverse firmware formats and domestic processor architectures in the product, breaking the adaptation limitations of a single parsing tool, but also achieve flexible expansion of parsing capabilities through the dynamic loading mechanism. It can be compatible with new formats and new architecture firmware without modifying the core system code, and at the same time provide adaptability tool support for subsequent firmware feature extraction, structure parsing and other stages, greatly improving the compatibility and efficiency of multi-platform firmware processing and adapting to diverse deployment needs.
[0077] In step S103, based on the parsing module, firmware header information, kernel code and file system features are extracted, firmware type is determined, key component locations are quickly located based on firmware type, firmware structure is parsed and bootloader, kernel code and root file system information are extracted, and firmware scanning strategy is output.
[0078] Among them, the root file system information is the structure and attribute information of the file system in the firmware, starting from the root directory, including its type, directory layout, file list, and various file permission settings.
[0079] It is understood that, by extracting root file system information, this application embodiment clearly restores the storage organization logic and file resource distribution of the firmware, providing clear path support for accurately locating key components such as core configuration files and driver modules in the root directory, efficiently parsing the overall firmware structure, and completely extracting the bootloader and kernel code associated files. At the same time, it can assist in verifying the integrity and compliance of the firmware through features such as file permissions and directory levels, providing data basis for outputting targeted firmware scanning strategies, and greatly improving the accuracy and targeting of subsequent vulnerability scanning.
[0080] In step S104, based on the bootloader, kernel code, and root file system information, fingerprint matching, behavioral modeling, and multimodal program semantic modeling methods are used to identify known and unknown vulnerabilities in the firmware sample, and the firmware scanning strategy is dynamically optimized during the scanning process.
[0081] Among them, the multimodal program semantic modeling method is a program analysis method that integrates program features from different modalities, such as instruction sequences and control flow / data flow graphs, to perform semantic encoding and feature aggregation on firmware program code, and constructs a cross-processor architecture program semantic representation model to accurately identify unknown vulnerabilities in firmware.
[0082] It is understood that the embodiments of this application, by employing a multimodal program semantic modeling method, can perform semantic encoding and aggregation based on the firmware's bootloader, kernel code, and root file system information, integrating multimodal program features such as instruction sequences and control flow / data flow graphs. This constructs a cross-processor architecture program semantic representation model, overcoming the detection limitations caused by different processor architectures, compilation conditions, and code obfuscation. It accurately uncovers potential risks at the deep semantic level of firmware, effectively identifying unknown vulnerabilities that are difficult to discover through fingerprint matching and behavioral modeling, thus compensating for the shortcomings of traditional detection methods. At the same time, it complements fingerprint matching and behavioral modeling, constructing a multi-level vulnerability detection system that fully covers known and unknown vulnerabilities. This significantly improves the comprehensiveness, accuracy, and cross-architecture adaptability of firmware vulnerability detection, and also provides accurate risk data support for dynamically optimizing firmware scanning strategies during the scanning process.
[0083] In step S105, a vulnerability list sorted by risk level is generated based on the known and unknown vulnerabilities in the firmware sample, and specific remediation suggestions are provided in combination with the optimized firmware scanning strategy and environment.
[0084] The vulnerability list is a structured list of vulnerability information formed after firmware vulnerability detection is completed, which integrates key information such as vulnerability identifier, risk level, affected components, and attack path, and sorts them according to established rules.
[0085] It is understood that the embodiments of this application utilize a vulnerability list that integrates key information such as vulnerability identifiers, risk levels, affected components, and attack paths, and is sorted by risk level. This clearly outlines the core information of all known and unknown vulnerabilities in the firmware, intuitively prioritizes vulnerability handling, and helps security personnel quickly focus on high-risk vulnerabilities and prioritize resource allocation for remediation, avoiding the omission of core security risks due to disordered handling. At the same time, relying on this structured list, the optimized firmware scanning strategy and the actual deployment environment can be accurately combined to output targeted and implementable specific remediation suggestions, making vulnerability remediation work more targeted. In addition, the standardized vulnerability list also provides a clear structured basis for tracking, retesting, archiving, and security auditing of vulnerability remediation, realizing full traceability of the vulnerability detection and remediation process, greatly improving the efficiency, standardization, and closed-loop nature of firmware vulnerability remediation, and ensuring the orderly implementation of firmware security protection work.
[0086] The multi-platform firmware parsing and vulnerability scanning method proposed in this application decouples the processing flow through architecture identification and module plug-in mechanisms. It analyzes the firmware magic number, header information, and structural features to dynamically load parsing modules, adapting to various domestic firmware formats and processor architectures, improving parsing adaptability and efficiency, and meeting the processing needs of firmware with heterogeneous architectures and diverse formats. It integrates detection methods such as fingerprint matching, behavioral modeling, and multimodal program semantic modeling. Through unified modeling with dual semantic coding branches and dual constraints, it maintains semantic invariance and achieves unknown vulnerability identification and accurate location based on deviation detection of normal program semantic distribution, enhancing the ability to discover complex logical defects and unknown risk paths. A vulnerability report and remediation suggestion generation mechanism is designed, combining deployment architecture and module features to output structured and executable remediation information, facilitating rapid response and closed-loop remediation, and improving the engineering value and practical security operation and maintenance usability of the detection system. Thus, it solves the problems of poor detection capabilities and low efficiency in existing technologies.
[0087] The following will illustrate the multi-platform firmware parsing and vulnerability scanning method through a specific embodiment, such as... Figure 6 As shown, it includes:
[0088] One hundred Phytium FT-2000 / 4 servers, 50 Loongson 3A5000 network switches, and 30 Kunpeng 920 embedded terminals from a financial company's cloud project were selected as the testing targets. A multi-platform firmware parsing and vulnerability scanning system was introduced to achieve fully automated firmware security testing and remediation throughout the entire process. At the firmware sample acquisition level, full acquisition of multi-source firmware was achieved through multiple channels: server firmware was imported in batches via the system's API interface, including stable original firmware for 100 devices. The interface used HTTPS encrypted transmission, supporting a maximum batch upload of 500 firmware packages at a time, with a stable transmission rate of 10MB / s; network switch firmware was obtained through remote device firmware retrieval, using SSH to connect to the management ports of 50 customized devices, automatically reading the proprietary firmware from the storage partitions, verifying the integrity of the transmission packets in real time during the retrieval process, and controlling the retrieval time for each device to within 2 minutes; embedded terminal firmware was imported via local file upload, including 30 test firmware versions, supporting batch upload of ZIP compressed packages, with the system automatically decompressing and recognizing the firmware files within the packages. All 180 collected firmware samples underwent a validity verification process: The validity verification unit first performed integrity verification on each sample based on the SHA256 hash value, calculating the sample's hash value and comparing it with the benchmark value provided by the manufacturer, eliminating two incomplete samples due to transmission interruptions caused by hash value mismatches; then, it filtered out five invalid non-firmware files, such as Excel configuration tables and text logs, by matching the file header magic number segments; finally, it matched the manufacturer's trusted source whitelist, blocking seven third-party firmwares from unknown sources, ultimately retaining 173 valid samples. The metadata management unit automatically extracted information such as the model, manufacturer, and firmware version of each device, generating a standardized sample ledger containing the sample ID, collection time, and device department, which was stored in a MySQL database for subsequent classification, querying, and version tracing.
[0089] After firmware sample collection, the system enters the format and architecture identification and parsing module loading stage: First, the verified firmware is input into the system, and the format identification unit determines the format of each sample. By analyzing the information in the first 1024 bytes of the firmware header, the magic number field, and the overall structural features, 120 SquashFS format firmwares, 30 Ext4 format firmwares, and 23 special package format firmwares are identified, with an accuracy rate of 100%. Next, the architecture identification unit identifies the processor instruction set architecture adapted to the firmware. By analyzing the instruction sequence characteristics and register usage rules in the firmware, 80 samples are adapted to the ARMv8 architecture, 50 samples are adapted to the LoongArch architecture, and 43 samples are adapted to the SW64 architecture. Subsequently, the parsing module loading unit automatically matches the corresponding module from the 20 parsing modules registered in the system based on the identification results. These parsing modules are encapsulated using dynamic link libraries and support a plugin registration mechanism. Expansion can be completed simply by entering the configuration file of the new module into the system. In this test, the parsing modules of all samples were successfully loaded, and the loading response time of a single sample was ≤1 second, ensuring the high efficiency of batch processing.
[0090] Based on the loaded parsing module, the system sequentially performs firmware feature extraction, type determination, component location, and structural parsing, and outputs a scanning strategy: The header information extraction unit first obtains the key header fields of each sample, such as the version number V2.3, checksum 0x87654321, and load base address 0x80000000 of a Phytium server firmware. This information provides basic configuration parameters for subsequent parsing; The kernel feature analysis unit analyzes the firmware kernel code using a static disassembler, extracting core features such as function entry addresses and call relationships. On average, more than 300 function entry addresses can be extracted per sample, and the startup sequence is summarized. The core call chain from the start function to the initialization function to the driver loading function is identified; the file system identification unit, based on storage structure characteristics and metadata information, identifies 120 SquashFS file systems, 30 Ext4 file systems, and 23 special file systems, while extracting the root directory path and permission settings; the component location and type judgment unit, combining the above characteristics, locates the address ranges of key components such as the boot program, network driver module, and encryption component, and determines that 30 samples are BIOS type firmware, 80 samples are UEFI type firmware, and 63 samples are embedded device firmware, with a classification accuracy of 100%. After feature extraction and type determination, the structure parsing unit analyzes the dependencies and layout of each module, generating a module dependency graph. For example, the bootloader of a network switch firmware depends on three driver modules. The bootloader extraction unit extracts the startup code and initialization process of all samples. For example, the hardware initialization process of embedded terminal firmware is from serial port initialization to network module loading to storage partition mounting. The kernel module extraction unit generates assembly code through disassembly and constructs a control flow graph based on the branch jump instructions in the code. On average, each kernel module can generate 200 basic block nodes, clearly presenting the kernel execution path. The file system extraction unit constructs a complete root file system structure and outputs a file list in JSON format. A Phytium server firmware extracted a total of 1500 files, including system configuration files, driver module files, etc. Finally, based on the above analysis results, the system outputs targeted firmware scanning strategies to increase the scanning priority of these high-risk modules.
[0091] Based on the extracted bootloader, kernel code, and root file system information, the system employs a multi-level strategy to conduct vulnerability scanning and dynamically optimizes the scanning strategy: The known vulnerability identification unit first performs fingerprint matching based on a vulnerability feature database containing over 5000 signature rules. These signature rules cover instruction sequence features, function call pattern features, and key string features, successfully identifying 32 known vulnerabilities, including 15 CVE-2023 series vulnerabilities and 17 vulnerabilities in domestic firmware; The behavior modeling and detection unit constructs a finite state machine model that fits the actual working logic for each type of firmware's operating scenario, simulating the entire process of firmware startup, network communication, data storage, etc., capturing memory access patterns, system call sequences, and abnormal interrupt response paths in real time, identifying 8 abnormal behavior patterns that deviate from the normal process, including 5 unauthorized access attempts to modify configuration files. The system identified three high-risk system call chains that exceeded the normal range. The multimodal semantic modeling unit performed cross-architecture semantic modeling through dual-branch encoding. On one hand, it input the disassembled instruction sequence into the sequence encoding network of the Transformer architecture to generate the instruction sequence semantic representation. On the other hand, it performed feature aggregation based on the control flow graph and data flow graph through GCN to generate the graph structure semantic representation. Then, it optimized the representation result through semantic alignment constraints and cross-architecture consistency constraints, and finally identified five unknown vulnerabilities, two of which were related to encryption modules. The scanning strategy optimization unit dynamically adjusted the scanning parameters according to the firmware architecture and module complexity. For embedded firmware with medium complexity, the number of scanning threads was reduced from 8 threads to 4 threads, while the scanning priority of the core module was increased, which reduced the scanning time of the core module by 40%, and the overall detection time was reduced from the expected 12 hours to 8 hours.
[0092] Based on the scanned known and unknown vulnerabilities, the system generates a risk-level vulnerability list and provides corresponding remediation suggestions: The risk level classification unit divides 32 known vulnerabilities into 8 high-risk vulnerabilities, 15 medium-risk vulnerabilities, and 9 low-risk vulnerabilities based on the attack difficulty, impact scope, and consequences of the vulnerabilities. The 5 unknown vulnerabilities are all classified as medium-to-high risk. A structured list containing vulnerability ID, risk level, affected components, and attack paths is generated, with high-risk vulnerabilities displayed at the top. The remediation solution generation unit provides customized remediation suggestions for each vulnerability based on the deployment environment: For the 8 high-risk vulnerabilities, it recommends installing the Kylin OS-compatible V2.3.2 patch and provides a step-by-step installation guide from downloading the patch package to verifying patch integrity, disabling unnecessary services, performing patch installation, and restarting the device for verification; for the 15 medium-risk vulnerabilities, it recommends configuration hardening, such as modifying firewall configurations to disable external network access to UDP port 161, adjusting file permissions to allow the owner to read and write, group users to read-only, and other users to have no permissions; for the 9 low-risk vulnerabilities, it recommends deleting redundant code in the next firmware upgrade. The report output unit generates a summary report of 173 samples and a detailed report for each device. The report includes four parts: vulnerability summary, detailed information, remediation recommendations, and supporting data, and supports export in both PDF and HTML formats. Based on the reports, the security team completed the remediation of all 8 high-risk vulnerabilities within one week, and the rectification of 15 medium-risk and 9 low-risk vulnerabilities within two weeks. After the rectification was completed, the system firmware was retested, and all vulnerabilities were fixed, raising the firmware security level to the highest level.
[0093] In summary, this application's embodiments, through multi-source firmware collection and legality verification across all channels, combined with dynamic adaptation to multiple formats and architectures and precise feature parsing, can comprehensively cover firmware scenarios of servers, network devices, and embedded terminals, providing a clean and reliable data source for subsequent testing. A multi-level vulnerability scanning system and dynamic strategy optimization enable accurate identification of known vulnerabilities and effective detection of unknown vulnerabilities, significantly improving the efficiency and coverage of batch firmware testing. A risk-graded structured list and customized environment remediation suggestions ensure vulnerability remediation, significantly improving firmware security levels, greatly reducing the security risk of firmware being maliciously exploited, and comprehensively enhancing firmware security protection capabilities and compliance assurance levels.
[0094] Figure 7 A schematic diagram of the structure of an electronic device provided in an embodiment of this application. The electronic device may include:
[0095] The memory 701, the processor 702, and the computer program stored on the memory 701 and executable on the processor 702.
[0096] When the processor 702 executes the program, it implements the multi-platform firmware parsing and vulnerability scanning method provided in the above embodiments.
[0097] Furthermore, electronic devices also include:
[0098] Communication interface 703 is used for communication between memory 701 and processor 702.
[0099] The memory 701 is used to store computer programs that can run on the processor 702.
[0100] The memory 701 may include high-speed RAM (Random Access Memory) and may also include non-volatile memory, such as at least one disk storage.
[0101] If the memory 701, processor 702, and communication interface 703 are implemented independently, then the communication interface 703, memory 701, and processor 702 can be interconnected via a bus to complete communication between them. The bus can be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, or an EISA (Extended Industry Standard Architecture) bus, etc. The bus can be divided into address bus, data bus, control bus, etc. For ease of representation, Figure 7 The bus is represented by a single thick line, but this does not mean that there is only one bus or one type of bus.
[0102] Optionally, in a specific implementation, if the memory 701, processor 702, and communication interface 703 are integrated on a single chip, then the memory 701, processor 702, and communication interface 703 can communicate with each other through an internal interface.
[0103] The processor 702 may be a CPU (Central Processing Unit), an ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement the embodiments of this application.
[0104] This application also provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the above-described multi-platform firmware parsing and vulnerability scanning method.
[0105] In the description of this specification, the references to "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of this application. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples. Moreover, without contradiction, those skilled in the art can combine and integrate the different embodiments or examples described in this specification, as well as the features of different embodiments or examples.
[0106] Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of that feature. In the description of this application, "multiple" means at least two, such as two, three, etc., unless otherwise explicitly specified.
[0107] Any process or method description in the flowchart or otherwise herein can be understood as representing a module, segment, or portion of code comprising one or more executable instructions for implementing custom logic functions or processes, and the scope of the preferred embodiments of this application includes additional implementations in which functions may be performed not in the order shown or discussed, including substantially simultaneously or in reverse order depending on the functions involved, as should be understood by those skilled in the art to which embodiments of this application pertain.
[0108] It should be understood that various parts of this application can be implemented using hardware, software, firmware, or a combination thereof. In the above embodiments, multiple steps or methods can be implemented using software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware as in another embodiment, it can be implemented using any of the following techniques known in the art, or a combination thereof: discrete logic circuits having logic gates for implementing logical functions on data signals, application-specific integrated circuits (ASICs) having suitable combinational logic gates, programmable gate arrays (PGAs), field-programmable gate arrays (FPGAs), etc.
[0109] Those skilled in the art will understand that all or part of the steps of the methods described in the above embodiments can be implemented by a program instructing related hardware. The program can be stored in a computer-readable storage medium, and when executed, the program includes one or a combination of the steps of the method embodiments.
[0110] Although embodiments of this application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting this application. Those skilled in the art can make changes, modifications, substitutions and variations to the above embodiments within the scope of this application.
Claims
1. A multi-platform firmware parsing and vulnerability scanning system, comprising: include: The module includes a data acquisition module, a firmware preprocessing module, a feature recognition module, a firmware extraction module, a vulnerability scanning module, and a vulnerability remediation suggestion module; among them, The data acquisition module is used to acquire multi-source firmware samples of the product, including original factory firmware, customized firmware, test firmware, and standard and cross-platform adapted firmware for compliance testing. The firmware preprocessing module receives and identifies the format and processor architecture of the firmware sample, obtains the identification result, and dynamically loads the corresponding parsing module based on the identification result. The firmware preprocessing module includes a format identification unit, an architecture identification unit, and a parsing module loading unit. The format identification unit identifies the firmware format type by analyzing the firmware header information, magic number field, and structural features. The architecture identification unit identifies the processor instruction set architecture adapted to the firmware. The parsing module loading unit dynamically loads the parsing module that matches the identification result, using a dynamic link library encapsulation and supporting a plugin registration mechanism. The feature recognition module is used to extract firmware header information, kernel code and file system features, determine firmware type, and quickly locate key components based on firmware type. The firmware extraction module is used to parse the firmware structure and extract the bootloader, kernel code and root file system information based on the firmware type and the location of the key components. The vulnerability scanning module is used to identify known and unknown vulnerabilities based on fingerprint matching, behavioral modeling, and multimodal program semantic modeling methods. During the scanning process, the scanning strategy is dynamically optimized. The vulnerability scanning module includes a known vulnerability identification unit, a behavioral modeling detection unit, a multimodal semantic modeling unit, and a scanning strategy optimization unit. Specifically, the known vulnerability identification unit performs fingerprint identification using signature rules from a vulnerability feature database; the behavioral modeling detection unit constructs a finite state machine model to identify abnormal behavior patterns; the multimodal semantic modeling unit performs cross-architecture semantic modeling through instruction sequence semantic encoding and graph structure semantic encoding; and the scanning strategy optimization unit dynamically adjusts the scanning strategy according to the firmware architecture and module complexity. The vulnerability remediation suggestion module is used to generate a vulnerability list sorted by risk level and provide specific remediation suggestions based on the environment.
2. The multi-platform firmware analysis and vulnerability scanning system of claim 1, wherein, The data acquisition module includes a sample access unit, a legality verification unit, and a metadata management unit. The sample access unit uses three access methods: local file upload, remote device firmware retrieval, and batch import via API interface, to access multi-source firmware samples of the product. The legality verification unit verifies sample integrity based on MD5 / SHA256 hash values, filters invalid non-firmware files, and matches trusted source whitelists. The metadata management unit extracts device model, manufacturer information, and firmware version metadata and enters them into the system.
3. The multi-platform firmware analysis and vulnerability scanning system of claim 1, wherein, The feature recognition module includes a header information extraction unit, a kernel feature analysis unit, a file system recognition unit, and a component location and type determination unit. The header information extraction unit extracts key fields such as version number, checksum, and load base address. The kernel feature analysis unit extracts function entry addresses and call relationship kernel code features based on static analysis. The file system recognition unit identifies the file system type and structure based on storage structure features and metadata. The component location and type determination unit locates key components of the bootloader and driver module, and determines the firmware type.
4. The multi-platform firmware analysis and vulnerability scanning system of claim 1, wherein, The firmware extraction module includes a structure parsing unit, a bootloader extraction unit, a kernel module extraction unit, and a file system extraction unit. The structure parsing unit is used to determine the dependencies and layout of each module. The bootloader extraction unit is used to extract the startup code and core functions of the initialization process. The kernel module extraction unit is used to generate code through disassembly and construct a control flow graph. The file system extraction unit is used to construct a complete root file system structure and output it in a structured manner.
5. The multi-platform firmware analysis and vulnerability scanning system of claim 1, wherein, The vulnerability remediation suggestion module includes a risk level classification unit, a remediation solution generation unit, and a report output unit. The risk level classification unit is used to classify vulnerabilities into high, medium, and low risks and generate a structured list. The remediation solution generation unit is used to provide patch update and configuration hardening suggestions based on the deployment environment. The report output unit is used to output a standardized vulnerability assessment report.
6. A method for use in a multi-platform firmware parsing and vulnerability scanning system according to any one of claims 1-5, characterized in that, The method includes: Obtain multi-source firmware sample data for the product; By receiving and identifying the format and processor architecture of the multi-source firmware sample data, the corresponding parsing module is dynamically loaded. Based on the parsing module, firmware header information, kernel code and file system features are extracted to determine the firmware type. Based on the firmware type, the location of key components is quickly located, the firmware structure is parsed and the bootloader, kernel code and root file system information are extracted, and the firmware scanning strategy is output. Based on the bootloader, kernel code, and root file system information, fingerprint matching, behavioral modeling, and multimodal program semantic modeling methods are used to identify known and unknown vulnerabilities in firmware samples, and the firmware scanning strategy is dynamically optimized during the scanning process. Based on the known and unknown vulnerabilities in the firmware sample, a vulnerability list sorted by risk level is generated, and specific remediation suggestions are provided in conjunction with the optimized firmware scanning strategy and environment.
7. An electronic device, comprising: It includes a memory, a processor, and a computer program stored on the memory and executable on the processor. The processor executes the program to implement the multi-platform firmware parsing and vulnerability scanning method of claim 6.
8. A computer readable storage medium having stored thereon a computer program or instructions, characterized in that, When a computer program or instruction is executed, it implements a multi-platform firmware parsing and vulnerability scanning method as claimed in claim 6.