Intelligent booting apparatus and method of a computer system

By using out-of-band monitoring and multi-level redundant boot paths in the baseboard management controller, the problem of manual fault recovery caused by the single boot path of the computer system is solved, realizing automated fault switching and self-healing recovery, and improving the reliability and automation of system startup.

CN122240194APending Publication Date: 2026-06-19联想长风科技(北京)有限公司

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
联想长风科技(北京)有限公司
Filing Date
2026-03-20
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

The existing computer system's startup fault tolerance is insufficient, which means that recovery from startup failures relies on manual intervention and cannot meet the requirements of high availability and automated operation and maintenance.

Method used

The out-of-band monitoring module of the baseboard management controller generates a startup status identifier, and combined with multi-level redundant startup paths and independent repair partitions, it realizes intelligent decision-making and automatic fault switching, and builds a three-level fault tolerance mechanism.

Benefits of technology

It improved the reliability and automation of system startup, shortened fault recovery time, and reduced business offline time.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122240194A_ABST
    Figure CN122240194A_ABST
Patent Text Reader

Abstract

This application provides an intelligent boot device and method for a computer system, relating to the field of computer technology. The device includes: a baseboard management controller with an out-of-band monitoring module that monitors the boot process and generates a boot status identifier; a boot control unit that reads the identifier; a multi-level redundant storage partition including primary and backup boot path partitions and an independent repair partition; and a boot control unit that selects a boot device from the primary, backup, or repair partition based on the identifier. This application solves the technical problem of traditional server single-path boot mode, where damage to critical boot components leads to system boot failure, and the recovery process relies on manual intervention. It achieves a three-level fault-tolerance mechanism by constructing primary and backup dual paths and an independent repair partition, and utilizes out-of-band monitoring to achieve automatic fault detection and intelligent boot path switching, thereby improving the reliability and automation of server system booting and reducing business offline time.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of computer technology, and more specifically to an intelligent startup device and method for a computer system. Background Technology

[0002] In modern data centers, cloud computing platforms, and enterprise-level server applications, the continuous and stable operation of computer systems is a crucial foundation for ensuring business continuity. As a prerequisite for operating system operation and business loading, the reliability of the system startup process directly affects whether the server can provide services normally. Therefore, ensuring system recoverability even when startup failures occur has become a key issue in computer system reliability design.

[0003] Existing computer systems typically employ a single-path boot architecture, where the BIOS or UEFI firmware loads the bootloader, operating system kernel, and initialization files from a fixed EFI partition and BOOT partition in a preset order, ultimately completing the system boot process. This approach is relatively simple and can meet basic boot requirements under normal circumstances, thus it is widely used in existing servers, workstations, and other computing devices.

[0004] However, existing technologies generally suffer from insufficient fault tolerance in the boot process. When the bootloader in the EFI partition is corrupted, the kernel file in the BOOT partition is missing, the boot configuration is incorrect, the file system is abnormal, or bad sectors appear on the storage medium, the system often cannot continue the boot process and thus falls into a boot failure state. Especially in unattended or remotely deployed scenarios, once the above failures occur, maintenance personnel usually need to manually troubleshoot and repair using USB drives, CDs, or network recovery environments. This not only makes the recovery process complex but also significantly prolongs the business interruption time, making it difficult to meet the application requirements of high availability and automated operation and maintenance. On the other hand, although existing servers are usually equipped with out-of-band management modules such as the Baseboard Management Controller (BMC) for device status monitoring, remote power on / off, and hardware management, such out-of-band management capabilities are mostly limited to hardware health monitoring and have not yet been deeply integrated with the system boot process. They cannot intelligently select an alternative boot path or enter a repair environment based on monitoring results when boot anomalies occur.

[0005] In summary, existing technologies, due to their reliance on a single, vulnerable startup path and the high dependence on manual intervention for fault recovery, have become a bottleneck in improving system reliability. There is an urgent need for a new startup architecture that can achieve automated fault detection, intelligent path switching, and rapid self-healing to fundamentally solve the single point of failure problem and achieve high availability and automated operation and maintenance of the server startup process. Summary of the Invention

[0006] This application provides an intelligent boot device and method for a computer system. The key feature is that, addressing the technical obstacles of system booting in data center servers and cloud computing environments—characterized by easily damaged critical components, strong single-point dependence in the boot chain, and reliance on manual fault recovery—resulting in difficult automatic recovery from boot failures and long service interruptions, the application utilizes a boot status identifier generation and parsing mechanism based on out-of-band monitoring of the baseboard management controller and BIOS / UEFI linkage. This, combined with a multi-level redundant boot path and a layered switching process for independent repair partitions, enables intelligent decision-making, automatic fault switching, and self-healing recovery during the system boot process, thereby improving boot reliability and shortening recovery time.

[0007] This application provides an intelligent startup device for a computer system, the device comprising: The system includes a baseboard management controller with an out-of-band monitoring module that monitors the computer system startup process and generates a startup status identifier from the monitoring results; a startup control unit, comprising BIOS or UEFI firmware, which is communicatively connected to the baseboard management controller and configured to read the startup status identifier; and a multi-level redundant storage partition, comprising a primary boot path partition, a backup boot path partition, and a repair partition independent of the boot path. The startup control unit is configured to select a corresponding partition from the primary boot path partition, backup boot path partition, or repair partition based on the startup status identifier to perform a boot operation.

[0008] This application also provides a smart startup method for a computer system, the method comprising: The out-of-band monitoring module of the baseboard management controller monitors the computer system's boot process and generates and stores a boot status identifier based on the monitoring results. When the system boots, the BIOS or UEFI firmware reads the boot status identifier stored in the baseboard management controller. Based on the read boot status identifier, the BIOS or UEFI firmware selects one of the preset multi-level redundant storage partitions—the primary boot path partition, the backup boot path partition, and the repair partition—to perform the boot operation.

[0009] One or more technical solutions provided in this application have at least the following technical effects or advantages: This application provides an intelligent startup device and method for a computer system, relating to the field of server hardware technology. It solves the technical problem that in the traditional single-path startup mode of servers, the failure of critical startup components leads to system startup failure, and the recovery process relies on manual intervention. It achieves the technical effect of improving the reliability and automation of server system startup and reducing business offline time by constructing a three-level fault tolerance mechanism with primary and backup dual paths and independent repair partitions, and by using out-of-band monitoring to realize automatic fault detection and intelligent switching of startup paths. Attached Figure Description

[0010] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0011] Figure 1 This is a schematic diagram of the structure of an intelligent startup device for a computer system according to this application.

[0012] Figure 2 This is a flowchart illustrating an intelligent startup method for a computer system according to this application.

[0013] Figure 3 This is a schematic diagram of multi-path redundancy startup for a computer system intelligent startup method according to this application.

[0014] Explanation of reference numerals in the attached figures: Baseboard management controller 11, Startup control unit 12, Multi-level redundant storage partition 13. Detailed Implementation

[0015] To further illustrate the technical means and effects of the present invention in achieving its intended purpose, the following detailed description of the specific implementation methods, structures, features and effects of the present invention, in conjunction with the accompanying drawings and preferred embodiments, is provided below.

[0016] Example 1, as Figure 1 As shown, this application provides an intelligent startup device for a computer system, the device comprising: The baseboard management controller 11 has an out-of-band monitoring module, which monitors the computer system startup process and generates a startup status identifier based on the monitoring results.

[0017] Specifically, the baseboard management controller 11, as an out-of-band management unit operating independently of the main processor and operating system, has an internal out-of-band monitoring module. This module starts working immediately after the computer system is powered on and monitors the entire boot process by collecting and analyzing key status signals. These key status signals include, but are not limited to, BIOS or UEFI execution phase information, boot loader execution status, kernel loading progress, system heartbeat signals, and phase completion markers within a preset time window. During monitoring, if the boot process progresses normally to the operating system running state within a preset time, the boot is considered successful. If an error message, abnormal interruption, or failure to enter the next boot phase within a preset time threshold is detected, the boot is considered abnormal or timed out. Based on these determinations, the baseboard management controller 11 processes the monitoring data and generates a corresponding boot status identifier. This boot status identifier characterizes the result or fault type of the boot and is written to the baseboard management controller 11's non-volatile memory or a preset storage location connected to it for retrieval by the boot control unit 12 during the next system boot, thus providing a basis for subsequent boot path selection.

[0018] Furthermore, the startup status identifier is stored in the non-volatile memory of the baseboard management controller 11, or in a storage location communicatively connected to the baseboard management controller 11. The storage location communicatively connected to the baseboard management controller 11 includes: a preset location on the system disk, a dedicated configuration chip, and a remote management platform.

[0019] Specifically, the boot status flag is written to a persistent data storage medium after generation to ensure reliable retrieval even after system power failure, restart, or abnormal reset. Preferably, the boot status flag is stored in a non-volatile memory integrated within the baseboard management controller 11, such as EEPROM, Flash, or FRAM. This memory is directly accessed via the baseboard management controller 11's internal bus and can be read and written according to a preset address or data structure, thus ensuring the stability and real-time performance of data access. Alternatively, the boot status flag can be stored in an external storage location communicatively connected to the baseboard management controller 11. This external storage location includes, but is not limited to, a preset location on the system disk, a dedicated configuration chip, or a remote management platform. The preset location on the system disk is a preset storage area within the system disk, such as a reserved partition, hidden partition, or a specific logical block address (LBA) area on the disk. The baseboard management controller 11 accesses this area through a low-level bus interface, such as I2C, SPI, SMBus, or a PCIe bridge interface, and writes and writes the boot status flag according to an agreed-upon data format, such as a fixed offset address or key-value pair structure. The system reads data from a pre-configured storage chip on the motherboard, such as a separate EEPROM or a security configuration chip. This chip stores critical system status information. The baseboard management controller 11 interacts with it via a dedicated communication interface to ensure reliable storage and tamper-proof protection of the startup status identifier. The remote management platform is a centralized server management platform. The baseboard management controller 11 encapsulates the startup status identifier into a data packet via a network interface and uploads it to the remote management server for persistent storage. Simultaneously, it retrieves the corresponding status identifier via a network protocol during system startup, enabling data sharing and synchronization across devices or in centralized management scenarios. By combining or selecting these various storage methods, the reliability and redundancy of the startup status identifier are improved, and the system's adaptability to different hardware architectures and application scenarios is enhanced. This ensures that the startup control unit 12 can stably and accurately obtain the identifier and execute startup path decisions accordingly.

[0020] The startup control unit 12 includes BIOS or UEFI firmware, is communicatively connected to the baseboard management controller 11, and is configured to read the startup status identifier.

[0021] Specifically, the boot control unit 12, as the core control module in the system boot process, is typically implemented by the BIOS or UEFI firmware embedded in the motherboard storage medium. It is executed first after the computer system is powered on or reset, and is used to complete hardware initialization, self-test, and control of subsequent boot processes. This boot control unit 12 establishes a communication connection with the baseboard management controller 11, for example, through an IPMI interface, SMBus, I2C bus, or other management communication channels, to achieve data exchange between the two. During the system boot process, at preset boot stages, such as after completing basic hardware initialization or before selecting a boot device, the boot control unit 12 actively accesses the baseboard management controller 11 or its associated storage location, reads the boot status identifier written by the baseboard management controller 11, and parses it to identify the execution result or exception type of the previous boot process. After reading and parsing the boot status identifier, the boot control unit 12 uses it as an important basis for boot decisions, subsequently selecting the corresponding boot path or executing the corresponding boot strategy. Through the above method, information linkage between the out-of-band management unit and the boot firmware is achieved, enabling the system to perform intelligent boot control based on historical boot states.

[0022] Furthermore, the multi-level redundant storage partition 13 is constructed at the physical layer of the system disk. The primary boot path partition includes a primary EFI system partition and a primary BOOT file partition. The backup boot path partition includes a backup EFI system partition and a backup BOOT file partition. The EFI system partition is used to store the bootloader, and the BOOT file partition is used to store the operating system kernel and the initialization memory disk. The repair partition is an independent file system partition, which contains a pre-installed bootloader, operating system kernel files, and repair toolset.

[0023] Specifically, the multi-level redundant storage partition 13 is formed during the system disk manufacturing or system deployment phase by pre-planning the physical storage space using partitioning tools. It is typically divided sequentially according to a preset capacity ratio into a primary EFI system partition, a primary BOOT file partition, a backup EFI system partition, a backup BOOT file partition, and a repair partition. Each partition is physically independent and is identified and managed through a partition table. The primary EFI system partition preferably uses the FAT32 file system format and contains a UEFI-compliant bootloader (such as a GRUB EFI file or Boot Manager) and its configuration files. The primary BOOT file partition stores the operating system kernel file (such as vmlinuz), the initialization memory disk file (such as initramfs), and the configuration parameters required for booting. The primary EFI system partition and the primary BOOT file partition are associated through the boot configuration file, thus forming a complete primary boot path. The backup boot path partition is constructed in the same way as the primary boot path, including a backup EFI system partition and a backup BOOT file partition. Their file system type, directory structure, and file content are consistent with or compatible with the primary boot path. In practical implementation, during system deployment, the contents of the primary boot path can be copied to the backup boot path via mirroring, or during system operation, a scheduled task can be used to synchronously update critical files in the primary and backup paths to ensure the availability of the backup path when the primary path fails. The repair partition is an independently configured bootable file system partition, with its capacity configured according to actual needs, preferably using the FAT32 file system format. An independent boot loader and its configuration are pre-written in this partition, enabling it to be directly recognized and booted by the BIOS or UEFI without relying on the primary or backup boot paths. Simultaneously, the repair partition contains a pre-installed minimal operating system environment and integrates a repair toolset and automated repair scripts. During actual operation, booting is performed from the primary boot path partition by default. When the boot process corresponding to the primary boot path fails or is determined to be abnormal, booting is automatically switched to the backup boot path partition. When the backup boot path also fails to boot, further booting is performed to the repair partition, entering the repair environment and executing a preset repair process. Through the multi-level redundant partition structure and its supporting mechanisms built at the physical layer, layered fault tolerance and automatic recovery capabilities of the boot chain are achieved, significantly improving the reliability and maintainability of system startup.

[0024] The boot control unit 12 is configured to select a corresponding partition from the primary boot path partition, the backup boot path partition, or the repair partition to perform a boot operation based on the boot status identifier.

[0025] Specifically, after the system is powered on or reset, the boot control unit 12 executes an initialization process and obtains a boot status identifier provided by the baseboard management controller 11 during a preset boot phase. The boot control unit 12 parses this boot status identifier to determine the execution result of the previous system boot process. Based on the parsed boot status information, the boot control unit 12 selects the corresponding boot path partition from the multi-level redundant storage partition 13 to perform the boot operation according to the preset boot decision logic. When the boot status identifier indicates that the previous boot process was completed normally, the primary boot path partition is selected first. The boot loader in the primary EFI system partition is called to load the operating system kernel and initialize the memory disk in the primary BOOT file partition to complete the system boot. When the boot status identifier indicates that the primary boot path is abnormal or has failed to boot, the primary boot path is skipped, and the backup boot path partition is selected to perform the boot, so as to realize the automatic switching of the boot path. When the boot status identifier further indicates that the backup boot path also cannot complete the boot or is abnormal, the repair partition is selected, and the repair environment is entered through its built-in boot loader. Through the above mechanism, the system can make intelligent decisions based on historical startup states and automatically complete the startup path selection without human intervention, thereby improving the reliability and fault tolerance of system startup.

[0026] Example 2, as described above, refers to Figure 1 A smart startup device for a computer system according to an embodiment of the present invention has been described in detail. Next, reference will be made to... Figure 2 , Figure 3 A smart startup method for a computer system according to an embodiment of the present invention is described, the method comprising: The out-of-band monitoring module of the baseboard management controller 11 monitors the startup process of the computer system and generates and stores a startup status identifier based on the monitoring results.

[0027] Specifically, after the computer system is powered on or reset, the baseboard management controller 11, as an independently operating out-of-band management unit, starts before or simultaneously with the main system. Its internal out-of-band monitoring module is activated to continuously monitor the entire startup process. This out-of-band monitoring module monitors various startup-related signals, including but not limited to the BIOS or UEFI execution phase status, bootloader loading status, operating system kernel loading progress, system heartbeat signals, and completion markers for each phase. It then uses preset time thresholds to determine the execution status of each phase. When the system successfully starts from the main boot path within a preset time window, the startup is considered successful; when startup fails from the main or backup boot path, it is considered an abnormal startup. Based on these determinations, the baseboard management controller 11 generates a corresponding startup status identifier, which characterizes the startup result type. The generated startup status identifier is then persistently saved in a preset storage location. Through this process, out-of-band perception, result determination, and status recording of the computer system's startup status are achieved, providing a basis for the subsequent startup control unit 12 to select the startup path based on this status identifier.

[0028] Furthermore, a startup status identifier is generated and stored based on the monitoring results, including: When a successful boot from the primary boot path is detected, a boot status identifier with a first value is generated and stored; when a boot failure from the primary boot path is detected, a boot status identifier with a second value is generated and stored; when a boot failure from the backup boot path is detected, a boot status identifier with a third value is generated and stored.

[0029] Specifically, during the continuous monitoring of the system startup process by the out-of-band monitoring module of the baseboard management controller 11, different startup status flags are generated based on the execution results of different startup paths. Specifically, when the system powers on or initializes, the startup status flag is initially set to empty or initialized to an initial value of 0, indicating that no startup exception has occurred or the previous startup has been successfully completed. When the system starts according to the main startup path, the out-of-band monitoring module monitors the startup process in real time. If it detects that the system successfully completes the entire process from boot loading to normal operating system operation within a preset time window, the startup status flag is maintained at the first value of 0 or no exception flag is written, indicating that the main startup path has started successfully. When an exception is detected during the startup process of the main startup path, such as bootloader execution failure, kernel loading error, or failure to complete the startup process within a preset time threshold, it is determined that the main startup path has failed. At this time, the baseboard management controller 11 updates the startup status flag and writes it to the second value of 1, indicating that the main startup path is unavailable. After the system switches to the backup boot path based on the identifier, the out-of-band monitoring module continues to monitor the boot process of the backup boot path. If the backup boot path also experiences a boot anomaly or fails to complete the boot process within a timeout period, it is determined that the backup boot path has failed to boot. The baseboard management controller 11 further updates the boot status identifier to a third value 2 to indicate that both the primary and backup boot paths have failed. The aforementioned boot status identifier is stored in a preset storage location and is read by the boot control unit 12 during subsequent system boot processes to serve as the basis for selecting the primary boot path, the backup boot path, or entering the repair partition, thereby realizing hierarchical switching and fault-tolerant control of the boot path.

[0030] Furthermore, a startup status identifier is generated and stored based on the monitoring results, including: The startup process of the computer system is monitored to identify startup abnormal states and startup timeout states. The startup abnormal state refers to an event in which a fatal error is reported during the startup process, and the startup timeout state refers to an event in which the startup process fails to advance to the next stage within a preset time threshold. Based on the identified startup abnormal states and startup timeout states, corresponding startup status identifiers are generated and stored.

[0031] Specifically, after the computer system is powered on or reset, the out-of-band monitoring module in the baseboard management controller 11 begins to monitor the boot process in stages. The boot process is typically divided into the firmware initialization stage, the bootloader execution stage, the operating system kernel loading stage, and the operating system takeover stage. The out-of-band monitoring module collects and judges the status signals, stage completion flags, and stage progression time corresponding to each stage, thereby refining the monitoring granularity to the point where it can distinguish between boot anomalies and boot timeouts. Specifically, in the process of identifying boot anomalies, the out-of-band monitoring module focuses on detecting whether fatal error reporting events occur during the boot process. These fatal error reporting events include, but are not limited to, BIOS or UEFI self-test failures, bootloader failures, corrupted boot configuration files, kernel image failures to load, missing initialization memory disks, file system mount failures, and explicit error codes or error log information output by the system. Once any of the above error events that prevent the boot process from continuing are detected, the baseboard management controller 11 determines the current state as a boot anomaly. In the process of identifying boot timeouts, the out-of-band monitoring module also sets a corresponding preset time threshold for each boot stage and continuously judges whether the system progresses to the next stage within the time threshold. If no preset stage completion flag, stage transition signal, or system heartbeat change is detected within the corresponding time threshold in the current stage, the system is determined to have experienced a startup timeout in that stage. For example, if no kernel loading start signal is detected within a preset time after the bootloader executes, the boot stage is determined to have timed out; if no heartbeat signal indicating that the operating system has entered the running state is detected within a preset time after the kernel is loaded, the kernel startup stage is determined to have timed out. After identifying a startup anomaly or startup timeout, the baseboard management controller 11 generates a corresponding startup status identifier based on different event types and writes the startup status identifier to a preset storage location. This startup status identifier includes a status category field, a stage identifier field, a timestamp field, and a verification field. The status category field is used to distinguish between anomaly and timeout types; the stage identifier field is used to indicate whether the fault occurred in the firmware initialization stage, boot loading stage, kernel loading stage, or system takeover stage; the timestamp field and verification field are used for subsequent status tracing and data integrity verification. Through the aforementioned phased and fine-grained monitoring and identification methods, the baseboard management controller 11 can not only determine whether the system has failed to start, but also further distinguish whether the failure is due to an abnormal startup event or a startup timeout event, thereby providing a basis for the subsequent startup control unit 12 to execute more accurate path switching and repair decisions.

[0032] When the system starts, the BIOS or UEFI firmware reads the startup status identifier stored in the baseboard management controller 11.

[0033] Specifically, after the computer system powers on or resets, the BIOS or UEFI firmware, acting as the boot control unit 12, is executed first. After completing basic hardware initialization and self-test procedures, it accesses the baseboard management controller 11 or its associated storage location during the preset boot phase. It obtains the boot status identifier by calling preset communication protocol instructions or reading data from a pre-defined address. Upon obtaining the boot status identifier, the BIOS or UEFI firmware parses it to determine whether the previous boot process was successful or what type of exception occurred. The parsing result serves as a crucial basis for subsequent boot process control, determining whether to continue executing the default boot path, switch to an alternative path, or enter a repair process. Through this method, the boot control unit 12 reads and utilizes the boot status recorded by the out-of-band management unit, thereby enabling the system boot process to make intelligent decisions based on historical states.

[0034] The BIOS or UEFI firmware selects one of the primary boot path partition, backup boot path partition, and repair partition from the preset multi-level redundant storage partition 13 to perform the boot operation based on the boot status identifier read.

[0035] Specifically, after the BIOS or UEFI firmware completes the reading and parsing of the boot status identifier, it determines the boot status identifier according to the pre-set boot decision rules, and selects the corresponding boot path from the preset multi-level redundant storage partition 13 to perform the boot operation. This multi-level redundant storage partition 13 includes at least a primary boot path partition, a backup boot path partition, and a repair partition, each partition corresponding to a different boot target in the boot device list. When the boot status identifier indicates that the previous boot was successful or no anomaly was detected, the BIOS or UEFI firmware selects the primary boot path partition according to the default priority, loads the boot loader in the primary EFI system partition, and further reads the operating system kernel and initialization memory disk files in the primary BOOT file partition to complete the system boot; when the boot status identifier indicates that the primary boot path failed to boot, the BIOS or UEFI firmware adjusts the boot order or directly specifies the boot target, skips the primary boot path partition, and selects the backup boot path partition to perform the boot, thereby realizing automatic switching of the boot path; when the boot status identifier further indicates that the backup boot path also failed to boot, the BIOS or UEFI firmware selects the repair partition as the boot target and enters an independent repair environment through its built-in boot loader. In practice, the path selection process can be achieved by modifying the UEFI boot option, dynamically updating the BootOrder variable, or directly calling the boot file path in the specified partition (such as \EFI\BOOT\BOOTX64.EFI). This path selection mechanism based on boot status identifiers allows the system to complete a step-by-step switch from the primary path to the backup path and then to the repair partition without manual intervention, thereby improving the reliability and fault tolerance of the boot process.

[0036] Furthermore, the BIOS or UEFI firmware, based on the read boot status identifier, selects one of the following from the primary boot path partition, backup boot path partition, and repair partition of the preset multi-level redundant storage partition 13 to perform a boot operation, including: When the boot status identifier is a first value, the operating system is booted from the primary boot path partition according to the primary boot path; when the boot status identifier is a second value, the operating system is booted from the backup boot path partition according to the backup boot path; when the boot status identifier is a third value, the system is guided to the repair partition for system boot repair.

[0037] Specifically, after the BIOS or UEFI firmware reads and parses the boot status flag, it executes the corresponding boot path selection and boot process based on the flag's value. When the boot status flag is the first value 0 or not set, it indicates that the previous system boot was successful or there is no current error record. The BIOS or UEFI firmware executes according to the default boot policy, booting from the primary boot path partition. That is, it loads the boot loader in the primary EFI system partition and locates the operating system kernel and initialization memory disk file in the primary BOOT file partition according to its configuration file, then completes the kernel loading and system boot process. When the boot status flag is the second value 1, it indicates that the primary boot path failed during the previous boot process. The BIOS or UEFI firmware skips the primary boot path partition according to a preset policy and selects the backup boot path partition to execute the boot. That is, it calls the boot loader in the backup EFI system partition and loads the kernel and initialization memory disk in the backup BOOT file partition, thus completing the operating system boot according to the backup boot path, achieving automatic switching and fault tolerance for primary path failures. When the boot status flag is the third value 2, it indicates that neither the primary boot path nor the backup boot path can complete a normal boot. The BIOS or UEFI firmware further selects the repair partition as the boot target and loads the pre-installed bootloader in the repair partition, entering an independent repair system environment. In this environment, a preset repair process is executed or repair tools are invoked to detect and repair the bootloader, kernel files, or configuration data in the primary boot path partition and the backup boot path partition, in order to restore the system's normal boot capability. Through the above-mentioned hierarchical boot mechanism based on different boot status flag values, an automated process from normal boot to fault switch and fault repair is achieved, thereby improving the reliability and self-healing capability of system boot.

[0038] Furthermore, the repair partition contains a pre-installed automatic repair script, and the repair partition is configured to record the timestamp of each entry. Based on the timestamp, when the time interval between the current entry and the previous entry is less than a preset threshold, the execution of the automatic repair script is paused and manual intervention is prompted. The automatic repair script includes one or more of the following: reinstalling the bootloader to the primary EFI system partition and the backup EFI system partition, repairing the boot configuration file, synchronizing boot files from the backup boot path partition, checking and repairing the root file system, and updating the system's file system mount table fstab.

[0039] Specifically, after the BIOS or UEFI firmware boots the system into the repair partition, the pre-installed minimal operating system environment within the repair partition is loaded and run. Simultaneously, the execution flow of the automatic repair script is automatically triggered. Each time the repair partition is booted into, it first obtains the current time from the system clock and records this time as the timestamp of the current entry into a preset storage location. It also reads the historical timestamp recorded during the previous entry into the repair partition. Subsequently, the time interval between the current and previous timestamps is calculated and compared with a preset time threshold (e.g., 20 minutes). If the time interval is less than the preset threshold, it is determined that the system has repeatedly entered repair mode within a short period, indicating that automatic repair may have failed to effectively resolve the problem or that a complex fault exists. At this point, the execution of the automatic repair script is paused, and prompts such as console pop-ups and remote management alarms are output to guide maintenance personnel to perform manual intervention. When the time interval is greater than or equal to a preset threshold, the automatic repair process is deemed permitted. At this point, the automatic repair script is initiated. This script sequentially performs multiple repair operations according to preset steps, including but not limited to: reinstalling the boot loader to the primary and backup EFI system partitions; repairing or rebuilding the boot configuration file; synchronizing boot files from the backup boot path partition to the primary boot path partition; performing consistency checks and repairs on the root file system; and updating the system's file system mount table (fstab) to ensure that each partition can be correctly mounted. During the execution of the repair script, the result of each operation is recorded, and a repair status flag is generated after all repair operations are completed for subsequent determination of repair success. This timestamp-based automatic repair control mechanism not only enables automatic recovery from common boot failures but also avoids the system repeatedly performing ineffective repairs within a short period, thereby improving system stability and operational efficiency.

[0040] Furthermore, after guiding to the repair partition, the process also includes: Identify the repair execution status of the automatic repair script; when the repair execution status is "repair successful", after the repair operation is completed, clear the startup status flag stored in the baseboard management controller 11 so that the normal process of booting from the main startup path can be restored on the next startup.

[0041] Specifically, after the system boots into the repair partition and executes the automatic repair script, the repair environment monitors the script's execution status and determines the results. Specifically, when the automatic repair script performs various repair operations, it sets an execution status return value or status flag for each critical step. When all critical operations in the automatic repair script return a success status and no abnormal error information is detected, the repair execution status is determined to be successful. If any critical step fails or an error is detected, the repair is determined to have failed or not completed. After determining that the repair execution status is successful, the repair environment sends a status update command to the baseboard management controller 11 through a preset communication interface, or directly accesses its storage area, to clear or reset the previously recorded boot status flag to its initial value. This is used to eliminate previously recorded boot failure states and restore the system state to the normal boot initial state. After completing the above clearing operation, a restart process is triggered. During the next system boot process, when the BIOS or UEFI firmware reads the cleared or reset boot status flag, it will preferentially execute the boot from the main boot path partition according to the default strategy, thereby restoring the normal boot process. Through the above mechanism, closed-loop management of the status after successful repair is achieved, avoiding the system from continuously entering the backup path or repair mode due to residual failure flags, thereby ensuring that the system can return to normal operation after the repair is completed.

[0042] Furthermore, the multi-level redundant storage partition 13 is not limited to two sets, but also includes three or more boot path partitions, forming an N+1 or N+M redundancy mode; the repair partition is not limited to a physical storage partition, but also includes one or more of the following: hidden partition, digitally signed secure area, and pre-boot execution environment PXE booted via network.

[0043] Specifically, the number of multi-level redundant storage partitions 13 is not limited to two sets of structures: a primary boot path and a backup boot path. In practical applications, it can be expanded to three or more independent boot path partitions according to system reliability requirements, thus forming an N+1 or N+M redundancy mode. Here, N represents the number of normally used boot paths, and +1 or +M represents the number of additional redundant backup paths. Each boot path is physically isolated from each other, and each set contains a complete EFI system partition and a BOOT file partition. The bootloader, kernel files, and related configurations stored inside can be kept consistent or compatible through mirroring or synchronization mechanisms. Therefore, when any path fails, the system can switch to other available paths sequentially or according to a preset strategy to continue booting, further improving boot fault tolerance. The implementation of the repair partition is not limited to a single physical partition. It can be a hidden or protected partition on the system disk, invisible to the operating system or accessible only under specific conditions to prevent accidental operation or damage. Alternatively, the repair partition can be constructed as a secure storage area protected by a digital signature, ensuring the integrity and trustworthiness of the repair environment through signature verification, thereby preventing malicious tampering. Furthermore, the repair partition can also utilize network booting to implement the repair environment; that is, by loading the repair system image from a remote server through a pre-boot execution environment (PXE), the system can still enter repair mode even when local storage is unavailable. Through these various extension methods, the multi-level redundant storage partition structure possesses excellent flexibility and scalability, adapting not only to computer systems of different sizes and reliability requirements but also providing a higher level of boot assurance and recovery capabilities under various failure scenarios.

[0044] In summary, the intelligent startup device for a computer system provided in this application embodiment has at least the following technical effects: by constructing a three-level fault tolerance mechanism with primary and backup dual paths and independent repair partitions, and by using out-of-band monitoring to achieve automatic fault detection and intelligent switching of startup paths, the reliability and automation of server system startup are improved, and the business offline time is reduced.

[0045] Obviously, those skilled in the art can make various modifications and variations to this application without departing from the spirit and scope of this application. Therefore, if such modifications and variations fall within the scope of this application and its equivalents, this application also intends to include such modifications and variations.

Claims

1. An intelligent startup device for a computer system, characterized in that, include: A baseboard management controller, which has an out-of-band monitoring module, monitors the computer system startup process through the out-of-band monitoring module and generates a startup status identifier from the monitoring results; A boot control unit, comprising BIOS or UEFI firmware, is communicatively connected to the baseboard management controller and configured to read the boot status identifier; Multi-level redundant storage partitions, including a primary boot path partition, a backup boot path partition, and a repair partition independent of the boot path; The boot control unit is configured to select a corresponding partition from the primary boot path partition, the backup boot path partition, or the repair partition to perform a boot operation based on the boot status identifier.

2. The intelligent startup device for a computer system according to claim 1, characterized in that, The multi-level redundant storage partition is constructed at the physical layer of the system disk. The primary boot path partition includes a primary EFI system partition and a primary BOOT file partition. The backup boot path partition includes a backup EFI system partition and a backup BOOT file partition. The EFI system partition is used to store the boot loader program, and the BOOT file partition is used to store the operating system kernel and the initialization memory disk. The repair partition is an independent file system partition, which contains a bootloader, operating system kernel files, and a repair toolset.

3. The intelligent startup device for a computer system according to claim 1, characterized in that, The startup status identifier is stored in the non-volatile memory of the baseboard management controller, or in a storage location that is communicatively connected to the baseboard management controller. The storage location that is communicatively connected to the baseboard management controller includes: a preset location on the system disk, a dedicated configuration chip, and a remote management platform.

4. A smart startup method for a computer system, characterized in that, The method is applied to the intelligent startup device of any of the computer systems described in claims 1-3, comprising: The out-of-band monitoring module of the baseboard management controller monitors the startup process of the computer system and generates and stores a startup status identifier based on the monitoring results. When the system starts, the BIOS or UEFI firmware reads the boot status identifier stored in the baseboard management controller; The BIOS or UEFI firmware selects one of the preset multi-level redundant storage partitions—the primary boot path partition, the backup boot path partition, and the repair partition—to perform the boot operation based on the read boot status identifier.

5. The intelligent startup method for a computer system according to claim 4, characterized in that, A startup status identifier is generated and stored based on the monitoring results, including: When a successful boot from the main boot path is detected, a boot status identifier with the first value is generated and stored. When a failure to boot from the primary boot path is detected, a second boot status identifier is generated and stored. When a failure to boot from the alternative boot path is detected, a third-valued boot status identifier is generated and stored.

6. The intelligent startup method for a computer system according to claim 5, characterized in that, The BIOS or UEFI firmware selects one of the preset multi-level redundant storage partitions—the primary boot path partition, the backup boot path partition, and the repair partition—to perform a boot operation based on the read boot status identifier, including: When the boot status identifier is the first value, the operating system is booted from the primary boot path partition according to the primary boot path. When the boot status identifier is the second value, the operating system is booted from the backup boot path partition and started according to the backup boot path. When the startup status identifier is a third value, the system is guided to the repair partition for system startup repair.

7. The intelligent startup method for a computer system according to claim 6, characterized in that, The repair partition contains a pre-installed automatic repair script, and the repair partition is configured to record the timestamp of each entry. Based on timestamps, when the time interval between the current entry and the previous entry is less than a preset threshold, the execution of the automatic repair script is paused and manual intervention is prompted; The automatic repair script includes one or more of the following: reinstalling the bootloader to the primary EFI system partition and the backup EFI system partition, repairing the boot configuration file, synchronizing boot files from the backup boot path partition, checking and repairing the root file system, and updating the system's file system mount table fstab.

8. The intelligent startup method for a computer system according to claim 4, characterized in that, A startup status identifier is generated and stored based on the monitoring results, including: The startup process of a computer system is monitored to identify abnormal startup states and timeout startup states. An abnormal startup state refers to an event in which a fatal error is reported during the startup process, and a timeout startup state refers to an event in which the startup process fails to advance to the next stage within a preset time threshold. Based on the identified startup exception and startup timeout states, generate and store the corresponding startup status identifiers.

9. The intelligent startup method for a computer system according to claim 7, characterized in that, After booting to the repair partition, the following is also included: Identify the repair execution status of the automatic repair script; When the repair execution status is "repair successful", after the repair operation is completed, the startup status flag stored in the baseboard management controller is cleared so that the normal process of booting from the main startup path is restored on the next startup.

10. The intelligent startup method for a computer system according to claim 4, characterized in that, The multi-level redundant storage partitions are not limited to two sets, but also include three or more boot path partitions, forming an N+1 or N+M redundancy mode; the repair partition is not limited to a single physical storage partition, but also includes one or more of the following: hidden partitions, digitally signed secure areas, and pre-boot execution environments (PXE) booted via the network.