Method for firmware context preservation in event of unexpected power loss, and method for firmware context acquisition

By using a method that involves multiple processor cores to collaboratively generate and sort field sub-blocks, the problem of preserving firmware field data during abnormal power loss of storage devices is solved, enabling the retention of critical information and efficient fault location, thereby improving the debugging efficiency of storage devices.

WO2026124207A1PCT designated stage Publication Date: 2026-06-18MEMBLAZE TECH BEIJING

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
MEMBLAZE TECH BEIJING
Filing Date
2025-11-25
Publication Date
2026-06-18

AI Technical Summary

Technical Problem

When a storage device experiences an abnormal power outage, existing technologies struggle to effectively preserve the firmware state in a very short time, resulting in low efficiency in fault analysis and debugging. Furthermore, the limited backup power supply time restricts the application of online debugging methods.

Method used

After detecting abnormal power loss by multiple processor cores, a field sub-block is generated and sorted according to the probability of the abnormality, and stored in the persistent storage medium space to ensure the preservation of critical information, and to perform debugging and analysis when the power is restored next time.

🎯Benefits of technology

In the event of an abnormal power outage, it can retain critical information, improve the efficiency of fault location and debugging, and ensure that the storage device can perform a complete fault analysis when it is powered on again.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN2025137532_18062026_PF_FP_ABST
    Figure CN2025137532_18062026_PF_FP_ABST
Patent Text Reader

Abstract

The present disclosure relates to a method for firmware context preservation in the event of an unexpected power loss, and a method for firmware context acquisition. The method for firmware context preservation in the event of an unexpected power loss is applied to a storage device, and comprises: detecting an unexpected power loss; after a predetermined time following detection of the unexpected power loss, a control core among a plurality of processor cores detecting a power loss completion flag, and if the power loss completion flag is not present, the control core sending an interrupt to the plurality of processor cores; each of the plurality of processor cores acquiring a corresponding data sub-block on the basis of the interrupt to generate a context sub-block, so as to obtain a plurality of context sub-blocks; and the control core storing some or all of the plurality of context sub-blocks in a target area of a persistent storage medium space on the basis of a sorting result of sorting the plurality of context sub-blocks by the likelihood of anomaly occurrence.
Need to check novelty before this filing date? Find Prior Art

Description

Methods for saving and retrieving firmware state during abnormal power loss

[0001] This disclosure claims priority to two Chinese patent applications filed on December 11, 2024, with application number 2024118191948, entitled "Method and Apparatus for Saving Firmware Debugging Information During Abnormal Power Loss of Storage Device" and filed on January 10, 2025, with application number 2025100430582, entitled "Generating CoreDump File Based on Firmware Saved During Abnormal Power Loss," the entire contents of which are incorporated herein by reference. Technical Field

[0002] This disclosure relates to the field of computer technology, and in particular to a method for saving firmware state and obtaining firmware state during abnormal power loss. Background Technology

[0003] Figure 1 shows a block diagram of a solid-state storage device. The solid-state storage device 102 is coupled to a host computer to provide storage capabilities. The host computer and the solid-state storage device 102 can be coupled in various ways, including but not limited to connections via SATA (Serial Advanced Technology Attachment), SCSI (Small Computer System Interface), SAS (Serial Attached SCSI), IDE (Integrated Drive Electronics), USB (Universal Serial Bus), PCIe (Peripheral Component Interconnect Express), NVMe (NVM Express), Ethernet, Fibre Channel, and wireless communication networks. The host computer can be an information processing device capable of communicating with the storage device through the above methods, such as a personal computer, tablet computer, server, laptop computer, network switch, router, cellular phone, or personal digital assistant. Storage device 102 includes interface 103, control unit 104, one or more NVM chips 105, and DRAM (Dynamic Random Access Memory) 110.

[0004] NAND flash memory, phase change memory, FeRAM (Ferroelectric RAM), MRAM (Magnetic Random Access Memory), and RRAM (Resistive Random Access Memory) are common types of NVM.

[0005] Interface 103 is compatible with exchanging data with the host via methods such as SATA, IDE, USB, PCIe, NVMe, SAS, Ethernet, and Fibre Channel.

[0006] The control unit 104 is used to control data transfer between the interface 103, the NVM chip 105, and the DRAM 110. It is also used for memory management, host logical address to flash physical address mapping, erase leveling, bad block management, etc. The control unit 104 can be implemented in various ways, including software, hardware, firmware, or a combination thereof. For example, the control unit 104 can be in the form of an FPGA (Field Programmable Gate Array), an ASIC (Application Specific Integrated Circuit), or a combination thereof. The control unit 104 may also include a processor or controller, in which software executes to manipulate the hardware of the control unit 104 to process I / O (Input / Output) commands. The control unit 104 can also be coupled to the DRAM 110 and can access the data in the DRAM 110. FTL tables and / or cached I / O command data can be stored in the DRAM.

[0007] The control unit 104 includes a flash interface controller (or media interface controller, flash channel controller), which is coupled to the NVM chip 105 and issues commands to the NVM chip 105 in accordance with the interface protocol of the NVM chip 105 to operate the NVM chip 105, and receives the command execution results output from the NVM chip 105. Known NVM chip interface protocols include "Toggle", "ONFI", etc.

[0008] Typically, when a storage device experiences an external power outage, it needs to save the currently cached I / O data and various metadata used by the firmware (such as FTL tables) to the NVM chip. For this purpose, components such as capacitors are usually used as backup power to provide the storage device with a brief period of internal power. However, this time is often very short, and the amount of data that can be saved is limited. Therefore, the firmware needs a relatively complex abnormal power failure handling process to achieve the goal of saving all the necessary data within a short period.

[0009] However, the complex abnormal power-down handling process leads to a higher firmware failure rate. To detect and resolve these firmware failures, it is necessary to analyze the firmware's operational status at the time of the failure. However, during an abnormal power-down, the backup power supply can only provide power for a very short time, which limits the methods for investigating firmware failures, such as online debugging. Therefore, the firmware needs to save its state within a very short time, or retain as much critical content as possible, so that it can be accessed externally for debugging upon the next power-up.

[0010] It is understandable that if the last power outage did not end normally, the firmware may not have had time to save the firmware state for debugging. Therefore, it is necessary to restore the firmware state for debugging. Summary of the Invention

[0011] To solve the above-mentioned technical problems, or at least partially solve them, this disclosure provides a method for saving firmware state and a method for obtaining firmware state during abnormal power loss.

[0012] This disclosure provides a method for saving firmware state during abnormal power loss, applied to a storage device, comprising: detecting an abnormal power loss; after a predetermined time following the detection of the abnormal power loss, a control core among multiple processor cores detects a power loss completion flag; if the power loss completion flag does not exist, the control core sends an interrupt to the multiple processor cores; each of the multiple processor cores obtains a corresponding data sub-block based on the interrupt to generate a state sub-block, resulting in multiple state sub-blocks; the control core stores part or all of the multiple state sub-blocks in a target area of ​​a persistent storage medium space based on a sorting result of the multiple state sub-blocks according to the probability of the abnormality occurring.

[0013] This disclosure provides a method for obtaining firmware context during abnormal power loss, applied to a storage device. The method includes: when the storage device is powered on, if it is identified that the abnormal power loss processing flow did not end normally during the previous power loss of the storage device, based on multiple fixed-size regions of a fixed-size continuous space in the persistent storage medium space, sequentially reading the data block corresponding to the first fixed-size sub-region of each of the multiple fixed-size regions; if no valid data block is read from the first fixed-size sub-region of the current fixed-size region, then reading the next fixed-size region; and if a valid data block is read from the first fixed-size sub-region of the current fixed-size region, then continuing to read the current fixed-size region to obtain all data blocks constituting the firmware context; determining the firmware context length based on the total number of all data blocks; and in response to the host reading the Telemetry log, determining the size of the Telemetry Log based on the firmware context length, and using one, more, or all of the data blocks as the content of the Telemetry Log.

[0014] This disclosure also provides a method for generating a core dump file using firmware that has experienced an abnormal power failure. The method is applied to a host connected to a storage device. The method includes: reading a Telemetry Log from the storage device, the Telemetry Log including firmware field data; parsing and unpacking the firmware field data to generate a package configuration file and a data sub-block file; and obtaining a core dump file based on the package configuration file and the data sub-block file.

[0015] This disclosure also provides a method for obtaining a firmware field-generated core dump file after an abnormal power failure, comprising: when the storage device is powered on, if it is identified that the abnormal power failure processing flow did not end normally during the previous power failure of the storage device, based on multiple fixed-size regions of a fixed-size continuous space in the persistent storage medium space, sequentially reading the data block corresponding to the first fixed-size sub-region of each of the multiple fixed-size regions; if no valid data block is read from the first fixed-size sub-region of the current fixed-size region, then reading the next fixed-size region; and if a valid data block is read from the first fixed-size sub-region of the current fixed-size region, then continuing to read the current fixed-size region to obtain all data blocks constituting the firmware field; determining the firmware field length based on the total number of all data blocks; the host reading a Telemetry Log from the storage device, the Telemetry Log including firmware field data; the storage device responding to the host reading the Telemetry log, determining the size of the Telemetry Log based on the firmware field length, and using one, multiple, or all of the data blocks as Telemetry... The log contains the following: the host parses and unpacks the firmware field data to generate a packaging configuration file and a data sub-block file; the host obtains the core dump file based on the packaging configuration file and the data sub-block file.

[0016] This disclosure also provides a system for obtaining a firmware field-generated core dump file after an abnormal power failure, including a host and a storage device. When the storage device is powered on, if it is identified that the abnormal power failure processing flow did not end normally during the previous power failure of the storage device, based on multiple fixed-size regions of a fixed-size continuous space in the persistent storage medium space, the system sequentially reads the data block corresponding to the first fixed-size sub-region of each of the multiple fixed-size regions, including one, some, or all of the fixed-size regions. If no valid data block is read from the first fixed-size sub-region of the current fixed-size region, the system reads the next fixed-size region. If a valid data block is read from the first fixed-size sub-region of the current fixed-size region, the system continues to read the current fixed-size region to obtain all data blocks constituting the firmware field. The firmware field length is determined based on the total number of all data blocks. The host is used to read a Telemetry Log from the storage device, the Telemetry Log including firmware field data. In response to the host reading the Telemetry Log, the storage device determines the size of the Telemetry Log based on the firmware field length, and uses one, some, or all of the data blocks as the Telemetry data. The host is also used to parse and unpack the firmware field data, generate a packaging configuration file and a data sub-block file, and obtain a core dump file based on the packaging configuration file and the data sub-block file.

[0017] This disclosure also provides an electronic device, the electronic device comprising: a processor; a memory for storing executable instructions of the processor; the processor being configured to read the executable instructions from the memory and execute the executable instructions to implement the method provided in this disclosure for saving firmware state during abnormal power failure, or obtaining firmware state during abnormal power failure, or generating a core dump file using firmware state during abnormal power failure, or generating a core dump file using firmware state during abnormal power failure.

[0018] Compared with the prior art, the technical solution provided in this disclosure has the following advantages: The firmware state-of-the-art solution for abnormal power loss provided in this disclosure detects an abnormal power loss. After a predetermined time has elapsed since the detection of the abnormal power loss, the control core among multiple processor cores checks for a power loss completion flag. If the power loss completion flag is absent, the control core sends an interrupt to the multiple processor cores. Each processor core obtains its corresponding data sub-block based on the interrupt and generates state sub-blocks. The control core then stores part or all of the state sub-blocks in a target area of ​​the persistent storage medium space based on a sorting result according to the probability of an anomaly. Using this technical solution, when a power loss anomaly occurs, key information in the firmware state-of-the-art can be preserved, allowing for rapid problem localization based on the saved state-of-the-art data, greatly improving debugging efficiency. The firmware state-of-the-art solution for abnormal power loss provided in this disclosure includes a host that: reads a Telemetry Log from a storage device, the Telemetry Log including firmware state-of-the-art data; parses and unpacks the firmware state-of-the-art data to generate a packaged configuration file and a data sub-block file; and obtains a core dump file based on the packaged configuration file and the data sub-block file. By adopting the above technical solution, when the storage device experiences a power failure and is powered on again, debugging and analysis can still be performed even if the firmware is not fully saved on-site, which greatly improves the efficiency of debugging the storage device. Attached Figure Description

[0019] Figure 1 is a schematic diagram of a solid-state storage device;

[0020] Figure 2 is a structural block diagram of a Gaoyaotai storage device provided in an embodiment of this disclosure;

[0021] Figure 3 is a flowchart illustrating a method for saving firmware in the field during abnormal power loss, provided in an embodiment of this disclosure.

[0022] Figure 4 is an example diagram of firmware field segmentation into field sub-blocks provided in an embodiment of this disclosure;

[0023] Figure 5 is an example diagram of the field sub-block structure provided in an embodiment of this disclosure;

[0024] Figure 6 is a flowchart illustrating the firmware on-site saving process provided in this embodiment of the present disclosure;

[0025] Figure 7 is an example diagram of a host and storage device connection provided in an embodiment of this disclosure;

[0026] Figure 8 is a flowchart illustrating a firmware field restoration method for abnormal power loss provided in an embodiment of this disclosure;

[0027] Figure 9A is a schematic diagram of the structure of the persistent storage medium space provided in an embodiment of this disclosure;

[0028] Figure 9B is a schematic diagram of the structure of a fixed-size region provided in an embodiment of this disclosure;

[0029] Figure 9C is a schematic diagram of the structure of the field sub-block provided in an embodiment of this disclosure;

[0030] Figure 10 is a flowchart illustrating the firmware field restoration process provided in this embodiment of the present disclosure.

[0031] Figure 11 is a flowchart illustrating the firmware field restoration process provided in this embodiment of the present disclosure;

[0032] Figure 12 is a flowchart illustrating a method for generating a core dump file in the field using firmware that has experienced an abnormal power failure, according to an embodiment of this disclosure.

[0033] Figure 13 is an example diagram of obtaining firmware context provided in an embodiment of this disclosure;

[0034] Figure 14 is an example diagram of firmware field restoration provided in the embodiments of this disclosure;

[0035] Figure 15 is a flowchart illustrating the firmware field restoration process provided in this embodiment of the present disclosure;

[0036] Figure 16 is a flowchart illustrating the firmware field restoration process provided in this embodiment of the present disclosure;

[0037] Figure 17 is a schematic diagram of a field device for saving firmware during abnormal power failure provided in an embodiment of this disclosure;

[0038] Figure 18 is a schematic diagram of a firmware field restoration device for abnormal power failure provided in an embodiment of this disclosure;

[0039] Figure 19 is a schematic diagram of an apparatus for generating a core dump file in the field using firmware that has experienced an abnormal power failure, according to an embodiment of this disclosure. Detailed Implementation

[0040] Based on the foregoing background description, FIG2 represents a block diagram of a typical storage device applying an embodiment of the present disclosure. FIG2 is also a detailed block diagram of the control component 104 of FIG1.

[0041] The control unit includes multiple processor cores (e.g., processor cores 0-3). Each processor core has its own memory (e.g., memory 0-3) for storing the firmware that runs on that processor core. The firmware running on each processor core is different. The processor core's own storage device also stores data generated or used during firmware execution. The software running in the embedded device is commonly referred to as firmware.

[0042] The control unit also includes cache / shared memory. The shared memory can be used by the processor core.

[0043] The control unit also includes SRAM, which is mainly used as a cache to store I / O data. In this disclosure, it is also used to store the field sub-blocks generated by each processor core.

[0044] The processor cores can communicate with each other. This can be done through inter-core queues (not shown) or shared memory.

[0045] The processor cores are divided into tasks. One core is selected as the control core.

[0046] Typically, when there are multiple processor cores, the firmware running on one processor core is called a logical module. For example, the control core is logical module 0. However, some processor cores run multiple logical modules. These logical modules can be viewed as different pieces of firmware that can run concurrently. For example, a logical module is a process, thread, or coroutine.

[0047] For subsequent debugging purposes, the firmware environment that the processor core / logic module needs to save includes various state information during abnormal power loss, mainly including: register status, core independent memory area data, inter-core shared fast memory area data, inter-core shared slow memory area data, stack information, SQ / CQ queue, Admin queue, memory management information, memory information, other processor status information, hardware accelerator status information, cache contents, inter-core communication queue, inter-core shared memory, etc.

[0048] Memory information mainly includes how the application uses memory, including the stack, code segment, data segment, and heap.

[0049] Register states contain the values ​​of the processor core's architecture registers, such as the program pointer and stack pointer, when the program crashes.

[0050] Stack information includes the stack pointer and function call stack information, which helps to locate the function call chain when the program crashes.

[0051] Memory management information relates to a program's memory allocation and usage, which helps in detecting problems such as memory leaks.

[0052] In addition to memory information, other processor and operating system status information includes some key program running statuses.

[0053] The firmware context primarily resides in the registers within the processor core, the memory corresponding to the processor core, and also includes the contents of inter-core queues, SQCQ, and / or AdminQ defined by the NVMe protocol. The state of the flash channel controller can also serve as the firmware context. The flash channel controller is used to operate NAND flash memory.

[0054] The control unit has external connections to a backup power supply, DRAM, and multiple NAND flash memory modules.

[0055] NAND flash memory is a "persistent storage medium." Its storage space is divided into two parts: one part stores user data (related to I / O commands), and the other part stores system data. A portion of the system data storage space is reserved as the "persistent storage medium space" for the storage field sub-blocks of this invention. This space is further divided into multiple areas. The field sub-block is to be written to one or more of these areas.

[0056] The backup power supply is used to provide short-term power during abnormal power outages, enabling the control components to execute the processing flow for handling abnormal power outages. Its power supply time is 10-20ms. The main function of the backup power supply is to write user data cached in SRAM or DRAM to NAND flash memory during abnormal power outages. Storing field sub-blocks is an additional requirement for it, therefore: (1) the backup power supply is unreliable for storing field sub-blocks, thus requiring optimization of the field sub-block storage process, including multi-core collaborative work, avoiding affecting the normal abnormal power outage processing process, and prioritizing the storage of more important field sub-blocks; (2) the backup power supply is used to store not only field sub-blocks, but also other data.

[0057] Each processor core generates several context subblocks. These are further divided into context subblocks that can only be generated by the processor core itself, and context subblocks that can be generated by any processor core. For the processor core's internal registers and its own dedicated memory, this information is known only to the processor core, and these context subblocks can only be generated by the core itself. However, for content such as SQ / CQ, any processor core can access and generate the corresponding context subblocks.

[0058] The method for saving firmware context during abnormal power failure proposed in this disclosure involves detecting an abnormal power failure. After a predetermined time following the detection of the abnormal power failure, the control core among multiple processor cores checks for a power failure completion flag. If the flag is absent, the control core sends an interrupt to the multiple processor cores. Each processor core retrieves its corresponding data sub-block based on the interrupt, generating a context sub-block. The control core then stores part or all of the context sub-blocks in a target area of ​​persistent storage medium based on a sorting result according to the probability of the abnormality. This technical solution enables the saving of critical context in a very short time in the event of a failed abnormal power failure process, and allows for external access and context restoration for maximum debugging after the next power-on, thereby significantly improving debugging efficiency.

[0059] Figure 3 is a flowchart illustrating a method for saving firmware in case of abnormal power failure according to an embodiment of this disclosure. As shown in Figure 3, this method is applied to a storage device and includes the following steps.

[0060] Step 201: An abnormal power outage was detected.

[0061] Step 202: After a predetermined time has elapsed since the abnormal power failure was detected, the control core among the multiple processor cores checks the power failure completion flag. If the power failure completion flag is not present, the control core sends an interrupt to the multiple processor cores.

[0062] Specifically, the storage device is connected to the host. Under normal circumstances, the user shuts down the host via buttons or the operating system. During the host shutdown process, the host notifies the storage device to power down. The host provides sufficient power to the storage device during the power-down period, allowing the storage device enough time to execute its power-down process. During abnormal power-down handling, each processor core has its own processing flow, and they operate independently and in parallel. An "interrupt" indicates that the control core has detected an anomaly and needs to save the firmware state. This causes each processor core to stop and generate a state sub-block. After generation, each processor core continues its own power-down handling process, while the control core is responsible for writing the state sub-blocks sequentially to the persistent storage medium.

[0063] In this embodiment of the disclosure, abnormal power failure refers to an event in which the storage device loses power because the user does not perform a normal power-off operation using the power button or other means.

[0064] In some embodiments, a preset time longer than the average duration is set, wherein the average duration is determined based on multiple historical durations during which the storage device completes a power outage process.

[0065] Specifically, if the preset time is slightly longer than normal (such as the average duration of the power-down completion flag), each processor core can complete the abnormal power-down process. Thus, by checking the power-down completion flag after the preset time, if it is not present, it indicates that one or more logic modules have a problem. Only then is it meaningful to save the firmware state.

[0066] It should be noted that the preset time can also be set according to the actual application needs. For example, a timer with a fixed time can be set to check the power-off completion flag when the timer expires. The power-off completion flag is used to indicate that the power-off is complete.

[0067] In this embodiment, the firmware includes multiple logical modules running on multiple processor cores. Each logical module's own abnormal power-down handling process includes multiple power-down steps. In response to detecting an abnormal power-down, each logical module executes its own abnormal power-down handling process and records the number of the completed power-down step during the execution of its own abnormal power-down handling process. After multiple logical modules have completed their own abnormal power-down handling processes, a power-down completion flag is generated. Thus, each logical module in the firmware divides the power-down handling process into multiple sub-steps, records the completed sub-step numbers in real time, analyzes the probability of each logical module experiencing an anomaly based on the completed sub-step numbers of each logical module and the logical dependencies between sub-steps, and sorts the logical modules according to the probability of an anomaly. Subsequently, the on-site data of logical modules with a higher probability of an anomaly can be saved first, further ensuring the efficiency and effectiveness of subsequent debugging.

[0068] If the power-down completion flag is not detected (i.e., the flag does not exist), the firmware needs to be saved in the current state. In other words, when an abnormal power-down begins, the firmware starts a timer with a fixed duration. At the end of the timer, it checks the power-down completion flag; if the flag is not present, the firmware saves the current state. It should be noted that the abnormal power-down handling procedures for each logic module continue to run.

[0069] The firmware state saving function can be initiated by the control core sending an interrupt to multiple processor cores (including the control core) to notify them to save the firmware state. Depending on the actual application requirements, one of the processor cores can be designated as the control core, and the other processor cores can perform corresponding operations based on the interrupts sent by the control core.

[0070] Step 203: Each of the multiple processor cores obtains the corresponding data sub-block based on the interrupt and generates a field sub-block, resulting in multiple field sub-blocks.

[0071] Step 204: Based on the sorting results of multiple field sub-blocks according to the probability of anomaly occurrence, the control kernel stores part or all of the multiple field sub-blocks into the target area of ​​the persistent storage medium space.

[0072] In this embodiment, after receiving an interrupt, each processor core packages its corresponding data sub-block into a corresponding context sub-block, and writes a completion marker when the context sub-block packaging is complete. The control core generates its own context sub-block and then exits the interrupt. Cores not acting as the control core, after generating their own context sub-blocks, also assist in handling any incomplete packaging from data sub-blocks to context sub-blocks. After exiting the interrupt, the control core sorts the context sub-blocks according to the probability of an exception occurring, and stores some or all of the context sub-blocks in the target area of ​​the persistent storage medium.

[0073] The firmware includes multiple logical modules running on multiple processor cores. The power-down completion flags include one or more power-down completion flags generated by these logical modules. The likelihood of anomalies occurring in one or more of these logical modules is determined based on these generated power-down completion flags; or, the likelihood of anomalies occurring in one or more of these logical modules is determined based on historical anomaly occurrences. In other words, if the abnormal power-down handling process progresses slower than expected, there is a greater probability of a problem, thus requiring priority to save the firmware state; or, based on experience or historical data, which logical modules are more likely to fail during abnormal power-downs.

[0074] In some embodiments, the abnormal power-down handling process of each logical module in the firmware is sorted according to its current progress, and the abnormal probability of each logical module in the firmware is sorted according to the sorting result; and / or, the abnormal probability of each logical module in the firmware is sorted according to the firmware context content processed by each logical module. That is, the order of the probability of anomalies is related to the logical module in a certain order. For example, if the abnormal power-down handling process is progressing slowly, it is more likely to have a problem, so the context sub-block corresponding to that logical module is placed first; the order of the probability of anomalies is also related to the firmware context content in a certain order. For example, register status and stack information are placed first, followed by memory information.

[0075] In some embodiments, in response to the detection of an abnormal power failure, each processing core processes its own abnormal power failure handling process in parallel. In response to an interrupt, multiple processor cores pause their own abnormal power failure handling process, generate multiple field sub-blocks, and then resume processing their own abnormal power failure handling process.

[0076] Specifically, the firmware includes multiple logical modules running on multiple processor cores. Each logical module's own abnormal power-down handling process includes multiple power-down steps. In response to the detection of an abnormal power-down, each logical module executes its own abnormal power-down handling process and records the number of the completed power-down step during the execution of its own abnormal power-down handling process. After multiple logical modules have completed their own abnormal power-down handling processes, a power-down completion flag is generated.

[0077] In some embodiments, the firmware context includes multiple logical parts, each of which is divided into one or more data sub-blocks, and context sub-blocks are generated based on the data sub-blocks.

[0078] In some embodiments, the logic portion includes one or more of the following: in-core register state, core-specific memory area data, inter-core shared fast memory, inter-core shared slow memory, stack information, SQ or CQ queue, Admin queue, memory management information, memory information, other processor state information, hardware accelerator state information, cache contents, inter-core communication queue, and inter-core shared memory.

[0079] In some embodiments, as shown in Figures 4 and 5, each field subblock includes: a magic number for identifying a new field subblock; a number for the firmware field packaged into the field subblock; the size of the field subblock; and a data subblock descriptor for describing information about the data subblocks contained in the field subblock, including: multiple data subblock descriptor entries, each data subblock descriptor entry corresponding to one of the data subblocks.

[0080] Each data subblock descriptor entry contains: the data subblock type, the log area, and the data subblock area. The data subblock type describes the logical portion of the data subblock from any target. The logical portion includes one or more of the following: in-core register status, in-core independent memory area data, inter-core shared fast memory, inter-core shared slow memory, stack information, SQ or CQ queue, Admin queue, memory management information, memory information, other processor status information, hardware accelerator status information, cache contents, inter-core communication queue, and inter-core shared memory.

[0081] The log area includes: the number of valid log entries, indicating the number of valid log entries; the total number of log entries, indicating the total number of log entries, including valid and blank entries; the log entry size, where each log entry is the same size; and the log entry area, which stores all log entries in equal-sized format.

[0082] The data sub-block area is used to store multiple data sub-blocks, and each data sub-block corresponds to one of the data sub-block descriptor entries.

[0083] In optional implementations, the field sub-block may include more or less content. For example, a field sub-block may include multiple data sub-block descriptors and corresponding data sub-blocks. Each data sub-block descriptor corresponds one-to-one with a data sub-block. The data sub-block descriptor is used to record metadata of the corresponding data sub-block, such as the type of firmware field recorded in the data sub-block. Optionally, the data sub-block descriptors have a specified size and are stored contiguously; the data sub-blocks also have a specified size and are stored contiguously. This facilitates the retrieval of a specific data sub-block within the field sub-block.

[0084] In some embodiments, when the control core determines that the power-down is not complete, it determines the probability of an exception occurring in each of the multiple logic modules based on the power-down step numbers that have been completed by each of the multiple logic modules; and sorts them according to the probability of the exception occurring to obtain a sorting result.

[0085] In some embodiments, after the control core detects a predetermined time of abnormal power failure, it does not send interrupts to multiple processor cores or save the firmware state based on the existence of a power failure completion flag, thus eliminating the need to initiate the abnormal power failure handling process.

[0086] In some embodiments, multiple processor cores each obtain corresponding data sub-blocks based on interrupts and generate multiple context sub-blocks, including: after each processor core receives an interrupt, it packages its own corresponding data sub-block into the corresponding context sub-block and adds marking information; after the control core generates its own corresponding context sub-block, it exits the interrupt handling process; after other processor cores that are not the control core generate their own corresponding context sub-blocks, they generate context sub-blocks including marking information from other data sub-blocks according to the sorting result, and then exit the interrupt handling process.

[0087] Specifically, upon receiving an interrupt, each processor core packages its own registers, internal independent memory areas, and other data sub-blocks into a corresponding context sub-block, and writes a completion flag when the context sub-block is completed. Except for the control core, after completing its own context sub-block, each processor core, based on the sorting result, saves the most important remaining context sub-blocks according to their order, and writes a completion flag when these sub-blocks are completed. This process continues until all context sub-blocks are completed, at which point the interrupt is exited. The control core, after completing its own context sub-block (including writing the completion flag), exits the interrupt directly.

[0088] In this embodiment of the disclosure, a target area can be determined in advance in the persistent storage medium space, and each field sub-block can be stored in the target area in sequence according to the anomaly probability sorting result of each field sub-block (e.g., from the anomaly probability from large to small).

[0089] In this embodiment, a fixed-size contiguous space in the persistent storage medium is obtained. This fixed-size contiguous space is divided into multiple fixed-size regions, and a target region is determined from these regions. The target region is used to record the context sub-blocks written to the persistent storage medium. A target sequence number corresponding to the target region is obtained, and this target sequence number is stored in a target location within the persistent storage medium when the abnormal power-down process is completed. This allows the storage device to retrieve the target sequence number from the target location upon the next power-on, thereby finding the saved firmware context.

[0090] Specifically, the firmware selects a continuous space of a fixed size from the persistent storage medium, divides this space into multiple regions of fixed size, selects an empty region, and records the sequence number of the selected region into the target sequence number as the "dirty region sequence number". When the abnormal power failure process is completed normally, the "dirty region sequence number" will be saved to the target location of the persistent storage medium.

[0091] It's important to understand that during abnormal power loss handling, the backup power supply cannot always guarantee that the target sequence number will be written to the target location. Therefore, when the storage device powers on, it searches for the previously saved firmware context in various ways. One such method is reading the "dirty region sequence number" at the target location to obtain the saved firmware context. If the "dirty region sequence number" is not successfully saved, the firmware context can be identified by traversing each region of a fixed-size contiguous space in the persistent storage medium, recording the location of the target data block in each region, and searching for data matching the characteristics of the target data block.

[0092] In this embodiment of the disclosure, the control core, based on the sorting result of multiple field sub-blocks according to the probability of an anomaly occurring, stores part or all of the multiple field sub-blocks into a target area of ​​the persistent storage medium space. This includes: dividing each field sub-block into small data blocks of a fixed size, adding descriptive information to each small data block to obtain target data blocks, and storing all target data blocks sequentially into the target area according to the sorting result; wherein, after each target data block is stored, a power failure completion flag is detected, and if the power failure is determined to be complete based on the power failure completion flag, the operation of storing part or all of the multiple field sub-blocks into the target area of ​​the persistent storage medium space is terminated; and if the power failure is determined to be incomplete based on the power failure completion flag, the storage operation of storing part or all of the multiple field sub-blocks into the target area of ​​the persistent storage medium space continues.

[0093] Specifically, after exiting the interrupt, the control core processes multiple field sub-blocks serially according to the sorting results: it divides the previously selected target area into multiple fixed-size sub-regions, and each sub-region into multiple fixed-size blocks; it waits for the field sub-blocks to be saved to the buffer; it divides the field sub-blocks into fixed-size small data blocks, adds a descriptive message to each small data block, including the data block's identifier and sequence number, and the size of the "small data block + data description" is equal to the size of the previously fixed-size sub-region; it stores all "small data blocks + data descriptions" sequentially into the previously selected area; after each small data block is stored, it checks the power-off completion flag, and if the power-off has been completed, it terminates the process, no longer saving the remaining small data blocks, nor processing the remaining other field sub-blocks.

[0094] In this disclosure, in order to preserve key content in the shortest possible time, the firmware is first divided according to logical relationships. Then, the divided logical parts are further segmented into data sub-blocks with limited data size. The data sub-blocks are then supplemented with necessary information, packaged into numbered field sub-blocks, and placed in a buffer. After that, each field sub-block is stored in a persistent storage medium in descending order of importance. In this way, even if the internal power supply of the storage device is interrupted prematurely, the most important parts can be preserved for subsequent investigation and analysis.

[0095] It should be noted that if a context subblock contains multiple data subblocks, they can be stored sequentially according to their importance, such as placing the most important registers first. This way, even if the context subblock is not completely saved, important debugging information can still be recovered from one or more successfully saved data subblocks as much as possible.

[0096] Figure 6 is a flowchart illustrating the firmware on-site saving process provided in this embodiment of the present disclosure.

[0097] Step 4.1: After detecting an abnormal power failure, start the timer. The time period can be set according to the average time it takes for each processor core to complete the abnormal power failure process, and the timer can be set according to the time period.

[0098] Step 4.2: Determine if the timer has reached its set time. If not, continue with step 4.2. If yes, proceed to step 4.3.

[0099] Step 4.3: Determine whether the power-down process has been completed. Specifically, the control core among multiple processor cores detects the power-down completion flag. If the power-down completion flag is not present, it means that the power-down process has not been completed, and proceed to step 4.4.

[0100] Step 4.4: The control core selects an empty region from the persistent storage medium storage space and stores the region number in the "dirty region number". Specifically, the control core obtains a fixed-size contiguous space in the persistent storage medium space, divides the fixed-size contiguous space into multiple fixed-size regions, determines the target region from the multiple fixed-size regions, obtains the target number corresponding to the target region as the "dirty region number", and stores the target number to the target location in the persistent storage medium space when the abnormal power failure process is completed.

[0101] Step 4.5: The control cores in multiple processor cores sort the field sub-blocks according to their importance. Specifically, when the control core determines that the power failure is not complete, it determines the probability of anomalies occurring in each of the multiple logic modules based on the power failure step numbers that have been completed by each of the multiple logic modules, sorts them according to the probability of anomalies occurring, obtains the sorting results, and sorts the multiple field sub-blocks according to their importance based on the sorting results.

[0102] Step 4.6 The control core issues interrupts to multiple processor cores (including itself).

[0103] In response to the interrupt, each processor core executes a portion of steps 4.7 through 4.12.

[0104] Step 4.7 Each processor core packages its own registers, independent internal storage, and other data sub-blocks into the corresponding context sub-blocks. Specifically, after receiving an interrupt, each processor core packages its own data sub-blocks into the corresponding context sub-blocks and adds marking information.

[0105] Step 4.8 Determine whether the processor core executing the current processing flow is itself the control core; if not, proceed to step 4.9.

[0106] Step 4.9: Determine if there are any unfinished sub-blocks in the field; if not, proceed to step 4.10 to exit the interruption; if there are unfinished sub-blocks in the field, proceed to step 4.11.

[0107] Step 4.11: Process the top-ranked field sub-blocks. Other processor cores, not acting as the control core, generate their own corresponding field sub-blocks and then generate field sub-blocks including tag information for the other data sub-blocks according to the ranking. After generating the field sub-blocks, these processor cores return to step 4.9 to determine if there are any more field sub-blocks to be generated.

[0108] In step 4.8, if the processor core is the control core, it executes step 4.12 to exit the interrupt process. Specifically, after the control core generates its corresponding context subblock, it exits the interrupt handling process and continues executing steps 4.13-4.17. Steps 4.13-4.17 are executed by the control core; other processor cores do not execute these steps.

[0109] Step 4.13: The control core determines whether there are any unsaved state sub-blocks to persistent storage. If so, proceed to step 4.14. Otherwise, the abnormal power failure handling process ends. Optionally, if there are no unsaved state sub-blocks to persistent storage, the target sequence number is written to the target location, or the target sequence number is written to the target location via step 4.20.

[0110] Step 4.14: The control core determines whether the unsaved, top-ranked context sub-blocks have been completed. If not, it continues with step 4.14 to wait for the top-ranked context sub-blocks to appear; if so, it proceeds to step 4.15. It's important to understand that in step 4.14, the control core repeatedly executes step 4.14 to wait for the top-ranked context sub-blocks to appear. During this time, other processor cores may be executing step 4.11, generating context sub-blocks. After waiting for the top-ranked context sub-blocks to appear in step 4.14, the control core proceeds to step 4.15 and subsequent steps to write the top-ranked context sub-blocks to the persistent storage medium. It's also important to understand that there can be multiple top-ranked context sub-blocks. The control core does not need to wait for all top-ranked context sub-blocks to complete in step 4.14; instead, it writes these context sub-blocks to the persistent storage medium as soon as they appear, through steps 4.15 and subsequent steps.

[0111] Step 4.15: The control core divides the context subblock into smaller data blocks; Step 4.16: Add descriptive information to the smaller data blocks and save them to the area corresponding to the "dirty region number"; Step 4.17: Determine if there are any unsaved smaller data blocks; if yes, execute step 4.16; if no, execute step 4.13. It's important to understand that returning to step 4.13 is necessary because other processor cores outside the control core may still be executing step 4.11 and generating further context subblocks. Returning to step 4.13 allows for the processing of these newly generated context subblocks.

[0112] In addition, after step 4.1, there is step 4.18 where each logic module starts the abnormal power failure handling process; step 4.19 where the control core determines whether the power failure handling process is completed normally; if so, step 4.20 is executed to save the "dirty region sequence number" to the persistent storage medium.

[0113] Understandably, the failure to complete the power-down process within the specified time is used as the condition for triggering firmware state saving. During the firmware state saving process, the firmware continues the power-down process normally. The state saving action does not affect the normal operation of the firmware logic. Once the power-down process is completed normally, the firmware state saving action will stop, and the permanent storage medium space used can be reclaimed and reused, further improving the flexibility of saving firmware state for abnormal power-downs.

[0114] The firmware state preservation scheme for abnormal power loss provided in this embodiment detects an abnormal power loss. After a predetermined time following the detection of the abnormal power loss, the control core among multiple processor cores checks for a power loss completion flag. If the power loss completion flag is absent, the control core sends an interrupt to the multiple processor cores. Each processor core then retrieves its corresponding data sub-block based on the interrupt and generates state sub-blocks. The control core then stores some or all of these state sub-blocks in a target area of ​​persistent storage medium based on a sorting result according to the probability of the abnormality. Using this technical solution, when a device experiences a power loss anomaly, key information in the firmware state can be preserved. This allows for rapid problem localization based on the saved state data, significantly improving debugging efficiency.

[0115] It is also understandable that after an abnormal power outage and subsequent power-on, the firmware can identify whether the previous power outage process ended normally. If the previous power outage ended normally, the firmware reads the "dirty region number" from the target location on the permanent storage medium. If the number is a valid region number, it means that some or all of the field sub-block data may have been written to that region. The firmware erases that region, and after erasure, the region can continue to be used during operation. If the previous power outage did not end normally, the host will detect that the storage device is lost or that the storage device is in an abnormal mode. In this case, for the purpose of debugging the storage device, the field sub-block data can be read to restore the abnormal situation.

[0116] In some embodiments, after an abnormal power failure, the method further includes: when the storage device is detected to be powered on, if it is identified that the abnormal power failure processing flow did not end normally when the storage device was previously powered off, querying from a fixed-size continuous space to obtain the firmware context saved when the storage device was previously powered off, and setting a target index corresponding to the firmware context so that the host can read the firmware context based on the target index.

[0117] Therefore, this disclosure enables the preservation of critical firmware information for debugging within a short period of time after an abnormal power outage of the storage device. Furthermore, even if the firmware information is not fully preserved, debugging and analysis can still be performed. This ensures that key firmware information is retained during power outages, allowing for rapid problem localization based on the saved data, significantly improving the efficiency of storage device debugging.

[0118] As shown in Figure 7, the connection between the host and the storage device includes: when the storage device is powered on, if it is detected that the abnormal power failure handling process did not end normally during the previous power failure of the storage device, based on multiple fixed-size regions of a fixed-size continuous space in the persistent storage medium space, the data block corresponding to the first fixed-size sub-region of each of the multiple fixed-size regions is read sequentially; if no valid data block is read from the first fixed-size sub-region of the current fixed-size region, the next fixed-size region is read; and if a valid data block is read from the first fixed-size sub-region of the current fixed-size region, the current fixed-size region is continued to be read to obtain all the data blocks constituting the firmware context; the length of the firmware context is determined based on the total number of all data blocks.

[0119] The host reads the Telemetry Log from the storage device, which includes firmware context data. In response to the host reading the Telemetry Log, the storage device determines the size of the Telemetry Log based on the firmware context length and uses one, more, or all of the data blocks as the content of the Telemetry Log. The firmware context data is parsed and unpacked to generate a package configuration file and data sub-block files. The core dump file is obtained based on the package configuration file and data sub-block files.

[0120] Figure 8 illustrates the process flow for obtaining the firmware state during an abnormal power outage, including the following steps.

[0121] Step 301: When the storage device is powered on, if it is detected that the abnormal power failure handling process did not end normally during the previous power failure of the storage device, based on multiple fixed-size regions of a fixed-size continuous space in the persistent storage medium space, the data block corresponding to the first fixed-size sub-region of each of the multiple fixed-size regions, one, some or all of the fixed-size regions is read in sequence.

[0122] Step 302: If no valid data block is read from the first fixed-size sub-region of the current fixed-size region, then read the next fixed-size region. If a valid data block is read from the first fixed-size sub-region of the current fixed-size region, then continue reading the current fixed-size region to obtain all the data blocks that constitute the firmware environment.

[0123] Step 303: Determine the firmware field length based on the total number of all data blocks.

[0124] Step 304: In response to the host reading the Telemetry log, determine the size of the Telemetry Log based on the firmware field length, and use one, more, or all of all data blocks as the content of the Telemetry Log.

[0125] In some embodiments, when reading each data block, the validity of the data block is determined based on the data description of each data block.

[0126] It should be noted that when reading multiple fixed-size regions sequentially, if a valid data block is read from the first fixed-size sub-region of the current fixed-size region, then there is no need to read data from other fixed-size regions; and if a valid data block has already been read from a fixed-size sub-region of the current fixed-size region, only data blocks are read from the current fixed-size region, and other fixed-size regions are no longer read, further improving processing efficiency.

[0127] In some embodiments, if it is identified that the abnormal power failure handling process of the storage device was completed normally during the previous power failure, the target sequence number is read from the target address in the persistent storage medium space. If the target sequence number is a valid region sequence number, the target sequence number is deleted and the fixed-size region corresponding to the target sequence number is erased.

[0128] In this embodiment of the disclosure, each fixed-size region corresponds to multiple fixed-size sub-regions, and each fixed-size sub-region includes multiple fixed-size data blocks. When the storage device is powered down for the last time, causing the firmware context to be written to the selected fixed-size region, the context sub-block is split into multiple small data blocks, and a data description is added to each small data block. The size of each data block in the fixed-size region is equal to the combined size of the small data block and the data description.

[0129] In this embodiment of the disclosure, the data description includes a data block identifier and a data block sequence number; wherein, the data block identifier is used to indicate that the content of the small data block is a backup firmware state, and the data block sequence number is used to indicate the sequence number of the small data block in the current state sub-block or the sequence number in all state sub-blocks.

[0130] It is understandable that, based on the description of the firmware storage field embodiment in Figures 1 to 6 above, the selection of which "fixed-size region" is determined by the control core and is uncertain. The control core will use the selection result as the target sequence number, i.e., the "dirty region sequence number", and write it in a specific target location. When the storage device is powered on again, there may be a real "dirty region sequence number" in the specific target location, or there may not be. In other words, the previous power failure may not have been an abnormal power failure, so the control core did not write the "dirty region sequence number". The previous power failure may have been an abnormal power failure during the processing, and the control core did not have time to successfully write the "dirty region sequence number". Therefore, it is necessary to identify the target sequence number in the power-on process.

[0131] For example, as shown in Figure 9A, the fixed-size contiguous space in the persistent storage medium space corresponds to multiple fixed-size regions, including fixed-size region (number 0), fixed-size region (number 1), fixed-size region (number 2), and fixed-size region (number 3), and a target address in the persistent storage medium space stores the target sequence number, i.e., the "dirty region sequence number". As an example, "dirty region sequence number (=1)" means that the fixed-size region (number 1) is the target region. The target region is used to record the context sub-blocks written to the persistent storage medium space. The target sequence number corresponding to the target region is obtained, and when the abnormal power-down process is completed, the target sequence number is stored in the target location of the persistent storage medium space, so that when the storage device is powered on again, it can obtain the target sequence number from the target location and thus find the saved firmware context.

[0132] Specifically, the fixed-size region is further divided into multiple fixed-size sub-regions, and each fixed-size sub-region includes multiple fixed-size data blocks. For example, as shown in Figure 9B, the fixed-size region includes four fixed-size sub-regions, and each fixed-size sub-region includes multiple fixed-size data blocks.

[0133] Understandably, when the control core writes the firmware context to a selected fixed-size area, it needs to split the context sub-block into multiple small data blocks and add data descriptions to each small data block to form a data block. This allows the storage device to identify whether the stored content is the backed-up firmware context when it is powered on again and to reconstruct the context sub-block.

[0134] Based on the description of the foregoing embodiments, the field subblock includes a "number" field and a "data block descriptor" that records the type of the data subblock. The "number" field, for example, represents the number of the processor core corresponding to the field subblock.

[0135] For example, as shown in Figure 9C, the field sub-block 0 structure divides the field sub-block into multiple small data blocks, i.e., multiple "small data blocks" as shown in Figure 9C. Each "small data block" is given a "data description," i.e., a data block identifier and a data block sequence number as shown in Figure 9C. The identifier indicates that the content of the data block is the backed-up firmware field, and the sequence number indicates its sequence number in the current field sub-block, or its sequence number in all field sub-blocks written by the control core. Each fixed-size sub-region includes multiple fixed-size data blocks, which can be contiguous or non-contiguous. For example, one or more field sub-blocks can be written into a fixed-size sub-region. In the following example, the "data block" with background fill records the data block obtained by adding "data description" to the "small data block" obtained from the field sub-block. The "data block" without background fill can be a blank data block or other data other than the field sub-block. As shown in Figure 9C, the "data blocks" with background fill in the fixed-size sub-region on the left are continuous, while the "data blocks" without background fill in the fixed-size sub-region on the right are discontinuous (because while the control core writes the field sub-block to the fixed-size sub-region, other cores may still be performing the power-down process and may generate data written to the fixed-size sub-region).

[0136] To read the saved state sub-blocks, if the previous power failure did not end normally, the firmware scans all fixed-size regions of the selected fixed-size persistent storage medium space during the saving process, searching for the first state sub-block data. That is, the firmware reads all fixed-size regions of this space one by one, reading only the first fixed-size sub-region of each fixed-size region. For the first fixed-size sub-region, it reads all its "data blocks" one by one. After reading each "data block," it checks the data description within the "data block." If the data description matches the characteristics of the state sub-block data, it means the required data block (also called a valid data block) has been found. It continues reading subsequent "data blocks" starting from the address of the first found data block. After reading each "data block," it checks the data description; if it is a valid data block, the number of valid data blocks is incremented by 1. This continues until all fixed-size sub-regions have been read. The firmware state length is calculated using the sum of the number of valid data blocks. All valid data blocks constitute the firmware state saved on the storage device during the last abnormal power failure. Optionally, the saved firmware state is stored within one or more fixed-size sub-regions, rather than all fixed-size sub-regions. For example, based on the fact that a data block in a fixed-size sub-region is a data block that has not been written to, it can be identified that subsequent data blocks will not store firmware context. Optionally, the size of the firmware context is also recorded along with the "dirty region number," thereby identifying the number of data blocks that record firmware context.

[0137] If the first fixed-size sub-region has been read but no field sub-block data is found, then the next fixed-size region is read. If all fixed-size regions have been read but no field sub-block data is found, the scan has failed and the firmware field does not exist.

[0138] If a firmware instance exists, add a telemetry log index. The first telemetry log area 1 points to the address of the first data block where this firmware instance is located, and the second and third telemetry log areas 2 and 3 are set to empty.

[0139] The telemetry log is defined in the NVMe 1.3 standard manual, which can be obtained from https: / / nvmexpress.org / wp-content / uploads / NVM_Express_Revision_1.3.pdf. For detailed explanations and usage, please refer to https: / / zhuanlan.zhihu.com / p / 399501400.

[0140] The host can obtain the previously saved firmware context by reading the telemetry log. Specifically, when the host reads the telemetry log header, the storage device responds by setting the Last Block of areas 1, 2, and 3 of the telemetry log to the length of the previously obtained firmware context plus the length of the telemetry log header. When the host reads telemetry log area 1, the firmware starts reading data from the first data block address where the firmware context is located and transmits the read data to the host. Since the Last Blocks of areas 1, 2, and 3 are equal, the host will not read telemetry log areas 2 and 3. The host constructs the entire telemetry log into a continuous file, called the context file.

[0141] It's important to understand that the host's read of the telemetry log may or may not occur; the timing is determined by the host. To prepare for potential host reads of the telemetry log, the storage device acquires the locations of multiple data blocks representing the firmware context and the length of the firmware context within one or more fixed-size regions upon power-up. This information is used to provide a response when the host reads the telemetry log. The storage device does not need to read the data blocks representing the firmware context before the host reads the telemetry log.

[0142] Figure 10 illustrates the processing flow after the storage device is powered on, including the following steps.

[0143] Step 10.1 Power on the storage device after an abnormal power outage.

[0144] Step 10.2 determines whether the previous abnormal power failure handling process on the storage device was completed normally; if it was completed normally, proceed to step 10.3. At this point, it is known that the previous abnormal power failure handling on the storage device was successful, thus eliminating the need to obtain firmware context for fault analysis. However, some firmware context may have been generated during the previous abnormal power failure handling based on the processing method shown in Figure 6. These firmware contexts need to be deleted to avoid confusion with other firmware contexts generated in subsequent abnormal power failure handling.

[0145] Step 10.3 Determine if the "dirty region number" is valid. You can determine the validity of the target number of each fixed-size region based on the number of fixed-size regions. For example, if there are 4 fixed-size regions, a "dirty region number" of 1 is considered valid, while a "dirty region number" of 5 or larger is invalid. If so, proceed to step 10.4.

[0146] Step 10.4 Erase the dirty region corresponding to the "dirty region number", which is the fixed-size region corresponding to the target number. The fixed-size region after erasure can continue to be used, thereby ensuring that the firmware state stored in the dirty region corresponding to the "dirty region number" is the firmware state stored during the last abnormal power failure.

[0147] Step 10.5 The storage device is in normal use.

[0148] Figure 11 illustrates another processing flow after the storage device is powered on, including the following steps.

[0149] Step 11.1 Power on the storage device after an abnormal power outage.

[0150] Step 11.2 determines whether the previous abnormal power failure handling process on the storage device was completed normally. If the abnormal power failure handling process was not completed normally, on-site data can be obtained for debugging, and step 11.3 can be executed. Optionally, step 11.2 and step 10.2 in Figure 10 can be implemented in the same process flow. For example, if the determination in step 10.2 is negative, then proceed to step 11.3 to continue execution.

[0151] Step 11.3 Set the fixed-size region to be read as the first fixed-size region and the data block to be read as the first data block.

[0152] Step 11.4 Read the data block to be read from the fixed-size region.

[0153] Step 11.5 Determine whether the read data block is a field sub-block data; if so, proceed to step 11.6; if the read data is not a field sub-block data, proceed to step 11.10.

[0154] Step 11.6 Increment the number of read data blocks by 1.

[0155] Step 11.7 Determine whether the fixed-size region to be read has been completely read or whether the number of data blocks read has reached the expected number; if not, proceed to step 11.8; if yes, proceed to step 11.9.

[0156] After setting the data block to be read as the next data block in step 11.8, proceed to step 11.4.

[0157] Step 11.9 Add a telemetry log entry. Point its area 1 to the first data block found, and set the Last Block of area 1 to the previously obtained firmware field length plus the telemetry log header length.

[0158] Step 11.10 Determine whether the first sub-region has been read; if yes, proceed to step 11.11; if no, proceed to step 11.12 and set the block to be read as the next data block before proceeding to step 11.4.

[0159] Step 11.11 Determine if all regions have been read; if yes, proceed to step 11.9; if no, proceed to step 11.13 Set the fixed-size region to be read as the next fixed-size region, set the data block to be read as the first data block of the next fixed-size region, and then proceed to step 11.4.

[0160] Specifically, after the storage device powers on, if it is determined that the previous abnormal power-down handling process was not completed properly, the storage device needs to obtain the firmware context. For example, it determines the fixed-size region storing the firmware context based on the "dirty region sequence number" and identifies the data blocks within it that record the firmware context. However, the "dirty region sequence number" may be invalid; for example, the "dirty region sequence number" was not recorded during the previous abnormal power-down handling process. In this case, it is necessary to traverse the fixed-size region to determine whether it stores the firmware context. More specifically, the first fixed-size sub-region of each fixed-size region is read, and each "data block" is read within the first fixed-size sub-region. If the read "data block" records the firmware context, it is assumed that the fixed-size sub-region containing it stores the firmware context; if no "data block" in the first fixed-size sub-region stores the firmware context, it is assumed that this fixed-size region does not store the firmware context, and its other fixed-size sub-regions are not read. Instead, another fixed-size region is traversed. For each "data block" in a fixed-size sub-region, identify whether it stores the firmware context. More specifically, read the "data block" and check if the location of the data description corresponding to the "data block" is recorded with an identifier. The presence of the identifier representing the firmware context is used as the basis for judgment. For blocks that do not have an identifier representing the firmware context, they may be other types of data (such as other data backed up during power failure) or they may be invalid data (meaning that there is no other valid data afterward, and in this case, the subsequent blocks can be discontinued).

[0161] It should be noted that the function retrieves all field sub-blocks backed up to the persistent storage medium during the last abnormal power outage from a fixed-size area. Here, "all field sub-blocks" refers to the backed-up field sub-blocks, not all generated field sub-blocks, because it's possible that not all field sub-blocks could be backed up in time; furthermore, some field sub-blocks may not have been fully backed up, and may only contain a portion of small data blocks.

[0162] Specifically, the firmware field length is obtained by finding all the small data blocks representing field sub-blocks that were backed up to the persistent storage medium space during the last abnormal power outage. There are many ways to determine the firmware field length. For example, in the case of the fixed-size sub-region on the left in Figure 9C, the firmware field length can be determined by recording the address of the first "data block" with background padding and the address of the last "data block" with background padding. Another example is the case of the fixed-size sub-region on the right in Figure 9C, where the address of each "data block" with background padding is recorded to determine the firmware field length, further improving the flexibility of determining the firmware field length.

[0163] Understandably, the storage device's firmware records the firmware context length and responds to the host with this recorded firmware context length when the host reads the telemetry log. The response to the host also includes the starting address of the telemetry log representing the firmware context. For example, the address of the first "data block" with background padding on the right side of Figure 9C can be used as the starting address of the telemetry log. Alternatively, a virtual address mapped to the first "data block" with background padding on the right side of Figure 9C can be used to respond to the host, preventing the host from knowing the physical address inside the storage device and further improving the security of the storage device.

[0164] Figure 12 is a flowchart illustrating a method for generating a core dump file in the field using firmware during an abnormal power outage, according to an embodiment of this disclosure. As shown in Figure 12, this method is applied to a host computer and includes the following steps.

[0165] Step 401: Obtain the Telemetry Log, which records the firmware state, from the storage device.

[0166] Specifically, the host can read the telemetry log and assemble the data received from the storage device into a field file.

[0167] When the host reads the firmware context, it obtains the firmware context, as shown in Figure 13. All "data blocks" with background fill are the firmware context.

[0168] For example, as shown in Figure 14, the host obtains the starting address (e.g., Area 1) and length (e.g., obtained from the Last Block of Area 1) of the firmware context in the Telemetry Log from the Telemetry Log Header provided by the storage device; and obtains the firmware context from the Telemetry Log based on the starting address and length. From the storage device's perspective, it obtains multiple small data blocks (also see Figure 13, where "data blocks" are filled with background) that store the firmware context in the process shown in Figure 11. The data (small data blocks) representing the firmware context in these data blocks are mapped to Telemetry Log Area 1. In response to the host reading Area 1 of the Telemetry Log, the storage device reads the multiple small data blocks that store the firmware context and sends them to the host as a response to the host reading Area 1 of the Telemetry Log.

[0169] Step 402: Parse and unpack the firmware field data obtained from the Telemetry Log to generate encapsulation configuration files and data sub-block files.

[0170] In some embodiments, the firmware field data is parsed and unpacked to obtain a packaging configuration file and a data sub-block file, including: checking the magic number of each field sub-block in the firmware field data and reading the number after the magic number matches; parsing the data sub-block descriptor and the data sub-block area; for each data sub-block descriptor entry, reading the storage area type and number of the data sub-block to jointly determine the data sub-block identifier, reading the offset of the data sub-block in the field sub-block and the data sub-block size, and reading the offset of the data sub-block in the storage area; generating a data sub-block file based on the data sub-block identifier, the offset of the data sub-block in the field sub-block and the data sub-block size; and generating packaging entries based on the data sub-block identifier, the data sub-block size and the offset of the data sub-block in the storage area and adding them to the packaging configuration file.

[0171] The host generates multiple field subblocks based on the firmware field data. Each field subblock consists of a series of consecutive small data blocks. The data description portion of a small data block does not belong to the firmware field, but the small data block itself does. For each small data block of the firmware field data, it is identified as the first data block of the field subblock when a specific location within it contains the "magic number" of the field subblock. The number of small data blocks belonging to this data block is determined based on its "field subblock size," and the field subblock is constructed using a specified number of subsequently extracted small data blocks. This process is repeated to obtain all field subblocks of the firmware field.

[0172] In this embodiment of the disclosure, the storage area type of the data sub-block is used to describe which logical part the data sub-block comes from; wherein, the logical part includes one or more of the following: in-core register state, in-core independent storage area data, inter-core shared fast storage, inter-core shared slow storage, stack information, SQ or CQ queue, Admin queue, memory management information, memory information, other processor state information, hardware accelerator state information, cache content, inter-core communication queue, and inter-core shared memory.

[0173] In some embodiments, the firmware field data is parsed and unpacked to obtain a packaging configuration file and a data sub-block file, including: obtaining the data sub-block identifier, the offset of the data sub-block in the field sub-block and the data sub-block size corresponding to all data sub-blocks based on the log area of ​​each field sub-block in the firmware field data; generating packaging entries based on the data sub-block identifier, data sub-block size and the offset of the data sub-block in the storage area and adding them to the packaging configuration file.

[0174] Understandably, after the host reads the site files, it parses, unpacks, and repackages them into a Coredump (core dump) file.

[0175] As an example, as shown in Figure 15, the steps include the following.

[0176] Step 15.1 For the field file, start parsing from the first field sub-block starting with telemetry log area 1. If the check fails or there is insufficient remaining data in the file, the parsing and unpacking will stop immediately and proceed directly to the next stage of encapsulating the Coredump file. At this time, the data sub-block files, encapsulation entries and log files that have been parsed normally can still be encapsulated into the Coredump file normally.

[0177] Specifically, for each scene sub-block, using the data structure shown in Figures 2 and 3: check the magic number; only if the check passes is it a valid scene sub-block, otherwise an error occurs; read the size of the scene sub-block; if it is greater than the remaining data size of the current file, then there is another scene sub-block; otherwise, the current scene sub-block is the last one; if the remaining data size of the current file is less than the size of the scene sub-block, it does not mean an error has occurred, and parsing can continue.

[0178] Step 15.2 Determine if there are any remaining sub-blocks; if so, proceed to step 15.3.

[0179] Step 15.3 Determine if the magic number matches; if it does, proceed to step 15.4.

[0180] Step 15.4 Read the size of the field sub-block to determine whether there is a next field sub-block and the location of the next field; Step 15.5.

[0181] Step 15.5 Read the number.

[0182] Step 15.6 Read the number of valid data subblock descriptor entries, the total number of data subblock descriptor entries, and the size of the data subblock descriptor entries.

[0183] Step 15.7 Determine if there are any more data sub-block descriptor entries; if so, proceed to step 15.8.

[0184] Step 15.8 The storage area type and number of the data sub-block together determine the identifier of the data sub-block.

[0185] Step 15.9 Read the offset and size of the data sub-block in the field sub-block, extract a segment of data from the field file, save it as a new data sub-block file, and name the file after the data sub-block identifier.

[0186] Step 15.10 reads the offset of the data sub-block in the storage area and the identifier of the data sub-block together to form an encapsulation entry, adds it to the end of the encapsulation configuration file, and obtains the encapsulation configuration file. Then return to step 15.7 to continue execution.

[0187] Specifically, in step 15.5, for each field subblock, the number is read; the data subblock descriptor and data subblock area are parsed: the number of valid data subblock descriptor entries, the total number of data subblock descriptor entries, and the size of the data subblock descriptor entries are read; the product of the total number of data subblock descriptor entries and the size of the data subblock descriptor entries determines the starting position of the log area (in the field subblock, the log area follows the data subblock area); the number of valid data subblock descriptor entries and the size of the data subblock descriptor entries determine the traversal method for each subsequent data subblock descriptor entry.

[0188] For each data subblock descriptor entry: the storage type of the data subblock is read, and its identifier is determined by the number obtained from the previous field subblock. For example, if the storage type of the data subblock is an internal register, combined with number 0, it can be combined into the data subblock identifier: Internal register of core 0. Another example is if the storage type of the data subblock is inter-core shared fast storage, combined with number 10, it can be combined into the data subblock identifier: Block A of inter-core shared fast storage (assuming the implementation agrees that the data subblock of inter-core shared fast storage block A is stored using the field subblock with number 10). The offset of the data subblock within the field subblock and the size of the data subblock are read, this data segment is extracted from the field subblock, and saved as a new data subblock file with the filename of the data subblock identifier mentioned above.

[0189] Data sub-block files are used in the subsequent coredump file packaging. The offset of the data sub-block in memory, along with its identifier and size, is used to form a packaging entry, which is added to the end of the packaging configuration file. The packaging configuration file is used in the subsequent coredump file packaging.

[0190] It should be noted that in step 15.7, if there are no data sub-block descriptor entries to process, step 15.11 is executed.

[0191] Step 15.11 Read the number of valid log entries, the total number of log entries, and the size of the log entries.

[0192] Step 15.12 Determine if there are any more log entries; if so, proceed to step 15.13.

[0193] Step 15.13: Add each log entry to the end of the log file to obtain the log file.

[0194] Specifically, the log section is parsed, including: reading the number of valid log entries, the total number of log entries, and the size of each log entry. The number of valid log entries and the size of each log entry determine how each log entry is traversed. For each log entry, it is appended to the end of the log file. The log file is not involved in the subsequent coredump file packaging.

[0195] The host needs to generate a Coredump file based on the telemetry log (site file or all valid small data blocks) it reads. The Coredump file is a formatted file used to record the program's state. In this embodiment, after generating the Coredump file, existing software tools can be used to analyze the firmware state, thereby understanding the situation when an anomaly occurs during the abnormal power-down handling process, facilitating the identification of the cause of the anomaly.

[0196] In this embodiment, a data block subfile and a packaging configuration file are generated from the telemetry log, and a coredump file is subsequently generated based on the data block subfile and the packaging configuration file. A file is a specific form of data. Besides using data block subfiles and packaging configuration files to record useful data obtained from the firmware environment, other forms of data carriers can also be used. For example, data can be stored in the host's memory in a specified format.

[0197] In this embodiment, based on the telemetry log, the host obtains multiple "data blocks," each including "small data blocks + data description." The host first reconstructs one or more field sub-blocks from the multiple "data blocks." Reconstructing a field sub-block can be understood as obtaining the "size of the field sub-block" from the beginning of the field sub-block to determine how many subsequent "data blocks" constitute a field sub-block. The field sub-block in the telemetry log may be incomplete; for example, it may have 100 data sub-blocks with a length of 500, but due to power failure, not all of them were recorded. In the telemetry log, the data length belonging to this field sub-block is 400, and these 400 data sub-blocks are used to construct the field sub-block. Multiple data sub-block files are generated; for example, a corresponding data sub-block file is generated for each data sub-block in the field sub-block.

[0198] In this embodiment of the disclosure, the encapsulation configuration file contains one corresponding encapsulation entry for each data sub-block file. The key information of each encapsulation entry in the encapsulation configuration file is the "identifier of the data sub-block". Through the "identifier of the data sub-block", the corresponding data sub-block file can be obtained, and the meaning of the data sub-block can also be obtained.

[0199] It should be noted that the encapsulation configuration file can be replaced by traversing all the data sub-block files. Each field sub-block contains multiple data sub-block descriptors and multiple data sub-blocks, with a one-to-one correspondence between the data sub-block descriptors and the data sub-blocks.

[0200] Step 403: Generate a core dump file (Coredump) based on the encapsulation configuration file and data sub-block files.

[0201] In some embodiments, the kernel dump file is obtained by encapsulating based on the encapsulation configuration file and the data sub-block file, including: obtaining the encapsulation entries of the kernel registers of the current kernel and the corresponding data sub-block files from the encapsulation configuration file, and generating the corresponding program header and sector according to the kernel dump file format (e.g., the ELF file format available from http: / / www.skyfree.org / linux / references / ELF_Format.pdf); obtaining the encapsulation entries of all kernel-independent storage, all kernel-shared fast storage, and all kernel-shared slow storage of the current kernel, and the corresponding data sub-block files from the encapsulation configuration file, and generating the corresponding program header and sector according to the target file format of the kernel dump type; and generating the corresponding program header and sector according to the kernel dump file format.

[0202] Next, based on the multiple sets of program headers and sectors generated earlier, the ELF header of the ELF file is generated.

[0203] Finally, the generated ELF header is combined with multiple sets of program headers and sectors to generate a core dump file.

[0204] The aforementioned embodiments generate a coredump file after acquiring multiple data sub-block files and a packaging configuration file. It is understood that the control unit has multiple processor cores, and a coredump file is generated for each processor core. Each processor core's coredump file includes a program header and sector representing the core's internal registers (program header and sector are fields defined in the existing ELF file format for coredump files); a program header and sector representing the core's independent internal memory area; a program header and sector representing the core's inter-core shared fast memory; a program header and sector representing the core's inter-core shared slow memory; and an ELF header, generated based on the above content. The "data sub-block identifier" in the data sub-block file or packaging configuration file provides specific information corresponding to the aforementioned coredump file. For example, the "data sub-block identifier" represents "core registers of processor core 0" and "inter-core shared fast memory of processor core 1." The main content of the program header comes from the "data sub-block identifier" and the data sub-block descriptor, while the content of the sector comes from the data sub-block.

[0205] Furthermore, a coredump file is generated and provided to existing debugging software (such as GDB), which can use the coredump file to analyze the firmware fault scene.

[0206] As an example, as shown in Figure 16, the steps include the following.

[0207] Step 16.1 Determine if there are any cores for which no coredump file has been generated; if not, it means that the required coredump file has been generated, and proceed directly to step 16.2 to end the process and wait for subsequent debugging and analysis by GDB (GNU Debugger); if yes, proceed to step 16.3 to process the next core.

[0208] Step 16.4 Locate the kernel register encapsulation entries of the current core from the encapsulation configuration file, find the corresponding data sub-block files, and package them into the corresponding Program Header and Sector.

[0209] Specifically, for each core, the encapsulation entries of the core registers of the current core are found from the encapsulation configuration file, and the corresponding data sub-block file is found using the identifier of the data sub-block obtained from the encapsulation entry; according to the ELF file format of the Coredump type, the contents of the data sub-block file are packaged into the Program Header and Sector for the core registers of the current core.

[0210] Specifically, the p_type field in the program header representing the CPU core's internal registers is set to PT_NOTE; the position of each register in the section representing the CPU core's internal registers is filled with the corresponding register value read from the data sub-block file; that is, the p_type field in the Program Header is set to PT_NOTE, the other fields of the Program Header are filled according to the actual situation, and the position of each register in the Sector is filled with the corresponding register value read from the data sub-block file.

[0211] Step 16.5 Locate the encapsulation entries for all independent storage within the current core and the encapsulation conditions for shared fast (slow) storage between all cores from the encapsulation configuration file, and then find the corresponding data sub-block files and package them into the corresponding Program Header and Sector.

[0212] Specifically, the `p_type` field in the Program Header is set to `PT_LOAD`. The base address of the data sub-block is calculated based on the storage area type of the data sub-block in its identifier. This base address is then added to the offset of the data sub-block within the storage area to obtain the starting address of the data sub-block. The `p_vaddr` and `p_paddr` fields in the Program Header are set to this starting address, where the base address is determined by the storage area type of the data sub-block in its identifier; the starting address of the data sub-block is determined by adding the offset of the data sub-block within the storage area to the base address. The `p_filesz` and `p_memsz` fields in the Program Header are set to the size of the data sub-block file. The Sector represents the contents of the complete data sub-block file read.

[0213] Step 16.6 generates the header for the ELF (Executable and Linkable Format) file format. Specifically, the e_type field in the ELF Header is set to ET_CORE, and other fields in the ELF Header are generated according to the actual situation.

[0214] Step 16.7 Combine the ELF Header, all Program Headers, and all Sectors to generate an ELF file, resulting in a Coredump file.

[0215] The ELF header generated in step 16.6, along with all Program headers and sectors generated in steps 16.4 and 16.5, are combined to generate an ELF file, which is the Coredump file. The ELF file generated for each kernel can be debugged and analyzed using GDB.

[0216] It should be noted that each core does not necessarily need all data sub-blocks to encapsulate a Coredump file. If the saved state is incomplete after a power failure, a Coredump file can still be encapsulated from the existing data sub-blocks for GDB debugging and analysis.

[0217] Therefore, the most critical debugging information can be saved in a short time, and even if the data is not fully saved, debugging and analysis can still be performed. This ensures that when a power failure occurs, the key information of the firmware can be preserved, and the problem can be quickly located based on the saved field data, which greatly improves the efficiency of debugging.

[0218] Figure 17 is a schematic diagram of a firmware field device for abnormal power failure provided in an embodiment of this disclosure. This device can be implemented by software and / or hardware and is generally integrated into an electronic device. As shown in Figure 17, the device includes: a detection module 501 for detecting abnormal power failure; a sending module 502 for a control core among multiple processor cores to detect a power failure completion flag after a predetermined time following the detection of the abnormal power failure; if the power failure completion flag does not exist, the control core sends an interrupt to the multiple processor cores; a generation module 503 for each of the multiple processor cores to generate field sub-blocks based on the corresponding data sub-blocks obtained from the interrupt; and a storage module 504 for the control core to store part or all of the multiple field sub-blocks in a target area of ​​a persistent storage medium space based on a sorting result of the multiple field sub-blocks according to the probability of an anomaly occurring.

[0219] Figure 18 is a schematic diagram of a device for obtaining the firmware field during abnormal power loss according to an embodiment of this disclosure. The device can be implemented by software and / or hardware and is generally integrated into an electronic device. As shown in Figure 18, the device, applied to a storage device, includes: a read judgment module 601, used when the storage device is powered on, if it is detected that the abnormal power-down processing flow of the storage device did not end normally during the previous power-down, and based on multiple fixed-size regions of a fixed-size continuous space in the persistent storage medium space, sequentially reads the data block corresponding to the first fixed-size sub-region of each of the multiple fixed-size regions; a read acquisition module 602, used to read the next fixed-size region if no valid data block is read from the first fixed-size sub-region of the current fixed-size region, and to continue reading the current fixed-size region if a valid data block is read from the first fixed-size sub-region of the current fixed-size region, thereby acquiring all data blocks constituting the firmware context; a determination module 603, used to determine the firmware context length based on the total number of all data blocks; and a response module 604, used to respond to the host reading the Telemetry log, determine the size of the Telemetry Log based on the firmware context length, and use one, more, or all of the data blocks as the content of the Telemetry Log.

[0220] Figure 19 is a schematic diagram of an apparatus for generating a core dump file in the firmware during an abnormal power failure, according to an embodiment of this disclosure. This apparatus can be implemented by software and / or hardware and is generally integrated into an electronic device. As shown in Figure 19, applied to a host computer, the apparatus includes: a reading module 701 for reading Telemetry Log from the storage device, the Telemetry Log including firmware field data; a parsing module 702 for parsing and unpacking the firmware field data to generate a packaged configuration file and a data sub-block file; and a packaging module 703 for obtaining a core dump file based on the packaged configuration file and the data sub-block file.

[0221] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.

[0222] The units described in the embodiments of this disclosure can be implemented in software or hardware. The names of the units are not, in some cases, intended to limit the specific unit.

[0223] The functions described above in this document can be performed, at least in part, by one or more hardware logic components. For example, exemplary types of hardware logic components that can be used, without limitation, include: Field Programmable Gate Arrays (FPGAs), Application-Specific Integrated Circuits (ASICs), Application Standard Products (ASSPs), System-on-Chip (SoCs), Complex Programmable Logic Devices (CPLDs), and so on.

[0224] In the context of this disclosure, a machine-readable medium can be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium can be, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

Claims

1. A method for saving firmware state during abnormal power loss, wherein, Used in storage devices, including: An abnormal power outage was detected; After a predetermined time has elapsed since the abnormal power failure was detected, the control core among the multiple processor cores checks the power failure completion flag. If the power failure completion flag is not present, the control core sends an interrupt to the multiple processor cores. Each of the multiple processor cores obtains a corresponding data sub-block based on the interrupt and generates a context sub-block, resulting in multiple context sub-blocks; Based on the sorting result of the multiple field sub-blocks according to the probability of anomalies occurring, the control core stores some or all of the multiple field sub-blocks into the target area of ​​the persistent storage medium space.

2. The method according to claim 1, wherein, The method further includes: A preset time longer than the average duration is set, wherein the average duration is determined based on multiple historical durations during which the storage device completes a power outage process.

3. The method according to claim 1, wherein, The firmware includes multiple logical modules running on the multiple processor cores; The power-down completion flag includes one or more power-down completion flags generated by the plurality of logic modules; Based on one or more power-down completion flags generated by the plurality of logic modules, determine the probability of one or more anomalies occurring in the plurality of logic modules; or, Based on the historical anomaly occurrences of one or more of the plurality of logic modules, the probability of anomalies occurring in one or more of the plurality of logic modules is determined.

4. The method according to claim 3, wherein, Sort the current process of abnormal power failure handling for each logical module in the firmware according to its progress; and based on the sorting result, sort the abnormal probability of each logical module in the firmware according to its probability; and / or, Based on the firmware context content processed by each logical module in the firmware, the probability of anomalies for each logical module in the firmware is set.

5. The method according to claim 1, wherein, In response to the detection of the abnormal power failure, each of the processing cores processes its own abnormal power failure processing flow in parallel. In response to the interruption, the multiple processor cores pause their own abnormal power failure processing flow, generate the multiple field sub-blocks, and then resume processing their own abnormal power failure processing flow.

6. The method according to claim 1, wherein, The firmware includes multiple logical modules running on the multiple processor cores, and each logical module's own abnormal power-down handling process includes multiple power-down steps; The method further includes: in response to detecting the abnormal power failure, each of the logic modules executes its own abnormal power failure handling process, and records the power failure step number that has been completed during the execution of its own abnormal power failure handling process; After all the multiple logic modules have completed their own abnormal power failure handling process, the power failure completion flag is generated.

7. The method according to claim 1, wherein, The firmware context includes multiple logical parts; each logical part is divided into one or more data sub-blocks; the context sub-blocks are generated based on the data sub-blocks.

8. The method according to claim 7, wherein, The logic section includes one or more of the following: in-core register status, independent memory area data of the core, inter-core shared fast memory, inter-core shared slow memory, stack information, SQ or CQ queue, Admin queue, memory management information, memory information, other processor status information, hardware accelerator status information, cache contents, inter-core communication queue, and inter-core shared memory.

9. The method according to claim 7, wherein, Each of the aforementioned field sub-blocks includes: A data subblock descriptor, used to describe information about the data subblocks contained in the field subblock, includes: multiple data subblock descriptor entries, each corresponding to one of the data subblocks; wherein, each data subblock descriptor entry includes: the type of the data subblock, used to describe that the data subblock comes from a specific logical part, the logical part including one or more of the following: in-core register status, in-core independent memory area data, inter-core shared fast memory, inter-core shared slow memory, stack information, SQ or CQ queue, Admin queue, memory management information, memory information, other processor status information, hardware accelerator status information, cache contents, inter-core communication queue, and inter-core shared memory; The data sub-block area is used to store multiple data sub-blocks, each of which corresponds to one of the data sub-block descriptor entries.

10. The method according to claim 6, wherein, The method further includes: When the control core determines that the power outage is not complete, it determines the probability of an anomaly occurring in each of the multiple logic modules based on the power outage step numbers that have been completed by each of the multiple logic modules. The anomalies are sorted according to their probability of occurrence to obtain the sorting result.

11. The method according to claim 10, wherein, After a predetermined time has elapsed since the abnormal power failure was detected, the control core neither sends an interrupt to the plurality of processor cores nor saves the firmware state, based on the presence of the power failure completion flag.

12. The method according to claim 1, wherein, Each of the multiple processor cores obtains a corresponding data sub-block based on the interrupt and generates a context sub-block, resulting in multiple context sub-blocks, including: After receiving the interrupt, each processor core packages its corresponding data sub-block into the corresponding field sub-block and adds marking information. After the control core generates its corresponding field sub-block, it exits the interrupt handling process. Other processor cores that are not the control core, after generating their own corresponding field sub-blocks, generate field sub-blocks including tag information from other data sub-blocks according to the sorting results, and exit the interrupt handling process.

13. The method according to claim 12, wherein, After each processor core exits its interrupt handling process, it continues to execute its own abnormal power failure handling process.

14. The method according to claim 1, wherein, The method further includes: Obtain a fixed-size contiguous space in the persistent storage medium space; The fixed-size continuous space is divided into multiple fixed-size regions. The target region is determined from the multiple fixed-size regions, and the target sequence number corresponding to the target region is obtained. When the abnormal power failure handling process is completed, the target sequence number is stored in the target location of the persistent storage medium space.

15. The method according to claim 1, wherein, The control core, based on the sorting results of the plurality of field sub-blocks according to the probability of anomalies, stores part or all of the plurality of field sub-blocks into a target area of ​​the persistent storage medium space, including: Each of the aforementioned field sub-blocks is divided into small data blocks of fixed size, and descriptive information is added to each small data block to obtain the target data block; All target data blocks are stored sequentially in the target area according to the sorting result; wherein, after each target data block is stored, the power-down completion flag is detected, and if the power-down is completed based on the power-down completion flag, the operation of storing part or all of the multiple field sub-blocks in the target area of ​​the persistent storage medium space is terminated; and if the power-down is not completed based on the power-down completion flag, the operation of storing part or all of the multiple field sub-blocks in the target area of ​​the persistent storage medium space continues.

16. The method according to any one of claims 1-15, wherein, The method further includes: When the storage device is powered on, if it is detected that the abnormal power failure handling process did not end normally when the storage device was last powered off, the firmware state saved when the storage device was last powered off is obtained from a fixed-size continuous space. Set the target index corresponding to the firmware context so that the host can read the firmware context based on the target index.

17. A method for obtaining the firmware context during an abnormal power outage, wherein, Applied to a storage device, the method includes: When the storage device is powered on, if it is detected that the abnormal power failure handling process did not end normally when the storage device was powered off last time, the data block corresponding to the first fixed-size sub-region of each of the multiple fixed-size regions in the persistent storage medium space is read sequentially based on multiple fixed-size regions of a fixed-size continuous space. If no valid data block is read from the first fixed-size sub-region of the current fixed-size region, then the next fixed-size region is read. If a valid data block is read from the first fixed-size sub-region of the current fixed-size region, then the current fixed-size region is read again to obtain all the data blocks that constitute the firmware context. The firmware field length is determined based on the total number of all data blocks. In response to the host reading the Telemetry log, the size of the Telemetry Log is determined based on the firmware field length, and the content of the Telemetry Log is based on one, more, or all of the data blocks.

18. The method according to claim 17, wherein, The method further includes: When reading each data block, the validity of the data block is determined based on the data description of the data block; The data description includes a data block identifier and a data block sequence number; wherein, the data block identifier is used to indicate that the content of the small data block is a backup of the firmware environment, and the data block sequence number is used to indicate the sequence number of the small data block in the current environment sub-block or in all environment sub-blocks.

19. The method according to claim 17 or 18, wherein, The method further includes: If it is identified that the abnormal power failure process ended normally when the storage device was last powered down, the target sequence number is read from the target address in the persistent storage medium space; If the target sequence number is a valid region sequence number, the target sequence number is deleted, and the fixed-size region corresponding to the target sequence number is erased.

20. The method according to any one of claims 17-19, wherein, Each of the fixed-size regions includes multiple fixed-size sub-regions; Each of the fixed-size sub-regions includes multiple fixed-size data blocks; When the storage device was powered down for the last time, causing the firmware context to be written to a selected fixed-size region, the context sub-block was split into multiple small data blocks, and the data description was added to each small data block; wherein, the size of each data block in the fixed-size region is equal to the combined size of the small data block and the data description.

21. The method according to any one of claims 17-20, wherein, The target sequence number is read based on the target location in the persistent storage medium space; wherein, when the storage device was last powered off, a fixed-size region was selected from the fixed-size continuous space of the persistent storage medium space, and the sequence number of the selected fixed-size region was recorded as the target sequence number; Read the fixed-size region corresponding to the target sequence number from the plurality of fixed-size regions.

22. The method according to any one of claims 17-21, wherein, Multiple small data blocks form a field sub-block; Each of the field sub-blocks includes a data sub-block descriptor and multiple data sub-blocks. The data sub-block descriptor is used to describe the information of the data sub-blocks contained in the field sub-block. A data subblock descriptor consists of multiple data subblock descriptor entries, each corresponding to one of the data subblocks.

23. The method according to any one of claims 17-22, wherein, The telemetry log includes a first telemetry log area 1; wherein, the telemetry log area 1 points to the address of the first data block where the firmware is located; When the host reads the Telemetry log header, the storage device sets the Last Block of the Telemetry log area 1 to the firmware field length plus the Telemetry log header length in response to the host. When the host reads the Telemetry log area1, the storage device provides all the data blocks that constitute the firmware context to the host.

24. The method according to any one of claims 17-23, wherein, The Telemetry log also includes a second Telemetry log area 2 and a third Telemetry log area 3; wherein, the Telemetry log area 2 and the Telemetry log area 3 are set to empty; When the host reads the Telemetry log header, the storage device sets the Last Block of Telemetry log area 2 and Telemetry log area 3 to the firmware field length plus the Telemetry log header length in response to the host.

25. A method for generating a core dump file in the field using firmware that has experienced an abnormal power failure, wherein, Applied to a host computer connected to a storage device, the method includes: Read the Telemetry Log from the storage device, the Telemetry Log including firmware field data; The firmware field data is parsed and unpacked to generate encapsulation configuration files and data sub-block files; Based on the encapsulation configuration file and the data sub-block file, the core dump file is obtained.

26. The method according to claim 25, wherein, The step of parsing and unpacking the firmware field data to generate encapsulation configuration files and data sub-block files includes: Read the number from each field subblock of the firmware field data; Obtain the data sub-block descriptor from the field sub-block; For each data subblock descriptor entry, obtain the storage area type of the data subblock from the data subblock descriptor entry, determine the data subblock identifier based on the storage area type and the number, obtain the data subblock size and the location of the data subblock within the field subblock from the data subblock descriptor entry; The data sub-block file is generated based on the data sub-block identifier, the position of the data sub-block within the field sub-block, and the size of the data sub-block. Based on the data sub-block identifier, the data sub-block size, and the position of the data sub-block within the field sub-block, an encapsulation entry is generated and added to the encapsulation configuration file; or the process of parsing and unpacking the firmware field data to generate an encapsulation configuration file and a data sub-block file includes: Based on the log area of ​​each field sub-block in the firmware field data, obtain the data sub-block identifier, data sub-block position and data sub-block size corresponding to all data sub-blocks; Based on the data sub-block identifier, the data sub-block size, and the data sub-block, an encapsulation entry is generated at the location within the field sub-block and added to the encapsulation configuration file.

27. The method according to claim 25 or 26, wherein, When the magic number of the field sub-block is obtained based on the firmware field data, the first data block of the field sub-block corresponding to the magic number is obtained, and the total number of data blocks of the field sub-block to which the first data block belongs is obtained based on the size of the field sub-block; Obtain the data blocks corresponding to all the data block counts, and construct a field sub-block. The magic number of the field sub-block is obtained from the firmware field. After verifying the validity of the magic number of the field sub-block, the field sub-block is read.

28. The method according to any one of claims 25-27, wherein, The storage area type of the data sub-block is used to describe which logical part the data sub-block comes from; The logic portion includes one or more of the following: in-core register status, in-core independent memory area data, inter-core shared fast memory, inter-core shared slow memory, stack information, SQ or CQ queue, Admin queue, memory management information, memory information, other processor status information, hardware accelerator status information, cache contents, inter-core communication queue, and inter-core shared memory.

29. The method according to any one of claims 25-28, wherein, The process of obtaining the core dump file based on the encapsulation configuration file and the data sub-block file includes: The package entry representing the CPU core's internal registers and the corresponding data sub-block file are obtained from the package configuration file and the program header and section representing the CPU core's internal registers are generated according to the target file format of the core dump type. From the encapsulation configuration file, obtain all encapsulation entries for independent storage within the current core, all encapsulation entries for shared fast storage between cores, and all encapsulation entries for shared slow storage between cores, as well as the corresponding data sub-block files, and generate the corresponding program header and section according to the target file format of the core dump type; Generate the target file header according to the target file format of the core dump type; The target file header, all program headers, and all sections are combined to generate the core dump file.

30. The method according to any one of claims 25-29, wherein, The filename of the data sub-block file is determined based on the identifier of the data sub-block; The encapsulation entry includes the identifier of the data sub-block; When reading the encapsulation entry to generate the program header or section, the corresponding data sub-block file is obtained based on the identifier of the data sub-block in the encapsulation entry; Find the kernel register encapsulation entry for the current core from the encapsulation configuration file, find the corresponding data sub-block file, and package it into the corresponding program header and section; or Find all the encapsulation entries for independent storage within the current core and the encapsulation entries for shared fast and / or slow storage between all cores from the encapsulation configuration file, and find the corresponding data sub-block files, and package them into the corresponding program header and section.

31. The method according to claim 29, wherein, Set the p_type field in the program header representing the CPU core's internal registers to PT_NOTE; The position of each register in the section representing the CPU core's internal registers is filled with the corresponding register value read from the data sub-block file.

32. The method according to claim 31, wherein, The encapsulation configuration file is used to obtain encapsulation entries for all independent storage within the current core, encapsulation entries for all shared fast storage between cores, and encapsulation entries for all shared slow storage between cores, as well as the corresponding data sub-block files. The corresponding program header and sections are then generated according to the target file format of the core dump type, including: Set the p_type field in the program header to PT_LOAD; Set the p_vaddr and p_paddr fields in the program header to the starting address, where the base address of the storage area type is determined by the storage area type of the data sub-block in the identifier of the data sub-block; add the offset of the data sub-block in the storage area to the base address to determine the starting address of the data sub-block; Set the p_filesz and p_memsz fields in the program header to the size of the data sub-block files; A section contains the contents read from a complete data sub-block file; Generate an ELF file format header, where the e_type field in the ELF Header is set to ET_CORE.

33. A method for obtaining a kernel dump file generated on-site from firmware after an abnormal power failure, wherein, include: When the storage device is powered on, if it is detected that the abnormal power failure handling process of the storage device did not end normally during the previous power failure, the data block corresponding to the first fixed-size sub-region of each of the multiple fixed-size regions in the persistent storage medium space is read sequentially based on multiple fixed-size regions of a fixed-size continuous space. If no valid data block is read from the first fixed-size sub-region of the current fixed-size region, then the next fixed-size region is read. If a valid data block is read from the first fixed-size sub-region of the current fixed-size region, then the current fixed-size region is read again to obtain all the data blocks that constitute the firmware context. The firmware field length is determined based on the total number of all data blocks. The host reads the Telemetry Log from the storage device, the Telemetry Log including firmware context data; in response to the host reading the Telemetry Log, the storage device determines the size of the Telemetry Log based on the firmware context length, and uses one, more, or all of the data blocks as the content of the Telemetry Log; The host parses and unpacks the firmware field data to generate a packaged configuration file and a data sub-block file. The host obtains the core dump file based on the encapsulation configuration file and the data sub-block file.

34. An electronic device, wherein, The electronic device includes: processor; Memory used to store the processor's executable instructions; The processor is configured to read the executable instructions from the memory and execute the executable instructions to implement the method for saving firmware state during abnormal power failure as described in any one of claims 1-16, or the method for obtaining firmware state during abnormal power failure as described in any one of claims 17-24, or the method for generating a core dump file using firmware state during abnormal power failure as described in any one of claims 25-32, or the method for generating a core dump file using firmware state during abnormal power failure as described in claim 33.

35. A system for obtaining firmware in the field to generate a core dump file after an abnormal power failure, comprising a host and a storage device; When the storage device is powered on, if it is detected that the abnormal power failure handling process of the storage device did not end normally during the previous power failure, the data block corresponding to the first fixed-size sub-region of each of the multiple fixed-size regions in the persistent storage medium space is read sequentially based on multiple fixed-size regions of a fixed-size continuous space. If no valid data block is read from the first fixed-size sub-region of the current fixed-size region, then the next fixed-size region is read. If a valid data block is read from the first fixed-size sub-region of the current fixed-size region, then the current fixed-size region is read again to obtain all the data blocks that constitute the firmware context. The firmware field length is determined based on the total number of all data blocks. The host is configured to read Telemetry Log from the storage device, the Telemetry Log including firmware context data; the storage device, in response to the host reading the Telemetry Log, determines the size of the Telemetry Log based on the firmware context length, and uses one, more, or all of the data blocks as the content of the Telemetry Log; The host is also used to parse and unpack the firmware field data, generate a packaging configuration file and a data sub-block file, and obtain a core dump file based on the packaging configuration file and the data sub-block file.