A method and apparatus for failure traceability of an aging test
By embedding a snapshot scheduler and a structured data acquisition mechanism into aging tests, the problem of accurately locating faulty modules in existing technologies is solved. This enables precise location of faulty modules and reliable data preservation, reduces labor costs, and supports performance analysis.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- 70MAI CO LTD
- Filing Date
- 2026-03-26
- Publication Date
- 2026-06-30
AI Technical Summary
Existing aging tests cannot accurately locate faulty modules, and the test results are only binary signals at the whole machine level. Manual troubleshooting is time-consuming and costly. Log systems are unstructured and difficult to parse automatically, making it impossible to construct performance degradation curves.
By embedding a snapshot scheduler in the main control MCU, the status parameters of the aging task module are periodically collected, structured snapshot frames are generated and stored in non-volatile memory, and fault analysis and report generation are performed using a host computer.
It enables precise location of faulty modules without disassembling the device or modifying the firmware, ensuring the integrity and reliability of fault data, supporting cross-batch trend analysis, and reducing labor costs.
Smart Images

Figure CN122309261A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of computer technology, and in particular to a method for tracing the source of failures in aging tests. Background Technology
[0002] In existing technologies, during the manufacturing process of smart hardware products, aging testing, as a key step in verifying the long-term operational reliability of the equipment, is typically performed by the main control chip controlling multiple functional modules to synchronously execute preset operations. Typical aging tasks include: continuous camera recording, LED light color cycling, periodic on / off of infrared fill light, reciprocating switching of the IR-cut filter motor, repeated connection and disconnection of the 4G communication module, continuous playback of test tones by the audio speaker, periodic on / off of the microphone, multiple rounds of battery charging and discharging, and timed system restarts. These operations are uniformly scheduled by the factory test mode in the device firmware, with durations ranging from several hours to tens of hours. After the test, the device provides feedback to the production line via a single status indicator light (e.g., a flashing white light indicates PASS, and a flashing orange light indicates FAIL). This mechanism does not record the operating status of each module during the test and does not save any intermediate data. Furthermore, existing technologies have the following technical shortcomings in the aging testing process: The test results, presented only as system-wide binary signals (PASS / FAIL), cannot pinpoint the specific failed functional module or the time of failure. When tests fail, manual intervention is required to reproduce the fault scenario, disassemble the device to read internal logs, or temporarily modify the firmware to increase debugging output, which results in a long troubleshooting cycle and high labor costs. The device has a built-in logging system, whose output is mostly unstructured ASCII text streams. It lacks unified field definitions and is difficult to be efficiently parsed by automated tools. At the same time, these logs are easily lost in abnormal situations such as power failure or watchdog reset. Because the state parameters of each module are not fixed, it is impossible to construct a performance degradation curve and it is difficult to identify early deterioration trends such as a slow increase in motor rotation resistance and a gradual increase in battery internal resistance, which limits the quality feedforward control capability. Summary of the Invention
[0003] One objective of this application is to provide a fault tracing method and device for aging tests, which enables the tracing of faults that occur during the aging test of the device under test.
[0004] According to one aspect of this application, a fault tracing method for aging tests is provided, wherein the method includes: The host computer sends a test preparation command to the device under test, so that the main control MCU can configure each aging task module to enter the test state according to the preparation command, and at the same time create an independent task thread of the snapshot scheduler and set the snapshot acquisition cycle. After the aging test is started, the main control MCU triggers snapshot events periodically according to the snapshot acquisition cycle. After each snapshot acquisition cycle ends, the snapshot scheduler broadcasts the status acquisition request signal to each of the aging task modules so that each of the aging task modules suspends the current non-critical loop operation within a preset time window and reads the predefined status parameters from the status register or global variables. The main control MCU packages all the read status parameters into a snapshot frame in binary form and writes it into a circular buffer in non-volatile memory; After the aging test is completed, the device under test enters a low-power standby state; the host computer sends a query command to the device under test, so that the main control MCU responds and returns a response packet. The response packet includes a 32-bit checksum indicating whether the aging test passed or not, the total number of valid snapshots, and the contents of the entire circular buffer. In response to the received response packet, the host computer sends paging read instructions to the device under test in a loop, so that the main control MCU can locate the corresponding snapshot frame from the loop buffer according to the index value carried by the paging read instruction and return frame by frame; After receiving all the snapshot frames, the host computer verifies the CRC checksum of each snapshot frame, reconstructs and extracts the status parameters of each aging task module according to the timestamp in the frame header of the snapshot frames in sequence, and plots the time status curve. The host computer loads the built-in rule engine to match the time state curve with the corresponding target fault event, and associates the successfully matched target fault event with the corresponding snapshot index to generate a structured fault report. The built-in rule engine includes at least two fault discrimination rules, and the structured fault report includes the fault module name, the first abnormal timestamp, the abnormal feature description, the associated snapshot index, and the original values of the state parameters.
[0005] Furthermore, the above method further includes: The host computer establishes a communication connection with the main control MCU in the device under test through a standard serial interface; The main control MCU is connected to each of the aging task modules via an internal bus or GPIO pins. The aging task module includes one or more of the following: camera module, LED driver circuit, IR-cut motor driver unit, mobile network communication module, audio power amplifier chip, microphone acquisition circuit, and battery management unit. The main control MCU is connected to the non-volatile memory via an SPI or QSPI interface. The non-volatile memory includes the built-in Flash memory of the device under test, or an externally deployed SPI Flash chip connected to the main control MCU via a QSPI interface.
[0006] Furthermore, the above method further includes: The serial communication baud rate between the host computer and the device under test is set to 115200bps. The communication between the main control MCU and each of the aging task modules adopts a synchronous interrupt mechanism. Write operations to the non-volatile memory are accelerated by a DMA controller.
[0007] Furthermore, in the above method, the main control MCU configures each aging task module to enter the test state according to the preparation instruction, and simultaneously creates an independent task thread for the snapshot scheduler and sets the acquisition cycle, including: The main control MCU configures each aging task module to enter the corresponding test state according to the single aging duration and number of cycles carried by the preparation instruction, and assigns an independent task control handle to each aging task module. Meanwhile, the main control MCU creates an independent task thread for the snapshot scheduler in its internal RAM and enters a waiting state, and sets the snapshot acquisition cycle of the snapshot scheduler. The task priority of the snapshot scheduler is higher than the priority of the aging test task and lower than the priority of the interrupt service routine.
[0008] Furthermore, the above method further includes: The total length of the snapshot frame is fixed at 64 bytes, including a frame header, multiple status fields, and a frame trailer; wherein, The frame header consists of a 4-byte timestamp and a 2-byte frame sequence number. The timestamp is used to indicate the cumulative number of seconds since the aging process started. The motor status field occupies 4 bytes and is used to store the position code value and direction indicator. The LED status field occupies 6 bytes and records the duty cycle of the R, G, and B channels respectively, with each channel occupying 2 bytes. The battery status field occupies 8 bytes and stores 2 bytes of voltage, 2 bytes of remaining capacity, 2 bytes of temperature, and 2 bytes of charge / discharge status indicator. The communication module status field occupies 6 bytes and contains 2 bytes of mobile network signal strength, 2 bytes of registration failure count, and 2 bytes of current connection status. The space for other module status fields is dynamically allocated according to the actual number of modules participating in the aging process, with a maximum of 32 bytes. The last 2 bytes of the frame are a 16-bit CRC checksum, used to verify the integrity of the snapshot frame.
[0009] Furthermore, the above method further includes: The circular buffer starts at the starting address and ends at the ending address, with a total size of 8KB, and can hold 128 snapshot frames of 64 bytes each. The snapshot frames written to the circular buffer include normal snapshot frames and emergency snapshot frames. The circular buffer contains two pointers: a write pointer and a read pointer. Initially, both the write pointer and the read pointer point to the starting address. Each time a new normal snapshot frame is written, the main control MCU writes the new normal snapshot frame to the position indicated by the write pointer and moves the write pointer forward by 64 bytes. When the write pointer reaches the end address of the circular buffer, it automatically wraps back to the starting address to continue writing, overwriting the earliest written normal snapshot frame. If any target aging task module detects an anomaly during the aging test, the target aging task module immediately sends a high-level signal to the main control MCU through a preset hardware interrupt line. Upon receiving the high-level signal, the main control MCU immediately sets a global fault flag in its interrupt service routine, stops all aging task threads, forces a status acquisition, generates an emergency snapshot frame, and writes it to the next available position of the write pointer or retains it by expanding the tail reserved area of the circular buffer. The emergency snapshot frame does not participate in the first-in-first-out (FIFO) overwrite logic of the circular buffer. The 0th byte of the frame header of the emergency snapshot frame is set to 0xFF to indicate that it is an emergency type.
[0010] Furthermore, in the above method, the snapshot acquisition period ranges from 1 minute to 1 hour to adapt to different product forms and is dynamically adjusted via AT commands.
[0011] Furthermore, the above method further includes: The snapshot frame is a snapshot frame encoded using TLV.
[0012] According to one aspect of this application, a non-volatile storage medium is provided that stores computer-readable instructions thereon, which, when executed by a processor, cause the processor to implement the fault tracing method of the aging test described above.
[0013] According to one aspect of this application, a fault tracing device for aging tests is provided, wherein the device includes: One or more processors; Computer-readable medium for storing one or more computer-readable instructions. When the one or more computer-readable instructions are executed by the one or more processors, the one or more processors implement the fault tracing method of aging test as described above.
[0014] Compared with existing technologies, this application first sends a test preparation command to the device under test (DUT) via a host computer, so that the main control MCU configures each aging task module to enter the test state according to the preparation command. Simultaneously, an independent task thread for a snapshot scheduler is created and a snapshot acquisition cycle is set. After the aging test starts, the main control MCU periodically triggers snapshot events according to the snapshot acquisition cycle. After each snapshot acquisition cycle ends, the snapshot scheduler broadcasts a status acquisition request signal to each aging task module, so that each aging task module suspends its current non-critical loop operation within a preset time window and reads predefined status parameters from the status register or global variables. The main control MCU packages all read status parameters into a binary snapshot frame and writes it to a circular buffer in non-volatile memory. After the aging test ends, the DUT enters a low-power standby state. The host computer sends a query command to the DUT, so that the main control MCU responds and returns a response packet. The response packet includes a final result indicating whether the aging test passed or failed, and other relevant information. The system calculates the total number of effective snapshots and the 32-bit checksum of the entire circular buffer. In response to the received response packet, the host computer sends paging read instructions to the device under test (DUT) in a loop, enabling the main control MCU to locate the corresponding snapshot frame from the circular buffer based on the index value carried by the paging read instruction and return frame by frame. After receiving all snapshot frames, the host computer verifies the CRC checksum of each snapshot frame, reconstructs and extracts the status parameters of each aging task module according to the timestamps in the frame headers of the snapshot frames, and plots the time status curve. The host computer loads a built-in rule engine to match the time status curve with corresponding target fault events, and associates the successfully matched target fault events with the corresponding snapshot index to generate a structured fault report. The built-in rule engine includes at least two fault discrimination rules, and the structured fault report includes the fault module name, the first abnormal timestamp, the abnormal feature description, the associated snapshot index, and the original values of the status parameters, thus enabling the tracing of faults occurring during the aging test of the DUT.
[0015] In this application, by embedding a snapshot scheduler and a structured data acquisition mechanism into the main control MCU of the device under test, the granularity of fault location is refined from the whole machine level to specific functional modules and precise time windows. This allows engineers to directly reproduce the fault context based on the snapshot sequence without disassembling the device or modifying the firmware. The snapshot frames adopt a fixed-format binary structure and are written to non-volatile Flash memory. Even if a sudden power failure occurs during testing, the most recently written snapshot data can still be completely retained, ensuring the reliability of fault data. The circular buffer design maximizes the retention of the most recent state history within limited storage space, balancing data integrity and resource constraints. The independent storage strategy for emergency snapshots ensures that the instantaneous state of critical faults is not overwritten. The host computer reads the snapshot sequence in pages using standardized AT commands, which is compatible with the existing production line communication architecture and does not require additional hardware probes or external monitoring equipment, making it suitable for large-scale mass production environments. The structured state data supports cross-batch trend analysis and can identify slow drifts in performance parameters, providing data support for product design improvements. Attached Figure Description
[0016] Other features, objects, and advantages of this application will become more apparent from the following detailed description of non-limiting embodiments with reference to the accompanying drawings: Figure 1 A schematic diagram of the overall process of a fault tracing method for aging tests according to one aspect of this application is shown. Figure 2 This diagram illustrates the timing relationship of a snapshot scheduler triggering periodic state acquisition during an aging test in a fault tracing method for aging tests according to one aspect of this application. Figure 3 This diagram illustrates the data format of a structured snapshot frame in a fault tracing method for aging tests according to one aspect of this application. Figure 4 A schematic diagram illustrating the management mechanism of the circular buffer in the non-volatile memory in a fault tracing method for aging tests according to one aspect of this application is shown. Figure 5 This diagram illustrates the interactive process of a host computer reading a snapshot sequence and performing fault analysis in a fault tracing method for aging tests according to one aspect of this application. Detailed Implementation
[0017] The present application will now be described in further detail with reference to the accompanying drawings.
[0018] In a typical configuration of this application, the terminal, the device of the service network, and the trusted party all include one or more processors (CPUs), input / output interfaces, network interfaces, and memory.
[0019] Memory may include non-persistent storage in computer-readable media, such as random access memory (RAM) and / or non-volatile memory, such as read-only memory (ROM) or flash RAM. Memory is an example of computer-readable media.
[0020] Computer-readable media includes both permanent and non-permanent, removable and non-removable media that can store information by any method or technology. Information can be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transferable medium that can be used to store information accessible by a computing device. As defined herein, computer-readable media does not include non-transitory computer-readable media, such as modulated data signals and carrier waves.
[0021] like Figure 1 As shown, Figure 1 This document presents a schematic diagram of the overall process for a fault tracing method in aging testing, as proposed in one aspect of this application. The core of this method lies in the active and periodic acquisition of key status parameters from all functional modules participating in the aging task (i.e., the aging task module) by the main control MCU of the device under test (DUT) during the aging test execution. These status parameters are encapsulated into structured binary snapshot frames and written to a circular buffer in non-volatile memory. After aging, the host computer reads the complete snapshot sequence page by page using standard serial instructions and performs sequential time-series analysis based on the time sequence of the periodically acquired status parameters within each period to achieve precise fault location. The overall system architecture of this method consists of a host computer 1, a DUT 2, an aging task module 3, a main control MCU 4, and non-volatile memory 5, as shown below. Figure 1As shown in the diagram; wherein, the host computer 1 establishes a communication connection with the main control MCU 4 in the device under test 2 through a standard serial interface (such as UART); the main control MCU 4 is connected to each aging task module 3 through an internal bus or GPIO pins, and the aging test module includes, but is not limited to, one or more of the following: camera module, LED driver circuit, IR-cut motor driver unit, mobile network communication module (here, the mobile network includes, but is not limited to, 2G / 3G / 3G / 4G / 5G mobile network), audio power amplifier chip, microphone acquisition circuit, battery management unit, etc.; the main control MCU 4 is connected to the non-volatile memory 5 through an SPI or QSPI interface, wherein the non-volatile memory can be the built-in Flash memory of the device under test 2, or an external SPI Flash chip, used to persistently save state snapshot data. The circular buffer in the non-volatile memory can be deployed on an external SPI Flash chip and connected to the main control MCU through a QSPI interface, which is suitable for devices under test with limited internal resources.
[0022] During the initialization phase of the aging test on the device under test (DUT), the host computer 1 sends a test-triggering preparation command to the DUT 2 (e.g., the command AT+FACT=BI,W, applicable to factory mode). <duration> , <times>The preparation command for triggering the test in this factory mode is transmitted to the main control MCU 4 via serial port. <duration>Duration of a single aging cycle (in seconds). <times>(for the number of cycles), so that the main control MCU4 configures each aging task module to enter the test state according to the preparation instructions, and at the same time creates an independent task thread for the snapshot scheduler and sets the snapshot acquisition period T_snap.
[0023] It should be noted that the range of the acquisition period T_snap includes, but is not limited to, any value between 1 minute and 1 hour, to adapt to the different aging time requirements of different product forms, and is dynamically adjusted by AT commands. In a preferred embodiment of this application, the acquisition period T_snap is preferably 300 seconds (i.e., 5 minutes).
[0024] During the initialization phase of the aging test, the main control MCU configures each aging task module to enter the test state according to the preparation instructions, and simultaneously creates an independent task thread for the snapshot scheduler and sets the acquisition cycle, specifically including: After receiving and parsing the preparation command, the main control MCU4 determines the single aging duration based on the preparation command. <duration>and number of loops <times>Configure each aging task module 3 to enter the corresponding test state, and assign an independent task control handle to each aging task module 3; Meanwhile, the main control MCU4 creates an independent task thread for a snapshot scheduler in its internal RAM and enters a waiting state. The snapshot acquisition period T_snap of the snapshot scheduler is set, preferably 300 seconds. The task priority of the snapshot scheduler is set to be higher than the priority of the aging test task and lower than the priority of the interrupt service routine, so as to ensure that the status acquisition is completed without affecting the critical real-time operation. After the task of the snapshot scheduler is started, it enters a waiting state and prepares to respond to subsequent periodic triggering events.
[0025] After the aging test is started, the main control MCU periodically triggers snapshot events according to the snapshot acquisition cycle (i.e., the snapshot scheduler triggers a snapshot event once every snapshot acquisition cycle T_snap). After each snapshot acquisition cycle ends, the snapshot scheduler broadcasts the status acquisition request signal to each of the aging task modules, so that each of the aging task modules suspends its current non-critical loop operation within a preset time window and reads predefined status parameters from the status register or global variables. For example, when a snapshot event is triggered, the main control MCU broadcasts a status acquisition request signal to all aging task modules. Each aging task module suspends its current non-critical loop operation (such as LED color gradient, audio playback buffer filling, etc.) within a preset time window (such as a time window of 10 milliseconds or any other value) after receiving the status acquisition request signal, and reads predefined critical status registers or variables, including but not limited to: the current position encoding value of the motor, the current RGB channel duty cycle of the LED, the real-time voltage and remaining capacity of the battery, the audio playback frame index, the 4G module signal strength, and the IR-cut switching count counter.
[0026] It should be noted that the collection of state parameters during the aging test can be extended to modules that are not involved in the aging process but affect the system stability, such as Wi-Fi RF power and ambient light sensor readings, to ensure that the performance of the device under test is comprehensively considered.
[0027] The main control MCU packages all the read status parameters into a snapshot frame in binary form and writes it into a circular buffer in non-volatile memory.
[0028] Furthermore, such as Figure 2 As shown, during the aging test, the main control MCU4 starts timing after receiving the aging test start signal 10, and periodically triggers snapshot events according to the preset snapshot acquisition cycle T_snap 11. Each time a snapshot acquisition cycle T_snap ends, the snapshot scheduler immediately broadcasts a status acquisition request signal 12 to all aging task modules 3 via the internal bus. Within a preset time window of 10 milliseconds after receiving the signal, each aging task module 3 suspends its current non-critical loop operation (such as PWM duty cycle update during LED color gradient process, data filling of audio playback buffer, writing of camera recording frames, etc.) and reads predefined critical status parameters from the status registers or global variables of each aging task module. The read status parameters include, but are not limited to, the IR-cut motor drive unit reading the current position encoding value, the LED drive circuit obtaining the current PWM duty cycle value of the RGB three channels, the battery management unit acquiring real-time voltage, remaining capacity and temperature values, the mobile network communication module reporting the current signal strength (RSRP) and registration status, the audio power amplifier chip returning the current playback frame index, and the microphone acquisition circuit recording the current gain setting and sampling rate, etc. These status parameters are packaged together and transmitted back to the main control MCU4 via the internal data bus.
[0029] The main control MCU4 organizes the received status parameters of all aging task modules into a fixed-length binary frame structure according to a preset field order. For example... Figure 3 As shown, the total length of the snapshot frame is fixed at 64 bytes, including a frame header 20, multiple status fields (21 to 25), and a CRC checksum 26 appended to the frame tail. The frame header 20 consists of a 4-byte timestamp and a 2-byte frame sequence number. The timestamp indicates the cumulative number of seconds since the start of aging. The status fields are as follows: the motor status field 21 occupies 4 bytes and stores the position code value and direction flag; the LED status field 22 occupies 6 bytes and records the duty cycle of the R, G, and B channels (2 bytes for each channel); the battery status field 23 occupies 8 bytes and stores the voltage (2 bytes), remaining capacity (2 bytes), temperature (2 bytes), and charge / discharge status flag (2 bytes); the communication module status field 24 occupies 6 bytes and includes the mobile network signal strength (2 bytes), registration failure count (2 bytes), and current connection status (2 bytes); the other module status fields 25 are dynamically allocated according to the number of aging task modules actually participating in the aging process, with a maximum of 32 bytes; the last 2 bytes of the frame tail are a 16-bit CRC checksum 26, used to verify the integrity of the snapshot frame.
[0030] After generating a complete snapshot frame, the main control MCU4 writes the snapshot frame into a pre-divided circular buffer in the non-volatile memory 5. For example, the snapshot frame is written into a pre-divided circular buffer in the built-in Flash memory of the device under test. The size of the buffer is fixed at 8KB and is managed by a first-in-first-out (FIFO) strategy. When the write pointer reaches the end address of the circular buffer, it automatically wraps back to the start address to overwrite the earliest written snapshot frame.
[0031] Among them, such as Figure 4 As shown, the circular buffer starts at the starting address 30 and ends at the ending address 35, with a total size of 8KB. It can hold 128 snapshot frames of 64 bytes each. The snapshot frames written to the circular buffer include normal snapshot frames and emergency snapshot frames. There are two pointers in the circular buffer, namely the write pointer 31 and the read pointer 32. In the initial state, both the write pointer 31 and the read pointer 32 point to the starting address 30. Each time a new normal snapshot frame 33 is written, the main control MCU 4 writes the new normal snapshot frame to the position indicated by the write pointer 31 and moves the write pointer 31 forward by 64 bytes. When the write pointer 31 reaches the ending address 35 of the circular buffer, it automatically wraps back to the starting address 30 to continue writing, overwriting the earliest written normal snapshot frame 33, thereby realizing the FIFO management strategy.
[0032] Furthermore, if any target aging task module detects an anomaly during the aging test (such as a motor drive current continuously exceeding a threshold for 500ms, a battery temperature exceeding 65°C, or a mobile network module failing to register 10 times consecutively, etc.), the target aging task module immediately sends a high-level signal to the main control MCU4 through a preset hardware interrupt line. Upon receiving the high-level signal, the main control MCU4 immediately sets a global fault flag in its interrupt service routine, stops all aging task threads, forces a status parameter acquisition, generates an emergency snapshot frame 34, and writes it to the next available position of the write pointer 31. This emergency snapshot frame does not participate in the aging process. The first-in-first-out (FIFO) overwrite logic of the circular buffer means that even if the circular buffer is full, the emergency snapshot frame 34 will still be appended. If necessary, the emergency snapshot frame 34 can be retained by expanding the reserved area at the end of the circular buffer, thereby ensuring that the emergency snapshot frame formed by the instantaneous state parameters of the critical fault is not overwritten by the subsequent ordinary snapshot frame, thus ensuring the persistent retention of the emergency snapshot frame. The 0th byte of the frame header of the emergency snapshot frame is set to 0xFF to identify the emergency type. Here, ordinary snapshot frames are written into the FIFO circular buffer in the non-volatile memory, and the emergency snapshot triggered by the emergency fault is independently stored and protected in the non-volatile memory.
[0033] After the aging test is completed, regardless of whether the aging test process is completed normally or interrupted due to a fault, the device under test enters a low-power standby state. At this time, the host computer 1 sends a query command to the device under test, such as AT+FACT=BI,R, etc., so that the main control MCU 4 responds and returns a response packet 41. The response packet 41 includes the final result indicating whether the aging test passed or failed (e.g., 0x00 indicates PASS, 0x01 indicates FAIL, etc.), the total number of valid snapshots N (including normal snapshot frames 33 and emergency snapshot frames 34, where N is a positive integer greater than or equal to 1), and a 32-bit checksum of the entire circular buffer contents, such as... Figure 5 As shown; Subsequently, in response to the received response packet, the host computer cyclically sends pagination read instructions 42 to the device under test, such as AT+FACT=BI_SNAP,R. <index> ( <index>(From 0 to N-1), the page read instruction 42 contains an index value, so that the main control MCU can locate the corresponding snapshot frame from the circular buffer of the non-volatile memory 5 according to the index value carried by the page read instruction and return the snapshot frame frame by frame through the serial port 43; here, the page read and verification of the snapshot sequence is realized by standard serial instructions to ensure complete data transmission.
[0034] After receiving all the snapshot frames, the host computer 1 first verifies the CRC checksum of each snapshot frame to remove corrupted data. Then, it reconstructs the complete time series according to the timestamps in the frame header 20 of the snapshot frames and extracts the original values of the status parameters of each status field (21-25) in the status parameters of each aging task module and plots the time status curve. The host computer loads the built-in rule engine 44 to match the time state curve with the corresponding target fault event, and associates the successfully matched target fault event with the corresponding snapshot index to generate a structured fault report. The built-in rule engine includes at least two fault discrimination rules, and the structured fault report includes the fault module name, the first abnormal timestamp, the abnormal feature description, the associated snapshot index, and the original values of the state parameters. For example, for the IR-cut motor status field 21, if the change in the position code value in three consecutive snapshots is less than 1 code unit, it is determined as "motor stall". For the battery status field 23, if the voltage drop between two adjacent snapshots exceeds 0.5V and the temperature rises simultaneously by more than 5°C, it is marked as "battery abnormal discharge". For the mobile network communication module status field 24, if the number of registration failures increases by more than 5 times in a single snapshot period, a "network registration abnormality" alarm is triggered. The rule engine 44 associates the successfully matched target fault events with the corresponding snapshot index, and finally generates a structured fault report 45. The report content of the structured fault report includes, but is not limited to, the fault module name, the first abnormal timestamp, the abnormal feature description, the associated snapshot index and the original values of the status parameters involved in the corresponding target fault events. Here, the host computer uses the rule engine based on the snapshot time series to perform automated fault mode matching and report generation. It can also support cross-batch trend analysis through structured status data and fault reports, identify slow drift of performance parameters, and provide data support for product design improvement.
[0035] Following the above embodiments of this application, during the entire implementation of the aging test, the communication between the main control MCU4 and each aging task module adopts a synchronous interrupt mechanism to ensure that all aging task modules can complete parameter reporting 13 within a specified time after the status acquisition request 12 is issued; the write operation of the non-volatile memory 5 is accelerated by the DMA controller, which can reduce CPU usage; the packaging of snapshot frames and the calculation of CRC are completed by the hardware acceleration unit to improve processing efficiency; the serial communication baud rate between the host computer 1 and the device under test 2 can be set to 115200bps to ensure the transmission stability of the snapshot frame return 43. In addition, to adapt to different product forms, the snapshot acquisition period T_snap can be dynamically adjusted in the range of 1 minute to 1 hour through AT commands; the status parameter fields can be added or deleted according to the specific hardware configuration, for example, the motor status field 21 can be omitted in devices without IR-cut motors; the circular buffer can also be deployed on an external SPI Flash chip and connected to the main control MCU4 through the QSPI interface, which is suitable for embedded platforms with limited internal storage resources. The snapshot frame format can also support snapshot frames using TLV (Type-Length-Value) encoding to enhance the flexibility of field expansion; the host computer reading method can also be replaced by USB HID bulk transmission or Bluetooth GATT service to meet the aging test requirements of wearable devices without serial port output.
[0036] To better illustrate the fault tracing method for aging tests provided in one aspect of this application, in a practical application scenario, such as an aging test station on a smart home camera production line, the device under test is preferably a dual-mode camera integrating IR-cut filter switching, 4G networking, local storage recording, and battery power. The host computer serves as the production line test host, communicating with the main control MCU in the device under test via a UART interface. The aging task module includes an IR-cut motor drive unit, LED status indicators, a 4G communication module, an image sensor, an audio amplifier, a microphone acquisition circuit, and a battery management unit. The non-volatile memory is a built-in 8MB SPI Flash chip, where the address range from 0x000000 to 0x001FFF is divided into an 8KB circular buffer for snapshot storage. The fault tracing method for aging tests of the dual-mode camera in this practical application scenario includes the following specific steps: Step 1: During the aging test initiation and snapshot scheduler initialization, the host computer 1 sends a test trigger preparation command AT+FACT=BI,W,7200,1 to the device under test 2, indicating the execution of an aging test lasting 7200 seconds (2 hours). Upon receiving this test trigger preparation command, the main control MCU 4 parses the parameters and sequentially activates each aging task module 3: the image sensor continuously writes to the TF card at 1080P@30fps, the IR-cut motor completes a "day / night" mode switch every 30 seconds, the LED blinks in a red-green-blue cycle (5-second interval), the 4G communication module disconnects and reconnects every 10 minutes, the speaker loops a 1kHz sine wave, the microphone samples for 10 seconds per minute, and the battery management unit performs a complete charge-discharge cycle. Simultaneously, the main control MCU 4 creates an independent task thread for the snapshot scheduler in its internal RAM, setting its snapshot acquisition period T_snap to 300 seconds. This thread has a higher priority than the aging tasks of each aging task module but lower than the hardware interrupt service routine, ensuring that status acquisition does not interfere with critical real-time operations.
[0037] Step 2, during the periodic snapshot collection process, such as Figure 2 As shown, the main control MCU starts timing after receiving the aging test start signal 10; when the first T_snap cycle (300 seconds) ends, the snapshot scheduler triggers the broadcast of the status acquisition request broadcast 12. The IR-cut motor drive unit pauses the current switching action within 10 milliseconds, reads the current position value 0x1A3F and the movement direction flag "forward" from the position encoder register; the LED drive circuit latches the current RGB three-channel PWM duty cycles as 0x03E8 (R), 0x0000 (G), and 0x0000 (B); the battery management unit obtains the voltage 3.82V (0x0EF2), remaining capacity 85% (0x0055), temperature 32℃ (0x0020), and charging status "constant current charging" through ADC sampling; the 4G module reports RSRP as -95dBm (0xFFA1), registration failure count as 0, and connection status as "LTE registered". The relevant data of the above status parameters are summarized to the main control MCU via the internal bus, and then... Figure 3 The format shown is a 64-byte snapshot frame: the frame header 20 contains a timestamp of 0x0000012C (300 seconds) and a frame sequence number of 0x0001; the motor status field 21 is filled with 0x1A3F0001; the LED status field 22 is filled with 0x03E800000000; the battery status field 23 is filled with 0x0EF2005500200001; the communication module status field 24 is filled with 0xFFA100000002; the other module status field 25 records the audio playback frame index 0x001E, microphone gain 0x0A, etc.; finally, a 16-bit checksum 26 is calculated and appended by the hardware CRC unit.
[0038] Step 3, Snapshot Frame Writing and Emergency Event Handling Phase: The main control MCU writes the snapshot frame to the circular buffer of non-volatile memory 5 via the DMA controller. For example... Figure 4 As shown, the initial write pointer 31 is located at the starting address 30 (0x000000). After the write is completed, the pointer moves forward 64 bytes to 0x000040. At the 5400th second (i.e., the 18th snapshot cycle), the IR-cut motor experiences mechanical jamming, causing the drive current to continuously exceed 2.5A for 520ms. Upon detecting this anomaly, its drive unit immediately pulls up the dedicated interrupt line and sends a hardware interrupt to the main control MCU. The main control MCU4 enters the interrupt service routine, sets the global fault flag, stops all aging task threads, and forcibly triggers a status acquisition. At this time, the 0th byte of the header of the generated emergency snapshot frame is set to 0xFF, while the remaining fields are still filled according to the standard format, forming an emergency snapshot frame 34. This emergency snapshot frame is written to the address currently pointed to by the write pointer 31 (0x000480) and does not participate in the subsequent FIFO overwrite logic. Even if the subsequent normal snapshot frame 33 continues to be written until the buffer is full, this emergency snapshot frame 34 remains at the original address, ensuring that the state at the moment of the fault is completely preserved.
[0039] In step four, during the data readback and analysis process after the test, the device under test (DUT) 2 prematurely terminates the aging test due to a detected motor abnormality and enters a low-power standby state. The host computer 1 sends an AT+FACT=BI,R query command, and the main control MCU 4 returns a response packet 41 containing FAIL (0x01), the total number of valid snapshots N=19 (including 18 normal snapshot frames 33 and 1 emergency snapshot frame 34), and a 32-bit SHA-1 checksum in an 8KB buffer. Subsequently, the host computer 1 sequentially sends 19 page read commands for each snapshot frame, from AT+FACT=BI_SNAP,R,0 to AT+FACT=BI_SNAP,R,18. The main control MCU 4 locates the corresponding address in the non-volatile memory 5 based on the index value; for example, index 18 corresponds to the emergency snapshot frame 34 at address 0x000480, and returns the snapshot frame return 43 frame by frame via UART at a baud rate of 115200bps. After receiving all the snapshot frames, the host computer software first checks the CRC checksum 26 in each snapshot frame to confirm that there are no transmission errors; then it reconstructs the complete state sequence from 0 seconds to 5400 seconds based on the timestamp in the frame header 20.
[0040] Step 5, Fault Rule Matching and Report Generation Stage: The host computer 1 loads the rule engine 44 and performs sliding window analysis on the IR-cut motor status field 21. It finds that the position code value in the three consecutive snapshots of index 16 (t=4800s), 17 (t=5100s), and 18 (t=5400s) is 0x1A3F, with a change of 0, which meets the criterion of "position change of less than 1 unit in three consecutive snapshots". Therefore, it is determined to be "motor stall". At the same time, the battery status field 23 shows that the voltage dropped from 3.85V to 3.28V (ΔV=0.57V>0.5V) and the temperature rose from 31℃ to 38℃ (ΔT=7℃>5℃) during the period from t=5100s to t=5400s, triggering the second-level alarm of "abnormal battery discharge". The rule engine 44 associates the aforementioned fault events with the snapshot index 18 and finally outputs a structured fault report 45, which clearly states that "the IR-cut motor stalled at 5400 seconds, accompanied by abnormal battery discharge," and attaches the original values of the status parameters so that engineers can directly locate the problem as poor mechanical assembly or failure of the motor drive circuit without disassembling the machine to reproduce it.
[0041] In the embodiments of this application, by embedding a snapshot scheduler and a structured data acquisition mechanism in the main control MCU of the device under test, the granularity of fault location is refined from the whole machine level to specific functional modules and precise time windows. This allows engineers to directly reproduce the fault context based on the snapshot sequence without disassembling the device or modifying the firmware. The snapshot frames adopt a fixed-format binary structure and are written to non-volatile Flash memory. Even if a sudden power failure occurs during the test, the most recently written snapshot data can still be completely retained, ensuring the reliability of fault data. The circular buffer design maximizes the retention of the most recent state history within the limited storage space, taking into account both data integrity and resource constraints. The independent storage strategy for emergency snapshots ensures that the instantaneous state of critical faults is not overwritten. The host computer reads the snapshot sequence in pages through standardized AT commands, which is compatible with the existing production line communication architecture and does not require additional hardware probes or external monitoring equipment, making it suitable for large-scale mass production environments. The structured state data supports cross-batch trend analysis and can identify slow drifts in performance parameters, providing data support for product design improvements.
[0042] According to another aspect of this application, a non-volatile storage medium is also provided, on which computer-readable instructions are stored, which, when executed by a processor, cause the processor to implement the fault tracing method for aging tests as described above.
[0043] According to another aspect of this application, a fault tracing device for aging tests is also provided, wherein the device includes: One or more processors; Computer-readable medium for storing one or more computer-readable instructions. When the one or more computer-readable instructions are executed by the one or more processors, the one or more processors implement the fault tracing method of aging test as described above.
[0044] For details of the various embodiments of the key data real-time disk recording device based on meta-attribute identification, please refer to the corresponding part of the above-mentioned aging test fault tracing method embodiment, which will not be repeated here.
[0045] In summary, this application sends a test preparation command to the device under test (DUT) via a host computer, enabling the main control MCU to configure each aging task module to enter the test state according to the preparation command. Simultaneously, it creates an independent task thread for a snapshot scheduler and sets the snapshot acquisition cycle. After the aging test starts, the main control MCU periodically triggers snapshot events according to the snapshot acquisition cycle. After each snapshot acquisition cycle ends, the snapshot scheduler broadcasts a status acquisition request signal to each aging task module, causing each aging task module to pause its current non-critical loop operation within a preset time window and read predefined status parameters from the status register or global variables. The main control MCU packages all read status parameters into a binary snapshot frame and writes it to a circular buffer in non-volatile memory. After the aging test ends, the DUT enters a low-power standby state. The host computer sends a query command to the DUT, causing the main control MCU to respond and return a response packet. The response packet includes a final result indicating whether the aging test passed or failed, a valid snapshot timer, and other relevant information. The total number of snapshots and the 32-bit checksum of the entire circular buffer are calculated. The host computer, in response to the received response packet, cyclically sends paging read instructions to the device under test (DUT), enabling the main control MCU to locate the corresponding snapshot frame from the circular buffer based on the index value carried by the paging read instruction and return frame by frame. After receiving all snapshot frames, the host computer verifies the CRC checksum of each snapshot frame, reconstructs and extracts the status parameters of each aging task module according to the timestamps in the frame headers of the snapshot frames, and plots the time status curve. The host computer loads a built-in rule engine to match the time status curve with the corresponding target fault event, and associates the successfully matched target fault event with the corresponding snapshot index to generate a structured fault report. The built-in rule engine includes at least two fault discrimination rules, and the structured fault report includes the fault module name, the first abnormal timestamp, the abnormal feature description, the associated snapshot index, and the original values of the status parameters, thus enabling the tracing of faults occurring during the aging test of the DUT.
[0046] It should be noted that this application can be implemented in software and / or a combination of software and hardware, for example, using an application-specific integrated circuit (ASIC), a general-purpose computer, or any other similar hardware device. In one embodiment, the software program of this application can be executed by a processor to implement the steps or functions described above. Similarly, the software program of this application (including related data structures) can be stored in a computer-readable recording medium, such as RAM memory, a magnetic or optical drive, a floppy disk, or similar devices. Furthermore, some steps or functions of this application can be implemented in hardware, for example, as circuitry that cooperates with a processor to perform the various steps or functions.
[0047] Furthermore, a portion of this application can be applied as a computer program product, such as computer program instructions, which, when executed by a computer, can invoke or provide the methods and / or technical solutions according to this application through the operation of the computer. The program instructions invoking the methods of this application may be stored in a fixed or removable recording medium, and / or transmitted via data streams in broadcast or other signal carrying media, and / or stored in the working memory of a computer device operating according to the program instructions. Here, one embodiment of this application includes an apparatus comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein, when the computer program instructions are executed by the processor, the apparatus is triggered to operate the methods and / or technical solutions based on the foregoing embodiments of this application.
[0048] It will be apparent to those skilled in the art that this application is not limited to the details of the exemplary embodiments described above, and that this application can be implemented in other specific forms without departing from the spirit or essential characteristics of this application. Therefore, the embodiments should be considered exemplary and non-limiting in all respects, and the scope of this application is defined by the appended claims rather than the foregoing description. Thus, all variations falling within the meaning and scope of equivalents of the claims are intended to be embraced within this application. No reference numerals in the claims should be construed as limiting the scope of the claims. Furthermore, it is clear that the word "comprising" does not exclude other units or steps, and the singular does not exclude the plural. Multiple units or devices recited in the apparatus claims may also be implemented by a single unit or device in software or hardware. The terms "first," "second," etc., are used to indicate names and do not indicate any particular order.< / index> < / index> < / times> < / duration> < / times> < / duration> < / times> < / duration>
Claims
1. A method of failure traceability for aging tests, wherein, The method comprises: The host computer sends a preparation instruction for triggering the test to the device under test, so that the main control MCU configures each aging task module into a test state according to the preparation instruction, and creates an independent task thread of a snapshot scheduler and sets a snapshot collection period; After the aging test is started, the main control MCU triggers a snapshot event periodically according to the snapshot collection period, and after each snapshot collection period ends, the snapshot scheduler broadcasts a state collection request signal to each aging task module, so that each aging task module suspends a current non-critical loop operation within a preset time window, and reads pre-defined state parameters from a state register or a global variable; The main control MCU packs all the read state parameters into a snapshot frame in a binary form and writes the snapshot frame into a circular buffer in a non-volatile memory; After the aging test ends, the device under test enters a low-power standby state; the host computer sends a query instruction to the device under test, so that the main control MCU responds and returns a response packet, the response packet including a final result indicating whether the aging test passes or not, a total number of valid snapshots, and a 32-bit checksum of the entire circular buffer; The host computer cyclically sends a paging read instruction to the device under test in response to the received response packet, so that the main control MCU locates a corresponding snapshot frame from the circular buffer according to an index value carried by the paging read instruction and returns the snapshot frame by frame; After receiving all the snapshot frames, the host computer verifies a CRC check code of each snapshot frame, reconstructs and extracts state parameters of each aging task module in sequence according to time stamps in frame headers of the snapshot frames, and draws a time-state curve; The host computer loads a built-in rule engine, matches a corresponding target fault event for the time-state curve, associates the target fault event that is successfully matched with a corresponding snapshot index, and generates a structured fault report, wherein the built-in rule engine includes at least two fault discrimination rules, and the structured fault report includes a fault module name, a first abnormal time stamp, an abnormal feature description, an associated snapshot index, and original values of state parameters.
2. The method of claim 1, wherein, The method further comprises: The host computer establishes a communication connection with the main control MCU in the device under test through a standard serial interface; The main control MCU is connected to each aging task module through an internal bus or a GPIO pin, and the aging task module includes one or more of a camera module, an LED driving circuit, an IR-cut motor driving unit, a mobile network communication module, an audio power amplifier chip, a microphone acquisition circuit, and a battery management unit; The main control MCU is connected to the non-volatile memory through an SPI or QSPI interface, wherein the non-volatile memory includes a built-in Flash memory of the device under test, or an SPI Flash chip deployed externally and connected to the main control MCU through a QSPI interface.
3. The method of claim 2, wherein, The method further comprises: The baud rate of serial communication between the host computer and the device under test is set to 115200 bps. The communication between the main control MCU and each of the aging task modules adopts a synchronous interrupt mechanism. Write operations to the non-volatile memory are accelerated by a DMA controller.
4. The method of claim 1, wherein, The main control MCU configures each aging task module to enter the test state according to the preparation instruction, and at the same time creates an independent task thread for the snapshot scheduler and sets the acquisition cycle, including: The main control MCU configures each aging task module to enter the corresponding test state according to the single aging duration and number of cycles carried by the preparation instruction, and assigns an independent task control handle to each aging task module. Meanwhile, the main control MCU creates an independent task thread for the snapshot scheduler in its internal RAM and enters a waiting state, and sets the snapshot acquisition cycle of the snapshot scheduler. The task priority of the snapshot scheduler is higher than the priority of the aging test task and lower than the priority of the interrupt service routine.
5. The method of claim 1, wherein, The method further includes: The total length of the snapshot frame is fixed at 64 bytes, including a frame header, multiple status fields, and a frame trailer; wherein, The frame header consists of a 4-byte timestamp and a 2-byte frame sequence number. The timestamp is used to indicate the cumulative number of seconds since the aging process started. The motor status field occupies 4 bytes and is used to store the position code value and direction indicator. The LED status field occupies 6 bytes and records the duty cycle of the R, G, and B channels respectively, with each channel occupying 2 bytes. The battery status field occupies 8 bytes and stores 2 bytes of voltage, 2 bytes of remaining capacity, 2 bytes of temperature, and 2 bytes of charge / discharge status indicator. The communication module status field occupies 6 bytes and contains 2 bytes of mobile network signal strength, 2 bytes of registration failure count, and 2 bytes of current connection status. The space for other module status fields is dynamically allocated according to the actual number of modules participating in the aging process, with a maximum of 32 bytes. The last 2 bytes of the frame are a 16-bit CRC checksum, used to verify the integrity of the snapshot frame.
6. The method of claim 1, wherein, The method further includes: The circular buffer starts at the starting address and ends at the ending address, with a total size of 8KB, and can hold 128 snapshot frames of 64 bytes each. The snapshot frames written to the circular buffer include normal snapshot frames and emergency snapshot frames. The circular buffer contains two pointers: a write pointer and a read pointer. Initially, both the write pointer and the read pointer point to the starting address. Each time a new normal snapshot frame is written, the main control MCU writes the new normal snapshot frame to the position indicated by the write pointer and moves the write pointer forward by 64 bytes. When the write pointer reaches the end address of the circular buffer, it automatically wraps back to the starting address to continue writing, overwriting the earliest written normal snapshot frame. If any target aging task module detects an anomaly during the aging test, the target aging task module immediately sends a high-level signal to the main control MCU through a preset hardware interrupt line. Upon receiving the high-level signal, the main control MCU immediately sets a global fault flag in its interrupt service routine, stops all aging task threads, forces a status acquisition, generates an emergency snapshot frame, and writes it to the next available position of the write pointer or retains it by expanding the tail reserved area of the circular buffer. The emergency snapshot frame does not participate in the first-in-first-out (FIFO) overwrite logic of the circular buffer. The 0th byte of the frame header of the emergency snapshot frame is set to 0xFF to indicate that it is an emergency type.
7. The method of any one of claims 1 to 6, wherein, The snapshot acquisition period ranges from 1 minute to 1 hour to adapt to different product forms and is dynamically adjusted via AT commands.
8. The method of any one of claims 1 to 6, wherein, The method further includes: The snapshot frame is a snapshot frame encoded using TLV.
9. A non-volatile storage medium having stored computer-readable instructions thereon, which, when executed by a processor, cause the processor to perform the method as described in any one of claims 1 to 8.
10. A failure trace equipment for aging test, wherein, The device includes: One or more processors; Computer-readable medium for storing one or more computer-readable instructions. When the one or more computer-readable instructions are executed by the one or more processors, the one or more processors perform the method as described in any one of claims 1 to 8.