Processor program execution information monitoring method and chip device

By compressing the processor program counter and return address values, and combining them with hardware timestamp-based conditional reporting, the problems of high hardware cost and large data traffic are solved, achieving low-cost, high-precision monitoring of processor program execution information, which is suitable for IoT chips and microcontrollers.

CN122285441APending Publication Date: 2026-06-26SHANGHAI QIMINGXIN SEMICONDUCTOR TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SHANGHAI QIMINGXIN SEMICONDUCTOR TECHNOLOGY CO LTD
Filing Date
2026-05-28
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing technologies for monitoring processor program execution information have high hardware costs and large data traffic, making them difficult to deploy in IoT chips and microcontrollers that are sensitive to hardware overhead, and they cannot effectively restore function call chains.

Method used

By acquiring the processor's raw program counter value and return address value in real time, performing compression processing that discards low-order alignment bits and high-order address bits, and combining this with a hardware timestamp, a fixed-width monitoring data packet is generated, and data is reported only under preset conditions.

Benefits of technology

It enables high-precision monitoring of processor program execution information with low hardware overhead, reduces data volume and bandwidth, and can effectively restore function call chains, making it suitable for deployment in cost-sensitive IoT chips and microcontrollers.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122285441A_ABST
    Figure CN122285441A_ABST
Patent Text Reader

Abstract

This application proposes a method and chip device for monitoring processor program execution information, relating to the field of processor technology. It includes: real-time acquisition of the processor's raw program counter value, raw return address value, and hardware timestamp; compression processing of the raw program counter value and raw return address value to obtain compressed program counter value and compressed return address value, wherein the compression processing discards low-order alignment bits and high-order address bits; address range judgment of the raw program counter value and raw return address value to generate a program counter position flag and a return address position flag; concatenating the hardware timestamp, compressed program counter value, compressed return address value, program counter position flag, and return address position flag in a preset order into a fixed-width monitoring data packet; and reporting the monitoring data packet in response to the fulfillment of a preset trigger condition. This scheme can achieve effective monitoring of processor program execution information with low hardware overhead.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of processor technology, and more specifically, to a method and chip device for monitoring processor program execution information. Background Technology

[0002] Monitoring processor program execution information to troubleshoot issues such as processor crashes or stack overflows is an important means of chip debugging and fault location.

[0003] In existing monitoring solutions, purely software-based solutions that rely on GPIO toggling or serial port printing for debugging are highly intrusive, altering program execution timing and exhibiting low sampling accuracy, making it difficult to capture microsecond-level abnormal jumps. Therefore, to improve sampling accuracy, the industry generally adopts hardware-based monitoring solutions.

[0004] However, while hardware solutions such as ARM CoreSight and RISC-V Trace offer high sampling accuracy and can completely record program execution flow information, they require dedicated hardware interfaces (such as advanced trace buses) and large-capacity on-chip buffers (such as embedded trace buffers). Capture generates a large amount of data, placing extremely high demands on chip bandwidth and storage resources, making them difficult to deploy in hardware-intensive IoT chips and microcontrollers. Furthermore, these hardware solutions either only record program counter values ​​and cannot reconstruct function call chains, making them ineffective for troubleshooting processor crashes or stack overflows; or they require complex hardware stack backtracking to reconstruct the call chain, further increasing implementation costs.

[0005] Therefore, how to effectively monitor processor program execution information with low hardware overhead in order to troubleshoot problems such as processor crashes or stack overflows has become a technical problem that urgently needs to be solved in this field. Summary of the Invention

[0006] The purpose of this application is to provide a method and chip device for monitoring processor program execution information, which can effectively monitor processor program execution information with low hardware overhead.

[0007] This application is implemented as follows: In a first aspect, this application provides a method for monitoring processor program execution information, comprising the following steps: acquiring the processor's raw program counter value, raw return address value, and hardware timestamp in real time; compressing the raw program counter value and raw return address value respectively to obtain a compressed program counter value and a compressed return address value, wherein the compression process involves discarding low-order alignment bits and high-order address bits; determining the address range of the raw program counter value and raw return address value respectively to generate a program counter position flag and a return address position flag; concatenating the hardware timestamp, compressed program counter value, compressed return address value, program counter position flag, and return address position flag in a preset order to generate a fixed-width monitoring data packet; and reporting the monitoring data packet in response to the fulfillment of a preset trigger condition.

[0008] In some implementations, discarding low-alignment bits includes: shifting the original program counter value or the original return address value to the right by M bits, where M is a positive integer greater than or equal to 1. Discarding high-order address bits includes: based on discarding low-alignment bits, discarding high-order bits in the original program counter value or the original return address value that exceed the address width, according to a preset address width.

[0009] In some implementations, the address width is determined based on the size of the address space occupied by the program code segment.

[0010] In some implementations, M is set to 1, and the default address width is 20.

[0011] In some implementations, the address range determination includes: when the original program counter value falls within a first preset address range, setting the program counter position flag to a first value to indicate that the program is executing on storage media inside the processor; when the original program counter value falls within a second preset address range, setting the program counter position flag to a second value to indicate that the program is executing on storage media outside the processor. Similarly, when the original return address value falls within the first preset address range, setting the return address position flag to a first value to indicate that the original return address points to storage media inside the processor; when the original return address value falls within the second preset address range, setting the return address position flag to a second value to indicate that the original return address points to storage media outside the processor.

[0012] In some implementations, the preset triggering conditions include at least one of the following: first triggering condition: the original program counter value changes; second triggering condition: the original return address value changes; third triggering condition: either the original program counter value or the original return address value changes.

[0013] In some implementations, the monitoring method also includes: providing a configuration register to store user-configured mode values; and selecting the corresponding trigger condition as a preset trigger condition based on the mode value.

[0014] In some implementations, the configuration register is 2 bits, with a value of 0x00 corresponding to the first trigger condition, 0x01 corresponding to the second trigger condition, and 0x10 corresponding to the third trigger condition.

[0015] In some implementations, the fixed-width monitoring data packet is 64 bits, the hardware timestamp is 22 bits, the program counter position flag and return address position flag are each 1 bit, and the compressed program counter value and compressed return address value are each 20 bits.

[0016] In a second aspect, this application provides a chip device including a memory for storing one or more programs; a processor; and when the one or more programs are executed by the processor, implementing the monitoring method as described in any one of the first aspects above.

[0017] Compared with the prior art, this application has at least the following advantages or beneficial effects: This application proposes a method for monitoring processor program execution information. Through compression processing and conditional triggering reporting, it significantly reduces the amount of data reported per instance and the total data throughput while ensuring the capture of critical program execution information. It achieves effective monitoring of processor program execution information without requiring the deployment of large-capacity on-chip buffers and dedicated high-bandwidth tracking interfaces, resulting in extremely low hardware overhead and making it suitable for deployment in cost-sensitive IoT chips and microcontrollers. Furthermore, by simultaneously capturing the original program counter value and the original return address value, along with a location flag, this application can effectively reconstruct function call chains and storage media access information, providing crucial data for troubleshooting anomalies such as processor crashes, stack overflows, and function pointer errors. Attached Figure Description

[0018] To more clearly illustrate the technical solutions of the embodiments of this application, the accompanying drawings used in the embodiments will be briefly introduced below. It should be understood that the following drawings only show some embodiments of this application and should not be regarded as a limitation of the scope. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.

[0019] Figure 1 This is a flowchart of an embodiment of a method for monitoring processor program execution information according to this application; Figure 2 This is a flowchart of yet another embodiment of a method for monitoring processor program execution information according to this application; Figure 3This is an example diagram illustrating the change process of PC and RA values ​​when a processor executes a program in one embodiment of this application; Figure 4 This is a structural block diagram of a chip device provided in an embodiment of this application.

[0020] Icons: 201, Processor; 202, Memory; 203, Communication Interface. Detailed Implementation

[0021] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. It should be understood that this application is not limited to the exemplary embodiments described herein.

[0022] In this document, relational terms such as first and second are used only to distinguish one entity or operation from another entity or operation, without necessarily requiring or implying any such actual relationship or order between these entities or operations.

[0023] To troubleshoot and resolve issues such as processor crashes or stack overflows, it is necessary to monitor processor program execution information. This execution information refers to data used to describe and record information such as the program's execution status, execution path, and execution environment. Pure software monitoring solutions, such as GPIO toggling or serial port printing, essentially output status information by inserting additional instructions into the program. This intrusive operation alters the original execution timing of the program, and due to limitations in instruction insertion density, the sampling accuracy is typically only at the millisecond level, making it difficult to capture microsecond-level abnormal jumps.

[0024] Therefore, to achieve non-intrusive, high-precision sampling, the industry generally adopts hardware solutions for monitoring. For example, pure hardware tracing solutions, such as ARM CoreSight and RISC-V Trace, can achieve non-intrusive, high-precision sampling. However, their design aims to completely record the entire program execution flow, requiring the deployment of dedicated hardware resources such as the Advanced TraceBus (ATB) and Embedded Trace Buffer (ETB), generating massive amounts of data during the capture process. Specifically, this type of method typically involves an on-chip hardware module continuously capturing the processor's program counter and reporting the captured complete 32-bit program counter address. The host computer then reconstructs the program's execution trajectory by receiving the reported address sequence. However, this method of directly reporting the complete 32-bit address requires 32 bits of data transmission bandwidth for each report. Therefore, when the program execution frequency is high, it generates a large amount of data traffic, putting significant pressure on system bandwidth and storage resources.

[0025] Furthermore, in the process of implementing this application, the inventors discovered that the most crucial aspect of accurately troubleshooting processor crashes or stack overflows is analyzing the correspondence between the PC value and the RA value. However, existing solutions only report the PC value, resulting in the loss of function call chain information. Restoring the call chain requires an additional hardware stack backtracking module, further increasing resource overhead. In other words, it is essential to find a better balance between resources and information, effectively preserving key call relationship information in the program execution flow while reducing data transmission overhead, in order to better troubleshoot processor crashes or stack overflows. The program execution flow refers to the sequential path of the processor executing instructions one by one during execution. To address the aforementioned issues, this application proposes a method and chip device for monitoring processor program execution information. This method uses a hardware module to capture the processor's original program counter value and original return address value in real time. It employs bidirectional compression processing that discards low-order alignment bits and high-order address bits, along with a condition-triggered reporting mechanism. This enables non-intrusive, high-precision monitoring of program execution flow and function call chain with minimal hardware overhead, thus better diagnosing and troubleshooting issues such as processor crashes or stack overflows.

[0026] After introducing the basic principles of this application, various non-limiting embodiments of this application will be described in detail below with reference to the accompanying drawings. Unless otherwise specified, the various embodiments and features described below can be combined with each other.

[0027] Please see Figure 1 The method for monitoring processor program execution information includes the following steps: Step S101: Obtain the processor's raw program counter value, raw return address value, and hardware timestamp in real time.

[0028] It should be noted that by simultaneously obtaining the original program counter value and the original return address value, a complete data foundation can be provided for subsequent function call chain reconstruction. Furthermore, the introduction of hardware timestamps ensures that each address capture has a precise time stamp, creating conditions for subsequent analysis of timing-related issues such as program execution time and interrupt response latency.

[0029] The hardware monitoring module can acquire three types of data in parallel: the raw program counter value, the raw return address value, and the hardware timestamp. This acquisition is real-time and continuous; that is, the latest values ​​in the processor registers and the current value of the hardware timer are reread every monitoring cycle. The raw program counter value can be directly read from the processor's program counter (PC) register, representing the address of the instruction the processor is currently executing or about to execute. The raw return address value can be directly read from the processor's return address (RA) register, representing the address the processor should return to after a function call. The hardware timestamp can be generated by a separate hardware timer (such as a system clock counter or a dedicated timer), used to mark the time information of the currently acquired program counter and return address values.

[0030] Step S102: Compress the original program count value and the original return address value respectively to obtain the compressed program count value and the compressed return address value. The compression process involves discarding the low-order alignment bits and discarding the high-order address bits.

[0031] Step S102 performs the same compression operation on the original program counter value and the original return address value obtained in step S101. This compression operation includes two steps: discarding the low-order alignment bits and discarding the high-order address bits. This compression operation utilizes the inherent characteristics of instruction address alignment and the limited space of program code segments, which can significantly reduce the data bit width without losing any valid address information.

[0032] It's important to note that regardless of whether it's the ARM Cortex-M series, RISC-V, or other mainstream embedded processors, instructions are typically aligned to 2 bytes (16 bits) or 4 bytes (32 bits). This means that processor instruction addresses have alignment characteristics: for 2-byte aligned instructions, the least significant bit (bit 0) of the address is always 0; for 4-byte aligned instructions, the lowest two bits (bits 1 and 0) of the address are always 0. Therefore, by discarding these consistently 0 low-order bits, the address width can be reduced without losing any valid information. For example, the original PC value = 0x1000_1004 (32-bit address), shifted right by one bit, becomes 0x0800_0802. Although the value changes, during subsequent restoration, simply adding a 0 at the end (shifting left by one bit) accurately restores the original address. This process doesn't lose any valid information; instead, it eliminates redundant bits.

[0033] Furthermore, in embedded system-on-a-chip (SoC), the code executed by the processor typically resides only in specific, limited storage media. Specifically, the processor's internal storage is located in a lower address range, such as from 0x0000_0000 to 0x000F_FFFF, with a capacity of 1 megabyte; while external static random access memory (SRAM) is located in a higher address range, such as from 0x2000_0000 to 0x200F_FFFF, or from 0x6000_0000 to 0x600F_FFFF. That is, the embedded processor's program code is usually linked into a specific, limited address space (e.g., the address range of internal local memory or external SRAM), and the total size of this address space determines the maximum bit width required to represent the program address. The higher-order bits exceeding this bit width are redundant information for distinguishing different instruction addresses within the same storage medium, and therefore can be safely discarded by discarding the higher-order address bits. In other words, although discarding higher-order address bits constitutes lossy compression, it does not result in the loss of critical information.

[0034] In other words, after discarding the low-order alignment bits and the high-order address bits, the original 32-bit program counter value and the original 32-bit return address value will be compressed into compressed program counter values ​​and compressed return address values ​​with significantly reduced bit width. Taking the 32-bit original address compressed to 20 bits as an example, the amount of data in a single address is reduced by about 37.5%, which provides feasibility in terms of data volume for the subsequent generation of fixed-width monitoring data packets, and can control the bandwidth usage of reported data from the source.

[0035] Step S103: Perform address range judgment on the original program counter value and the original return address value respectively, and generate program counter position flag and return address position flag.

[0036] It should be noted that step S103 extracts the storage medium type information, which exists in 32-bit numerical form, into a single-bit location flag by determining the address range. This conversion compresses the storage medium information, which originally required multiple high-order bits, into a single 1-bit flag, further reducing the data volume of subsequent data packets. Simultaneously, the introduction of the program counter location flag and the return address location flag allows engineers to quickly determine the storage medium type (internal fast memory or external slow memory) of the program execution location and return address during troubleshooting, thus facilitating the analysis of issues related to storage access latency and cache consistency.

[0037] Step S103 performs the same address range determination operation on the original program counter value and the original return address value obtained in step S101. This determination operation is based on a preset address range configuration: the first preset address range corresponds to the first storage medium inside the processor (such as internal local memory), and the second preset address range corresponds to the second storage medium outside the processor (such as external static random access memory). The two preset address ranges do not overlap and together cover the entire address range where the program code may be stored.

[0038] Step S104: Concatenate the hardware timestamp, compressed program counter value, compressed return address value, program counter position flag and return address position flag in a preset order to generate a fixed-width monitoring data packet.

[0039] Step S104 concatenates these data sequentially according to a preset bit order to form a binary sequence with a fixed total bit width. This sequence is the monitoring data packet. The specific bit allocation of the preset order can be determined according to system design requirements (e.g., from high to low bits, the bits are hardware timestamp, return address position flag, program counter position flag, compressed return address value, and compressed program counter value), as long as the sending and receiving ends maintain the same protocol agreement. After concatenation, the total bit width of the monitoring data packet is fixed and does not change with changes in the processor's execution state.

[0040] Step S105: In response to the fulfillment of the preset triggering conditions, a monitoring data packet is reported.

[0041] It should be noted that the preset trigger conditions are pre-configured event types used to determine under what circumstances the monitoring data packet is output to the outside (e.g., written to the on-chip buffer, output through the debug interface, or sent to the host computer). Setting the trigger conditions means that this application no longer uses the traditional continuous reporting method, but only triggers reporting when specific conditions are met, thereby filtering out a large amount of redundant data and effectively avoiding the problem of excessive data volume in traditional hardware tracing solutions.

[0042] Compared to continuous sampling, which reports data every instruction cycle, conditional triggering only generates reporting data when the program execution state changes (such as a change in the program counter value or return address value). No reporting traffic is generated during the sequential execution phase, reducing the data volume by several orders of magnitude. This eliminates the need for a large-capacity on-chip buffer and a high-bandwidth dedicated trace bus, requiring only a small number of registers and memory units, making it suitable for deployment in hardware-intensive IoT chips and microcontrollers. Furthermore, conditional triggering retains information on key changes during program execution, providing sufficient data support for subsequent anomaly localization and call chain analysis.

[0043] It should be noted that, as Figure 2 As shown, steps S101-S105 can be implemented by the processor program monitoring module, forming a complete data processing chain: from obtaining the original register value, to parallel processing of compression and judgment, to packet generation, and finally conditional reporting. Parallel processing of compression and judgment means that steps S102 and S103 can be processed in parallel to improve processing efficiency. Of course, asynchronous processing can also be used, in which case the execution order of the two steps is not limited; that is, step S102 can be executed first and then step S103, or step S103 can be executed first and then step S102.

[0044] In summary, this application does not monitor all program execution details, but selectively captures two key registers in the processor that reflect the program execution flow and function call relationships—the raw program counter value and the raw return address value—and introduces hardware timestamps as the event marker time dimension. Then, by compressing addresses and conditional reporting, it achieves the capture of key information about the program execution flow with extremely low hardware overhead, solving the problem of deploying complete hardware tracing on low-cost chips. That is, this application significantly reduces the amount of data per iteration through compression, avoids data flooding caused by continuous sampling through conditional reporting, and eliminates the need for large-capacity buffers. Simultaneously capturing the program counter value and return address value, along with location markers, can effectively reconstruct the function call chain and storage media information, meeting the debugging needs of embedded systems with extremely low hardware overhead and facilitating the troubleshooting of processor crashes or stack overflows.

[0045] Based on the aforementioned scheme, in some implementations of this application, discarding the low-order alignment bits includes: shifting the original program count value or the original return address value to the right by M bits, where M is a positive integer greater than or equal to 1. Discarding the high-order address bits includes: based on discarding the low-order alignment bits, discarding the high-order bits of the original program count value or the original return address value that exceed the address width, according to a preset address width.

[0046] Understandably, this implementation uses right shift and truncation operations to compress the original program count value and the original return address value, shortening them into shorter effective addresses. This significantly reduces the data bit width without losing effective address information, effectively reducing the amount of data reported per time and alleviating the pressure on chip bandwidth and storage resources.

[0047] Specifically, discarding the low-order alignment bits involves shifting the original program counter or original return address value to the right by M bits, where M is a positive integer greater than or equal to 1. This step utilizes the instruction address alignment feature to remove redundant bits that are always 0 at the lowest bit level. Following this, discarding the high-order address bits involves discarding bits exceeding a preset address width. The preset address width is typically determined based on the actual address space occupied by the program code segment, ensuring that the reserved bits are sufficient to uniquely identify each address within the code segment.

[0048] For example, in some implementations of this application, the value of M is 1, and the preset address width is 20.

[0049] Understandably, this implementation discards the right shift bit M when the low-order alignment bit is set to 1, that is, shifts right by one bit to be compatible with instruction addresses aligned to 2 bytes or 4 bytes; at the same time, the preset address width is 20 bits, thereby compressing the original 32-bit address (original program counter value and original return address value) into a 20-bit address (compressed program counter value and compressed return address value).

[0050] Understandably, a 20-bit compressed address combined with a 1-bit location flag can cover a contiguous 2MB address space. For most IoT chips, MCUs, and embedded controllers, their firmware code size is typically between tens of KB and 1MB. Therefore, a 20-bit resolution is sufficient to cover the entire code segment. In other words, setting M to 1 accommodates different instruction alignment methods, while setting the address width to 20 matches the code size of most IoT chips and microcontrollers, achieving an excellent compression ratio while ensuring information integrity.

[0051] Therefore, when the hardware timestamp has a bit width of 22 bits, and the right shift M (when discarding low-order alignment bits) is 1, and the preset address bit width is 20 bits, the above information is concatenated in order from high-order bits to low-order bits to form a complete 64-bit monitoring data packet. The format of this monitoring data packet is: {22-bit hardware timestamp, 1-bit return address position flag, 1-bit program counter position flag, 20-bit compressed return address value, 20-bit compressed program counter value}. That is, in some implementations of this application, the fixed-bit-width monitoring data packet is 64 bits, the hardware timestamp is 22 bits, the program counter position flag and return address position flag are each 1 bit, and the compressed program counter value and compressed return address value are each 20 bits.

[0052] It should be noted that the above parameter configurations are not fixed. If the program code requires a large amount of storage space, the bit width of the compressed program counter value and the compressed return address value can be appropriately increased (i.e., the number of discarded high-order address bits) while correspondingly compressing the bit width of the hardware timestamp to keep the total bit width of the monitoring data packet unchanged.

[0053] The hardware timestamp is a hardware counter that starts to cycle and toggle when the chip is powered on to record time information. When the hardware timestamp is 22 bits, it can theoretically record approximately 4.2 million system clock cycles, which is sufficient to monitor changes in the original program counter value and the original return address value over a period of time.

[0054] It should be noted that when the amount of program code is large (e.g., more than 2 megabytes), the bit width of the hardware timestamp can be appropriately reduced, while the bit width of the compressed program counter value and the compressed return address value can be expanded (i.e., the number of high-order address bits discarded) to achieve the effect of supporting the storage of a larger amount of code. However, the disadvantage is that the length of time that the hardware timestamp can record will be correspondingly shortened.

[0055] As an example, if the high-order bits of the compressed program counter and compressed return address are compressed by one bit less, both become 21-bit wide, allowing for the recording of 4 megabytes of code. Simultaneously, the hardware timestamp is reduced to 20 bits, theoretically enabling the recording of approximately 1.05 million system clock cycles. If the high-order bits of the compressed program counter and compressed return address are compressed by two bits less, both become 22-bit wide, allowing for the recording of 8 megabytes of code. Simultaneously, the hardware timestamp is reduced to 18 bits, theoretically enabling the recording of approximately 260,000 system clock cycles.

[0056] Based on the aforementioned scheme, in some implementations of this application, the address range determination includes: when the original program counter value falls within a first preset address range, setting the program counter position flag to a first value to indicate that the program is executing on storage media inside the processor; when the original program counter value falls within a second preset address range, setting the program counter position flag to a second value to indicate that the program is executing on storage media outside the processor. When the original return address value falls within the first preset address range, setting the return address position flag to a first value to indicate that the original return address points to storage media inside the processor; when the original return address value falls within the second preset address range, setting the return address position flag to a second value to indicate that the original return address points to storage media outside the processor.

[0057] In practical applications, the processor system is pre-configured with a first preset address range corresponding to the processor's internal local memory (e.g., the lower address range 0x0000_0000 to 0x000F_FFFF) and a second preset address range corresponding to the external static random access memory (e.g., the higher address range 0x2000_0000 to 0x200F_FFFF or 0x6000_0000 to 0x600F_FFFF).

[0058] When the original program counter value falls within the first preset address range, the program counter position flag is cleared to zero, indicating that the program is executing in internal local memory; if it falls within the second preset address range, the program counter position flag is set to 1, indicating that the program is executing in external static random access memory. Similarly, the original return address value uses the exact same judgment logic: if it falls within the first preset address range, the return address position flag is cleared to 0; if it falls within the second preset address range, the return address position flag is set to 1.

[0059] Therefore, even when the hardware module discards the high-order address bits, it can still determine whether the original address originally pointed to internal local memory or external static random access memory through these two flag bits.

[0060] Based on the aforementioned scheme, in some implementations of this application, the preset triggering conditions include at least one of the following triggering conditions: first triggering condition: the original program counter value changes; second triggering condition: the original return address value changes; third triggering condition: either the original program counter value or the original return address value changes.

[0061] Understandably, this implementation allows users to choose the timing of reporting based on actual debugging needs by setting multiple optional trigger conditions, thereby achieving a balance between information acquisition and data volume.

[0062] The first trigger condition reports monitoring data packets only when the raw program counter value changes. This is mainly used when the program behaves unexpectedly, such as entering an infinite loop, the raw program counter value running out of bounds, or when it's necessary to observe whether a critical section of code has been executed. In these scenarios, changes in the raw return address value are relatively minor. The second trigger condition reports only when the raw return address value changes. This is mainly used in scenarios such as stack overflows, function pointer errors, and real-time operating system task switching anomalies. In these cases, changes in the raw program counter value may be very frequent and trivial, making tracing function call relationships more critical. The third trigger condition triggers reporting whenever either the raw program counter value or the raw return address value changes. This is mainly used when encountering complex interrupt contention, rare soft errors, or when precise measurement of interrupt response latency is required. This mode provides the highest information redundancy and can reconstruct the most complete context.

[0063] like Figure 3 As shown, assuming the hardware timestamp values ​​are 1 to 20, the corresponding raw return address values ​​are: 0x0, 0x0, 0x0, 0x0, 0x10, 0x10, 0x10, 0x10, 0x10, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0. The corresponding raw program counter values ​​are: 0x0, 0x4, 0x8, 0xc, 0x60, 0x64, 0x68, 0x6c, 0x10, 0x14, 0x14, 0x14, 0x80, 0x84, 0x88, 0x8c, 0x18, 0x18, 0x1c, 0x20.

[0064] In the above example, when the hardware timestamp is 5, the original return address value changes from 0x0 to 0x10, indicating entry into a function; when the hardware timestamp is 9, the original return address value changes back from 0x10 to 0x0, indicating exit from the function. When the hardware timestamp is 13, the original program counter value jumps to 0x80, indicating entry into interrupt handling; when the hardware timestamp is 17, the original program counter value changes to 0x18, indicating exit from interrupt handling.

[0065] Based on the above data, if the user selects the first trigger condition (i.e., reporting only when the original program counter value changes), reporting will be triggered at hardware timestamps 13 and 17. If the user selects the second trigger condition (i.e., reporting only when the original return address value changes), reporting will be triggered at hardware timestamps 5 and 9. If the user selects the third trigger condition (i.e., reporting when either the original program counter value or the original return address value changes), reporting will be triggered at hardware timestamps 5, 9, 13, 17, and other similar changes.

[0066] Based on the aforementioned scheme, in some implementations of this application, the monitoring method further includes: providing a configuration register for storing user-configured mode values; and selecting the corresponding triggering condition as a preset triggering condition based on the mode value.

[0067] Understandably, this implementation allows users to choose the trigger mode through a configuration register, thus flexibly adapting to different debugging scenarios within the same hardware solution. During actual operation, the monitoring logic selects one of several trigger conditions as the currently active preset trigger condition based on the user-configured mode value stored in the register, thereby controlling the timing of monitoring data packet reporting. For example, in some implementations of this application, the configuration register is 2 bits, with a value of 0x00 corresponding to the first trigger condition, 0x01 to the second trigger condition, and 0x10 to the third trigger condition.

[0068] To enable those skilled in the art to more intuitively understand this application, a specific example will be provided below. This example is an exemplary demonstration combining the overall technical paradigm of this application with some optional implementation details. It should be noted that the following demonstration is intended to aid understanding and does not constitute an exhaustive list of all embodiments of this application, nor does it imply that this application must include all the details described below in its specific implementation.

[0069] by Figure 3 Taking the PC and RA values ​​corresponding to timestamps 1-4 as an example, in the "compression (corresponding to step S102) → judgment (corresponding to step S103) → packaging (corresponding to step S104) → reporting (corresponding to step S105)" process of this application: (1) In the compression process, the least significant bit (right-shifted by one bit) and the high 11 bits of the original 32-bit program counter value and original return address value are discarded respectively to obtain the compressed program counter value and compressed return address value of 20 bits. For example, the original program counter value is compressed to obtain 20'b0000_0000_0000_0000_0000, 20'b0000_0000_0000_0010, 20'b0000_0000_0000_0000_0100, 20'b0000_0000_0000_0000_0110, etc.; the four original return address values ​​are all compressed to 20'b0000_0000_0000_0000_0000_0000.

[0070] (2) In the judgment process, the 32-bit original program counter value and the original return address value at each time are judged. Since the high bits are all 0, it indicates that the program is running in the local memory inside the processor. Therefore, the program counter position flag and the return address position flag at each time are set to 0.

[0071] (3) In the packaging process, the data is packaged according to a preset order. For example, when the hardware timestamp is 1, the monitoring data packet is: a 22-bit hardware timestamp 22'b00_0000_0000_0000_0001, followed by a 1-bit return address position flag 0, a 1-bit program counter position flag 0, a 20-bit compressed return address value 20'b0000_0000_0000_0000_0000, and a 20-bit compressed program counter value 20'b0000_0000_0000_0000_0000. The monitoring data packets corresponding to subsequent timestamps follow the same pattern.

[0072] (4) In the reporting process, when the first trigger condition (reporting only when the original program counter value changes) is selected, a monitoring data packet is reported every time the original program counter value changes. When the second trigger condition (reporting only when the original return address value changes) is selected, a report is reported every time the original return address value changes. When the third trigger condition (reporting when either the original program counter value or the original return address value changes) is selected, reporting is triggered for any change.

[0073] Therefore, the processor program execution information monitoring method of this application can obtain a large amount of useful information at a very low cost. For example, when the processor freezes, the specific original program counter value and original return address value can be obtained; the function flow and jump function order executed by the processor can be accurately reconstructed; the interrupt execution order and nesting can be obtained; and the precise time required for function or interrupt execution can be obtained. The solution of this application can complete online debugging without additional debugging hardware tools, and the cost is extremely low.

[0074] Please see Figure 4 This application provides a chip device including at least one processor 201 and at least one memory 202. The processor 201 and memory 202 are directly connected to each other, or communicate with each other through a communication interface 203, or are electrically connected through one or more communication buses or signal lines to achieve data transmission or interaction. The memory 202 stores program instructions executable by the processor 201, which can call and execute the program instructions to implement any of the processor program execution information monitoring methods provided by the various implementations described above. For example, the implementation includes: The system acquires the processor's raw program counter value, raw return address value, and hardware timestamp in real time. It then compresses the raw program counter value and raw return address value to obtain compressed program counter value and compressed return address value, where the compression process discards low-order alignment bits and high-order address bits. Next, it performs address range determination on the raw program counter value and raw return address value to generate a program counter position flag and a return address position flag. Finally, it concatenates the hardware timestamp, compressed program counter value, compressed return address value, program counter position flag, and return address position flag in a preset order to generate a fixed-width monitoring data packet. Upon meeting a preset trigger condition, it reports the monitoring data packet.

[0075] The memory 202 may be, but is not limited to, random access memory (RAM), read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), etc.

[0076] The processor 201 can be an integrated circuit chip with signal processing capabilities. The processor 201 can be a general-purpose processor, including a central processing unit (CPU), a network processor (NP), etc.; it can also be a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components.

[0077] Understandable. Figure 4 The structure shown is for illustrative purposes only; the chip device may also include components that are more advanced than those shown. Figure 4 The more or fewer components shown, or having the same Figure 4 The different configurations shown. Figure 4 The components shown can be implemented using hardware, software, or a combination thereof.

[0078] It will be apparent to those skilled in the art that this application is not limited to the details of the exemplary embodiments described above, and that this application can be implemented in other specific forms without departing from the spirit or essential characteristics of this application. Therefore, the embodiments should be considered illustrative and non-limiting in all respects, and the scope of this application is defined by the appended claims rather than the foregoing description. Thus, all variations falling within the meaning and scope of equivalents of the claims are intended to be included within this application. No reference numerals in the claims should be construed as limiting the scope of the claims.

Claims

1. A method for monitoring processor program execution information, characterized in that, The method includes: Obtain the processor's raw program counter value, raw return address value, and hardware timestamp in real time; The original program count value and the original return address value are compressed respectively to obtain a compressed program count value and a compressed return address value, wherein the compression process involves discarding the low-order alignment bits and discarding the high-order address bits. The original program counter value and the original return address value are respectively judged for address range, and the program counter position flag and the return address position flag are generated. The hardware timestamp, the compression program counter value, the compression return address value, the program counter position flag, and the return address position flag are concatenated in a preset order to generate a monitoring data packet with a fixed bit width. In response to the fulfillment of preset triggering conditions, the monitoring data packet is reported.

2. The monitoring method according to claim 1, characterized in that, The discarding of low-order alignment bits includes: shifting the original program counter value or the original return address value to the right by M bits, where M is a positive integer greater than or equal to 1; The discarding of high-order address bits includes: based on discarding low-order alignment bits, discarding high-order bits in the original program count value or the original return address value that exceed the address width, according to a preset address width.

3. The monitoring method according to claim 2, characterized in that, The address width is determined based on the size of the address space occupied by the program code segment.

4. The monitoring method according to claim 2, characterized in that, The value of M is 1, and the preset address width is 20.

5. The monitoring method according to claim 1, characterized in that, The address range determination includes: When the original program counter value falls within the first preset address range, the program counter position flag is set to a first value to indicate that the program is executed on the storage medium inside the processor; when the original program counter value falls within the second preset address range, the program counter position flag is set to a second value to indicate that the program is executed on the storage medium outside the processor. When the original return address value falls within the first preset address range, the return address location flag is set to a first value to indicate that the original return address points to the storage medium inside the processor; when the original return address value falls within the second preset address range, the return address location flag is set to a second value to indicate that the original return address points to the storage medium outside the processor.

6. The monitoring method according to claim 1, characterized in that, The preset triggering conditions include at least one of the following triggering conditions: First trigger condition: The original program counter value changes; Second trigger condition: The original return address value changes; The third triggering condition is that either the original program counter value or the original return address value changes.

7. The monitoring method according to claim 6, characterized in that, The monitoring method also includes: A configuration register is provided to store the user-configured mode value; Based on the mode value, the corresponding trigger condition is selected as the preset trigger condition.

8. The monitoring method according to claim 7, characterized in that, The configuration register is 2 bits, with a value of 0x00 corresponding to the first trigger condition, 0x01 corresponding to the second trigger condition, and 0x10 corresponding to the third trigger condition.

9. The monitoring method according to claim 1, characterized in that, The fixed-width monitoring data packet is 64 bits, the hardware timestamp is 22 bits, the program counter position flag and the return address position flag are each 1 bit, and the compression program count value and the compression return address value are each 20 bits.

10. A chip device, characterized in that, include: Memory, used to store one or more programs; processor; When the one or more programs are executed by the processor, the monitoring method as described in any one of claims 1-9 is implemented.