Performance monitoring system, method, chip, electronic device
By configuring the performance monitoring system for information control, the problem of inflexible configuration of existing PMUs is solved, enabling performance monitoring needs of various modules to be adapted, reducing resource contention, and improving hardware resource utilization and the accuracy of global performance analysis.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- MOORE THREADS TECH CO LTD
- Filing Date
- 2025-10-20
- Publication Date
- 2026-06-19
AI Technical Summary
The existing configuration of performance monitoring units (PMUs) is not flexible enough, making it difficult to adapt to the performance monitoring needs of different modules. Furthermore, resource contention can easily occur when reporting performance data, affecting the original functions of the modules.
By configuring an information control performance monitoring system, including multiple event statistics modules and output modules, arbitrating the reporting method of performance data, and adjusting the bandwidth ratio of the communication path according to weight information, it supports configurable bit-width counter arrays and time-division multiplexing technology to achieve flexible performance monitoring.
It improves the configuration flexibility of the performance monitoring system, reduces resource contention, increases hardware resource utilization and energy efficiency, supports large-scale parallel event monitoring, and enhances the accuracy of global performance analysis.
Smart Images

Figure CN121301154B_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the field of chip technology, and in particular to a performance monitoring system, method, chip, and electronic device. Background Technology
[0002] A Performance Monitor Unit (PMU) is a hardware component integrated into the processor, primarily used to track and count low-level hardware events in the system. As the infrastructure for performance analysis, the PMU monitors and measures key performance indicators of the processor and system, making it crucial for performance analysis, debugging, power management, and software optimization.
[0003] However, the configuration methods of PMUs proposed in existing technologies are not flexible enough and are difficult to adapt to the performance monitoring needs of different modules. Furthermore, when reporting monitored performance data, resource contention can easily occur, affecting the original functionality of the modules. Summary of the Invention
[0004] In view of this, this disclosure proposes a performance monitoring system, method, chip, and electronic device. The performance monitoring system proposed in this disclosure has a more flexible configuration, can adapt to the performance monitoring needs of various modules, can reduce the degree of resource competition when reporting performance data, and reduce the impact on the original functions of the modules.
[0005] According to one aspect of this disclosure, a performance monitoring system is provided. The system includes multiple event statistics modules and an output module. The event statistics modules are configured to: control a counter array to count the number of target events in an event stream according to target events indicated by configuration information; and output the performance data output by the counter array to the output module when the configuration information indicates that performance data should be reported via a first reporting method. The output module is configured to: arbitrate a first event statistics module to be reported from among the multiple event statistics modules; adjust the bandwidth ratio of the performance data allowed to be transmitted on the communication path between the output module and memory according to the weight information of the first event statistics module; and then output the performance data from the first event statistics module to memory. The counter array is located within or connected to the event statistics modules.
[0006] In one possible implementation, the event statistics module includes a configuration register, a first control unit, and a second control unit. The configuration register stores the configuration information, which also indicates a reporting frequency. Specifically, the configuration register is used to transmit a received first start signal and a first stop signal to the first control unit. The first control unit is specifically used to, in response to receiving the first start signal, control the counter array to count the number of target events in the event stream, and in response to receiving the first stop signal, control the counter array to stop counting. The configuration register is also used to output a first reporting signal to the first control unit according to the reporting frequency. The first control unit is also used to, in response to receiving the first reporting signal, when the configuration information indicates that performance data is reported through a first reporting method, notify the second control unit to output the performance data output by the counter array to the output module.
[0007] In one possible implementation, the event statistics module further includes a synchronization unit, specifically configured to: receive a first control signal; in response to the first control signal changing from a first state to a second state, output a second start signal to the first control unit; and in response to the first control signal changing from a second state to a first state, output a second stop signal to the first control unit. The first control unit is specifically configured to: in response to receiving the second start signal, control the counter array to count the number of target events in the event stream; and in response to receiving the second stop signal, control the counter array to stop counting. The synchronization unit is also configured to: receive a second control signal; and in response to the second control signal changing from a first state to a second state or from a second state to a first state, output a second reporting signal to the first control unit. The first control unit is also configured to: in response to receiving the second reporting signal, when the configuration information indicates that performance data is reported via a first reporting method, notify the second control unit to output the performance data output by the counter array to the output module.
[0008] In one possible implementation, when the configuration information indicates that performance data is reported via a second reporting method, the first control unit is further configured to store the performance data output by the counter array in the configuration register; the configuration register is further configured to output the stored performance data when a request to obtain performance data is received.
[0009] In one possible implementation, the target event includes multiple first events. When the counter array is located in the event statistics module, the configuration information also indicates the counter bit width. The event statistics module is specifically used to divide the counter array into multiple counter groups according to the counter bit width, wherein the total bit width of the counters in each counter group is equal to the counter bit width; and to control the multiple counter groups to count the number of each first event in the event stream.
[0010] In one possible implementation, the target event includes multiple first events, and the event statistics module is specifically used to divide a statistical period into multiple sub-periods, determine the first events counted by the counter array in each sub-period based on the target event, and control the counter array to count the number of corresponding first events in the event stream in each sub-period.
[0011] In one possible implementation, the configuration information also indicates the priority of different types of first events included in the target event, and the event statistics module is specifically used to determine the first event counted by the counter array in each sub-cycle according to the priority.
[0012] In one possible implementation, the plurality of event statistics modules synchronously receive the first control signal and the second control signal.
[0013] In one possible implementation, the event statistics module is activated upon receiving a first enable signal or a second enable signal; the first enable signal is a signal synchronously sent to all event statistics modules, and the second enable signal is a signal sent to some event statistics modules respectively.
[0014] According to another aspect of this disclosure, a performance monitoring method is provided, the method being applied to a performance monitoring system, the system including multiple event statistics modules and an output module, the method comprising: causing the event statistics module to: control a counter array to count the number of target events in an event stream according to a target event indicated by configuration information; when the configuration information indicates that performance data is to be reported via a first reporting method, outputting the performance data output by the counter array to the output module; causing the output module to: arbitrate from the multiple event statistics modules to select a first event statistics module to be reported, adjust the bandwidth ratio of the performance data allowed to be transmitted on the communication path between the output module and memory according to the weight information of the first event statistics module, and then output the performance data from the first event statistics module to memory; wherein, the counter array is located in the event statistics module or connected to the event statistics module.
[0015] In one possible implementation, the event statistics module includes a configuration register, a first control unit, and a second control unit. The configuration register stores the configuration information, which also indicates a reporting frequency. Controlling the counter array to count the number of target events in the event stream according to the target events indicated by the configuration information includes: using the configuration register to transmit a received first start signal and a first stop signal to the first control unit; using the first control unit, in response to receiving the first start signal, controlling the counter array to count the number of target events in the event stream, and in response to receiving the first stop signal, controlling the counter array to stop counting. When the configuration information indicates that performance data should be reported via a first reporting method, outputting the performance data output by the counter array to the output module includes: using the configuration register to output a first reporting signal to the first control unit according to the reporting frequency; using the first control unit, in response to receiving the first reporting signal, when the configuration information indicates that performance data should be reported via the first reporting method, notifying the second control unit to output the performance data output by the counter array to the output module.
[0016] In one possible implementation, the event statistics module further includes a synchronization unit. The step of controlling the counter array to count the number of target events in the event stream according to the target event indicated by the configuration information includes: using the synchronization unit to receive a first control signal; in response to the first control signal changing from a first state to a second state, outputting a second start signal to the first control unit; and in response to the first control signal changing from a second state to a first state, outputting a second stop signal to the first control unit; using the first control unit, in response to receiving the second start signal, controlling the counter array to count the number of target events in the event stream; and in response to receiving the second stop signal, controlling the counter array to stop counting. The step of outputting the performance data output by the counter array to the output module when the configuration information indicates that performance data should be reported via a first reporting method includes: using the synchronization unit to receive a second control signal; in response to the second control signal changing from a first state to a second state or from a second state to a first state, outputting a second reporting signal to the first control unit; and using the first control unit, in response to receiving the second reporting signal, notifying the second control unit to output the performance data output by the counter array to the output module when the configuration information indicates that performance data should be reported via the first reporting method.
[0017] In one possible implementation, when the configuration information indicates that performance data is reported via a second reporting method, the method further includes: using the first control unit to store the performance data output by the counter array in the configuration register; and using the configuration register to output the stored performance data when a request to obtain performance data is received.
[0018] In one possible implementation, the target event includes multiple first events. When the counter array is located in the event statistics module, the configuration information also indicates the counter bit width. Controlling the counter array to count the number of the target events in the event stream includes: dividing the counter array into multiple counter groups according to the counter bit width, wherein the total bit width of the counters in each counter group is equal to the counter bit width; and controlling the multiple counter groups to count the number of each type of first event in the event stream respectively.
[0019] In one possible implementation, the target event includes multiple first events, and controlling the multiple counter groups to count the number of the target events in the event stream includes: dividing a statistical period into multiple sub-periods, determining the first events counted by the counter array in each sub-period based on the target event, and controlling the counter array to count the number of the corresponding first events in the event stream in each sub-period.
[0020] In one possible implementation, the configuration information further indicates the priority of different types of first events included in the target event, and determining the events counted by each counter group in each sub-cycle according to the target event includes: determining the first events counted by the counter array in each sub-cycle according to the priority.
[0021] In one possible implementation, the plurality of event statistics modules synchronously receive the first control signal and the second control signal.
[0022] In one possible implementation, the event statistics module is activated upon receiving a first enable signal or a second enable signal; the first enable signal is a signal synchronously sent to all event statistics modules, and the second enable signal is a signal sent to some event statistics modules respectively.
[0023] According to another aspect of this disclosure, a chip is provided that includes the system described above.
[0024] According to another aspect of this disclosure, an electronic device is provided, including the chip described above.
[0025] According to the performance monitoring system of this disclosure, the event statistics module controls the counter array to count the number of target events in the event stream based on the target events indicated by the configuration information. When the configuration information indicates that performance data should be reported through a first reporting method, the performance data output by the counter array is output to the output module. The output module arbitrates from multiple event statistics modules to select the first event statistics module to be reported, adjusts the bandwidth ratio of the performance data allowed to be transmitted on the communication path between the output module and the memory according to the weight information of the first event statistics module, and then outputs the performance data from the first event statistics module to the memory. The counter array is located in or connected to the event statistics module, which improves the flexibility of the counter array's configuration. Adjusting the bandwidth ratio of the performance data allowed to be transmitted on the communication path between the output module and the memory according to the weight information reduces the resource contention during performance data reporting and minimizes the impact on the original functions of the module. The target events, performance data reporting methods, etc., can all be set through configuration information, improving the configuration flexibility of the event statistics module and adapting to the performance monitoring needs of various modules.
[0026] Because the performance monitoring process is controlled by configuration information, the parameterization is higher, which makes it faster for the performance monitoring system to adapt to different types of modules.
[0027] Furthermore, the performance monitoring system proposed in this embodiment provides selectable data reporting methods, supports adjusting the priority of events to be statistically analyzed, reduces system overhead, and improves hardware resource utilization and energy efficiency.
[0028] Furthermore, the performance monitoring system proposed in this embodiment uses a configurable bit-width counter array, combined with time-division multiplexing technology, to support large-scale parallel monitoring of events; it also supports dynamically adjusting the statistical events and has scalability.
[0029] Furthermore, the performance monitoring system proposed in this embodiment can synchronously receive an enable signal and start multiple event statistics modules, thereby reducing the sampling time deviation of each event statistics module and improving the accuracy of global performance analysis; it can also start individual event statistics modules separately, thereby improving the control flexibility of the event statistics modules.
[0030] Other features and aspects of this disclosure will become clear from the following detailed description of exemplary embodiments with reference to the accompanying drawings. Attached Figure Description
[0031] The accompanying drawings, which are included in and form part of this specification, illustrate exemplary embodiments, features, and aspects of this disclosure together with the specification and serve to explain the principles of this disclosure.
[0032] Figure 1This illustrates an exemplary application scenario of a performance monitoring system according to embodiments of the present disclosure.
[0033] Figure 2 A schematic diagram showing the structure of a performance monitoring system according to an embodiment of the present disclosure is provided.
[0034] Figure 3 A schematic diagram showing the structure of an output module according to an embodiment of the present disclosure is provided.
[0035] Figure 4 A schematic diagram showing the structure of an event statistics module according to an embodiment of the present disclosure is provided.
[0036] Figure 5 A schematic diagram showing the structure of an event statistics module according to an embodiment of the present disclosure is provided.
[0037] Figure 6 A schematic diagram of a second reporting method according to an embodiment of this disclosure is shown.
[0038] Figure 7 A schematic diagram illustrating the flow of a performance monitoring method according to an embodiment of the present disclosure is shown. Detailed Implementation
[0039] Various exemplary embodiments, features, and aspects of this disclosure will now be described in detail with reference to the accompanying drawings. The same reference numerals in the drawings denote elements that have the same or similar functions. Although various aspects of the embodiments are shown in the drawings, they are not necessarily drawn to scale unless specifically indicated otherwise.
[0040] As used herein, the terms “comprising,” “including,” “having,” or variations thereof are open-ended and include one or more of the stated features, integrals, elements, steps, components, or functions, but do not exclude the presence or addition of one or more other features, integrals, elements, steps, components, functions, or groups thereof.
[0041] When an element is referred to as “connected,” “coupled,” “responding,” or a variation thereof relative to another element, it may be directly connected, coupled, or responding to another element, or there may be an intermediate element present.
[0042] Although the terms first, second, third, etc., may be used herein to describe various elements / operations, these elements / operations should not be limited by these terms. These terms are only used to distinguish one element / operation from another. Therefore, without departing from the teachings of the inventive concept, a first element / operation in some embodiments may be referred to as a second element / operation in other embodiments.
[0043] The term “exemplary” as used herein means “serving as an example, embodiment, or illustration.” Any embodiment illustrated herein as “exemplary” is not necessarily to be construed as superior to or better than other embodiments.
[0044] Furthermore, to better illustrate this disclosure, numerous specific details are set forth in the following detailed description. Those skilled in the art will understand that this disclosure can be practiced without certain specific details. In some instances, methods, means, components, and circuits well known to those skilled in the art have not been described in detail in order to highlight the main points of this disclosure.
[0045] The principle of PMU is to acquire the event stream generated when the module to be monitored is working, and to count specific types of events in the event stream through event counters. The statistical results are the monitored performance data, and the performance of the module to be monitored can be analyzed based on the performance data.
[0046] The main drawbacks of existing PMU technologies are as follows:
[0047] 1. Using a fixed-width counter to count events, when the counter width is low, it cannot support large-scale parallel monitoring of events.
[0048] 2. It only supports a single data reporting mechanism, which can easily lead to resource contention and may affect the original functions of the module.
[0049] 3. In some scenarios, it may be necessary to monitor multiple modules. Existing technology sets up a separate PMU for each module, and the PMUs corresponding to different modules run independently. The time deviation of the performance data collection of each module is large, which affects the accuracy of the overall performance analysis.
[0050] 4. Static design makes the events monitored by the PMU fixed, which makes it difficult to meet the diverse monitoring needs of different hardware modules.
[0051] In view of this, this disclosure proposes a performance monitoring system, method, chip, and electronic device. The performance monitoring system proposed in this disclosure has a more flexible configuration, can adapt to the performance monitoring needs of various modules, can reduce the degree of resource competition when reporting performance data, and reduce the impact on the original functions of the modules.
[0052] Because the performance monitoring process is controlled by configuration information, the parameterization is higher, which makes it faster for the performance monitoring system to adapt to different types of modules.
[0053] Furthermore, the performance monitoring system proposed in this embodiment provides selectable data reporting methods, supports adjusting the priority of events to be statistically analyzed, reduces system overhead, and improves hardware resource utilization and energy efficiency.
[0054] Furthermore, the performance monitoring system proposed in this embodiment uses a configurable bit-width counter array, combined with time-division multiplexing technology, to support large-scale parallel monitoring of events; it also supports dynamically adjusting the statistical events and has scalability.
[0055] Furthermore, the performance monitoring system proposed in this embodiment can synchronously receive an enable signal and start multiple event statistics modules, thereby reducing the sampling time deviation of each event statistics module and improving the accuracy of global performance analysis; it can also start individual event statistics modules separately, thereby improving the control flexibility of the event statistics modules.
[0056] Figure 1 This illustrates an exemplary application scenario of a performance monitoring system according to embodiments of the present disclosure.
[0057] like Figure 1 As shown, the performance monitoring system is located in the first device, which may also include a system-on-a-chip (SOC), a graphics processing unit (GPU), memory, etc. The GPU may be the module to be monitored.
[0058] The performance monitoring system uses a counter array (not shown) to count events, and the statistical results are the performance data. Configuration information can be written to the performance monitoring system through a driver (not shown). The configuration information determines the working mode of the performance monitoring system, such as what events to monitor, which counters to use to monitor events, and how the performance data is reported.
[0059] The performance monitoring system can be started via the SOC. After startup, the system begins to collect statistics on events and report performance data according to the configuration information. Performance data can be reported to memory or stored within the monitoring system, awaiting a request from another device before being reported to that device. This performance data provides support for performance analysis of the modules under monitoring.
[0060] Those skilled in the art will understand that the performance monitoring system can also be set on the GPU, and the specific location of the performance monitoring system is not limited in the embodiments disclosed herein.
[0061] The performance monitoring system can also be used to monitor the performance of other modules, such as memory performance. This disclosure does not limit the specific types of modules that the performance monitoring system can monitor.
[0062] This performance monitoring system is applicable to various scenarios, including graphics rendering, deep learning, and scientific computing. In graphics rendering, it can monitor texture unit utilization and rasterization latency to optimize the rendering pipeline. In deep learning, it can monitor tensor core events and analyze model training efficiency in real time. In scientific computing, it can globally and synchronously monitor the load of multi-core computing units and dynamically allocate computing tasks. The performance monitoring system is also suitable for many other scenarios; it can be applied to any scenario where event statistics are required. Details regarding the application scenarios of this performance monitoring system will not be elaborated further here.
[0063] Figure 2 A schematic diagram showing the structure of a performance monitoring system according to an embodiment of the present disclosure is provided.
[0064] like Figure 2 As shown, in one possible implementation, the system includes multiple event statistics modules and an output module.
[0065] The event statistics module is used to: control the counter array to count the number of target events in the event stream according to the target events indicated by the configuration information; and output the performance data output by the counter array to the output module when the configuration information indicates that performance data should be reported through the first reporting method.
[0066] The output module is used to: arbitrate the first event statistics module to be reported from multiple event statistics modules, adjust the bandwidth ratio of the performance data allowed to be transmitted on the communication path between the output module and the memory according to the weight information of the first event statistics module, and then output the performance data from the first event statistics module to the memory.
[0067] The counter array is located in the event statistics module or connected to the event statistics module.
[0068] For example, a performance monitoring system may include multiple event statistics modules. Each event statistics module is a PMU (Power Management Unit). Each event statistics module is connected to an output module, which can be connected to memory via an Advanced Dextensible Interface (AXI).
[0069] Configuration information can be written to multiple event statistics modules through a driver. The configuration information written to each event statistics module can indicate the target events that the event statistics module should monitor. For example, when the module to be monitored is a GPU, the target events may include GPU core utilization, cache hit rate, etc. The specific type of target event is not limited in the embodiments of this disclosure.
[0070] The event statistics module is used to control the counter array to count the number of target events in the event stream according to the target event indicated by the configuration information. The counter array is located in the event statistics module or connected to the event statistics module. This disclosure embodiment does not limit the specific location of the counter array.
[0071] When the counter array is located within the event statistics module, the event statistics module may include an input interface for the event stream. The event statistics module can receive the event stream and control the counter array to count the number of target events in the event stream, with the statistical results serving as performance data.
[0072] When a counter array is connected to an event statistics module, the counter array's placement within the electronic device becomes more flexible. Counter arrays can be placed near key modules that may require monitoring. In this case, the event stream input interface can be located at the counter array. The event statistics module can control the counter array to count the number of target events in the event stream and obtain the statistical results from the counter array as performance data.
[0073] The performance monitoring system of this disclosure supports a first reporting method for performance data. The first reporting method involves reporting performance data to memory via the AXI bus. Configuration information can indicate the performance data reporting method for the event statistics module.
[0074] In addition to transmitting performance data, the AXI bus is also used to transmit other types of data, such as GPU rendering results. Configuration information can also indicate the weighting information of the event statistics module, and based on this weighting information, the bandwidth percentage of performance data allowed to be transmitted along the communication path between the output module and memory (i.e., the AXI bus) can be determined.
[0075] When the configuration information indicates that performance data should be reported through the first reporting method, the event statistics module can output the performance data output by the counter array to the output module.
[0076] The output module can arbitrate the first event statistics module to be reported from multiple event statistics modules, adjust the bandwidth ratio of the performance data allowed to be transmitted on the communication path between the output module and the memory according to the weight information of the first event statistics module, and then output the performance data from the first event statistics module to the memory.
[0077] The event statistics module can proactively output its own weight information to the output module, or the output module can proactively access the event statistics module to obtain its weight information. This embodiment of the disclosure does not limit the method or timing of the output module obtaining the weight information from the event statistics module.
[0078] Figure 3 A schematic diagram showing the structure of an output module according to an embodiment of the present disclosure is provided.
[0079] like Figure 3 As shown, the output module may include a first arbiter and a second arbiter. Multiple inputs of the first arbiter are connected to multiple event statistics modules, and the output of the first arbiter is connected to one input of the second arbiter. The other inputs of the second arbiter are connected to other modules in the electronic device that transmit data to memory via the AXI bus (e.g., cache within the GPU).
[0080] Each event statistics module can only count a limited number of events. Therefore, if the number of events in the event stream exceeds the countable number of events that a single event statistics module can count, multiple event statistics modules need to be activated, and all of them will output performance data. The first arbitrator, upon receiving performance data from multiple event statistics modules, arbitrates and determines which event statistics module is selected, then transmits the performance data from the first event statistics module to the second arbitrator. The first arbitrator can implement arbitration based on existing technology; the specific arbitration mechanism of the first arbitrator will not be elaborated upon here.
[0081] The second arbitrator adjusts the bandwidth ratio of performance data based on the weight information of the first event statistics module, and then transmits the performance data to memory via the AXI bus. The second arbitrator can use a weighted round-robin (WRR) method for arbitration. For example, assuming the AXI bus transmits data A, data B, and data C, where data A is performance data from the first event statistics module. The weight information of the first event statistics module may include a weight of 3 for data A, a weight of 2 for data B, and a weight of 1 for data C. Based on this weight information, the bandwidth ratio of the performance data allowed to be transmitted via the communication path between the output module and memory (i.e., the AXI bus) can be 0.5. Each arbitration by the second arbitrator can transmit a data volume of D. Therefore, the second arbitrator can transmit data A in the first 1-3 arbitrations, data B in the 4th-5th arbitrations, and data C in the 6th arbitration, with each transmission containing a data volume of D.
[0082] This disclosure does not limit the specific arbitration method of the second arbitrator, as long as the different types of data transmitted on the AXI bus can be transmitted to the memory in a balanced manner.
[0083] The embodiments disclosed herein do not limit the specific structure of the output module, as long as the output module can achieve the above functions.
[0084] It should be understood that the output module and the memory can also communicate in other ways, and the embodiments of this disclosure do not limit the specific communication method between the output module and the memory.
[0085] According to the performance monitoring system of this disclosure, the event statistics module controls the counter array to count the number of target events in the event stream based on the target events indicated by the configuration information. When the configuration information indicates that performance data should be reported through a first reporting method, the performance data output by the counter array is output to the output module. The output module arbitrates from multiple event statistics modules to select the first event statistics module to be reported, adjusts the bandwidth ratio of the performance data allowed to be transmitted on the communication path between the output module and the memory according to the weight information of the first event statistics module, and then outputs the performance data from the first event statistics module to the memory. The counter array is located in or connected to the event statistics module, which improves the flexibility of the counter array's configuration. Adjusting the bandwidth ratio of the performance data allowed to be transmitted on the communication path between the output module and the memory according to the weight information reduces the resource contention during performance data reporting and minimizes the impact on the original functions of the module. The target events, performance data reporting methods, etc., can all be set through configuration information, improving the configuration flexibility of the event statistics module and adapting to the performance monitoring needs of various modules.
[0086] In one possible implementation, the event statistics module is activated upon receiving a first enable signal or a second enable signal;
[0087] The first enable signal is a signal that is synchronously sent to all event statistics modules, and the second enable signal is a signal that is sent to some event statistics modules respectively.
[0088] For example, the event statistics module can be started by outputting an enable signal to it. Once started, the event statistics module will perform the functions described above.
[0089] The enable signal can be the first enable signal. The first enable signal is a signal sent synchronously to all event statistics modules. In scenarios where the event stream includes many events, the first enable signal can be used to synchronously start multiple event statistics modules to jointly complete event statistics. This method ensures that multiple event statistics modules can start synchronously, reduces the time deviation of data collection between modules, and improves the accuracy of global performance analysis.
[0090] The enable signal can be a second enable signal. This second enable signal is sent separately to certain event statistics modules. In scenarios where the event stream includes relatively few events, the second enable signal can be used to activate only a portion of the event statistics modules. This approach reduces the power consumption of the performance monitoring system.
[0091] The enable signal can be output to the event statistics module by a driver (not shown). This disclosure does not limit the specific output method of the enable signal.
[0092] Figure 4A schematic diagram showing the structure of an event statistics module according to an embodiment of the present disclosure is provided.
[0093] In one possible implementation, the event statistics module includes a configuration register, a first control unit, and a second control unit. The configuration register stores configuration information, which also indicates the reporting frequency.
[0094] The configuration register is specifically used to transmit the received first start signal and first stop signal to the first control unit;
[0095] The first control unit is specifically configured to, in response to receiving a first start signal, control the counter array to count the number of target events in the event stream, and in response to receiving a first stop signal, control the counter array to stop counting;
[0096] The configuration register is also used to output a first reporting signal to the first control unit according to the reporting frequency;
[0097] The first control unit is also configured to, in response to receiving the first reporting signal, when the configuration information indicates that performance data is reported through the first reporting method, notify the second control unit to output the performance data output by the counter array to the output module.
[0098] For example, such as Figure 4 As shown, the event statistics module includes a configuration register, a first control unit, and a second control unit. Figure 4 In the example, the counter array is also located in the event statistics module.
[0099] The configuration register can be used to store configuration information, and the configuration information can be adjusted by accessing the configuration register. This disclosure does not limit the specific method of accessing the configuration register; for example, the configuration register can be accessed through a driver. In this case, the first enable signal and the second enable signal received by the event statistics module can be written to the configuration register by the driver.
[0100] The first control unit can be used to control the counter array to start and stop event statistics, output performance data, adjust the counter bit width of the counter array, and adjust the counter statistics events. The second control unit can be a Direct Memory Access (DMA) controller, used to output the performance data output by the counter array to the output module.
[0101] The driver can write a first start signal to the configuration register. When the configuration register receives the first start signal, it can output the first start signal to the first control unit. In response to receiving the first start signal, the first control unit controls the counter array to count the number of target events in the event stream.
[0102] The driver can write a first stop signal to the configuration register. When the configuration register receives the first stop signal, it can output the first stop signal to the first control unit. In response to receiving the first stop signal, the first control unit controls the counter array to stop counting.
[0103] The first start signal and the first stop signal can also be received in other ways, and this disclosure does not limit this.
[0104] The configuration information can also indicate the reporting frequency. The reporting frequency refers to the frequency at which performance data is reported. The configuration register can also output a first reporting signal to the first control unit according to the reporting frequency. When the first control unit receives the first reporting signal, it determines that performance data needs to be reported once. When the configuration information indicates that performance data should be reported via the first reporting method, it determines that the performance data needs to be reported to memory. The first control unit can notify the second control unit to output the performance data output by the counter array to the output module.
[0105] It should be understood that higher reporting frequency results in higher data accuracy, but also higher resource consumption. The reporting frequency can be dynamically adjusted according to the application scenario requirements to achieve higher data accuracy.
[0106] The configuration register can transmit more information to the first control unit, as long as it is information that the first control unit may use. This embodiment of the disclosure does not limit the specific content transmitted by the configuration register to the first control unit.
[0107] When the output module receives performance data from any event statistics module, it can actively access the configuration register to obtain the weight information of that event statistics module. The configuration register can also output the weight information of the event statistics module to the output module simultaneously with the performance data. This embodiment does not limit the specific method of outputting the weight information.
[0108] Figure 5 A schematic diagram showing the structure of an event statistics module according to an embodiment of the present disclosure is provided.
[0109] In one possible implementation, the event statistics module also includes a synchronization unit.
[0110] The synchronization unit is specifically used to receive a first control signal, and in response to the first control signal changing from a first state to a second state, output a second start signal to the first control unit, and in response to the first control signal changing from a second state to a first state, output a second stop signal to the first control unit.
[0111] The first control unit is specifically configured to, in response to receiving a second start signal, control the counter array to count the number of target events in the event stream, and in response to receiving a second stop signal, control the counter array to stop counting;
[0112] The synchronization unit is also configured to receive a second control signal and, in response to the second control signal changing from a first state to a second state or from a second state to a first state, output a second reporting signal to the first control unit.
[0113] The first control unit is also configured to, in response to receiving a second reporting signal, when the configuration information indicates that performance data is reported through the first reporting method, notify the second control unit to output the performance data output by the counter array to the output module.
[0114] For example, such as Figure 5 As shown, the event statistics module includes a configuration register, a first control unit, a second control unit, and a synchronization unit. Figure 5 In the example, the counter array is also located in the event statistics module.
[0115] The functions of the first control unit and the second control unit have been described above and will not be repeated here.
[0116] The system can simultaneously receive a first control signal and a second control signal. The first control signal is used to control the event statistics module to start and stop event statistics. The second control signal is used to control the event statistics module to report performance data.
[0117] For example, in response to the first control signal changing from a first state to a second state, the third control unit outputs a second start signal to the first control unit, and in response to the first control signal changing from a second state to a first state, outputs a second stop signal to the first control unit. The first state can be a low level, and the second state can be a high level.
[0118] The second start signal has the same function as the first start signal. The second stop signal has the same function as the first stop signal. That is, in response to receiving the second start signal, the first control unit controls the counter array to count the number of target events in the event stream, and in response to receiving the second stop signal, controls the counter array to stop counting.
[0119] The synchronization unit responds to the second control signal changing from the first state to the second state or from the second state to the first state by outputting a second reporting signal to the first control unit.
[0120] The second reporting signal has the same function as the first reporting signal. That is, in response to receiving the second reporting signal, when the configuration information indicates that performance data should be reported through the first reporting method, the first control unit notifies the second control unit to output the performance data output by the counter array to the output module.
[0121] This approach makes the counter array control mechanism of the event statistics module more flexible.
[0122] In one possible implementation, multiple event statistics modules synchronously receive the first control signal and the second control signal.
[0123] For example, the first and second control signals can be broadcast from the SOC to each event statistics module. That is, multiple event statistics modules synchronously receive the first and second control signals. The SOC can distribute the first and second control signals through a clock tree network, and each event statistics module can synchronize the first and second control signals to its own clock domain via clock synchronization, and then use the first and second control signals to control the synchronization unit. In this way, the acquisition time deviation between event statistics modules can be further reduced, improving the accuracy of global performance analysis.
[0124] The performance monitoring system of this disclosure also supports a second reporting method for performance data. The second reporting method involves storing performance data in a configuration register and waiting for it to be retrieved by other modules. Figure 6 A schematic diagram of a second reporting method according to an embodiment of this disclosure is shown.
[0125] In one possible implementation, when the configuration information indicates that performance data should be reported via a second reporting method,
[0126] The first control unit is also used to store the performance data output by the counter array in a configuration register;
[0127] The configuration register is also used to output the stored performance data when a request for performance data is received.
[0128] For example, such as Figure 6 As shown, if the configuration information indicates that performance data should be reported via the second reporting method, the first control unit can store the performance data output by the counter array into the configuration register. Upon receiving a request to retrieve performance data, the configuration register outputs the stored performance data to the requesting party.
[0129] This approach makes performance data reporting more flexible. When configuration register resources are insufficient or in high-throughput scenarios, the first reporting method can be prioritized. The second reporting method has lower latency, and when real-time performance data reporting is required, it can be prioritized, reducing resource contention.
[0130] It should be understood that the methods for reporting performance data are not limited to the examples described above; performance data can also be reported when the counter overflows. This disclosure does not limit the methods for reporting performance data.
[0131] It should be understood that although the event statistics module supports receiving the first start signal, the first stop signal, the first control signal, and the second control signal, in actual application, if the event statistics module has already received the first start signal, then the event statistics module will not receive the first control signal and the second control signal during this start-up process.
[0132] The following describes how to adjust the counter bit width of a counter array.
[0133] In one possible implementation, the target event includes multiple first events, and when the counter array is located in the event statistics module, the configuration information also indicates the counter bit width.
[0134] The event statistics module is specifically used to divide the counter array into multiple counter groups according to the counter bit width, and the total bit width of the counters in each counter group is equal to the counter bit width;
[0135] Control multiple counter groups to count the number of each type of first event in the event stream.
[0136] For example, when the counter array is located in the event statistics module, it can include multiple fixed-width counters, such as 32-bit counters.
[0137] The target event can include multiple first events. Different first events may require different statistical times. The statistical time required for a first event determines the counter bit width for that first event. Configuration information can indicate the counter bit width for each type of first event.
[0138] Based on the counter bit width indicated in the configuration information, the event statistics module can divide the counter array into multiple counter groups. The total bit width of the counters in each counter group is equal to the counter bit width of the first event to be counted in that counter group. The event statistics module can control multiple counter groups to count the number of each type of first event in the event stream.
[0139] For example, some events require high-precision, long-term statistical analysis to determine the performance of the module being monitored when parsing performance data, such as memory bandwidth statistics. In this case, the counter width for the event can be configured to 128 bits. The event statistics module can group four adjacent 32-bit counters into a group based on the counter width, with the total bit width of the counters in this group equal to 128 bits, equivalent to a single 128-bit counter. This counter group can be used to monitor memory bandwidth statistics.
[0140] Other events require shorter statistical time, such as cache hit rate and branch prediction error. In this case, the counter width for this event can be configured to 64 bits. The event statistics module can group two adjacent 32-bit counters into a group based on the counter width. The total bit width of the counters in this group is equal to 64 bits, which is equivalent to a 64-bit counter. This counter group can be used to monitor cache hit events or branch prediction error events.
[0141] Some events require even shorter statistical time, such as GPU core utilization. In this case, the counter bit width for this event can be configured to 32 bits. Since the counter bit width indicated by the configuration information is the same as the fixed bit width of the counters in the counter array, it is sufficient to select one counter as a separate counter group for monitoring GPU core utilization.
[0142] It should be understood that the methods for grouping counters are not limited to the examples above. Multiple counter groups can be selected in the counter array to count the number of the same first event to ensure statistical accuracy. The target event may also include only one event. As long as the counter group can meet the statistical time requirements of the event, the specific grouping method of the counters is not limited in the embodiments of this disclosure.
[0143] In this way, the counter bit width can be parameterized, making the statistical method of the counter array more flexible.
[0144] The configuration register can also output configuration information to the first control unit. Based on the reporting frequency, reporting method, counter bit width, and the number of counters in its controllable counter array indicated by the configuration information, the first control unit can determine which counters in its controllable counter array participate in event statistics. Additionally, a continuously active counter can be set in the counter array to provide clock information.
[0145] In one possible implementation, the target event includes multiple first events.
[0146] The event statistics module is specifically used to divide a statistical period into multiple sub-periods, determine the first event to be counted by the counter array in each sub-period based on the target event, and control the counter array to count the number of the corresponding first events in the event stream in each sub-period.
[0147] For example, by combining time-sharing multiplexing technology, the event statistics module can count more types of events in a shorter time. The principle of time-sharing multiplexing is to use the same resources to complete different tasks in different time periods. If the target event includes multiple first events, a statistical period can be divided into multiple sub-periods, with different sub-periods corresponding to different time periods. Based on the target event, the first event to be counted by the counter array in each sub-period is determined, and in each sub-period, the counter array is controlled to count the number of corresponding events in the event stream.
[0148] As mentioned above, a counter group with a total width of 64 bits can be obtained through grouping, used to count cache hit rate or branch prediction error. If the target event includes both cache hit rate and branch prediction error, the statistical period can be divided into two sub-periods. Each counter group is determined to count cache hit rate in the first sub-period and branch prediction error in the second sub-period. Subsequently, after counting cache hit rate in the first sub-period, the performance data is reported; after counting branch prediction error in the second sub-period, the performance data is reported.
[0149] As mentioned above, a 32-bit counter can be used to calculate GPU core utilization. If multiple GPU cores need to have their utilization monitored, each GPU core utilization can be treated as a separate first event. If the target events include GPU core 1 utilization, GPU core 2 utilization, GPU core 3 utilization, and GPU core 4 utilization, the statistical period can be divided into four sub-periods. Each counter group is configured to calculate GPU core 1 utilization in the first sub-period, GPU core 2 utilization in the second sub-period, GPU core 3 utilization in the third sub-period, and GPU core 4 utilization in the fourth sub-period. Subsequently, after calculating GPU core 1 utilization in the first sub-period, the performance data is reported; after calculating GPU core 2 utilization in the second sub-period, the performance data is reported; after calculating GPU core 3 utilization in the third sub-period, the performance data is reported; and after calculating GPU core 4 utilization in the fourth sub-period, the performance data is reported.
[0150] This approach enables a single event statistics module to support parallel statistics for multiple events, improving its efficiency. Compared to a 128-bit counter group, a 64-bit counter can achieve twice the event statistics; a 32-bit counter can achieve four times the event statistics.
[0151] In one possible implementation, the configuration information also indicates the priority of the different types of first events included in the target event.
[0152] The event statistics module is specifically used to determine the events that the counter array counts in each sub-cycle, based on priority.
[0153] For example, the priority of different types of first events within the target event can be determined based on the real-time nature of the event. Priorities can be categorized into high, medium, and low levels. Higher-priority first events can be statistically analyzed earlier.
[0154] For example, target events may include two primary events: memory access latency and power consumption statistics. Memory access latency has higher real-time requirements, so it can be set as a high priority, while power consumption statistics can be set as a low priority.
[0155] The event statistics module can determine the first event to be counted by the counter array in each sub-cycle based on priority. This allows the statistical results of events with high real-time requirements to be reported earlier.
[0156] The priority of events can be dynamically adjusted by modifying configuration information (this can be achieved through FPGA programmable logic). The target events may differ for different modules being monitored, and the priorities of the different first events included within the target events may also differ. This approach ensures that the performance monitoring system is compatible with various modules.
[0157] Those skilled in the art will understand that the configuration information can also indicate more, such as whether to disable event statistics. If a performance monitoring system is installed in the module to be monitored, the operation of the performance monitoring system will increase the power consumption of the module. If it is determined that the power consumption of the module to be monitored is too high, the configuration information can be adjusted so that the configuration information indicates that event statistics are disabled, thereby saving power consumption of the module to be monitored.
[0158] This disclosure also proposes a performance monitoring method. Figure 7 A schematic diagram illustrating the flow of a performance monitoring method according to an embodiment of the present disclosure is shown.
[0159] like Figure 7 As shown, in one possible implementation, the method is applied to a performance monitoring system, the system including multiple event statistics modules and an output module, and the method includes:
[0160] The event statistics module controls the counter array to count the number of target events in the event stream according to the target events indicated by the configuration information (step S71). When the configuration information indicates that performance data should be reported through the first reporting method, the performance data output by the counter array is output to the output module (step S72). The output module arbitrates the first event statistics module to be reported from multiple event statistics modules, adjusts the bandwidth ratio of the performance data allowed to be transmitted on the communication path between the output module and the memory according to the weight information of the first event statistics module (step S73), and then outputs the performance data from the first event statistics module to the memory (step S74). The counter array is located in the event statistics module or connected to the event statistics module.
[0161] In one possible implementation, the event statistics module includes a configuration register, a first control unit, and a second control unit. The configuration register stores the configuration information, which also indicates a reporting frequency. Controlling the counter array to count the number of target events in the event stream according to the target events indicated by the configuration information includes: using the configuration register to transmit a received first start signal and a first stop signal to the first control unit; using the first control unit, in response to receiving the first start signal, controlling the counter array to count the number of target events in the event stream, and in response to receiving the first stop signal, controlling the counter array to stop counting. When the configuration information indicates that performance data should be reported via a first reporting method, outputting the performance data output by the counter array to the output module includes: using the configuration register to output a first reporting signal to the first control unit according to the reporting frequency; using the first control unit, in response to receiving the first reporting signal, when the configuration information indicates that performance data should be reported via the first reporting method, notifying the second control unit to output the performance data output by the counter array to the output module.
[0162] In one possible implementation, the event statistics module further includes a synchronization unit. The step of controlling the counter array to count the number of target events in the event stream according to the target event indicated by the configuration information includes: using the synchronization unit to receive a first control signal; in response to the first control signal changing from a first state to a second state, outputting a second start signal to the first control unit; and in response to the first control signal changing from a second state to a first state, outputting a second stop signal to the first control unit; using the first control unit, in response to receiving the second start signal, controlling the counter array to count the number of target events in the event stream; and in response to receiving the second stop signal, controlling the counter array to stop counting. The step of outputting the performance data output by the counter array to the output module when the configuration information indicates that performance data should be reported via a first reporting method includes: using the synchronization unit to receive a second control signal; in response to the second control signal changing from a first state to a second state or from a second state to a first state, outputting a second reporting signal to the first control unit; and using the first control unit, in response to receiving the second reporting signal, notifying the second control unit to output the performance data output by the counter array to the output module when the configuration information indicates that performance data should be reported via the first reporting method.
[0163] In one possible implementation, when the configuration information indicates that performance data is reported via a second reporting method, the method further includes: using the first control unit to store the performance data output by the counter array in the configuration register; and using the configuration register to output the stored performance data when a request to obtain performance data is received.
[0164] In one possible implementation, the target event includes multiple first events. When the counter array is located in the event statistics module, the configuration information also indicates the counter bit width. Controlling the counter array to count the number of the target events in the event stream includes: dividing the counter array into multiple counter groups according to the counter bit width, wherein the total bit width of the counters in each counter group is equal to the counter bit width; and controlling the multiple counter groups to count the number of each type of first event in the event stream respectively.
[0165] In one possible implementation, the target event includes multiple first events, and controlling the multiple counter groups to count the number of the target events in the event stream includes: dividing a statistical period into multiple sub-periods, determining the first events counted by the counter array in each sub-period based on the target event, and controlling the counter array to count the number of the corresponding first events in the event stream in each sub-period.
[0166] In one possible implementation, the configuration information further indicates the priority of different types of first events included in the target event, and determining the events counted by each counter group in each sub-cycle according to the target event includes: determining the first events counted by the counter array in each sub-cycle according to the priority.
[0167] In one possible implementation, the plurality of event statistics modules synchronously receive the first control signal and the second control signal.
[0168] In one possible implementation, the event statistics module is activated upon receiving a first enable signal or a second enable signal; the first enable signal is a signal synchronously sent to all event statistics modules, and the second enable signal is a signal sent to some event statistics modules respectively.
[0169] This disclosure also proposes a chip including the performance monitoring system described above. The chip can be a GPU, etc., and this disclosure does not limit the specific type of chip.
[0170] This disclosure also proposes an electronic device including the chip described above. The electronic device may be a terminal device or a server; this disclosure does not limit the specific type of electronic device.
[0171] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of an instruction containing one or more executable instructions for implementing a specified logical function. In some alternative implementations, the functions marked in the blocks may occur in a different order than those shown in the drawings. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, may be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions.
[0172] The various embodiments of this disclosure have been described above. These descriptions are exemplary and not exhaustive, nor are they limited to the disclosed embodiments. Many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principles, practical application, or technical improvements to the embodiments in the market, or to enable others skilled in the art to understand the embodiments disclosed herein.
Claims
1. A performance monitoring system, characterized in that, The system includes multiple event statistics modules and an output module. The event statistics module is used to: control the counter array to count the number of target events in the event stream according to the target events indicated by the configuration information; and output the performance data output by the counter array to the output module when the configuration information indicates that performance data is reported through the first reporting method. The output module is used to: arbitrate the first event statistics module to be reported from the multiple event statistics modules, adjust the bandwidth ratio of the performance data allowed to be transmitted on the communication path between the output module and the memory according to the weight information of the first event statistics module, and then output the performance data from the first event statistics module to the memory. The counter array is located in the event statistics module or connected to the event statistics module.
2. The system according to claim 1, characterized in that, The event statistics module includes a configuration register, a first control unit, and a second control unit. The configuration register stores the configuration information, which also indicates the reporting frequency. The configuration register is specifically used to transmit the received first start signal and first stop signal to the first control unit; The first control unit is specifically configured to, in response to receiving the first start signal, control the counter array to count the number of target events in the event stream, and in response to receiving the first stop signal, control the counter array to stop counting; The configuration register is also used to output a first reporting signal to the first control unit according to the reporting frequency; The first control unit is further configured to, in response to receiving the first reporting signal, when the configuration information indicates that performance data is reported through the first reporting method, notify the second control unit to output the performance data output by the counter array to the output module.
3. The system according to claim 2, characterized in that, The event statistics module also includes a synchronization unit. The synchronization unit is specifically used to receive a first control signal, and in response to the first control signal changing from a first state to a second state, output a second start signal to the first control unit, and in response to the first control signal changing from a second state to a first state, output a second stop signal to the first control unit. The first control unit is specifically configured to, in response to receiving the second start signal, control the counter array to count the number of target events in the event stream, and in response to receiving the second stop signal, control the counter array to stop counting; The synchronization unit is further configured to receive a second control signal, and in response to the second control signal changing from a first state to a second state or from a second state to a first state, output a second reporting signal to the first control unit; The first control unit is further configured to, in response to receiving the second reporting signal, when the configuration information indicates that performance data is reported through the first reporting method, notify the second control unit to output the performance data output by the counter array to the output module.
4. The system according to claim 2 or 3, characterized in that, When the configuration information indicates that performance data should be reported via the second reporting method, The first control unit is further configured to store the performance data output by the counter array in the configuration register; The configuration register is also used to output the stored performance data when a request to obtain performance data is received.
5. The system according to any one of claims 1-3, characterized in that, The target event includes multiple first events. When the counter array is located in the event statistics module, the configuration information also indicates the counter bit width. The event statistics module is specifically used to divide the counter array into multiple counter groups according to the counter bit width, wherein the total bit width of the counters in each counter group is equal to the counter bit width; The multiple counter groups are controlled to count the number of each type of first event in the event stream.
6. The system according to any one of claims 1-3, characterized in that, The target event includes multiple first events. The event statistics module is specifically used to divide a statistical period into multiple sub-periods, determine the first event to be counted by the counter array in each sub-period based on the target event, and control the counter array to count the number of the corresponding first events in the event stream in each sub-period.
7. The system according to claim 6, characterized in that, The configuration information also indicates the priority of the different types of first events included in the target event. The event statistics module is specifically used to determine the first event that the counter array counts in each sub-cycle, based on the priority.
8. The system according to claim 3, characterized in that, The multiple event statistics modules simultaneously receive the first control signal and the second control signal.
9. The system according to claim 1, characterized in that, The event statistics module is activated upon receiving a first enable signal or a second enable signal; The first enable signal is a signal that is synchronously sent to all event statistics modules, and the second enable signal is a signal that is sent to some event statistics modules respectively.
10. A chip, characterized in that, The system comprising any one of claims 1-9.
11. An electronic device, characterized in that, Includes the chip described in claim 10.
12. A performance monitoring method, characterized in that, The method is applied to a performance monitoring system, the system including multiple event statistics modules and an output module, the method including: The event statistics module is configured to: control the counter array to count the number of target events in the event stream according to the target events indicated by the configuration information; and output the performance data output by the counter array to the output module when the configuration information indicates that performance data should be reported through the first reporting method. The output module arbitrates the first event statistics module to be reported from among the multiple event statistics modules, adjusts the bandwidth ratio of the performance data allowed to be transmitted on the communication path between the output module and the memory according to the weight information of the first event statistics module, and then outputs the performance data from the first event statistics module to the memory. The counter array is located in the event statistics module or connected to the event statistics module.