Waveform data acquisition method, electronic device, and storage medium

By capturing and filtering user-specified target node data in a hardware simulation system, the problem of multi-node capture failure in existing technologies is solved, enabling flexible waveform data acquisition and efficient hardware simulation.

CN119249985BActive Publication Date: 2026-06-26SHANGHAI UNIVISTA IND SOFTWARE GRP CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHANGHAI UNIVISTA IND SOFTWARE GRP CO LTD
Filing Date
2024-10-08
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

In the hardware simulation system of Boolean processing units, the existing technology cannot capture the output of more than 8 processing units at the same time, resulting in information capture failure. In addition, users need to recompile and constrain the compiler, which reduces the execution efficiency of the processing unit instructions. Furthermore, instructions need to be recompiled when selecting different node signals.

Method used

By acquiring the target node specified by the user, capturing the target data of each processing unit and compressing and storing it in the cache, using the mask module and filter to filter the target data, and combining the change detection module to store only the changed data, the cache and memory usage are reduced, and the simulation speed is improved.

Benefits of technology

It enables users to capture any number of target nodes, avoids recompilation, improves debugging efficiency, reduces cache and memory space usage, and increases hardware simulation speed.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN119249985B_ABST
    Figure CN119249985B_ABST
Patent Text Reader

Abstract

The present application relates to chip design technical field, especially to a waveform data acquisition method, electronic equipment and storage medium, it is through the target node of user specified capture each processing unit target data and compress storage to cache, in each device cycle target data in the cache is removed to memory;Wherein, the step of capturing the target data of the fth processing unit and compressing storage to cache includes: according to the target node, the data stream of the fth processing unit continuous output is filtered to obtain the data block including target node data;While the data block of cache changes, the change flag position of the data block of data change is valid, and the change flag position of the data block of no change is invalid.It not only allows the user to specify a large number of target nodes or change target nodes, without recompilation, improves the debugging efficiency;And reduce the occupied cache and memory space, improve the hardware simulation speed.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of chip design technology, and in particular to a method for acquiring waveform data, an electronic device, and a storage medium. Background Technology

[0002] Currently, in hardware simulation systems based on Boolean Processing Units (BPs), a common method for achieving fully visual debugging is to capture the computation results of a subset of nodes (e.g., all triggers in the netlist) and upload them to the host. The host then performs software simulation on the gate-level netlist of the device under test (DUT) based on the states of these subset of nodes, thus obtaining the waveforms of all nodes in the gate-level netlist. Capturing the computation results of a subset of nodes is achieved by dynamically configuring a set of MUXs to select a portion of the BP's output. For example, in a cluster with 64 processing units, the outputs of 8 processing units are selected for capture to obtain the states of a subset of nodes. The hardware design would then require deploying eight 64-to-1 MUXs, and the input selection of these MUXs would be controlled in real-time via instructions. This design introduces several problems, including:

[0003] First, due to the number and selection mechanism of MUXs, only eight processing unit outputs can be selected and captured at any given time, regardless of changes in control signals. When more than eight processing unit outputs in a cluster need to be captured, some information capture will fail. To avoid this, the user design needs to be recompiled, and additional constraints need to be imposed on the processing unit compiler, thereby reducing the execution efficiency of processing unit instructions. For example, more wait instructions need to be inserted to prevent the required capture operations from being executed in the same device cycle.

[0004] Second, when users need to select different node signals for capture, it is necessary to recompile and generate new instructions. Summary of the Invention

[0005] To address the aforementioned technical problems, the present invention adopts the following technical solution: a method for acquiring waveform data, the method comprising the following steps:

[0006] S100, retrieve the target node specified by the user.

[0007] S200, according to the target node, capture the target data of each processing unit and compress and store it in the cache, and in each device cycle, move the target data in the cache to memory; wherein, the step of capturing the target data of the f-th processing unit and compressing and storing it in the cache includes:

[0008] S210, based on the target node, filter the data stream continuously output by the f-th processing unit to obtain a data block including the target node data.

[0009] S220, if the target data of the i-th data block in the current device cycle is different from the target data of the i-th data block in the previous device cycle, then the target data of the i-th data block in the current device cycle is stored in the data cache, and at the same time the status information of the change flag bit of the i-th data block in the current device cycle is set to valid and saved to the flag bit cache; otherwise, the target data of the i-th data block in the current device cycle is not stored, and only the status information of the change flag bit of the i-th data block in the current device cycle is set to invalid and saved to the flag bit cache.

[0010] The present invention has at least the following beneficial effects:

[0011] This invention provides a method, electronic device, and storage medium for acquiring waveform data. It captures data from all nodes and filters it according to user-specified target nodes to obtain target data. While caching changed data blocks, it marks the changed data blocks as valid and the unchanged data blocks as invalid, facilitating waveform data recovery. The number of user-specified target nodes is not limited by hardware, allowing users to specify a large number of target nodes or change target nodes without recompilation, thus improving debugging efficiency. Furthermore, it reduces the cache and memory space required, improving hardware simulation speed. Attached Figure Description

[0012] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0013] Figure 1 This is a schematic diagram of a debugging system based on hardware simulation provided in Embodiment 1 of the present invention;

[0014] Figure 2 This is a schematic diagram of another debugging system based on hardware simulation provided in Embodiment 1 of the present invention;

[0015] Figure 3 This is a flowchart of a waveform data acquisition method provided in Embodiment 2 of the present invention. Detailed Implementation

[0016] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0017] Unless otherwise defined, all technical and scientific terms used in the embodiments of this invention have the same meaning as commonly understood by those skilled in the art.

[0018] Please see Figure 1 The diagram illustrates a debugging system based on hardware simulation, comprising M target modules, wherein the j-th target module comprises K processing units (BPCs). j A capture module ProB j Each processing unit includes at least one node. Among them, BPC j ={BPC j,1 BPC j,2 ,…,BPC j,r ,…,BPC j,K}, BPC j,r For BPC j The r-th processing unit in the process, where r ranges from 1 to K.

[0019] In one implementation, the processing unit is a Boolean Processor (BP) or an FPGA (Field-Programmable Gate Array). Other processing unit types similar to those used in simulation also fall within the scope of this invention.

[0020] A node is any logical element in a circuit that generates observable signals.

[0021] In one implementation, a node is a flip-flop, register, combinational logic, signal line, module, or subsystem interface. In another implementation, combinational logic is an adder or decoder. Other types of selected logic elements that observe their state changes also fall within the scope of this invention.

[0022] For further information, please refer to the following: Figure 1 ProB j Includes: a mask module. j K filters j A package module Pack j and a data caching module Pbuffer j Filterj ={Filter j,1 ,Filter j,2 ,…,Filter j,r ,…,Filter j,K}, Filter j,r For Filter j The r-th filter.

[0023] Furthermore, the mask module j , used to obtain the target node specified by the user.

[0024] It should be noted that in existing technologies, if a user wants to view the entire waveform, since only information from a subset of nodes is captured, the software needs to perform gate-level netlist simulation on the host side to reconstruct the waveforms of all nodes in order to achieve full waveform visualization and debugging. This results in users having to wait a long time to obtain the waveform file during the debugging process. The system provided in this embodiment allows users to arbitrarily specify target nodes, or specify all nodes as target nodes. When the user specifies some or all target nodes, the system stores the data of all user-specified target nodes in memory to achieve waveform visualization, without the need for recompilation or software simulation to recover the data. When the user changes the specified target nodes, the system also does not need to recompile; it only needs to re-acquire the data of the changed subset of target nodes and re-store it in memory.

[0025] Furthermore, K filters j , respectively with the Mask j The connections are made, and each filter is connected to a processing unit; where the r-th filter is... j,r Connect the BPC j The r-th processing unit BPC j,r Used to obtain BPC j,r Continuous output data stream and filtering of the BPC j,r The target data is obtained by analyzing the data stream of the target node described in the text.

[0026] Among them, ProB of the j-th target module j Inside, each filter is associated with a mask. j Connection, Mask j The acquired user-specified target node will be simultaneously sent to ProB. j All filters within.

[0027] Among them, the ProB of the j-th target module j Within the unit, each filter is connected to its corresponding processor unit. (The sentence fragment about filters appears to be incomplete and lacks context. It's unclear what "Filter" refers to.) j,rConnect to BPC j,r For example, Filter j,r Used to acquire the data stream continuously output by the processor, and then filter the target data of the target node specified by the user.

[0028] Furthermore, the Pack module j , with Filter j All filter connections are used to receive target data and package it.

[0029] Furthermore, the data caching module Pbuffer j , with the Pack j A connection is used to cache target data, and the Pbuffer is used in each device cycle. j The target data in the memory is transferred to the memory, and the target data in the memory is used to present a fully visible waveform.

[0030] That is, when the processing unit of a device cycle finishes its operation, Pbuffer... j It will be packaged into a data frame and transmitted to memory. The device cycle is also called the DUT cycle. The frame header of the data frame contains a timestamp and identification information, which is used to present a fully visible waveform.

[0031] In Embodiment 1 of this invention, a capture module is configured for each group of K processing units. The capture module includes a masking module and a filter configured for each processing unit. The masking module acquires the target node specified by the user and sends it to each filter. Each filter filters the target data of the target node from the continuously output data stream of the corresponding processor. Finally, the capture module packages and sends the data to memory. The system provided by this invention does not have hardware limitations on the number of target nodes specified by the user. When the user needs to specify more than 8 processing units or change the target node, recompilation is not required. When the user needs to view the data of all nodes, waveform reconstruction through software simulation is not required; waveform data is obtained directly from the target data, improving debugging efficiency.

[0032] In one implementation, to address the issues of storage space consumption and reduced hardware emulation speed due to data transmission bandwidth limitations, the packaging module and the filter... j Each filter is also connected to a change detection module.

[0033] It should be noted that the number of filters and change detection modules is the same, with each filter configured with one change detection module.

[0034] Furthermore, the Pack j With the Filter j,rChange detection module between connections j,r This is used to detect whether the target data in the current data block has changed compared to the target data in the corresponding data block in the previous device cycle. If the target data has changed, the current target data is sent to the packaging module; otherwise, it is not sent. For easier understanding, please refer to [link to relevant documentation]. Figure 2 In ProB j It also includes K change detection modules, the first of which is change. j,1 Connect in Pack j With the Filter j,1 Between, the second change detection module change j,2 Connect in Pack j With the Filter j,2 And so on, the Kth change detection module change j,K Connect in Pack j With the Filter j,K between.

[0035] Specifically, the change detection module only sends changed data to the packaging module; that is, the packaging module only stores data that has changed in adjacent periods. This significantly reduces cache and memory usage. Furthermore, the reduced amount of data that needs to be uploaded to memory alleviates the pressure on data transmission bandwidth and improves hardware simulation speed.

[0036] In one implementation, the change j,r It is also used to generate status information of change flag bits. When the target data changes, the status information of the change flag bits of the data block to which the target data belongs is set to valid; otherwise, it is set to invalid.

[0037] The change flag status information is used to identify changes in the data within the continuously output data blocks of the current processing unit, and is used to present a waveform in conjunction with the changed target data stored in memory. That is, when the change flag status information is valid, the target data in memory is sequentially retrieved as the data information for the current data block; when the change flag status information is invalid, the data information of the corresponding data block from the previous device cycle is used as the data information for the current data block. Specifically, when sequentially retrieving the target data in memory, based on the Mask... j The system retrieves target data of the same width from memory based on the data width of the target node specified by the user. For example, if the user-specified target node contains 16 bits of data, then 16 bits of target data are sequentially retrieved from memory.

[0038] In this context, a data block is a fixed-size data segment divided from the continuously output data stream of the processing unit. For example, if the size of the continuously output data stream of the processing unit is 512 bits, and the fixed size of the data block is 32 bits, then 512 bits can be divided into 16 data blocks. If any bit in the current data block changes compared to the corresponding data block in the previous device cycle, then the current data block is considered different from the previous data block, and the status information of the change flag bit is set to valid.

[0039] In one implementation, please refer again. Figure 2 The j-th target module further includes: a flag bit buffer module Fbuffer. j Fbuffer j It is connected to each change detection module in the j-th target module and is used to sequentially save the status information of the change flag bits of all data blocks in all processing units. The status information of the change flag bits is used to recover the visual waveform.

[0040] In one implementation, the change j,r It also includes: a current data cache, a temporary storage module, and a comparison module. The current data cache is directly connected to the Filter. j,r A connection is established to cache the target data of the current data block. A temporary storage module is connected to the current data cache and is used to cache the target data of each data block in the previous device cycle. A comparison module is connected to both the current data cache and the temporary storage module to compare whether the target data of the current data block is the same as the target data of the corresponding data block in the previous device cycle. If they are the same, there is no change; otherwise, the target data has changed.

[0041] In one embodiment, the system further includes:

[0042] The upload control module is connected to the flag cache module and the data cache module in each capture module, respectively. It is used to initiate an upload request when the flag cache module is full of data for one device cycle, and to generate a clock stop request when the flag cache module and / or the data cache module reach the overflow threshold.

[0043] The clock stop request is used to request the system to pause the clock, thereby halting the further flow of data into the corresponding cache module until the cache module has sufficient space to store new data. By pausing the clock in a timely manner, data loss or system errors can be avoided, thus ensuring the correctness and integrity of data processing.

[0044] Example 2

[0045] Please refer to Figure 3, which illustrates a method for acquiring waveform data, the method comprising the following steps:

[0046] S100, retrieve the target node specified by the user.

[0047] S200: The target data of each processing unit is captured according to the target node and compressed and stored in the cache. In each device cycle, the target data in the cache is moved to memory.

[0048] It should be noted that the target node, processing unit, target data, device cycle, cache and memory in Embodiment 1 are all applicable to Embodiment 2 of the present invention, and will not be described again.

[0049] The steps of capturing the target data of the f-th processing unit and compressing and storing it in the cache include:

[0050] S210, based on the target node, filter the data stream continuously output by the f-th processing unit to obtain a data block including the target node data.

[0051] S210 also includes:

[0052] S211, Obtain the data stream continuously output by the f-th processing unit, wherein the data stream includes data from all nodes.

[0053] S212, the data stream is divided into multiple data blocks according to a preset size. It should be noted that each data block is bound to its own unique identifier, which is used to identify the order of data in the data stream and prevent data corruption.

[0054] S213, based on the target node, filter out the data in each data block that does not belong to the target node, and obtain a data block that includes the target data of the target node.

[0055] It should be noted that the unique identifier bound to the data block containing the target data remains unchanged after filtering.

[0056] S220, if the target data of the i-th data block in the current device cycle is different from the target data of the i-th data block in the previous device cycle, then the target data of the i-th data block in the current device cycle is stored in the data cache, and at the same time, the status information of the change flag bit of the i-th data block in the current device cycle is set to valid and saved to the flag bit cache; otherwise, the target data of the i-th data block in the current device cycle is not stored, and only the status information of the change flag bit of the i-th data block in the current device cycle is set to invalid and saved to the flag bit cache. The status information of the change flag bit is used to recover waveform data.

[0057] This method compresses data by storing data blocks that have changed between previous and subsequent cycles. This not only reduces the cache and memory space required but also alleviates the pressure on data transmission bandwidth and improves hardware simulation speed due to the significant reduction in the amount of data that needs to be uploaded to memory. While caching data blocks that have changed between previous and subsequent cycles, the change flags for changed data blocks are set to valid, while those for unchanged data blocks are set to invalid. When presenting the waveform, the status information of the change flags is used to determine whether the data in the current data block is the same as that in the previous cycle. If they are the same, the data from the corresponding data block in the previous cycle is directly used as the data in the current data block; otherwise, the corresponding data is sequentially retrieved from memory.

[0058] In one implementation, when the data stream continuously output by the processing unit in a device cycle is divided into T data blocks, the T data blocks of a device cycle are compared with the corresponding data blocks of the previous cycle T times in sequence, and T change flag bits of status information are generated in sequence.

[0059] In one implementation, the change flag is valid when its status information is set to 1, and invalid when its status information is set to 0. Other ways of representing the status information of the change flag as valid or invalid fall within the protection scope of this invention.

[0060] In one embodiment, S220 further includes:

[0061] S221, When the flag cache is full of data for one device cycle, an upload request is initiated.

[0062] In one implementation, S200, the step of transferring the target data in the cache to memory in each device cycle further includes:

[0063] S230 receives upload requests from various target modules in the system.

[0064] S240 arbitrates all received upload requests, selects a target upload request, and authorizes its corresponding target module to store the data in its flag cache into memory.

[0065] In one embodiment, S220 further includes:

[0066] S222, when at least one of the data cache and the flag cache reaches the overflow threshold, a clock stop request is generated.

[0067] It should be noted that there is no order between S221 and S222; they are two different handling measures under different conditions.

[0068] Furthermore, the method also includes:

[0069] S300, presenting waveform data, wherein the step of recovering the target data of the i-th data block in the waveform includes:

[0070] S310, when the status information of the change flag bit of the i-th data block is valid, the target data width of the target node included in the i-th data block is obtained, and the target data width data is sequentially retrieved from memory. It should be noted that when the user specifies the target node, the data width of the target node is also specified at the same time.

[0071] S320, when the status information of the change flag bit is invalid, the data of the corresponding data block in the previous cycle is used as the data of the current database.

[0072] This method enables rapid and lossless recovery of waveform data. Because the method provided in this invention does not require selecting only some nodes for output using a multiplexer, but instead directly acquires node data from all processors and grants the user configuration power, filtering data based on the user-specified target nodes. That is, when the user wants to capture more target nodes or change the target nodes, they can directly configure this without recompiling or performing gate-level netlist simulation to recover the waveforms of all nodes. This not only reduces memory usage but also allows for more flexible acquisition of target node data, improving debugging efficiency.

[0073] Embodiment 2 of the present invention also provides a non-transitory computer-readable storage medium, which can be disposed in an electronic device to store at least one instruction or at least one program related to implementing a method in the method embodiment, wherein the at least one instruction or the at least one program is loaded and executed by the processing unit to implement the method provided in the above embodiment.

[0074] Embodiment 2 of the present invention also provides an electronic device, including a processing unit and the aforementioned non-transitory computer-readable storage medium.

[0075] Embodiment 2 of the present invention also provides a computer program product, which includes program code. When the program product is run on an electronic device, the program code is used to cause the electronic device to perform the steps of the methods described above in various exemplary embodiments of the present invention.

[0076] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the above-described division of functional units and modules is used as an example. In practical applications, the above functions can be assigned to different functional units and modules as needed, that is, the internal structure of the device can be divided into different functional units or modules to complete all or part of the functions described above.

[0077] While specific embodiments of the invention have been described in detail by way of example, those skilled in the art should understand that the examples are for illustrative purposes only and not intended to limit the scope of the invention. Those skilled in the art should also understand that various modifications can be made to the embodiments without departing from the scope and spirit of the invention. The scope of this invention is defined by the appended claims.

Claims

1. A method for acquiring waveform data, characterized in that, The method includes the following steps: S100, retrieve the target node specified by the user; S200, according to the target node, capture the target data of each processing unit and compress and store it in the cache, and in each device cycle, move the target data in the cache to memory; wherein, the step of capturing the target data of the f-th processing unit and compressing and storing it in the cache includes: S210, based on the target node, filter the data stream continuously output by the f-th processing unit to obtain a data block including the target node data; S220, if the target data of the i-th data block in the current device cycle is different from the target data of the i-th data block in the previous device cycle, then the target data of the i-th data block in the current device cycle is stored in the data cache, and at the same time the status information of the change flag bit of the i-th data block in the current device cycle is set to valid and saved to the flag bit cache; otherwise, the target data of the i-th data block in the current device cycle is not stored, and only the status information of the change flag bit of the i-th data block in the current device cycle is set to invalid and saved to the flag bit cache.

2. The method according to claim 1, characterized in that, S210 also includes: S211, Obtain the data stream continuously output by the f-th processing unit, wherein the data stream includes data from all nodes; S212, the data stream is divided into multiple data blocks according to a preset size; S213, based on the target node, filter out the data in each data block that does not belong to the target node, and obtain a data block that includes the target data of the target node.

3. The method according to claim 1, characterized in that, S220 also includes: S221, When the flag cache is full of data for one device cycle, an upload request is initiated.

4. The method according to claim 3, characterized in that, The step of moving the target data from the cache to memory in each device cycle further includes: S230 receives upload requests from various target modules in the system; S240 arbitrates all received upload requests, selects a target upload request, and authorizes its corresponding target module to store the data in its flag cache into memory.

5. The method according to claim 1, characterized in that, S220 also includes: S222, when at least one of the data cache and the flag cache reaches the overflow threshold, a clock stop request is generated.

6. The method according to claim 1, characterized in that, When the data stream continuously output by the processing unit in a device cycle is divided into T data blocks, the T data blocks of a device cycle are compared with the corresponding data blocks of the previous cycle T times in sequence, and T change flag bits are generated in sequence to provide status information.

7. The method according to claim 1, characterized in that, When the status information of the change flag is set to 1, it is valid; when the status information of the change flag is set to 0, it is invalid.

8. A non-transitory computer-readable storage medium storing at least one instruction or at least one program segment, characterized in that, The at least one instruction or the at least one program segment is loaded and executed by the processor to implement the method as described in any one of claims 1-7.

9. An electronic device, characterized in that, Includes a processor and the non-transitory computer-readable storage medium as described in claim 8.