Function performance parameter acquisition method and device, equipment, storage medium and product
By adding callback functions and building a lightweight hash table structure using GCC/G++ compilation options in lightweight embedded systems, the problem of high overhead of the gprof tool in lightweight embedded systems is solved, and low-overhead function call frequency statistics and fast debugging are achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- VERISILICON MICROELECTRONICS (CHENGDU) CO LTD
- Filing Date
- 2026-03-20
- Publication Date
- 2026-06-19
AI Technical Summary
Existing gprof tools in lightweight embedded systems require the insertion of performance analysis code at each function entry point, resulting in significant runtime and storage overhead, which affects the normal functionality and real-time performance of the system.
The GCC/G++ -finstrument-functions compilation option is used to add a callback function call instruction at the entry point of each function. A lightweight static memory allocation hash table structure is built, and hash collisions are handled by using a linked list method. The frequency of function calls is recorded in real time to reduce memory and time overhead.
It implements low-overhead function call frequency statistics in lightweight embedded systems, supports multi-threaded real-time operating systems, and can output statistical results at any breakpoint, helping developers quickly locate and debug problems.
Smart Images

Figure CN122240439A_ABST
Abstract
Description
Technical Field
[0001] This application belongs to the field of computer science, and specifically relates to a method, apparatus, electronic device, computer-readable storage medium, and computer program product for obtaining function performance parameters. Background Technology
[0002] In system development, performance analysis tools are crucial for optimizing system real-time performance and efficiency. The gprof tool in the GNU (GNU's Not Unix) toolset is a classic performance profiling tool that employs a hybrid analysis approach and is widely used in Unix-like systems such as Linux. However, in resource-constrained micro-embedded systems (lightweight embedded systems), especially platforms running Real-Time Operating Systems (RTOS), the applicability of gprof is limited. For example, this tool requires instrumentation during compilation using specific options (such as the -pg option), inserting performance analysis code (to statistically analyze function call counts, call relationships, and function execution time) at each function entry point. This introduces significant runtime and storage overhead, thus affecting the normal functionality and real-time performance of resource-constrained embedded systems. Summary of the Invention
[0003] Therefore, the purpose of this application is to provide a method, apparatus, electronic device, computer-readable storage medium, and computer program product for obtaining function performance parameters, so as to minimize the memory and time overhead of performance analysis code during runtime.
[0004] The embodiments of this application are implemented as follows: In a first aspect, embodiments of this application provide a method for obtaining function performance parameters, comprising: obtaining a target file, wherein the target source code corresponding to the target file contains a call instruction for a callback function used to count the frequency of function calls, so as to monitor the call frequency of each function in the target source code; during the execution of the target file, counting the call frequency of each function in the target file to obtain call frequency statistics, and recording the call frequency statistics in a hash table structure.
[0005] In the above implementation scheme, this application no longer inserts performance analysis code at each function entry point, but instead adds a callback function call instruction at the entry point of each function. This minimizes the memory and time overhead of the performance analysis code during runtime. Furthermore, by constructing a lightweight hash table structure with static memory allocation and using linked lists to handle hash collisions, the execution count of each function in the target file can be recorded and counted in real time. The statistical results rely solely on function pointers for storage, avoiding memory fragmentation and uncertainty caused by dynamic memory allocation. In addition, this application is compatible with multi-threaded real-time operating systems (RTOS) and supports outputting statistical results at any breakpoint, thereby helping developers quickly locate and debug problems.
[0006] In one possible implementation of the first aspect embodiment, obtaining the target file includes: during the compilation of the target source code using a compiler that supports instrumentation, adding a call instruction for the callback function at the entry point of each function in the target source code to obtain the target file.
[0007] In the above implementation scheme, instead of using the gprof tool, this scheme utilizes the -finstrument-functions compilation option provided by GCC (GNU Compiler Collection) / G++ (GNU C++ Compiler) to automatically insert callback function call instructions at the entry or exit point of the target function during the compilation stage, thereby enabling the tracking and recording of function call frequency. At the same time, this application no longer inserts performance analysis code at each function entry point, but adds callback function call instructions at the entry point of each function, which can minimize the memory and time overhead of the performance analysis code during runtime.
[0008] In one possible implementation of the first aspect embodiment, the hash table structure includes: a hash bucket array, a node pool array, and a node counter; the number of hash buckets in the hash bucket array is greater than or equal to the number of nodes in the node pool array, and the index in each hash bucket is used to point to the head node of the corresponding linked list in the node pool array; the number of nodes in the node pool array is consistent with the number of functions to be monitored in the target file, and each node in the node pool array includes a first field, a second field, and a third field, the first field being used to record the address of the monitored function, the second field being used to record the function call frequency, and the third field being used to represent the index of the next node pointed to by the current node; the node counter is used to record the number of nodes used in the node pool array.
[0009] In the above implementation scheme, a lightweight, statically allocated hash table structure based on linked list collision handling is constructed to record and count the execution count of each function in the target file in real time with extremely low memory and time overhead. Specifically, the hash bucket array is a pre-allocated static array that acts as the index or "bucket" of the linked list. The number of hash buckets (tables) in the hash bucket array is greater than or equal to the number of nodes in the node pool array. For example, the number of hash buckets in the hash bucket array is a prime number that is closest to 1.25 times the number of nodes in the node pool array (i.e., the number of functions to be monitored), which maximizes the reduction of hash collisions. Each node in the node pool array includes a first field (e.g., func), a second field (e.g., count), and a third field (e.g., next), used to record the function call frequency. When a hash collision occurs (i.e., the hash values of two function pointers point to the same index in the hash bucket array), the next pointer in the linked list is used to point to the index of the next node to facilitate the management of conflicting nodes. A node counter is set to record the number of nodes used in the current node pool array to ensure the determinism of the statistical operation. The whole process ensures the accuracy of data updates. In terms of space, it only relies on a hash bucket and node pool of fixed size, and the space complexity is O(1). In terms of time, in the best case, the search and update can be completed in O(1) time.
[0010] In one possible implementation of the first aspect embodiment, the call frequency statistics are recorded in a hash table structure, including: determining the hash value of the function address of the called target function, and finding the corresponding hash bucket based on the hash value; if the index in the hash bucket is not the initial value, finding the corresponding target linked list based on the index in the hash bucket; traversing the nodes in the target linked list; if a node corresponding to the target function is found during traversal, updating the call frequency recorded in the second field of the traversed node.
[0011] In the above implementation scheme, the corresponding hash bucket is found by calculating the hash value of the function address of the target function, and the corresponding target linked list is found according to the index in the hash bucket. When traversing the nodes in the target linked list, if a node with the same function pointer is found, the call frequency of that node is directly updated. In this way, efficiency can be improved and the hash collision problem can be solved while ensuring the accuracy of data statistics.
[0012] In one possible implementation of the first aspect embodiment, the method further includes: if no node corresponding to the objective function is found, adding a new node to the objective linked list, updating the values recorded in each field of the new node, and updating the count value of the node counter.
[0013] In the above implementation scheme, if the node corresponding to the target function is not found, a new node needs to be added to the target linked list, and the various fields of the new node (including function pointer, call frequency and next field) need to be updated, as well as the count value of the node counter, to ensure the accuracy of the data, thereby laying a solid data foundation for subsequent performance optimization and resolving hash collision issues.
[0014] In one possible implementation of the first aspect embodiment, adding a node to the target linked list includes: adding a node to the target linked list using head insertion; the method further includes: updating the index in the hash bucket to the index of the newly added node.
[0015] In the above implementation scheme, when handling hash collisions, the "head insertion method" is preferred. Since functions used frequently are usually located at the end of the entire runtime, while functions used earlier are mostly performing some initialization or basic functions and are used relatively infrequently, choosing the "head insertion method" can reduce the number of times the linked list is traversed, thereby improving efficiency.
[0016] In one possible implementation of the first aspect embodiment, the method further includes: if the index in the hash bucket is an initial value, allocating a new node for the objective function, updating the index in the hash bucket to the index of the new node, updating each field of the new node, and updating the count value of the node counter.
[0017] In the above implementation scheme, if the index in the hash bucket is the initial value, it indicates that the hash bucket is empty, that is, it does not point to any node. The index in the hash bucket can be directly updated to the index of the new node, and the various fields of the new node (including function pointer, call frequency and next field) and the count value of the node counter are updated to ensure the accuracy of the data, thereby laying a solid data foundation for subsequent performance optimization.
[0018] In one possible implementation of the first aspect embodiment, the call frequency statistics include: function address and call frequency. The method further includes: obtaining the link mapping file corresponding to the target file, wherein the link mapping file includes: function address, function name, and number of bytes occupied by the function; and generating a statistical report containing function name, function address, number of bytes occupied by the function, and call frequency based on the call frequency statistics recorded in the hash table structure and the link mapping file.
[0019] In the above implementation scheme, since the output format of the call frequency statistics is usually F:<function address>,<call frequency>, it is not convenient for developers to use directly. By obtaining the link mapping file corresponding to the target file and using a comparison script to compare the output results (i.e., call frequency statistics) with the compiled link mapping file, a more detailed statistical report is generated. This allows the switch personnel to intuitively understand the detailed data of each function, which helps to identify code hotspots, optimize repetitive calculations and redundant code, thereby improving system execution efficiency. In lightweight embedded systems with multiple types of memory coexisting, analyzing the function call frequency statistics provides a key basis for memory layout optimization.
[0020] Secondly, embodiments of this application also provide a function performance parameter acquisition device, including: an acquisition module and a statistics module; the acquisition module is used to acquire a target file, wherein the target source code corresponding to the target file contains call instructions for callback functions used to count function call frequencies, so as to monitor the call frequency of each function in the target source code; the statistics module is used to count the call frequency of each function in the target file during the execution of the target file, obtain call frequency statistics data, and record the call frequency statistics data in a hash table structure.
[0021] Thirdly, embodiments of this application also provide an electronic device, including: a memory and a processor, the processor being connected to the memory; the memory being used to store a program; the processor being used to invoke the program stored in the memory to perform a method provided as described in the first aspect embodiments and / or in combination with any possible implementation of the first aspect embodiments.
[0022] Fourthly, embodiments of this application also provide a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, performs the method provided by any possible implementation of the first aspect embodiments and / or in combination with the first aspect embodiments.
[0023] Fifthly, embodiments of this application also provide a computer program product, the computer program product including a computer program, which, when executed by a processor, implements the method provided by any possible implementation of the first aspect embodiment and / or in combination with the first aspect embodiment.
[0024] The technical effects of any of the implementation methods in the second to fifth aspects can be referred to the technical effects of the same or similar implementation methods in the first aspect, and will not be repeated here.
[0025] Other features and advantages of this application will be set forth in the following description. The objectives and other advantages of this application can be realized and obtained through the structures specifically pointed out in the written description and the accompanying drawings. Attached Figure Description
[0026] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the accompanying drawings used in the embodiments will be briefly described below. Obviously, the drawings described below are only some embodiments of this application, and those skilled in the art can obtain other drawings based on these drawings. The above and other objects, features, and advantages of this application will become clearer through the accompanying drawings.
[0027] Figure 1 A flowchart illustrating a method for obtaining function performance parameters provided in an embodiment of this application is shown.
[0028] Figure 2 A schematic diagram illustrating the principle of a hash table structure provided in an embodiment of this application is shown.
[0029] Figure 3 This illustration shows a schematic diagram of another hash table structure provided in an embodiment of this application.
[0030] Figure 4 This illustration shows a schematic diagram illustrating the principle of resolving hash collisions according to an embodiment of this application.
[0031] Figure 5 The diagram illustrates the principle of a method for obtaining function performance parameters provided in an embodiment of this application.
[0032] Figure 6 This illustration shows a schematic diagram illustrating the principle of generating a statistical report according to an embodiment of this application.
[0033] Figure 7 A schematic diagram of a function performance parameter acquisition device provided in an embodiment of this application is shown.
[0034] Figure 8 A schematic diagram of the structure of an electronic device provided in an embodiment of this application is shown. Detailed Implementation
[0035] The technical solutions of the embodiments of this application will now be described with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of them. The following embodiments are provided as examples to more clearly illustrate the technical solutions of this application, and should not be used to limit the scope of protection of this application. Those skilled in the art will understand that, without conflict, the following embodiments and features can be combined with each other.
[0036] It should be noted that similar reference numerals and letters in the following figures indicate similar items; therefore, once an item is defined in one figure, it does not need to be further defined and explained in subsequent figures. Furthermore, relational terms such as "first," "second," etc., in the description of this application are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Moreover, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus.
[0037] Furthermore, the term "and / or" in this application is merely a description of the relationship between related objects, indicating that there can be three relationships. For example, A and / or B can represent three situations: A exists alone, A and B exist simultaneously, and B exists alone.
[0038] In the description of the embodiments of this application, unless otherwise expressly specified and limited, the technical term "connection" can be a direct connection or an indirect connection through an intermediate medium.
[0039] This application provides a method, apparatus, electronic device, computer-readable storage medium, and computer program product for obtaining function performance parameters, aiming to solve the problem that the current gprof tool is difficult to adapt to the performance analysis needs of lightweight embedded systems. For example, the tool needs to insert performance analysis code at each function entry point during compilation to count the number of function calls, call relationships, function running time, etc., which introduces significant runtime and storage overhead, thereby affecting the normal function and real-time performance of resource-constrained embedded systems.
[0040] This approach no longer uses the gprof tool, but instead leverages the `-finstrument-functions` compilation option provided by GCC / G++ and a cross-compilation toolchain compatible with lightweight embedded systems to collect performance parameters and improves the acquisition process. For example, using the `-finstrument-functions` compilation option provided by GCC / G++, callback function call instructions are added to the entry point of each function in the target source code during compilation, resulting in a target file. Then, when this target file is run in real-time on the lightweight embedded system, the callback function used to count function call frequencies is invoked based on the previously added call instructions, thereby achieving the tracking and recording of function call frequencies.
[0041] Instead of inserting performance analysis code at each function entry point, this application adds a callback function call instruction at the entry point of each function. This minimizes the memory and time overhead of the performance analysis code. At the same time, by constructing a lightweight hash table structure with static memory allocation and using linked lists to handle hash collisions, it can record and count the execution count of each function in the target file in real time. The statistical results are stored only by function pointers, are compatible with multi-threaded real-time operating systems, and support outputting statistical results at any breakpoint, thereby helping developers quickly locate and debug problems.
[0042] The lightweight embedded system (also known as a micro-embedded system) described in this application is a computer system designed to perform specific tasks under conditions of limited resources such as computing, storage, power consumption, and cost, with a microcontroller at its core. Compared with general-purpose computing systems (such as computing systems in PCs (Personal Computers) and servers) and resource-rich embedded systems (such as computing systems in smartphones and smart TVs), this type of system is characterized by limited hardware resources, the use of real-time operating systems, and a core objective of reliably and efficiently achieving specific control or processing tasks. Typical applications include IoT sensor nodes, wearable devices (such as smart bracelets), smart home modules (such as curtain motors), industrial controllers (such as PLCs (Programmable Logic Controllers)), and automotive electronic control units.
[0043] The following is combined Figure 1 The method for obtaining function performance parameters that can be used in lightweight embedded systems, as provided in the embodiments of this application, is described.
[0044] S1: Obtain the target file.
[0045] In one possible implementation, the target file can be obtained from a database or storage device. The target source code corresponding to the target file contains call instructions for callback functions used to count function call frequencies, thereby monitoring the call frequency of each function in the target source code. In this implementation, a pre-compiled and linked target file can be directly obtained, and during compilation, call instructions for callback functions used to count function call frequencies are added to the target source code.
[0046] The target file can be a compiled executable program or object code file that can run on a lightweight embedded system. By pre-inserting callback function call instructions into the target file, the frequency of function calls is counted in real time when the target file is actually run on the lightweight embedded system.
[0047] In one possible implementation, the process of obtaining the target file may include adding callback function call instructions at the entry point of each function in the target source code during the compilation process using a compiler that supports instrumentation, thereby obtaining the target file. For example, by utilizing the `-finstrument-functions` compilation option provided by GCC / G++, callback function call instructions can be automatically inserted at the entry or exit point of the target function during the compilation stage, thereby enabling the tracking and recording of function call frequency; furthermore, the scope of this compilation option can be flexibly configured according to the actual application scenario, possessing good customizability.
[0048] S2: During the execution of the target file, the call frequency of each function in the target file is counted to obtain the call frequency statistics, and the call frequency statistics are recorded in a hash table structure.
[0049] After obtaining the target file, it can be run on a platform or device running a real-time operating system. During the execution of the target file, the callback function will count the call frequency of each function in the target file, obtain call frequency statistics, and record the call frequency statistics in a hash table structure. The call frequency statistics include: function address (i.e., function pointer) and call frequency.
[0050] The hash table structure in this application can be stored in a fixed location in memory, such as a fixed location in Static Random Access Memory (SRAM), so that the method can be applied to the system debugging phase. Even if the system experiences an abnormal reset or crash during the debugging phase, the call frequency statistics can be obtained from the fixed location without losing the call frequency statistics.
[0051] To minimize the overhead of callback function runtime, this application constructs a lightweight, statically memory-allocated hash table structure based on linked list collision handling, which records and counts the execution count (i.e., call frequency) of each function in the target file in real time with extremely low memory and time overhead.
[0052] In one possible implementation, the hash table structure includes: a hash bucket array and a node pool array. In some possible implementations, the hash table structure also includes a node counter.
[0053] The number of hash buckets (tables) in the hash bucket array is greater than or equal to the number of nodes in the node pool array. The index in each hash bucket is used to point to the head node of the corresponding linked list in the node pool array. The hash bucket array is a pre-allocated static array that acts as the index or "bucket" of the linked list. Each hash bucket in the hash bucket array corresponds to a hash value, which can be mapped to a specific hash bucket by calculating the hash value of the function pointer. The index in each hash bucket is not the node data itself, but the index of the head node of the linked list. Initially, the index in each hash bucket is the initial value (such as -1). During the initialization phase, the index of all "buckets" needs to be initialized to -1, indicating that all "buckets" are empty, that is, they do not point to any node. If different function pointers are mapped to the same hash bucket after being calculated by the hash function (such as functions Func A and Func C being mapped to table[1]), it indicates that there is a hash collision.
[0054] In some implementations, the number of hash buckets in the hash bucket array is a prime number that is 1.25 times the number of nodes in the node pool array (i.e., the number of functions to be monitored). For example, if there are 10 functions to be monitored, the number of hash buckets is 13; if there are 12 functions to be monitored, the number of hash buckets is 17; and if there are 20 functions to be monitored, the number of hash buckets is 23. This maximizes the reduction of hash collisions.
[0055] The number of nodes in the node pool array corresponds to the number of functions to be monitored in the target file. For example, if 10 functions need to be monitored, there will be 10 nodes; similarly, if 12 functions need to be monitored, there will be 12 nodes. Each node in the node pool array includes a first field (e.g., `func`), a second field (e.g., `count`), and a third field (e.g., `next`). The first field records the address of the monitored function, the second field records the function's call frequency, and the third field indicates the index of the next node pointed to by the current node. The node pool array is a pre-allocated static array used to store all linked lists, where each linked list contains at least one node. Linked lists pointed to by indices in hash buckets with hash collisions (referred to as hash-collision linked lists) contain multiple nodes; linked lists without hash collisions may contain only one node.
[0056] Each node contains three fields, for example: func: Used to record the address (i.e., function pointer) of the monitored function, as a unique key; count: Used to record the number of times the function is called, that is, to record the frequency of function calls; next: When a hash collision occurs (i.e., when the hash values of two function pointers point to the same index of the hash bucket array, a collision node is formed), a linked list is used to manage the collision node. It acts as the next pointer in the linked list to point to the index of the next node. If its value is -1, it means that the linked list ends and there are no subsequent collision nodes.
[0057] This design approach, which separates the index (table) from the storage (nodes) and uses array indexes instead of address pointers, not only simplifies memory management but also effectively avoids memory fragmentation.
[0058] Node counter (node_count): An integer used to record the number of nodes currently in use in the node pool array.
[0059] In one possible implementation, the process of recording call frequency statistics in a hash table structure includes: determining the hash value of the function address of the called target function; finding the corresponding hash bucket based on the hash value; if the index in the hash bucket is not the initial value, finding the corresponding target linked list based on the index in the hash bucket; then traversing the nodes in the target linked list; if a node corresponding to the target function is found, updating the call frequency recorded in the second field of the traversed node, for example, incrementing the call frequency recorded in the second field of the traversed node by 1.
[0060] In one possible implementation, the process of recording call frequency statistics in a hash table structure further includes: if no node corresponding to the target function is found, a new node is added to the target linked list, and the values recorded in each field of the new node are updated, as well as the count value of the node counter is updated. When adding a node to the target linked list, it can be inserted at the head or tail. If a node is added at the tail (i.e., "tail insertion"), the index of the `next` field in the original tail node of the target linked list needs to be updated to the index of the newly added node to point to the newly added node. Simultaneously, the values recorded in each field of the new node are updated. For example, the content of the `func` field of the new node is updated to the function address of the target function, the content of the `count` field is updated to the call frequency of the target function (e.g., call frequency +1), the index of the `next` field is updated to -1, and the count value of the node counter is updated (e.g., node counter count value +1).
[0061] If a new node is added at the head (i.e., "head insertion"), the index in the corresponding hash bucket needs to be updated to the index of the newly added node. At the same time, the values recorded in each field of the newly added node need to be updated. For example, the content of the func field of the newly added node is updated to the function address of the target function, the content of the count field is updated to the call frequency of the target function, such as call frequency +1, the index of the next field is updated to the index of the original starting node (original head node) to point to the original starting node, and the count value of the node counter is updated, such as the count value of the node counter +1.
[0062] In one possible implementation, the process of recording call frequency statistics in a hash table structure further includes: if the index in the hash bucket is the initial value, allocating a new node for the target function, updating the index in the hash bucket to the index of the new node, and updating each field of the new node. For example, updating the content of the func field of the new node to the function address of the target function, updating the content of the count field to the call frequency of the target function (e.g., call frequency + 1), updating the index of the next field to -1; and updating the count value of the node counter (e.g., node counter count value + 1).
[0063] To better understand, an example will be provided below. The workflow of this hash table structure mainly includes two processes: initialization and updating the function call count.
[0064] Initialization: When the system starts, all records in the node pool array are cleared to zero, the index of each hash bucket in the hash bucket array is initialized to -1, and the node counter is set to zero, so that the hash table structure is in a ready state.
[0065] The function call count update includes the following processes: hash value calculation, index collision-free index, and index collision.
[0066] 1) Calculate the hash value: The hash value is calculated by performing a modulo operation on the function pointer (e.g., a 32-bit address) of the input target function using a hash value generation function. Empirically, the modulo value can be set to a prime number closest to 1.25 times the number of nodes in the node pool, thus quickly locating the corresponding hash bucket and minimizing hash collisions. When tracing a function call, the function pointer is first passed as an argument, and the hash value is obtained by calling the hash value generation function.
[0067] 2) No index collision (no hash collision): Based on the calculated hash value, search in the hash bucket array to find the corresponding hash bucket index. If the hash bucket index is -1, it means that the index does not point to any node, so insert the new node directly. a. Fill the new node with the new function pointer and the initial count value of 1; b. Update the next field of the new node to the original index of the current bucket, which is -1; c. Update the index of the current bucket in the hash bucket array to node_count, which is the index of the new node; d. Finally, increment the node counter (node_count) by 1.
[0068] To facilitate understanding, the following will be combined with... Figure 2 To explain, assuming the hash value of function 1 is 3, the index in table[3] is searched. Initially, the index in table[3] is -1, so a new node nodes[0] is allocated to it, and the index in table[3] is updated to node_count. At this time, node_count = 0. At the same time, the function pointer of function 1 (such as...) is... Figure 2 The function first fills the corresponding fields of nodes[0] with the initial count value 1 (0x4000) and the next field of nodes[0], updates the node counter node_count to -1, and finally increments the node counter node_count by 1. Afterwards, each call to function 1 will update the count in nodes[0]. Figure 2 The "count: 5" indicates that function 1 was called 5 times.
[0069] For example, assuming the hash value of function 2 is 4, then the index in table[4] is searched. Initially, the index in table[4] is -1, so a new node nodes[1] is allocated for it, and the index in table[4] is updated to node_count. At this time, node_count=1. At the same time, the function pointer of function 2 (such as...) is... Figure 2 The 0x5000) and the initial count value 1 are filled into the corresponding fields of nodes[1] and the next field of nodes[1] is updated to -1. Finally, the node counter node_count is incremented by 1. After that, each time function 2 is called, the count in nodes[1] will be updated. Figure 2 The "count: 3" indicates that function 2 was called 3 times.
[0070] For example, assuming the hash value of function 3 is 1, then the index in table[1] is searched. Initially, the index in table[1] is -1, so a new node nodes[2] is allocated for it, and the index in table[1] is updated to node_count. At this time, node_count = 2. At the same time, the function pointer of function 3 (such as...) is... Figure 2The 0x3000) and the initial count value 1 are filled into the corresponding fields of nodes[2] and the next field of nodes[2] is updated to -1. Finally, the node counter node_count is incremented by 1. After that, each time function 3 is called, the count in nodes[2] will be updated. Figure 2 The "count: 7" indicates that function 3 was called 7 times.
[0071] 3) Index Collision (Hash Collision): Based on the calculated hash value, a search is performed in the hash bucket array. If the found hash bucket index is not -1, it means that the index already points to a valid node. If different function pointers point to the same hash bucket after hash function calculation, a hash collision has occurred. For example, Figure 3 If the hash values of functions Func A, Func B, and Func C are all mapped to table[1], it indicates that a hash collision has occurred, and they need to be stored in the same linked list one by one. Taking the "head insertion method" as an example, the hash value of the first function Func A is mapped to table[1], and the frequency of Func A's call is recorded in the node pool array. The hash value of the function Func C called afterward is also mapped to table[1]. When handling hash collisions, Func C is inserted into the head of the linked list using the head insertion method, and its next node points to Func A. The hash value of the function Func B called afterward is also mapped to table[1]. When handling hash collisions, Func B is inserted into the head of the new linked list using the head insertion method, and its next node points to Func C.
[0072] in, Figure 3 The examples of functions Func D and Func E in the example are the aforementioned examples of no hash collision. The hash value of function Func D is mapped to table[2], and the call frequency of Func D is recorded in the node pool array; the hash value of function Func E is mapped to table[n-1], and the call frequency of Func D is recorded in the node pool array.
[0073] To better understand, let's combine the following... Figure 4The process of handling hash collisions in function Func A is explained. The hash value of the function pointer of Func A is calculated. Based on the hash value, the corresponding hash bucket is located, and the head index of the linked list in the hash bucket is obtained. Then, the linked list is traversed along the next index, comparing the func field of each node with the current function pointer. If a node with the same function pointer is found in the linked list, the count field value of that node is directly incremented by 1, and the operation is complete. If no node is found after traversing to the end of the linked list (i.e., next is -1), an insertion operation is performed. First, it is checked whether the node pool array is full, i.e., whether node_count is the maximum value. If it is full (i.e., node_count is the maximum value), the insertion can be abandoned to ensure system determinism. If it is not full, the new node is inserted using the "head insertion method". a. Fill the new node with the function pointer of function Func A and the initial count of 1; b. Update the next field of the new node to the current index of the current hash bucket; c. Update the current hash bucket index to node_count, which is the index of the new node; d. Finally, increment node_count by 1 to update the number of nodes that have been used.
[0074] In some possible implementations, the "head insertion method" is preferred when handling hash collisions. Since functions used frequently are usually located at the end of the overall runtime, while functions used earlier are mostly performing some initialization or basic functions and are used relatively infrequently, choosing the "head insertion method" can reduce the number of times the linked list is traversed, thereby improving efficiency.
[0075] The above process describes the complete decision-making process from receiving function pointer input to outputting statistical results. The specific steps are as follows: First, calculate the hash value of the function pointer. If no hash collision occurs, directly locate the target position and update the frequency count; if a hash collision occurs, proceed to the linked list traversal stage. When traversing the linked list, first perform node matching search: if a node with the same function pointer is found, directly update the call frequency of that node; if not found, insert a new node after checking the node pool capacity and update each field. The entire process ensures the accuracy of data updates. In terms of space, it only relies on a fixed-size hash bucket and node pool, with a space complexity of O(1); in terms of time, in the best case, the search and update can be completed in O(1) time.
[0076] In lightweight embedded systems, function call frequency is crucial for performance optimization, code quality analysis, and debugging. Tracking function call counts helps identify code hotspots, optimize repetitive calculations and redundant code, thereby improving system execution efficiency. In particular, in embedded architectures with multiple memory types (e.g., ordered by access speed in descending order: TCM > SRAM > PSRAM > NorFlash), statistical analysis based on function call frequency provides key information for memory layout optimization. For example, in a lightweight embedded system with multiple memory types, such as four memory types with different read / write speeds: TCM > SRAM > PSRAM > NorFlash, analyzing function call frequency statistics allows placing high-frequency functions in high-speed memory (TCM, SRAM) and low-frequency functions in low-speed memory (PSRAM, NorFlash). This maximizes overall system performance within limited memory resource constraints, achieving a balance between access latency and storage cost.
[0077] In some possible implementations, the object file described in this application may be a binary file compiled for at least two different speed memory regions, whose linking process has taken into account the access characteristics of different functions, thereby achieving an optimized deployment across memory partitions.
[0078] TCM (Tightly-Coupled Memory) is a high-speed memory that is directly connected to the processor core via a dedicated bus. It has extremely low and deterministic access latency and usually does not allow direct access by peripherals such as DMA (Direct Memory Access). It is often used to store critical code or data.
[0079] SRAM can retain data while powered on without needing to be refreshed, and has fast read and write speeds.
[0080] PSRAM (Pseudo Static Random Access Memory) internally uses DRAM (Dynamic Random Access Memory) structure, but its external interface and timing are similar to SRAM, and it integrates a self-refresh circuit, so no external controller is needed for refresh operations.
[0081] NorFlash (Not OR Flash) is a type of non-volatile memory that supports random access, allowing the CPU (Central Processing Unit) to directly read and execute code from it (i.e., XIP, eXecuteInPlace), but its write and erase speeds are typically slow.
[0082] In some possible implementations, the above method further includes: obtaining the linker map file (MAP file) corresponding to the target file; and generating a more detailed statistical report containing function names, function addresses, number of bytes occupied by the function, and call frequency based on the call frequency statistics recorded in the hash table structure and the linker map file. The MAP file is generated during the compilation and linking of the target file; it describes in detail the static layout of the entire executable program in memory and is a "map" connecting the source code and the final machine code. The linker map file includes information such as function addresses, function names, and number of bytes occupied by the function.
[0083] In some implementations, the schematic diagram of the above method is as follows: Figure 5 As shown. When it is necessary to optimize memory usage and analyze system performance, the call frequency statistics recorded in the hash table structure can be read. This data can be output to the debug interface in the format F:<function address>,<call frequency>, such as F:0x12009974,10658, where 0x12009974 represents the function address and 10658 represents the call frequency of 10658. Since the output format is typically F:<function address>, the statistical results of <call frequency> are not easily usable by developers. Therefore, this application also proposes a method for locating specific information about the call frequency of functions. This involves using a comparison script to compare the output results (i.e., call frequency statistics) with the compiled MAP file, generating a more detailed statistical report in .csv (Comma Separated Values) or .xml (eXtensible Markup Language) format. This report includes <function address>, <function name>, <number of bytes occupied by the function>, and <call frequency>, such as 0x12009974, device_open, 878, 10658, where device_open represents the function name and 878 represents the number of bytes occupied by the function. By analyzing the function call frequency statistics, in micro-embedded systems, high-frequency functions can be placed in high-speed memory (TCM, SRAM), while low-frequency functions can be placed in low-speed memory (PSRAM, NorFlash), significantly improving memory utilization efficiency and program performance.
[0084] In some implementation methods, the specific implementation scheme of the method for locating the specific information of the call frequency function is as follows: Figure 6As shown. For example, using a Python comparison script, the output of the original debugging interface (i.e., call frequency statistics) is compared with the link mapping file to generate a more detailed statistical report (in .csv or .xml format) containing function names, function addresses, number of bytes occupied by the function, and call frequency. It is understood that in some implementations, it may not be necessary to... Figure 6 Convert the CSV report to an XML report, and then output the CSV report directly.
[0085] like Figure 7 As shown in the figure, this application embodiment also provides a function performance parameter acquisition device 100, including: an acquisition module 110, a statistics module 120, and a processing module 130. The processing module 130, indicated by the dashed box, is an optional module.
[0086] The acquisition module 110 is used to acquire a target file, wherein the target source code corresponding to the target file contains a callback function call instruction for counting function call frequency, so as to monitor the call frequency of each function in the target source code.
[0087] The statistics module 120 is used to count the call frequency of each function in the target file during the execution of the target file, obtain the call frequency statistics, and record the call frequency statistics in a hash table structure.
[0088] Optionally, the acquisition module 110 is specifically used to add the call instruction of the callback function at the entry point of each function in the target source code during the compilation of the target source code using a compiler that supports instrumentation, so as to obtain the target file.
[0089] Optionally, the statistics module 120 is specifically used to determine the hash value of the function address of the called target function, and find the corresponding hash bucket based on the hash value; if the index in the hash bucket is not the initial value, then find the corresponding target linked list based on the index in the hash bucket; traverse the nodes in the target linked list; if a node corresponding to the target function is found, update the call frequency recorded in the second field of the traversed node. If no node corresponding to the target function is found, add a new node to the target linked list, and update the values recorded in each field of the new node, as well as update the count value of the node counter. If the index in the hash bucket is the initial value, allocate a new node to the target function, update the index in the hash bucket to the index of the new node, update each field of the new node, and update the count value of the node counter.
[0090] Optionally, the acquisition module 110 is further configured to acquire the link mapping file corresponding to the target file, wherein the link mapping file includes: function address, function name, and number of bytes occupied by the function.
[0091] The processing module 130 is used to generate a statistical report containing function name, function address, number of bytes occupied by function and call frequency based on the call frequency statistics recorded in the hash table structure and the link mapping file.
[0092] The function performance parameter acquisition device 100 provided in this application embodiment has the same implementation principle and technical effect as the aforementioned method embodiment. For the sake of brevity, any parts not mentioned in the device embodiment can be referred to the corresponding content in the aforementioned method embodiment. like Figure 8 As shown, Figure 8 This diagram illustrates a structural block diagram of an electronic device 200 provided in an embodiment of this application. The electronic device 200 includes: a transceiver 210, a memory 220, a communication bus 230, and a processor 240. The transceiver 210, memory 220, and processor 240 are electrically connected directly or indirectly to achieve data transmission or interaction. For example, these components can be electrically connected to each other through one or more communication buses 230 or signal lines. The transceiver 210 is used to send and receive data. The memory 220 is used to store computer programs, such as... Figure 7 The software functional module shown is the function performance parameter acquisition device 100. The function performance parameter acquisition device 100 includes at least one software functional module that can be stored as software or firmware in the memory 220 or embedded in the operating system (OS) of the electronic device 200. The processor 240 is used to execute the executable module stored in the memory 220; for example, the processor 240 is used to execute the above-described function performance parameter acquisition method.
[0093] The memory 220 may be, but is not limited to, random access memory (RAM), read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), etc.
[0094] Processor 240 may be an integrated circuit chip with signal processing capabilities. The aforementioned processor can be a general-purpose processor, including a central processing unit (CPU), network processor (NP), graphics processing unit (GPU), accelerated processing unit (AP), multimedia application processor (MAP), microprocessor, etc.; it can also be a digital signal processor (DSP), application-specific integrated circuit (ASIC), field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. It can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of this application. Alternatively, processor 240 can also be any conventional processor.
[0095] Among them, the aforementioned electronic devices 200 include, but are not limited to: mobile phones, tablets, personal computers (PCs), netbooks, personal digital assistants (PDAs), servers, and other devices.
[0096] This application embodiment also provides a non-volatile computer-readable storage medium (hereinafter referred to as the storage medium) storing a computer program, which is executed by a computer such as the electronic device 200 described above to perform the function performance parameter acquisition method described above.
[0097] This application also provides a computer program product, which includes a computer program. When the computer program is executed by a computer, it performs the function performance parameter acquisition method described above.
[0098] It should be noted that the various embodiments in this specification are described in a progressive manner, with each embodiment focusing on the differences from other embodiments. The same or similar parts between the various embodiments can be referred to each other.
[0099] In the several embodiments provided in this application, it should be understood that the disclosed apparatus and methods can also be implemented in other ways. The apparatus embodiments described above are merely illustrative. For example, the flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods, and computer program products according to various embodiments of this application. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions marked in the blocks may occur in a different order than those marked in the drawings. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in a block diagram and / or flowchart, and combinations of blocks in block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions.
[0100] In addition, the functional modules in the various embodiments of this application can be integrated together to form an independent part, or each module can exist independently, or two or more modules can be integrated to form an independent part.
[0101] If the aforementioned functions are implemented as software functional modules and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a computer-readable storage medium and includes several instructions to cause a computer device (which may be a personal computer, laptop, server, or electronic device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned computer-readable storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0102] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.
Claims
1. A method for obtaining function performance parameters, characterized in that, include: Obtain the target file, wherein the target source code corresponding to the target file contains call instructions for callback functions used to count function call frequencies, so as to monitor the call frequency of each function in the target source code; During the execution of the target file, the call frequency of each function in the target file is counted to obtain call frequency statistics, and the call frequency statistics are recorded in a hash table structure.
2. The method according to claim 1, characterized in that, Obtaining the target file includes: During the compilation of the target source code using a compiler that supports instrumentation, a callback function call instruction is added at the entry point of each function in the target source code to obtain the target file.
3. The method according to claim 1, characterized in that, The hash table structure includes: a hash bucket array, a node pool array, and a node counter; The number of hash buckets in the hash bucket array is greater than or equal to the number of nodes in the node pool array, and the index in each hash bucket is used to point to the head node of the corresponding linked list in the node pool array; The number of nodes in the node pool array is consistent with the number of functions that need to be monitored in the target file. Each node in the node pool array includes a first field, a second field, and a third field. The first field is used to record the address of the monitored function, the second field is used to record the function call frequency, and the third field is used to represent the index of the next node pointed to by the current node. The node counter is used to record the number of nodes that have been used in the node pool array.
4. The method according to claim 3, characterized in that, The call frequency statistics are recorded in a hash table structure, including: Determine the hash value of the function address of the target function being called, and find the corresponding hash bucket based on the hash value; If the index in the hash bucket is not the initial value, then the corresponding target linked list is found based on the index in the hash bucket; Traverse the nodes in the target linked list; If a node corresponding to the target function is encountered during traversal, the call frequency of the second field record of the traversed node is updated.
5. The method according to claim 4, characterized in that, The method further includes: If no node corresponding to the target function is found, a new node is added to the target linked list, and the values recorded in each field of the new node are updated, as well as the count value of the node counter is updated.
6. The method according to claim 5, characterized in that, Adding a node to the target linked list includes: adding a node to the target linked list using head insertion; the method further includes: Update the index in the hash bucket to the index of the newly added node.
7. The method according to claim 4, characterized in that, The method further includes: If the index in the hash bucket is the initial value, a new node is allocated to the objective function, and the index in the hash bucket is updated to the index of the new node. The various fields of the new node are also updated, as is the count value of the node counter.
8. The method according to any one of claims 1-7, characterized in that, The call frequency statistics include: function address and call frequency; the method further includes: Obtain the link mapping file corresponding to the target file, wherein the link mapping file includes: function address, function name, and number of bytes occupied by the function; Based on the call frequency statistics recorded in the hash table structure and the link mapping file, a statistical report is generated that includes function name, function address, number of bytes occupied by the function, and call frequency.
9. A device for obtaining function performance parameters, characterized in that, include: The acquisition module is used to acquire the target file, wherein the target source code corresponding to the target file contains a callback function call instruction for counting function call frequency, so as to monitor the call frequency of each function in the target source code; The statistics module is used to count the call frequency of each function in the target file during the execution of the target file, obtain the call frequency statistics, and record the call frequency statistics in a hash table structure.
10. An electronic device, characterized in that, include: A memory and a processor, wherein the processor is connected to the memory; The memory is used to store programs; The processor is configured to invoke a program stored in the memory to execute the method as described in any one of claims 1-8.
11. A computer-readable storage medium, characterized in that, It stores a computer program, which, when executed by a processor, performs the method as described in any one of claims 1-8.
12. A computer program product comprising a computer program that, when executed by a processor, implements the method of any one of claims 1-8.