Kernel function calling method, device, equipment, storage medium and program product

By calling the target kernel function through the host-side management interface function, the problem of low testing efficiency of heterogeneous parallel library kernel functions is solved, enabling efficient kernel function development and debugging, and supporting rapid iteration and error avoidance.

CN117236423BActive Publication Date: 2026-06-16DAWNING INT INFORMATION IND CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
DAWNING INT INFORMATION IND CO LTD
Filing Date
2023-09-27
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Existing heterogeneous parallel libraries are inefficient during kernel function testing, requiring extensive modifications and recompilation of the entire library, resulting in time-consuming and laborious development and debugging.

Method used

By obtaining the problem description information, the target kernel function is called using the host-side management interface function, enabling the testing and calling of the kernel function without needing to be embedded in a heterogeneous parallel library and recompiled. This isolates the CPU-side and GPU-side code and uses a unified interface for management and debugging.

Benefits of technology

It improves kernel function testing efficiency, simplifies the development process, reduces debugging complexity, maintains the same or even better performance as heavyweight libraries, supports rapid development and iteration, and has error avoidance capabilities.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117236423B_ABST
    Figure CN117236423B_ABST
Patent Text Reader

Abstract

The application relates to a kernel function calling method, device, equipment, storage medium and program product. The method comprises the following steps: acquiring problem description information, wherein the problem description information comprises test problem description information; acquiring a target kernel function corresponding to the problem description information according to the problem description information, wherein the target kernel function comprises a host management interface function, and the target kernel function is determined according to the host management interface function; and calling the target kernel function according to the problem description information and the host management interface function, so as to respond to the problem description information. The method can improve the kernel function test efficiency and improve the long tail problem of a heterogeneous parallel library.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of high-performance computer technology, and in particular to a kernel function calling method, apparatus, device, storage medium, and program product. Background Technology

[0002] GPUs (graphics processing units) were the earliest heterogeneous parallel coprocessors. Nvidia customized a complete hardware and software ecosystem for GPUs, including the CUDA programming platform and various infrastructure libraries based on CUDA, such as CUBLAS for linear algebra operations and cuDNN for deep learning acceleration. AMD also almost perfectly replicated Nvidia's ecosystem, such as the HIP programming platform to match the CUDA programming platform, RobLAS to match CUBLAS, and MIOpen to match cuDNN.

[0003] However, all of the aforementioned heterogeneous parallel libraries suffer from the problem of long-tail development. Taking the miopen library as an example, if you want to test the performance and correctness of a kernel function that is about to be added to the miopen library, traditional techniques require extensive and intrusive modifications to the entire miopen library, embedding the kernel into the miopen library, then recompiling the miopen library, and finally using miopen performance and correctness testing tools to test the kernel function.

[0004] However, the above methods suffer from low kernel function testing efficiency. Therefore, improving kernel function testing efficiency and mitigating the long-tail problem in heterogeneous parallel libraries is of great significance. Summary of the Invention

[0005] Therefore, it is necessary to provide a kernel function calling method, apparatus, device, storage medium, and program product that can improve the efficiency of kernel function testing and alleviate the long-tail problem of heterogeneous parallel libraries, in order to address the above-mentioned technical problems.

[0006] Firstly, this application provides a kernel function calling method. The method includes:

[0007] Obtain problem description information, including test problem description information;

[0008] Based on the problem description information, obtain the target kernel function corresponding to the problem description information, wherein the target kernel function includes a host-side management interface function, and the target kernel function is determined based on the host-side management interface function;

[0009] Based on the problem description information and the host-side management interface function, the target kernel function is invoked in response to the problem description information.

[0010] In this embodiment, the target kernel function includes a host-side management interface function. By combining this host-side management interface function, the target kernel function can be called, thereby responding to the test problem description information to realize the test process of the target kernel function. It is not necessary to embed the target kernel function into a heterogeneous parallel library (such as the miopen library) and then recompile the entire miopen library for kernel function testing. In this embodiment, the target kernel function is similar to a plug-in, except that its actual business logic code runs on the GPU. During the test of calling the target kernel function, no code of any heterogeneous parallel library needs to be modified, saving a lot of code modification time, improving the testing efficiency of the target kernel function, and helping to improve the long-tail problem of heterogeneous parallel libraries.

[0011] In addition, because the CPU-side code (host-side management interface functions) and GPU-side code (GPU business logic function bodies) are isolated and template-based programming is not used, the debugging complexity is relatively low, whether debugging the CPU-side code or the GPU-side code.

[0012] The kernel functions (such as the target kernel function) written in this application embodiment do not need to be embedded in a heterogeneous parallel library (such as the miopen library) for kernel function testing. The ability to optimize each kernel function individually can be retained. Because the management and calling process of kernel functions in this application is more advanced, it can achieve the same excellent performance as heavy-duty libraries such as miopen or cuDNN, or even surpass these libraries.

[0013] In one embodiment, the host-side management interface function includes a function category interface function, which indicates the target function category corresponding to the target kernel function. The step of obtaining the target kernel function corresponding to the problem description information includes:

[0014] Based on the problem description information, determine the required functional category corresponding to the problem description information;

[0015] The target kernel function is obtained according to the required function category, wherein the required function category is the same as the target function category.

[0016] In this embodiment, the computer device implements multi-category management of multiple kernel functions based on the functional category interface functions of each kernel function. This enables rapid retrieval of kernel functions and assists developers in defining the category to which their developed kernel functions belong during the kernel function development phase I.

[0017] In one embodiment, obtaining the target kernel function according to the required function category includes:

[0018] Based on the required function category, at least one candidate kernel function corresponding to the required function category is found in a preset mapping table. The function category of the candidate kernel function is the same as the required function category. The mapping table includes the mapping relationship between each function category and each kernel function corresponding to each function category.

[0019] The target kernel function is obtained based on the at least one candidate kernel function.

[0020] In this embodiment, the computer device categorizes the loaded kernel functions according to their functional categories and stores the categorization results in the form of a mapping table. Thus, for a given functional category, the computer device can quickly query all available kernel functions in the corresponding problem domain with O(1) complexity. In contrast, traditional techniques require sequentially querying which kernel function is available in the current problem domain from heterogeneous parallel libraries, which is inefficient. This embodiment significantly improves the query efficiency of kernel functions, thereby enhancing the testing efficiency and response timeliness of kernel functions.

[0021] In one embodiment, the host-side management interface function further includes a GPU feature interface function, which indicates the target GPU features supported by the target kernel function. The step of calling the target kernel function based on the problem description information and the host-side management interface function includes:

[0022] Based on the problem description information and the host-side management interface function, the target kernel function is called through the GPU. The target GPU characteristics supported by the target kernel function match the GPU characteristics of the GPU, and the GPU characteristics of each kernel function in the mapping table match the GPU characteristics of the GPU.

[0023] Before classifying, caching, and mapping each kernel function, this embodiment first filters and judges each kernel function based on its GPU feature interface function. Only when the GPU features supported by the kernel function match the GPU features currently configured in the computer device is the kernel function retained, thereby improving the accuracy of classification, caching, and mapping, and increasing the success rate of kernel function calls.

[0024] In one embodiment, the host-side management interface function includes an actual parameter calculation interface function, and the step of calling the target kernel function based on the problem description information and the host-side management interface function includes:

[0025] The kernel function arguments corresponding to the problem description information are calculated using the actual parameter calculation interface function.

[0026] The executable file corresponding to the target kernel function is obtained based on the kernel function arguments.

[0027] The target kernel function is invoked based on the kernel function arguments, the executable file, and the host management interface function.

[0028] In this embodiment, during the process of the computer device obtaining the executable file corresponding to the target kernel function based on the kernel function arguments, for some kernel functions that do not depend on the specific problem domain for compilation, they can be pre-compiled during the initialization process. The computer device can then directly query the executable file corresponding to the target kernel function based on the relevant information. However, for cases where the compilation process must depend on the problem description information, the computer device will call the corresponding compilation instructions to compile the source code of the obtained target kernel function to obtain its executable binary code. This makes the process of obtaining the executable file corresponding to the target kernel function more flexible. Based on the kernel function arguments and the executable file, the target kernel function can be quickly called. In other words, in this embodiment, the kernel function developed by the developer can be quickly deployed for online verification. The developer does not need to understand other code details of the heterogeneous parallel library. Once verified, the kernel function can be quickly deployed for application.

[0029] In one embodiment, the host-side management interface function further includes a kernel function call interface function, which is used to indicate the data memory address corresponding to the target kernel function. The step of calling the target kernel function based on the kernel function arguments, the executable file, and the host-side management interface function includes:

[0030] The target kernel function is called via the GPU based on the kernel function arguments, the executable file, and the data memory address.

[0031] This embodiment isolates the CPU-side code (host-side management interface functions) and the GPU-side code (GPU business logic function body) through the aforementioned unified interface, thereby decoupling the CPU-side code and GPU-side code and enabling rapid development and iteration of kernel functions.

[0032] In addition, this application also facilitates error avoidance and recovery. For example, if testing is insufficient and vulnerabilities in certain kernel functions are not discovered during the testing phase, and vulnerabilities occur during operation after the kernel function is deployed, since the kernel function is implemented in a plug-in-like form through the aforementioned unified interface, the computer device can uninstall the kernel function without shutting down, or disable the kernel function in a specific problem domain and report the vulnerability. Users can also selectively remove the kernel function.

[0033] Secondly, this application also provides a kernel function calling apparatus. The apparatus includes:

[0034] The first acquisition module is used to acquire problem description information, which includes test problem description information;

[0035] The second acquisition module is used to acquire the target kernel function corresponding to the problem description information based on the problem description information, wherein the target kernel function includes a host-side management interface function, and the target kernel function is determined based on the host-side management interface function;

[0036] The calling module is used to call the target kernel function in response to the problem description information, based on the problem description information and the host-side management interface function.

[0037] Thirdly, this application also provides a computer device, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps of the method described in the first aspect above.

[0038] Fourthly, this application also provides a computer-readable storage medium. The computer-readable storage medium stores a computer program thereon, which, when executed by a processor, implements the steps of the method described in the first aspect above.

[0039] Fifthly, this application also provides a computer program product. The computer program product includes a computer program that, when executed by a processor, implements the steps of the method described in the first aspect above.

[0040] The aforementioned kernel function calling method, apparatus, device, storage medium, and program product obtain problem description information, including test problem description information. Then, based on the problem description information, they obtain the target kernel function corresponding to the problem description information. This target kernel function includes a host-side management interface function, which is determined based on the host-side management interface function. Then, based on the problem description information and the host-side management interface function, the target kernel function can be called to respond to the problem description information. Thus, in this embodiment, the target kernel function includes the host-side management interface function. Combining this host-side management interface function enables the calling of the target kernel function, thereby responding to the test problem description information to achieve the testing process of the target kernel function. This eliminates the need to embed the target kernel function into a heterogeneous parallel library (e.g., the miopen library) and recompile the entire miopen library for kernel function testing. In this embodiment, the target kernel function is similar to a plugin. During the testing process, no code modification of the heterogeneous parallel library is required, saving significant time spent on code modification, improving the testing efficiency of the target kernel function, and helping to address the long-tail problem of heterogeneous parallel libraries. Attached Figure Description

[0041] To more clearly illustrate the technical solutions in the embodiments or related technologies of this application, the accompanying drawings used in the description of the embodiments or related technologies will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0042] Figure 1 This is a diagram illustrating the application environment of a kernel function call method in one embodiment;

[0043] Figure 2 This is a flowchart illustrating the kernel function call method in one embodiment;

[0044] Figure 3 This is a flowchart illustrating step 202 in another embodiment;

[0045] Figure 4 This is an exemplary diagram illustrating the classification of kernel function functional categories in another embodiment;

[0046] Figure 5 This is a flowchart illustrating step 302 in another embodiment;

[0047] Figure 6 This is a flowchart illustrating step 203 in another embodiment;

[0048] Figure 7 This is a block diagram of a kernel function calling device in one embodiment;

[0049] Figure 8 This is an internal structural diagram of a computer device in one embodiment. Detailed Implementation

[0050] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.

[0051] GPU (graphics processing unit) was the earliest heterogeneous parallel coprocessor, which brought tremendous changes to the image display and processing industry and laid the foundation for the later wave of deep learning.

[0052] Nvidia has customized a complete hardware and software ecosystem for GPUs, including the CUDA programming platform and various infrastructure libraries based on CUDA, such as CUBLAS for linear algebra operations and cuDNN for deep learning acceleration. AMD has almost replicated Nvidia's ecosystem in a similar way, such as the HIP programming platform, which is comparable to the CUDA programming platform; Roblas, which is comparable to CUBLAS; and MIOpen, which is comparable to cuDNN.

[0053] Unlike AMD and Nvidia, its entire ecosystem is completely open source. During the research and development process, the inventors of this application discovered that these heterogeneous parallel libraries all have the same problem: the long tail problem in development.

[0054] Taking the miopen library as an example, if you want to test the performance and correctness of a kernel function that is about to be added to the miopen library, traditional techniques usually have two options: 1) You need to make extensive intrusive modifications to the entire miopen library, embed the kernel function into the miopen library, then recompile the miopen library (single-threaded compilation usually takes about an hour and a half), and then use the miopen performance and correctness testing tool to test the kernel function; 2) Manually write a simple test case for the kernel function. After the test case passes, use the method described in 1) to embed the test case into the miopen library, and use MIOpenDriver (i.e., the miopen performance and correctness testing tool) to perform a second retest of the kernel function.

[0055] However, regardless of which method is used, it is very time-consuming and labor-intensive for kernel function developers. Sometimes, in order to test a kernel function, the code of the miopen library needs to be modified repeatedly to succeed. Currently, in order to test a kernel function, it takes several weeks or even a month of debugging, which is a serious problem of low kernel function testing efficiency.

[0056] In view of this, the kernel function calling method provided in this application obtains problem description information, which includes test problem description information, and then obtains the target kernel function corresponding to the problem description information. The target kernel function includes a host-side management interface function, which is determined based on the host-side management interface function. Then, based on the problem description information and the host-side management interface function, the target kernel function can be called to respond to the problem description information. In this way, the target kernel function in this application includes the host-side management interface function. By combining the host-side management interface function, the target kernel function can be called, thereby responding to the test problem description information to realize the test process of the target kernel function. It is not necessary to embed the target kernel function into a heterogeneous parallel library (such as the miopen library) and then recompile the entire miopen library for kernel function testing. The target kernel function in this application is similar to a plug-in. During the test of calling the target kernel function, no code of the heterogeneous parallel library needs to be modified, saving a lot of code modification time, improving the testing efficiency of the target kernel function, and helping to improve the long-tail problem of heterogeneous parallel libraries.

[0057] The kernel function calling method provided in this application embodiment can be applied to, for example, Figure 1 In the implementation environment shown, the implementation environment includes a computer device that can obtain problem description information, which includes test problem description information. Based on the problem description information, the computer device obtains the target kernel function corresponding to the problem description information. The target kernel function includes a host-side management interface function, which is determined based on the host-side management interface function. Based on the problem description information and the host-side management interface function, the computer device calls the target kernel function to respond to the problem description information.

[0058] In this embodiment, the computer device can be a server, or other high-performance device equipped with a host and a GPU. For example, the host can be the CPU (Central Processing Unit) of the computer device, and the number of GPUs in the computer device can be one or more. Figure 1 (Only one GPU is shown as an example). For example, the following description of the kernel function calling method of this application embodiment takes the CPU as the host as an example. In this application embodiment, the CPU in the computer device can obtain problem description information, which includes test problem description information. The CPU obtains the target kernel function corresponding to the problem description information based on the problem description information. The target kernel function includes the host management interface function. The target kernel function is determined based on the host management interface function. The CPU calls the target kernel function through the GPU to respond to the problem description information based on the problem description information and the host management interface function.

[0059] In one exemplary embodiment, such as Figure 2 As shown, a kernel function calling method is provided, which can be applied to... Figure 1 The following steps, 201-203, will be used as an example to illustrate the process of using computer equipment as an example:

[0060] Step 201: The computer device obtains the problem description information.

[0061] The problem description information includes test problem description information.

[0062] After the kernel function is developed, it needs to be tested before going live. Taking the miopen library as an example, the kernel function written for the miopen library is intended to accelerate deep learning. In order to test the performance and correctness of the kernel function, the testing process needs to simulate the real usage process. First, the user needs to input a description of a specific problem domain, that is, the problem description information.

[0063] In the context of kernel function testing, the problem description information may include test problem description information. Of course, this application embodiment can also be used in the usage scenario after the kernel function is launched. In that case, the problem description information may also include the description information of the user's input on a specific problem domain in the usage scenario.

[0064] In this embodiment of the application, the problem description information is a description of a specific problem domain input by the user. The problem description information can be understood as the data basis for the user to perform data processing by calling kernel functions.

[0065] For example, the problem description information can represent that the convolution of the matrix needs to be calculated, or the problem description information can represent that the convolution of the matrix needs to be calculated, specifically the forward propagation convolution, and so on.

[0066] Step 202: The computer device obtains the target kernel function corresponding to the problem description information based on the problem description information.

[0067] The target kernel function includes the host-side management interface function, which is determined based on the host-side management interface function.

[0068] In this embodiment of the application, during the development phase of the kernel function, the developer abstracts the interface of the kernel function and divides the kernel function into host management interface function and GPU business logic function body.

[0069] Among them, the host-side management interface function is used by the computer device to manage and call the kernel function through the CPU. For example, the host-side management interface function is used to manage one or more of the following: the GPU features supported by the kernel function (such as the model of the supported GPU), the function category corresponding to the kernel function, the actual parameter calculation of the kernel function, and the data memory address when the kernel function is executed.

[0070] For example, based on the host-side management interface functions, it is also possible to manage the compilation, searching, optimization, and calling logic of kernel functions, and so on.

[0071] The GPU business logic function body is the function body in the kernel function used to implement business logic. The execution of the kernel function is usually implemented by the GPU.

[0072] Thus, when developers write kernel functions, in addition to writing the GPU business logic function body, they also need to write the host-side management interface function.

[0073] By using the above method, the business logic part of the kernel function can be isolated from other complex management and call logic. In this way, developers do not need to understand any code details of the heterogeneous parallel library during the kernel function development process. They only need to focus on the development of the kernel function logic code and can integrate their written kernel function into the heterogeneous parallel library.

[0074] In this embodiment of the application, during the initialization of the heterogeneous parallel library (such as the miopen library), the computer device can load all kernel functions that implement the aforementioned host management interface functions. In this way, after the computer device obtains the problem description information through step 201, in step 202, it can determine the target kernel function corresponding to the problem description information from all the kernel functions that have been loaded and implement the host management interface functions, using the host management interface functions of each kernel function.

[0075] It is understandable that the target kernel function also includes host-side management interface functions, and the determination of the target kernel function is combined with the implementation of its host-side management interface functions.

[0076] For example, after obtaining the problem description information, the computer device can match the problem description information with a kernel function that has the required functional category as the target kernel function based on the host management interface function of each kernel function; for example, it can also match the problem description information with a kernel function that supports the GPU characteristics of the GPU in the current computer device as the target kernel function, and so on.

[0077] Step 203: The computer device calls the target kernel function in response to the problem description information based on the problem description information and the host management interface function.

[0078] Once the computer device selects a suitable target kernel function based on the problem description information, it can call that target kernel function.

[0079] For example, as described above, the computer can obtain information such as the kernel function arguments and executable file of the target kernel function based on the host management interface function and problem description information of the target kernel function, and use the obtained information to execute the target kernel function on the GPU to realize the call of the target kernel function.

[0080] The above embodiments obtain problem description information, including test problem description information, and then obtain the target kernel function corresponding to the problem description information. The target kernel function includes a host-side management interface function, which is determined based on the host-side management interface function. Then, based on the problem description information and the host-side management interface function, the target kernel function can be called to respond to the problem description information. Thus, in this embodiment, the target kernel function includes the host-side management interface function, and combining this function enables the calling of the target kernel function, thereby responding to the test problem description information to achieve the testing process of the target kernel function. This eliminates the need to embed the target kernel function into a heterogeneous parallel library (such as the miopen library) and recompile the entire miopen library for kernel function testing. In this embodiment, the target kernel function is similar to a plugin, except that its actual business logic code runs on the GPU. During the testing process, no code modification of the heterogeneous parallel library is required, saving significant time spent on code modification, improving the testing efficiency of the target kernel function, and helping to address the long-tail problem of heterogeneous parallel libraries.

[0081] The following describes other beneficial effects that the kernel function calling method of the embodiments of this application can achieve.

[0082] In traditional technologies, Nvidia and AMD have also made attempts and efforts to address the long-tail problem in the development of heterogeneous parallel libraries cuDNN / MIOpen. Nvidia developed a new library called Cutlass, which fully utilizes the features of C++ generic programming and is written in a fully templated manner. It breaks down a Gemm (generic matrix multiplication) operation into a series of smaller operations, then writes templated code for these operations, and subsequently completes the Gemm operation by combining these template functions through hierarchical calls. AMD developed a library called Composable_kernel using the same principle to achieve the same functionality.

[0083] The cutlass and composable_kernel libraries mentioned above have solved the long-tail problem in development to some extent, but they still have at least the following drawbacks:

[0084] 1) Debugging difficulties. The use of fully templated code leads to debugging difficulties, which is an inherent problem of C++ generic code. Furthermore, kernel functions run on GPU hardware, making cutlass and composable_kernel even more difficult to debug.

[0085] 2) Performance issues. These two highly abstract template libraries mainly abstract and implement Gemm operations. For various operations in deep learning, such as convolution and deconvolution, they must first be converted into Gemm operations and then used to complete them. Therefore, the performance loss will only be greater.

[0086] Due to the aforementioned issues, the Cutlass and Composable_kernel libraries are currently mostly used in areas such as operator fusion, serving as supplements to the shortcomings of the cuDNN and MIOpen libraries.

[0087] In this embodiment, since the CPU-side code (host-side management interface function) and GPU-side code (GPU business logic function body) are isolated and template-based programming is not used, debugging either the CPU-side code or the GPU-side code is much simpler than debugging libraries such as cutlass and composal_kernel. In other words, the debugging complexity of this embodiment is low.

[0088] In addition, the kernel functions (such as the target kernel function) written in the embodiments of this application do not need to be embedded in heterogeneous parallel libraries (such as the miopen library) for kernel function testing. The ability to optimize each kernel function can be retained. Because the management and calling process of kernel functions in this application is more advanced, it can achieve the same excellent performance as heavy libraries such as miopen or cuDNN, or even surpass these libraries.

[0089] In one embodiment, based on Figure 2 The illustrated embodiment can be found in [reference]. Figure 3 This embodiment relates to the process by which a computer device obtains the target kernel function corresponding to the problem description information based on the problem description information. For example... Figure 3 As shown, in this embodiment, step 202 includes Figure 3 Steps 301 and 302 are shown below:

[0090] Step 301: The computer device determines the required functional category corresponding to the problem description information based on the problem description information.

[0091] Computer devices convert user-input problem description information into categories with specific labels, namely, required function categories. These required function categories refer to the function categories that the user needs kernel functions to implement. For example, the problem description information may represent the need to calculate the convolution of a matrix, specifically the calculation of a forward propagation convolution with preset hyperparameters. The required function category may be the forward propagation convolution under the convolution operation.

[0092] As one implementation method, computer devices can perform character recognition, keyword extraction, and other processing on the problem description information to obtain the required functional category corresponding to the problem description information.

[0093] Step 302: The computer device obtains the target kernel function according to the required function category.

[0094] The required function category is the same as the target function category.

[0095] In this embodiment, the host-side management interface function includes a function category interface function, which is used to indicate the target function category corresponding to the target kernel function.

[0096] As mentioned above, during the initialization of a heterogeneous parallel library (such as the miopen library), a computer device can load all kernel functions that implement the aforementioned host-side management interface functions. Each kernel function's host-side management interface function includes a function category interface function, which is used to indicate the function category corresponding to the kernel function.

[0097] Function category interface functions are primarily used to help computer devices classify, cache, and schedule kernel functions according to their function categories. For example, one kernel function might be classified as calculating matrix convolutions and can only perform forward propagation convolutions with hyperparameter padding of 0x0 and stride of 1x1, while another kernel function might be classified as performing Gemm operations. Therefore, during kernel function development, developers specify the function category of the kernel function in their written kernel functions using the function category interface functions.

[0098] In this embodiment of the application, the function category interface function of the kernel function indicates that the function category of the kernel function may include a major category and a subcategory under the major category. For example, if the function category of a certain kernel function is to calculate the convolution of a matrix and can only calculate the forward propagation convolution with hyperparameter padding = 0x0 and stride = 1x1, then its major category is to calculate the convolution of a matrix and its subcategory is to calculate the forward propagation convolution with hyperparameter padding = 0x0 and stride = 1x1.

[0099] In this way, for different kernel functions, the computer device can categorize kernel functions with the same major category and the same subcategory into the same functional category. The functional category interface function in this application essentially defines a multi-label category system. For different kernel functions, they will only be classified into the same category if all functional category labels are the same.

[0100] For example, see Figure 4 , Figure 4 This is an exemplary diagram illustrating the classification of kernel function functional categories.

[0101] like Figure 4 As shown, the functional categories of the kernel functions loaded by the computer device include major categories A, B, C, D, E, and F. For major category A, the subcategories include A1-A8; for major category B, the subcategories include B1 and B2. If a kernel function is classified as subcategory A1 under major category A, then the functional category of that kernel function is A1.

[0102] It should be noted that a kernel function may belong to multiple functional categories. For example, kernel function A belongs to functional categories (A2, B1, D1, E5), kernel function B belongs to functional categories (A2, B2, C4, D1, E5), and so on.

[0103] The computer equipment categorizes the loaded kernel functions according to their functional categories in the manner described above.

[0104] In this way, in subsequent kernel function testing scenarios and usage scenarios after the kernel function is deployed, the computer device can obtain the target kernel function from the results obtained by the above classification based on the required function category, where the required function category and the target function category are the same. For example, please combine... Figure 4 If the required functional category is A1, then the target kernel function is the kernel function of the subcategory A1 under the major category A.

[0105] In this embodiment, the computer device implements multi-category management of multiple kernel functions based on the functional category interface functions of each kernel function. This enables rapid retrieval of kernel functions and assists developers in defining the category to which their developed kernel functions belong during the kernel function development phase I.

[0106] In one possible implementation of step 302, see [link to step 302]. Figure 5 Step 302 includes Figure 5 Steps 501 and 502 are shown below:

[0107] Step 501: The computer device searches for at least one candidate kernel function corresponding to the required function category in a preset mapping table according to the required function category.

[0108] The functional categories of the candidate kernel functions are the same as the required functional categories. The mapping table includes the mapping relationship between each functional category and the corresponding kernel functions.

[0109] In this embodiment, after the computer device classifies the loaded kernel functions according to their functional categories in the manner described above, the classification results can be stored in the form of a mapping table. The mapping table can classify the kernel functions according to their functional categories. As mentioned above, the computer device can group kernel functions with the same major category and the same minor category into the same functional category.

[0110] Understandably, for a fixed kernel function, it only calls the interface function of that function category during the initialization process to establish the mapping table, and does not need to call this interface a second time.

[0111] In this way, the computer device can directly look up at least one candidate kernel function corresponding to the required function category in the mapping table, for example, by combining... Figure 4 The required functional category is a subcategory A1 under the main category A. All kernel functions in the mapping table that are classified under A1 are candidate kernel functions.

[0112] Step 502: The computer device obtains the target kernel function based on at least one candidate kernel function.

[0113] The number of candidate kernel functions can be one or more. If the number of candidate kernel functions is one, the candidate kernel function is directly used as the target kernel function.

[0114] When there are multiple candidate kernel functions, the computer device can quickly find the candidate kernel function with the best performance among these multiple candidate kernel functions according to the preset optimization logic, and take the candidate kernel function with the best performance as the target kernel function.

[0115] In this embodiment, the computer device categorizes the loaded kernel functions according to their functional categories as described above, and stores the categorization results in the form of a mapping table. Thus, for a given functional category, the computer device can quickly query all available kernel functions in the corresponding problem domain with O(1) complexity. In contrast, traditional techniques require sequentially querying which kernel function is available in the current problem domain from heterogeneous parallel libraries, which is inefficient. This embodiment significantly improves the query efficiency of kernel functions, thereby enhancing the testing efficiency and response timeliness of kernel functions.

[0116] In one embodiment, based on Figure 5In the embodiment shown, the host-side management interface function further includes a GPU feature interface function. The GPU feature interface function is used to indicate the target GPU features supported by the target kernel function. In this embodiment, step 203 may include the following step A1:

[0117] Step A1: The computer device calls the target kernel function through the GPU based on the problem description information and the host management interface function. The target GPU characteristics supported by the target kernel function match the GPU characteristics of the GPU, and the GPU characteristics of each kernel function in the mapping table match the GPU characteristics of the GPU.

[0118] In this embodiment, during the development of kernel functions, developers also need to write GPU feature interface functions. These GPU feature interface functions represent the GPU features supported by the kernel function. For example, a kernel function may only support MI100 (a model of AMD GPU), which needs to be specified in the GPU feature interface function of that kernel function.

[0119] In this way, during the initialization process of the computer device, after loading all the kernel functions that implement the above-mentioned host management interface functions, the GPU feature interface functions of each kernel function can be used to filter and judge each kernel function, check whether each kernel function supports the GPU currently configured in the computer device, and retain the kernel functions whose supported GPU features match the GPU features currently configured in the computer device.

[0120] Next, for the kernel functions that have been verified, the computer device will further call its functional category interface functions to establish the mapping table.

[0121] Before classifying, caching, and mapping each kernel function, this embodiment first filters and judges each kernel function based on its GPU feature interface function. Only when the GPU features supported by the kernel function match the GPU features currently configured in the computer device is the kernel function retained, thereby improving the accuracy of classification, caching, and mapping, and increasing the success rate of kernel function calls.

[0122] In one embodiment, in Figure 2 Based on the illustrated embodiment, see also Figure 6 In this embodiment, the host-side management interface function includes a parameter calculation interface function. This embodiment describes the process by which the computer device calls the target kernel function based on the problem description information and the host-side management interface function. For example... Figure 6 As shown, step 203 may include Figure 6 Steps 601-603 are shown below:

[0123] Step 601: The computer device uses the actual parameter calculation interface function to calculate the kernel function actual parameters corresponding to the problem description information.

[0124] In this embodiment, during the development of the kernel function, the developer also needs to write an actual parameter calculation interface function, which is used to calculate the corresponding kernel function actual parameters based on the problem description information.

[0125] Understandably, the problem description information input by the user needs to be converted into parameters that the kernel function can recognize. These parameters are called kernel function arguments. The computer device can call the argument calculation interface function of the target kernel function to calculate the kernel function arguments corresponding to the problem description information.

[0126] Step 602: The computer device obtains the executable file corresponding to the target kernel function based on the kernel function arguments.

[0127] After obtaining the kernel function arguments, the computer device can look up the corresponding executable file for the target kernel function based on the kernel function arguments, the function name of the target kernel function, and other information. That is, for some kernel functions that do not depend on a specific problem domain for compilation, they can be pre-compiled during the initialization process described above. Here, the computer device directly looks up the corresponding executable file for the target kernel function based on the relevant information.

[0128] In other possible implementations, if the compilation process of the target kernel function must rely on the problem description information, the computer device obtains the source code of the target kernel function and calls the corresponding compilation instructions to compile the obtained source code of the target kernel function to obtain its executable binary code, that is, to obtain an executable file for caching to avoid subsequent recompilation.

[0129] It should be noted that for the same problem description information, this parameter calculation interface function is called only once.

[0130] Step 603: The computer device calls the target kernel function based on the kernel function arguments, the executable file, and the host management interface function.

[0131] Computer devices can load kernel function arguments and executable files onto the GPU to prepare for subsequent calls to the target kernel function.

[0132] In this embodiment, during the process of the computer device obtaining the executable file corresponding to the target kernel function based on the kernel function arguments, for some kernel functions that do not depend on the specific problem domain for compilation, they can be pre-compiled during the initialization process. The computer device can then directly query the executable file corresponding to the target kernel function based on the relevant information. However, for cases where the compilation process must depend on the problem description information, the computer device will call the corresponding compilation instructions to compile the source code of the obtained target kernel function to obtain its executable binary code. This makes the process of obtaining the executable file corresponding to the target kernel function more flexible. Based on the kernel function arguments and the executable file, the target kernel function can be quickly called. In other words, in this embodiment, the kernel function developed by the developer can be quickly deployed for online verification. The developer does not need to understand other code details of the heterogeneous parallel library. Once verified, the kernel function can be quickly deployed for application.

[0133] In one embodiment, based on Figure 6 In the embodiment shown, the host management interface function further includes a kernel function call interface function, which is used to indicate the data memory address corresponding to the target kernel function.

[0134] The computer device can call the target kernel function through the GPU based on the kernel function arguments, the executable file, and the data memory address to achieve the process in step 603.

[0135] In this embodiment, during the development of the kernel function, the developer also needs to write a kernel function call interface function. The kernel function call interface function is used to indicate the data memory address corresponding to the kernel function. The data memory address is the actual data memory address involved in the operation. In this way, after the computer device loads the kernel function arguments, executable file and data memory address into the GPU, the GPU executes the target kernel function according to the kernel function arguments, executable file and data memory address, thus realizing the call of the target kernel function.

[0136] This embodiment isolates the CPU-side code (host-side management interface functions) and the GPU-side code (GPU business logic function body) through the aforementioned unified interface, thereby decoupling the CPU-side code and GPU-side code and enabling rapid development and iteration of kernel functions.

[0137] In addition, this application also facilitates error avoidance and recovery. For example, if testing is insufficient and vulnerabilities in certain kernel functions are not discovered during the testing phase, and vulnerabilities occur during operation after the kernel function is deployed, since the kernel function is implemented in a plug-in-like form through the aforementioned unified interface, the computer device can uninstall the kernel function without shutting down, or disable the kernel function in a specific problem domain and report the vulnerability. Users can also selectively remove the kernel function.

[0138] In one embodiment, a kernel function call method is provided for a computer device, the method comprising:

[0139] Step a, obtain the problem description information, which includes the test problem description information.

[0140] Step b: Based on the problem description information, determine the required functional category corresponding to the problem description information.

[0141] Step c: Based on the required function category, find at least one candidate kernel function corresponding to the required function category in the preset mapping table.

[0142] The mapping table includes the mapping relationship between each functional category and the corresponding kernel function.

[0143] During the mapping table creation phase, i.e. the initialization process of the heterogeneous parallel library, the computer device obtains all kernel functions that have written host-side management interface functions. The host-side management interface functions include function category interface functions, GPU feature interface functions, actual parameter calculation interface functions, and kernel function call interface functions. The function category interface functions are used to indicate the function category corresponding to the kernel function, the GPU feature interface functions are used to indicate the GPU features supported by the kernel function, and the kernel function call interface functions are used to indicate the data memory address corresponding to the kernel function.

[0144] The computer device filters each kernel function based on the GPU feature interface function, and establishes a mapping table that retains the kernel functions that match the supported GPU features and the GPU features in the computer device.

[0145] In the mapping table, the functional categories of the candidate kernel functions are the same as the required functional categories.

[0146] Step d: Obtain the target kernel function based on at least one candidate kernel function.

[0147] Step e: Calculate the kernel function parameters corresponding to the problem description information using the interface function of the target kernel function's actual parameters.

[0148] Step f: Obtain the executable file corresponding to the target kernel function based on the kernel function arguments.

[0149] Step g: Based on the kernel function arguments, the executable file, and the data memory address, the target kernel function is called via the GPU.

[0150] It should be understood that although the steps in the flowcharts of the embodiments described above are shown sequentially according to the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the embodiments described above may include multiple steps or multiple stages. These steps or stages are not necessarily completed at the same time, but can be executed at different times. The execution order of these steps or stages is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the steps or stages of other steps.

[0151] Based on the same inventive concept, this application also provides a kernel function calling apparatus for implementing the kernel function calling method described above. The solution provided by this apparatus is similar to the implementation scheme described in the above method; therefore, the specific limitations in one or more kernel function calling apparatus embodiments provided below can be found in the limitations of the kernel function calling method described above, and will not be repeated here.

[0152] In one exemplary embodiment, such as Figure 7 As shown, a kernel function calling device is provided, comprising:

[0153] The first acquisition module 701 is used to acquire problem description information, the problem description information including test problem description information;

[0154] The second acquisition module 702 is used to acquire the target kernel function corresponding to the problem description information according to the problem description information, wherein the target kernel function includes a host management interface function, and the target kernel function is determined based on the host management interface function;

[0155] The calling module 703 is used to call the target kernel function in response to the problem description information, based on the problem description information and the host-side management interface function.

[0156] In one embodiment, the host-side management interface function includes a function category interface function, which is used to indicate the target function category corresponding to the target kernel function. The second acquisition module 702 includes:

[0157] The determining unit is used to determine the required functional category corresponding to the problem description information based on the problem description information;

[0158] The acquisition unit is used to acquire the target kernel function according to the required function category, wherein the required function category is the same as the target function category.

[0159] In one embodiment, the acquisition unit is specifically used to find at least one candidate kernel function corresponding to the required function category in a preset mapping table according to the required function category, wherein the function category of the candidate kernel function is the same as the required function category, and the mapping table includes the mapping relationship between each function category and each kernel function corresponding to each function category; and to acquire the target kernel function according to the at least one candidate kernel function.

[0160] In one embodiment, the host management interface function further includes a GPU feature interface function, which is used to indicate the target GPU features supported by the target kernel function. The calling module 703 is specifically used to call the target kernel function through the GPU according to the problem description information and the host management interface function, wherein the target GPU features supported by the target kernel function match the GPU features of the GPU, and the GPU features of each kernel function in the mapping table match the GPU features of the GPU.

[0161] In one embodiment, the host management interface function includes an actual parameter calculation interface function, and the calling module 703 is specifically used to calculate the kernel function actual parameters corresponding to the problem description information using the actual parameter calculation interface function; obtain the executable file corresponding to the target kernel function according to the kernel function actual parameters; and call the target kernel function according to the kernel function actual parameters, the executable file, and the host management interface function.

[0162] In one embodiment, the host management interface function further includes a kernel function call interface function, which is used to indicate the data memory address corresponding to the target kernel function. The calling module 703 is specifically used to call the target kernel function through the GPU according to the kernel function parameters, the executable file and the data memory address.

[0163] Each module in the aforementioned kernel function calling device can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in or independent of the processor in a computer device, or stored in the memory of a computer device in software form, so that the processor can call and execute the operations corresponding to each module.

[0164] In one exemplary embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as follows: Figure 8As shown, this computer device includes a processor, memory, input / output (I / O) interfaces, and a communication interface. The processor, memory, and I / O interfaces are connected via a system bus, and the communication interface is also connected to the system bus via the I / O interfaces. The processor provides computational and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system, computer programs, and a database. The internal memory provides the environment for the operating system and computer programs stored in the non-volatile storage media. The database stores kernel function call data. The I / O interfaces are used for exchanging information between the processor and external devices. The communication interface is used for communicating with external terminals via a network connection. When the computer program is executed by the processor, it implements a kernel function call method.

[0165] Those skilled in the art will understand that Figure 8 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer device to which the present application is applied. Specific computer devices may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.

[0166] In one exemplary embodiment, a computer device is provided, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to perform the following steps:

[0167] Obtain problem description information, including test problem description information;

[0168] Based on the problem description information, obtain the target kernel function corresponding to the problem description information, wherein the target kernel function includes a host-side management interface function, and the target kernel function is determined based on the host-side management interface function;

[0169] Based on the problem description information and the host-side management interface function, the target kernel function is invoked in response to the problem description information.

[0170] In one embodiment, the host-side management interface function includes a function category interface function, which is used to indicate the target function category corresponding to the target kernel function. When the processor executes the computer program, it specifically implements the following steps:

[0171] Based on the problem description information, determine the required functional category corresponding to the problem description information;

[0172] The target kernel function is obtained according to the required function category, wherein the required function category is the same as the target function category.

[0173] In one embodiment, the processor specifically implements the following steps when executing a computer program:

[0174] Based on the required function category, at least one candidate kernel function corresponding to the required function category is found in a preset mapping table. The function category of the candidate kernel function is the same as the required function category. The mapping table includes the mapping relationship between each function category and each kernel function corresponding to each function category.

[0175] The target kernel function is obtained based on the at least one candidate kernel function.

[0176] In one embodiment, the host management interface function further includes a GPU feature interface function, which is used to indicate the target GPU features supported by the target kernel function. When the processor executes the computer program, it specifically implements the following steps:

[0177] Based on the problem description information and the host-side management interface function, the target kernel function is called through the GPU. The target GPU characteristics supported by the target kernel function match the GPU characteristics of the GPU, and the GPU characteristics of each kernel function in the mapping table match the GPU characteristics of the GPU.

[0178] In one embodiment, the host-side management interface function includes an actual parameter calculation interface function, and the processor implements the following steps when executing the computer program:

[0179] The kernel function arguments corresponding to the problem description information are calculated using the actual parameter calculation interface function.

[0180] The executable file corresponding to the target kernel function is obtained based on the kernel function arguments.

[0181] The target kernel function is invoked based on the kernel function arguments, the executable file, and the host management interface function.

[0182] In one embodiment, the host-side management interface function further includes a kernel function call interface function, which is used to indicate the data memory address corresponding to the target kernel function. When the processor executes the computer program, it specifically implements the following steps:

[0183] The target kernel function is called via the GPU based on the kernel function arguments, the executable file, and the data memory address.

[0184] In one embodiment, a computer-readable storage medium is provided having a computer program stored thereon, the computer program performing the following steps when executed by a processor:

[0185] Obtain problem description information, including test problem description information;

[0186] Based on the problem description information, obtain the target kernel function corresponding to the problem description information, wherein the target kernel function includes a host-side management interface function, and the target kernel function is determined based on the host-side management interface function;

[0187] Based on the problem description information and the host-side management interface function, the target kernel function is invoked in response to the problem description information.

[0188] In one embodiment, the host-side management interface function includes a function category interface function, which is used to indicate the target function category corresponding to the target kernel function. When the computer program is executed by the processor, it specifically implements the following steps:

[0189] Based on the problem description information, determine the required functional category corresponding to the problem description information;

[0190] The target kernel function is obtained according to the required function category, wherein the required function category is the same as the target function category.

[0191] In one embodiment, when the computer program is executed by the processor, it specifically implements the following steps:

[0192] Based on the required function category, at least one candidate kernel function corresponding to the required function category is found in a preset mapping table. The function category of the candidate kernel function is the same as the required function category. The mapping table includes the mapping relationship between each function category and each kernel function corresponding to each function category.

[0193] The target kernel function is obtained based on the at least one candidate kernel function.

[0194] In one embodiment, the host-side management interface function further includes a GPU feature interface function, which is used to indicate the target GPU features supported by the target kernel function. When the computer program is executed by the processor, it specifically implements the following steps:

[0195] Based on the problem description information and the host-side management interface function, the target kernel function is called through the GPU. The target GPU characteristics supported by the target kernel function match the GPU characteristics of the GPU, and the GPU characteristics of each kernel function in the mapping table match the GPU characteristics of the GPU.

[0196] In one embodiment, the host-side management interface function includes an actual parameter calculation interface function, which, when executed by a processor, specifically implements the following steps:

[0197] The kernel function arguments corresponding to the problem description information are calculated using the actual parameter calculation interface function.

[0198] The executable file corresponding to the target kernel function is obtained based on the kernel function arguments.

[0199] The target kernel function is invoked based on the kernel function arguments, the executable file, and the host management interface function.

[0200] In one embodiment, the host-side management interface function further includes a kernel function call interface function, which is used to indicate the data memory address corresponding to the target kernel function. When the computer program is executed by the processor, it specifically implements the following steps:

[0201] The target kernel function is called via the GPU based on the kernel function arguments, the executable file, and the data memory address.

[0202] In one embodiment, a computer program product is provided, including a computer program that, when executed by a processor, performs the following steps:

[0203] Obtain problem description information, including test problem description information;

[0204] Based on the problem description information, obtain the target kernel function corresponding to the problem description information, wherein the target kernel function includes a host-side management interface function, and the target kernel function is determined based on the host-side management interface function;

[0205] Based on the problem description information and the host-side management interface function, the target kernel function is invoked in response to the problem description information.

[0206] In one embodiment, the host-side management interface function includes a function category interface function, which is used to indicate the target function category corresponding to the target kernel function. When the computer program is executed by the processor, it specifically implements the following steps:

[0207] Based on the problem description information, determine the required functional category corresponding to the problem description information;

[0208] The target kernel function is obtained according to the required function category, wherein the required function category is the same as the target function category.

[0209] In one embodiment, when the computer program is executed by the processor, it specifically implements the following steps:

[0210] Based on the required function category, at least one candidate kernel function corresponding to the required function category is found in a preset mapping table. The function category of the candidate kernel function is the same as the required function category. The mapping table includes the mapping relationship between each function category and each kernel function corresponding to each function category.

[0211] The target kernel function is obtained based on the at least one candidate kernel function.

[0212] In one embodiment, the host-side management interface function further includes a GPU feature interface function, which is used to indicate the target GPU features supported by the target kernel function. When the computer program is executed by the processor, it specifically implements the following steps:

[0213] Based on the problem description information and the host-side management interface function, the target kernel function is called through the GPU. The target GPU characteristics supported by the target kernel function match the GPU characteristics of the GPU, and the GPU characteristics of each kernel function in the mapping table match the GPU characteristics of the GPU.

[0214] In one embodiment, the host-side management interface function includes an actual parameter calculation interface function, which, when executed by a processor, specifically implements the following steps:

[0215] The kernel function arguments corresponding to the problem description information are calculated using the actual parameter calculation interface function.

[0216] The executable file corresponding to the target kernel function is obtained based on the kernel function arguments.

[0217] The target kernel function is invoked based on the kernel function arguments, the executable file, and the host management interface function.

[0218] In one embodiment, the host-side management interface function further includes a kernel function call interface function, which is used to indicate the data memory address corresponding to the target kernel function. When the computer program is executed by the processor, it specifically implements the following steps:

[0219] The target kernel function is called via the GPU based on the kernel function arguments, the executable file, and the data memory address.

[0220] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties, and the collection, use and processing of the relevant data must comply with relevant regulations.

[0221] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. Any references to memory, databases, or other media used in the embodiments provided in this application can include at least one of non-volatile and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive random access memory (ReRAM), magnetic random access memory (MRAM), ferroelectric random access memory (FRAM), phase change memory (PCM), graphene memory, etc. Volatile memory can include random access memory (RAM) or external cache memory, etc. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM). The databases involved in the embodiments provided in this application may include at least one type of relational database and non-relational database. Non-relational databases may include, but are not limited to, blockchain-based distributed databases. The processors involved in the embodiments provided in this application may be general-purpose processors, central processing units, graphics processing units, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, etc., and are not limited to these.

[0222] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

[0223] The embodiments described above are merely illustrative of several implementation methods of this application, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of this patent application. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this application should be determined by the appended claims.

Claims

1. A kernel function calling method, characterized in that, The method includes: Obtain problem description information, including test problem description information; Based on the problem description information, the target kernel function corresponding to the problem description information is obtained. The target kernel function includes a host management interface function, which includes an actual parameter calculation interface function and a kernel function call interface function. The kernel function call interface function is used to indicate the data memory address corresponding to the target kernel function. The target kernel function is determined based on the host management interface function. Based on the problem description information and the host-side management interface function, the target kernel function is invoked in response to the problem description information; the invocation of the target kernel function based on the problem description information and the host-side management interface function includes: The kernel function arguments corresponding to the problem description information are calculated using the actual parameter calculation interface function. The executable file corresponding to the target kernel function is obtained based on the kernel function arguments. The target kernel function is called via the GPU based on the kernel function arguments, the executable file, and the data memory address.

2. The method according to claim 1, characterized in that, The host-side management interface function includes a function category interface function, which indicates the target function category corresponding to the target kernel function. The step of obtaining the target kernel function corresponding to the problem description information includes: Based on the problem description information, determine the required functional category corresponding to the problem description information; The target kernel function is obtained according to the required function category, wherein the required function category is the same as the target function category.

3. The method according to claim 2, characterized in that, The step of obtaining the target kernel function according to the required functional category includes: Based on the required function category, at least one candidate kernel function corresponding to the required function category is found in a preset mapping table. The function category of the candidate kernel function is the same as the required function category. The mapping table includes the mapping relationship between each function category and each kernel function corresponding to each function category. The target kernel function is obtained based on the at least one candidate kernel function.

4. The method according to claim 3, characterized in that, The host-side management interface function also includes a GPU feature interface function, which indicates the target GPU features supported by the target kernel function. The step of calling the target kernel function based on the problem description information and the host-side management interface function includes: Based on the problem description information and the host-side management interface function, the target kernel function is called through the GPU. The target GPU characteristics supported by the target kernel function match the GPU characteristics of the GPU, and the GPU characteristics of each kernel function in the mapping table match the GPU characteristics of the GPU.

5. The method according to claim 1, characterized in that, The problem description information is used to characterize the convolution of the matrix that needs to be computed.

6. The method according to claim 1, characterized in that, The host-side management interface function is used to manage and call core functions through the CPU.

7. A kernel function calling device, characterized in that, The device includes: The first acquisition module is used to acquire problem description information, which includes test problem description information; The second acquisition module is used to acquire the target kernel function corresponding to the problem description information according to the problem description information. The target kernel function includes a host management interface function, which includes an actual parameter calculation interface function and a kernel function call interface function. The kernel function call interface function is used to indicate the data memory address corresponding to the target kernel function. The target kernel function is determined according to the host management interface function. The calling module is used to call the target kernel function based on the problem description information and the host-side management interface function to respond to the problem description information; the calling module is specifically used for: The kernel function arguments corresponding to the problem description information are calculated using the actual parameter calculation interface function. The executable file corresponding to the target kernel function is obtained based on the kernel function arguments. The target kernel function is called via the GPU based on the kernel function arguments, the executable file, and the data memory address.

8. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the steps of the method according to any one of claims 1 to 6.

9. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 6.

10. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 6.