A method and a device for testing a GPU-BOX system
A technology for system testing and testing indicators, applied in faulty hardware testing methods, detection of faulty computer hardware, error detection/correction, etc., can solve problems such as comprehensive and effective evaluation, inability to achieve GPU system performance, and single testing
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0043] Such as figure 1 As shown, the present invention provides a kind of method for GPU-BOX system test, comprising:
[0044] S1, configure the GPU-BOX system, check the GPU bandwidth, and obtain bandwidth information;
[0045] S2, check the number of GPUs and the version of Vbios, and obtain the test index parameters in the GPU-BOX system performance test process;
[0046] S3, connect the BOX to the server, pressurize the system on the server side, and test the overall power consumption of the GPU-BOX system.
Embodiment 2
[0048] Such as figure 2 As shown, in step S1, the GPU-BOX system is configured, the GPU bandwidth check is performed, and the bandwidth information obtained specifically includes:
[0049] S11, modify the / boot / grub.conf configuration file under the Red Hat system, and disable the graphical interface;
[0050] S12, use the loop lspci command to capture all the GPUs in the BOX and obtain bandwidth information;
[0051] S13, check whether the bandwidth information is x16, if the judgment result is yes, then execute step S14, if the judgment result is no, then execute step S15;
[0052] S14, proceed to the next step of performance testing;
[0053] S15, check the connection between the PICE slot and the GPU, and re-execute step S12 after confirming that its bandwidth is x16.
[0054] Wherein, in step S11, disabling the graphical interface is specifically modifying the kernel parameter "intel_iommu" from on to off, and the specific command is: "intel_iommu=on amd_iommu=on" is ch...
Embodiment 3
[0056] Such as image 3 As shown, in step S2, check the number of GPUs and the version of Vbios, and obtain the test index parameters in the GPU-BOX system performance test process specifically include:
[0057] S21, check the number of GPUs and the version of Vbios by looping the lspci command;
[0058] S22, obtain the test index parameters in the end-to-end test process in the GPU-BOX system performance test, and check whether the test index parameters meet the stress test standard in the performance test, if the judgment result is yes, then perform step S23; if the judgment result If no, execute step S24;
[0059] S23, continue to perform the follow-up stress test;
[0060] S24, indicating that there is a problem in the hardware system itself, and the test is ended.
[0061] Before step S21 is carried out, the test tool NVQual also needs to be installed, and the GPU-BOX system is tested end-to-end using the CUDA test tool in the test tool NVQual (referring to the GPUs in...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


