A GPU memory optimization method and system

By analyzing task load patterns and Huffman coding to compress data using convolutional neural networks, and dynamically adjusting memory pool configuration, the problems of insufficient memory allocation and lack of intelligent scheduling in graphics processor memory management are solved, achieving efficient memory resource management and improved computing efficiency.

CN121996403BActive Publication Date: 2026-06-30HUNAN INST OF INFORMATION TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HUNAN INST OF INFORMATION TECH
Filing Date
2025-11-06
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing graphics processor memory management methods suffer from insufficient memory allocation and lack of intelligent scheduling for data compression in dynamic task environments, leading to memory fragmentation and resource waste, and failing to effectively balance memory usage and computational efficiency.

Method used

By analyzing task load patterns through convolutional neural networks, dynamically adjusting memory pool configuration, using Huffman coding to compress intermediate data with low access frequency, and reallocating memory resources according to task priority, adaptive memory management is achieved.

Benefits of technology

It significantly improves memory utilization, reduces access latency, is suitable for high-dynamic and high-concurrency task load scenarios, and improves computing efficiency.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121996403B_ABST
    Figure CN121996403B_ABST
Patent Text Reader

Abstract

This invention discloses a GPU memory optimization method and system. By acquiring dynamic data flow information and intermediate data generation patterns of the task load, the real-time variation characteristics of the task load are obtained. Based on these characteristics, a gradient descent-based optimization algorithm is used to determine the memory pool allocation ratio and reserved space, resulting in an adjusted memory pool configuration. Available memory block information is extracted from the adjusted configuration, and a Huffman coding algorithm is used to compress low-access-frequency intermediate data, resulting in compressed intermediate data storage units. If the access frequency of the compressed intermediate data storage units is lower than a preset threshold, the compressed intermediate data storage units are temporarily stored in a low-speed storage area, and memory resources are reallocated according to task priority, resulting in an optimized memory allocation scheme. This invention significantly improves memory utilization, reduces access latency, and achieves efficient memory resource management.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of graphics processor technology, and in particular discloses a GPU memory optimization method and system. Background Technology

[0002] With the rapid development of artificial intelligence and deep learning, graphics processing units (GPUs) are playing an increasingly crucial role in high-performance computing. The memory management efficiency of GPUs directly affects the performance and resource utilization of computing tasks. Efficient memory management is not only fundamental to improving computing speed but also an important guarantee for promoting large-scale model training and inference. However, existing memory management methods have significant shortcomings in terms of dynamism and adaptability.

[0003] Many solutions rely on static allocation or simple memory reclamation mechanisms, which struggle to cope with the rapid changes in memory requirements in complex tasks, leading to memory fragmentation or over-allocation and thus wasting resources. Furthermore, existing methods often neglect differences in data characteristics and task priorities when processing intermediate data, failing to effectively balance memory usage and computational efficiency. The core challenge lies in how to achieve efficient memory allocation and data compression in dynamic task environments while ensuring the continuity of the computation process.

[0004] First, the lack of dynamism in memory allocation makes it difficult for the system to adjust the memory pool size in real time according to the task load. For example, when training a large neural network, intermediate data generated by certain layers may consume a large amount of memory in a short period of time, but existing methods cannot flexibly adjust memory allocation according to the task stage, resulting in low memory usage efficiency.

[0005] Secondly, the compression and management of intermediate data lacks intelligent scheduling. A large amount of intermediate data may not be needed temporarily during computation, but existing methods fail to effectively identify and temporarily store this data, resulting in memory resource consumption and limiting the parallel execution of more tasks. For example, in image processing tasks, some intermediate feature maps may not be immediately accessible in subsequent calculations, but due to the lack of intelligent scheduling, they continue to occupy valuable memory.

[0006] Therefore, designing a mechanism that can achieve efficient memory utilization and computational continuity through adaptive memory allocation, data compression, and intelligent scheduling in dynamic task environments has become a key issue in improving graphics processor performance. Summary of the Invention

[0007] This invention provides a GPU memory optimization method and system, aiming to solve at least one defect in the prior art.

[0008] One aspect of the present invention relates to a GPU memory optimization method, comprising the following steps:

[0009] The system acquires dynamic data stream information and intermediate data generation patterns of task load, extracts memory demand peaks and data access frequency from task execution sequences, and classifies task load patterns using a pre-defined convolutional neural network model to obtain real-time change characteristics of task load.

[0010] Based on the real-time changes in task load, the frequency of memory allocation requests and releases is analyzed. An optimization algorithm based on gradient descent is used to determine the allocation ratio and reserved space of the memory pool, resulting in the adjusted memory pool configuration.

[0011] The available memory block information is extracted from the adjusted memory pool configuration. Data compression priorities are generated for intermediate data. The Huffman coding algorithm is used to compress the intermediate data with low access frequency to obtain the compressed intermediate data storage unit.

[0012] If the access frequency of the compressed intermediate data storage unit is lower than the preset threshold, the compressed intermediate data storage unit will be temporarily stored in the low-speed storage area, and memory resources will be reallocated according to task priority through the intelligent scheduling algorithm to obtain an optimized memory allocation scheme.

[0013] Furthermore, the steps of obtaining dynamic data flow information and intermediate data generation patterns of task load, extracting peak memory demand and data access frequency from the task execution sequence, and classifying task load patterns using a pre-defined convolutional neural network model to obtain real-time changing characteristics of task load include:

[0014] Dynamic data stream information is obtained from the task sequence, and time series analysis tools are used to process the dynamic data stream information to extract intermediate data of task load and obtain the time distribution of task execution and data generation pattern.

[0015] Based on the time distribution of task execution and the data generation pattern, statistical analysis tools are used to calculate the peak memory demand and data access frequency. If the peak memory demand exceeds the preset threshold, it is marked as high resource consumption, thus obtaining the resource consumption characteristics of the task load.

[0016] The data access frequency and memory demand peak in the resource usage characteristics are extracted by using a convolutional neural network. The extracted features are then classified into patterns to determine the task load category.

[0017] If the task load category is high-frequency changing, then real-time monitoring tools are used to continuously track the dynamic data stream, obtain real-time change characteristics, and obtain a dynamic behavior description of the task load.

[0018] Furthermore, in the steps of obtaining dynamic data stream information from the task sequence, processing the dynamic data stream information using time series analysis tools, extracting intermediate data of the task load, and obtaining the time distribution and data generation patterns of task execution, the dynamic data stream information is derived using the following formula:

[0019]

[0020] in, Indicates time Dynamic data stream information, This represents the total number of tasks in the task sequence. Indicates the first The weighting coefficients of each task. Indicates the first Each task at time The execution status value, Indicates the time decay parameter. Indicates the first The start time of each task;

[0021] Intermediate task load data is derived using the following formula:

[0022]

[0023] in, Indicates the first Intermediate task load data within a time window This indicates the number of data sampling points within the time window. Indicates the first Resource utilization rate of each sampling point Indicates the first The computational complexity of each sampling point Indicates the first Memory usage per sampling point Indicates the first Bandwidth consumption per sampling point;

[0024] The data generation pattern is derived from the following formula:

[0025]

[0026] in, Prediction functions that represent the patterns in data generation. Time variables representing time series These parameters represent the stages of task execution. The amplitude coefficient representing periodic fluctuations. This represents the frequency parameter of data generation. This indicates the phase offset. This represents the coefficient of the exponentially decaying term. Indicates the decay rate. This indicates the reference offset.

[0027] Furthermore, based on the time distribution of task execution and data generation patterns, statistical analysis tools are used to calculate the peak memory demand and data access frequency. If the peak memory demand exceeds a preset threshold, it is marked as high resource consumption. In the step of obtaining the resource consumption characteristics of the task load, the peak memory demand is derived using the following formula:

[0028]

[0029] in, This indicates the peak memory requirement. Indicates the total time for task execution. Indicates the total number of tasks. Indicates the first Each task at time Memory usage Indicates the first Each task at time The activity status indicator is 1 when the task is active and 0 otherwise;

[0030] Data access frequency is calculated using the following formula:

[0031]

[0032] in, Indicates the frequency of data access. Indicates the statistical time window. This indicates the total number of data access events. Indicates the first The amount of data in each access event Indicates the first The duration of the second access event;

[0033] High resource usage is marked using the following formula:

[0034]

[0035] in, Indicates resource usage characteristic markers, Indicates the current memory usage. This indicates the preset memory threshold. This represents the resource usage determination coefficient. If the ratio of the current memory usage to the preset memory threshold is greater than or equal to the resource usage determination coefficient, it is marked as 1 to indicate high resource usage; otherwise, it is marked as 0 to indicate normal resource usage.

[0036] Furthermore, in the step of extracting features from data access frequency and memory demand peaks in resource usage characteristics using a convolutional neural network, and then performing pattern classification on the extracted features to determine the task load category, the task load category is derived using the following formula:

[0037]

[0038] in, This indicates the final determined task load category. Indicates the candidate category, This represents the extracted comprehensive feature vector. Indicates belonging to a category under given features The probability, Indicate category The corresponding weight vector, Indicates the total number of categories. Indicates all categories Summing the numerators.

[0039] Furthermore, if the task load type is characterized by high-frequency changes, then in the step of continuously tracking the dynamic data stream using real-time monitoring tools to obtain real-time change characteristics and derive a dynamic behavior description of the task load, the dynamic behavior description of the task load is derived using the following formula:

[0040]

[0041] in, Indicates the first The dynamic behavior description value of each task. Indicates the total length of the observation period. Indicates time The load variation coefficient, Represents the variability weight parameter. Indicates time The response characteristic value, This represents the mean of the response characteristics. This represents the standard deviation of the response characteristics.

[0042] Furthermore, based on the real-time changes in task load, the frequency of memory allocation requests and releases is analyzed. A gradient descent-based optimization algorithm is used to determine the memory pool allocation ratio and reserved space, resulting in the adjusted memory pool configuration. The steps include:

[0043] The time series data of memory allocation requests and the periodic data of release frequency are obtained from the real-time change characteristics of task load. The frequency distribution of memory allocation requests is calculated using time series analysis tools. The periodic characteristics of release frequency are extracted by combining Fourier transform, and the dynamic behavior pattern of memory allocation requests and release frequency is obtained.

[0044] Based on the dynamic behavior pattern of the release frequency, a gradient descent-based optimization algorithm is used to iteratively optimize the objective function to determine the initial configuration parameters of the memory pool. The objective function is derived using the following formula:

[0045]

[0046] in, Indicates the distribution ratio. Indicates the proportion of reserved space. This represents the average frequency of memory allocation requests. The periodic intensity representing the release frequency. and These are the weighting coefficients;

[0047] By continuously tracking the frequency changes of memory allocation requests and the periodic fluctuations of release frequency through real-time monitoring tools, the real-time usage status data of the memory pool is obtained. If the memory usage rate in the real-time usage status data exceeds the preset threshold, it is marked as a high-load state, and the dynamic adjustment needs of the memory pool configuration are determined.

[0048] To address the need for dynamic adjustments, a preset threshold comparison method is used to adjust the allocation ratio and reserved space of the memory pool. If the duration of high load exceeds the preset threshold, the reserved space ratio is increased and the allocation ratio is decreased to obtain the adjusted memory pool configuration.

[0049] Furthermore, the steps of extracting available memory block information from the adjusted memory pool configuration, generating data compression priorities for intermediate data, and compressing low-access-frequency intermediate data using the Huffman coding algorithm to obtain compressed intermediate data storage units include:

[0050] Extract memory block information from the adjusted memory pool configuration, scan the available memory blocks in the memory pool using a memory management tool, obtain the size and allocation status of each memory block, and generate a list of memory block information containing the size and status of memory blocks.

[0051] Based on the memory block information list and data access patterns, the access frequency of intermediate data is analyzed using statistical tools. For intermediate data with access frequencies below a preset threshold, a priority sorting algorithm is used to generate a data compression priority list.

[0052] For intermediate data in the data compression priority list, Huffman coding algorithm is used for compression processing. By constructing a Huffman tree to generate a coding table, the intermediate data is converted into compressed data according to the coding table to obtain a compressed data set.

[0053] Based on the compressed data set and storage space allocation requirements, the compressed data set is allocated to compressed storage units using storage management tools. If the occupancy rate of the compressed storage unit exceeds a preset threshold, the storage space allocation ratio is adjusted to obtain the compressed storage unit configuration.

[0054] Furthermore, if the access frequency of the compressed intermediate data storage unit is lower than a preset threshold, the compressed intermediate data storage unit is temporarily stored in a low-speed storage area. The steps of obtaining an optimized memory allocation scheme by reallocating memory resources according to task priority through an intelligent scheduling algorithm include:

[0055] Access frequency data is obtained from the compressed intermediate data storage unit. The access frequency of each storage unit is analyzed using statistical tools. If the access frequency is lower than a preset threshold, it is marked as low-frequency data, and a low-frequency data set is obtained.

[0056] Based on the low-frequency data set, a data migration tool is used to transfer the storage units marked as low-frequency data to the low-speed storage area, generating a storage allocation record and obtaining the low-speed storage allocation record.

[0057] Based on the low-speed storage allocation record and task priority list, memory resources are reallocated through an intelligent scheduling algorithm, prioritizing the allocation of memory required by high-priority tasks, and a preliminary memory allocation scheme is obtained.

[0058] For the initial memory allocation scheme, memory usage is detected using memory management tools. If memory usage exceeds a preset threshold, the resource ratio of the low-speed storage area is adjusted to obtain an optimized memory allocation scheme.

[0059] Another aspect of the present invention relates to a GPU memory optimization system for performing the above-described GPU memory optimization method, the GPU memory optimization system comprising:

[0060] The real-time change feature acquisition module is used to acquire dynamic data flow information and intermediate data generation patterns of task load, extract memory demand peaks and data access frequency from task execution sequence, and classify task load patterns using a preset convolutional neural network model to obtain real-time change features of task load.

[0061] The memory pool configuration acquisition module is used to analyze memory allocation requests and release frequencies based on the real-time changes in task load, and use a gradient descent-based optimization algorithm to determine the allocation ratio and reserved space of the memory pool, thereby obtaining the adjusted memory pool configuration.

[0062] The data storage unit acquisition module is used to extract available memory block information from the adjusted memory pool configuration, generate data compression priorities for intermediate data, and use the Huffman coding algorithm to compress intermediate data with low access frequency to obtain compressed intermediate data storage units.

[0063] The memory allocation scheme acquisition module is used to temporarily store the compressed intermediate data storage unit in a low-speed storage area if the access frequency of the compressed intermediate data storage unit is lower than a preset threshold, and then reallocate memory resources according to task priority through an intelligent scheduling algorithm to obtain an optimized memory allocation scheme.

[0064] The beneficial effects achieved by this invention are as follows:

[0065] This invention provides a GPU memory optimization method and system, addressing the problems of low memory allocation efficiency and data access latency caused by dynamic changes in task load. It proposes a solution integrating data flow analysis, memory allocation optimization, and data compression. This invention classifies task load patterns using convolutional neural networks, extracting peak memory demand and data access frequency to accurately capture real-time changing characteristics. It optimizes memory pool configuration based on gradient descent algorithm, dynamically adjusting allocation ratios and reserved space. For infrequently accessed intermediate data, Huffman coding is used for compression and temporary storage in a low-speed storage area. A smart scheduling algorithm then reallocates resources based on task priority. This invention significantly improves memory utilization, reduces access latency, and achieves efficient memory resource management, making it particularly suitable for high-dynamic, high-concurrency task load scenarios. Attached Figure Description

[0066] Figure 1 This is a flowchart illustrating an embodiment of a GPU memory optimization method according to the present invention;

[0067] Figure 2 This is a functional block diagram of an embodiment of a GPU memory optimization system according to the present invention.

[0068] Explanation of icon numbers:

[0069] 10. Real-time change feature acquisition module; 20. Memory pool configuration acquisition module; 30. Data storage unit acquisition module; 40. Memory allocation scheme acquisition module. Detailed Implementation

[0070] To better understand the above technical solutions, the following will provide a detailed explanation of the technical solutions in conjunction with the accompanying drawings and specific implementation methods.

[0071] like Figure 1 As shown, the first embodiment of the present invention proposes a GPU memory optimization method, including the following steps:

[0072] Step S100: Obtain dynamic data stream information and intermediate data generation patterns of task load, extract memory demand peaks and data access frequencies from task execution sequences, and classify task load patterns using a preset convolutional neural network model to obtain real-time change characteristics of task load.

[0073] Real-time load variation characteristics refer to a set of multi-dimensional quantitative features reflecting the dynamic evolution of task load over time. These features are derived from dynamic data stream information (such as data input rate and type distribution) and intermediate data generation patterns (such as generation frequency and lifecycle) collected during task execution. The analysis extracts the temporal fluctuations of peak memory demand and hotspot migration patterns of data access frequency. After classifying load patterns (such as compute-intensive / data-intensive, periodic / burst) using a pre-defined Convolutional Neural Network (CNN) model, these features provide a basis for real-time decision-making regarding dynamic resource scheduling and memory optimization. Real-time load variation characteristics characterize the state change trends and inherent patterns of the load during execution through real-time indicators at three levels: data characteristics, resource requirements, and pattern categories.

[0074] Step S200: Based on the real-time changes in task load, analyze memory allocation requests and release frequencies, and use a gradient descent-based optimization algorithm to determine the allocation ratio and reserved space of the memory pool, thereby obtaining the adjusted memory pool configuration.

[0075] The adjusted memory pool configuration refers to a memory resource management scheme that uses real-time changes in task load characteristics (such as memory allocation request frequency, release frequency, peak memory demand timing, data access hotspots, etc.) as core input. It dynamically optimizes the allocation ratio of memory blocks of different sizes (such as the proportion of small / medium / large blocks) and the size of emergency reserve space within the memory pool using a gradient descent-based optimization algorithm. The final result is a memory resource management scheme that combines high utilization and low allocation latency. The adjusted memory pool configuration adapts to dynamic load changes in real time through the gradient descent-based optimization algorithm, minimizing memory waste and allocation overhead while meeting task memory requirements, thus achieving efficient dynamic scheduling of memory resources.

[0076] Step S300: Extract available memory block information from the adjusted memory pool configuration, generate data compression priorities for intermediate data, and use the Huffman coding algorithm to compress intermediate data with low access frequency to obtain compressed intermediate data storage units.

[0077] The compressed intermediate data storage unit refers to a structured storage area within the memory pool, defined based on available memory block information extracted from the adjusted memory pool configuration, combined with the generation patterns and access frequencies of intermediate data (such as low-frequency, long-lifetime but not immediately used intermediate data). This data is compressed using the Huffman coding algorithm. The compressed intermediate data storage unit employs a process of "priority sorting - targeted compression - adapted memory block storage" to store low-value intermediate data in compressed form. This reduces memory usage while preserving data recoverability, freeing up memory resources for high-priority data and achieving efficient utilization of the memory pool.

[0078] Step S400: If the access frequency of the compressed intermediate data storage unit is lower than the preset threshold, the compressed intermediate data storage unit is temporarily stored in the low-speed storage area, and memory resources are reallocated according to task priority through the intelligent scheduling algorithm to obtain an optimized memory allocation scheme.

[0079] An optimized memory allocation scheme refers to a memory management strategy that, when the access frequency of compressed intermediate data storage units is lower than a preset threshold, uses an intelligent scheduling algorithm to temporarily store the compressed intermediate data storage units in low-speed storage areas (such as hard drives or external storage) to release memory resources. Then, based on task priorities (such as real-time requirements and importance), the memory pool resource allocation ratio is dynamically adjusted, ultimately forming a memory management strategy that balances the memory needs of high-priority tasks with overall resource utilization. Its core is to achieve precise allocation of memory resources to high-value tasks through a closed-loop process of "low-value data migration - memory resource release - priority-oriented reallocation," reducing unnecessary usage and improving system operating efficiency.

[0080] Furthermore, in the GPU memory optimization method provided in this embodiment, step S100 includes:

[0081] Step S110: Obtain dynamic data stream information from the task sequence, process the dynamic data stream information using time series analysis tools, extract intermediate data of task load, and obtain the time distribution and data generation pattern of task execution.

[0082] Dynamic data stream information is derived using the following formula:

[0083] (1)

[0084] In formula (1), Indicates time Dynamic data stream information, This represents the total number of tasks in the task sequence. Indicates the first The weighting coefficients of each task. Indicates the first Each task at time The execution status value, Indicates the time decay parameter. Indicates the first The start time of each task.

[0085] Intermediate task load data is derived using the following formula:

[0086] (2)

[0087] In formula (2), Indicates the first Intermediate task load data within a time window This indicates the number of data sampling points within the time window. Indicates the first Resource utilization rate of each sampling point Indicates the first The computational complexity of each sampling point Indicates the first Memory usage per sampling point Indicates the first Bandwidth consumption per sampling point.

[0088] The data generation pattern is derived from the following formula:

[0089] (3)

[0090] In formula (3), Prediction functions that represent the patterns in data generation. Time variables representing time series These parameters represent the stages of task execution. The amplitude coefficient representing periodic fluctuations. This represents the frequency parameter of data generation. This indicates the phase offset. This represents the coefficient of the exponentially decaying term. Indicates the decay rate. This indicates the reference offset.

[0091] In a distributed task scheduling system, the dynamic data flow information of task sequences typically originates from real-time task logs across multiple nodes. Assuming a data center processes 100,000 tasks daily, the real-time task logs record task start times, end times, and data generation volumes. Time series analysis tools such as the ARIMA model are used to process the data in the real-time task logs and extract intermediate data on the task load. For example, analysis of the real-time task logs reveals that daily task execution time is concentrated between 9:00 AM and 11:00 AM, with data generation peaking at approximately 500 MB per second during this period.

[0092] Step S120: Based on the time distribution of task execution and the data generation pattern, use statistical analysis tools to calculate the peak memory demand and data access frequency. If the peak memory demand exceeds the preset threshold, it is marked as high resource consumption, thus obtaining the resource consumption characteristics of the task load.

[0093] Peak memory requirements are calculated using the following formula:

[0094] (4)

[0095] In formula (4), This indicates the peak memory requirement. Indicates the total time for task execution. Indicates the total number of tasks. Indicates the first Each task at time Memory usage Indicates the first Each task at time The activity status indicator is 1 when the task is active and 0 otherwise.

[0096] Data access frequency is calculated using the following formula:

[0097] (5)

[0098] In formula (5), Indicates the frequency of data access. Indicates the statistical time window. This indicates the total number of data access events. Indicates the first The amount of data in each access event Indicates the first The duration of each access event.

[0099] High resource usage is marked using the following formula:

[0100] (6)

[0101] In formula (6), Indicates resource usage characteristic markers, Indicates the current memory usage. This indicates the preset memory threshold. This represents the resource usage determination coefficient. If the ratio of the current memory usage to the preset memory threshold is greater than or equal to the resource usage determination coefficient, it is marked as 1 to indicate high resource usage; otherwise, it is marked as 0 to indicate normal resource usage.

[0102] Statistical analysis tools were used to calculate peak memory requirements and data access frequency. Assuming a peak memory requirement of 8GB and a preset threshold of 6GB, exceeding this threshold was marked as high resource consumption, indicating that the task load placed significant pressure on system resources during peak periods. Data access frequency analysis showed that the task accessed the database approximately 200 times per second during peak periods, indicating high-frequency data interaction characteristics.

[0103] Step S130: Extract features from the data access frequency and memory demand peak in the resource usage features using a convolutional neural network, perform pattern classification on the extracted features, and determine the task load category.

[0104] Task load categories are derived using the following formula:

[0105] (7)

[0106] In formula (7), This indicates the final determined task load category. Indicates the candidate category, This represents the extracted comprehensive feature vector. Indicates belonging to a category under given features The probability, Indicate category The corresponding weight vector, Indicates the total number of categories. Indicates all categories Summing the numerators.

[0107] Resource consumption characteristics are extracted using a Convolutional Neural Network (CNN). Peak memory requirements and data access frequency are taken as input to generate feature vectors. For example, a CNN model extracts high-frequency access patterns through convolutional layers, revealing periodic fluctuations in data access within a specific timeframe. Based on these extracted features, a pattern classifier categorizes task load into high-frequency and low-frequency variations.

[0108] Step S140: If the task load category is high-frequency change, then use real-time monitoring tools to continuously track the dynamic data stream, obtain real-time change characteristics, and obtain a dynamic behavior description of the task load.

[0109] The dynamic behavior of the task load is described by the following formula:

[0110] (8)

[0111] In formula (8), Indicates the first The dynamic behavior description value of each task. Indicates the total length of the observation period. Indicates time The load variation coefficient, Represents the variability weight parameter. Indicates time The response characteristic value, This represents the mean of the response characteristics. This represents the standard deviation of the response characteristics.

[0112] Assuming the classification results show high-frequency changes, real-time monitoring tools such as Prometheus are used to continuously track the data stream and obtain real-time change characteristics. For example, monitoring may reveal that the amount of task data generated fluctuates by ±20% during peak periods, indicating that the task load is dynamic and resource allocation needs to be adjusted in real time.

[0113] Dynamic behavior descriptions are generated by analyzing real-time monitoring data. For example, monitoring data might show that a task gradually stabilizes after a peak period, with data generation decreasing to 100MB per second and memory requirements dropping to 4GB. Dynamic behavior descriptions help optimize resource scheduling, prioritizing the allocation of more computing resources to frequently changing tasks, thereby improving system stability.

[0114] Preferably, in the GPU memory optimization method provided in this embodiment, step S200 includes:

[0115] Step S210: Obtain time-series data of memory allocation requests and periodic data of release frequency from the real-time change characteristics of task load. Use time-series analysis tools to calculate the frequency distribution of memory allocation requests and combine Fourier transform to extract the periodic characteristics of release frequency to obtain the dynamic behavior pattern of memory allocation requests and release frequency.

[0116] The frequency distribution of memory allocation requests is derived using the following formula:

[0117] (9)

[0118] In formula (9), A function representing the frequency distribution of memory allocation requests. Indicates the length of the observation time window. Indicates at time Dirac function at , Indicates the first The size of each memory allocation request is used by formula (9) to calculate the frequency distribution characteristics of memory allocation requests in the time series.

[0119] The periodicity of the release frequency is derived from the following formula:

[0120] (10)

[0121] In formula (10), The power spectral density represents the release frequency. Represents angular frequency. A time-series signal representing the frequency of memory release. Indicates the duration of the signal. Represents the imaginary unit. Represents a time variable. The kernel function of the Fourier transform is represented by formula (10), which extracts the periodicity of the release frequency through the Fourier transform.

[0122] The dynamic behavior pattern of memory allocation requests and deallocation frequencies is derived from the following formula:

[0123] (11)

[0124] In formula (11), Indicates the first Memory dynamic behavior patterns at any given time This represents the weighting coefficient of the allocation request. Indicates the first The intensity of allocation requests at any given moment. The amplitude coefficient representing the periodic release. Indicates the period length. Indicates phase shift, Let represent the random noise term. Formula (11) describes the combined dynamic behavior pattern of memory allocation requests and release frequencies.

[0125] In distributed task scheduling systems, time-series data of memory allocation requests and periodic data of memory release frequency can be obtained through task logs. Task logs record the timestamps of memory requests and release times for each task. Assuming a data center processes 50,000 tasks daily, the logs show that memory allocation requests are concentrated between 2 PM and 4 PM, approximately 100 requests per second. The release frequency exhibits periodic fluctuations, occurring once per hour. Using time-series analysis tools, such as moving averages, the frequency distribution of memory allocation requests was calculated, revealing an average request frequency of 80 times per second, peaking at 120 times per second. Fourier transform was used to extract the periodic characteristics of the release frequency; analysis showed that the main period of the release frequency is 1 hour, and the intensity (the ratio of the peak value to the mean of the periodic fluctuation) is approximately 1.5. The analysis using time-series analysis tools reveals the dynamic behavior patterns of memory allocation and release, providing a basis for subsequent optimization.

[0126] Step S220: Based on the dynamic behavior pattern of the release frequency, the objective function is iteratively optimized using a gradient descent-based optimization algorithm to determine the initial configuration parameters of the memory pool. The objective function is derived using the following formula:

[0127] (12)

[0128] In formula (12), Describe the objective function. Indicates the distribution ratio. Indicates the proportion of reserved space. This represents the average frequency of memory allocation requests. The periodic intensity representing the release frequency. and These are the weighting coefficients.

[0129] The initial configuration parameters of the memory pool are derived using the following formula:

[0130] (13)

[0131] In formula (13), Indicates the first The parameter vector after the nth iteration. Indicates the first The current parameter vector for the next iteration. This indicates the step size for updating the learning rate control parameter. Describe the objective function In parameters The gradient vector at that point, Indicates the number of iterations.

[0132] Based on the periodicity of the release frequency, a gradient descent algorithm is used to optimize the memory pool configuration. The allocation ratio and reserved space ratio in the objective function are iteratively adjusted to balance memory usage efficiency and stability. For example, the initial allocation ratio is set to 0.7, the reserved space ratio to 0.2, and the weight coefficients are... and The initial values ​​were 0.6 and 0.4 respectively. After optimization, the allocation ratio was adjusted to 0.65, and the reserved space ratio was increased to 0.25, making the memory pool more adaptable to peak demand. In this embodiment, configuration optimization ensures that the system can still operate efficiently under high load.

[0133] Step S230: Continuously track the frequency changes of memory allocation requests and the periodic fluctuations of release frequency through real-time monitoring tools to obtain real-time usage status data of the memory pool. If the memory usage rate in the real-time usage status data exceeds a preset threshold, it is marked as a high-load state, and the dynamic adjustment needs of the memory pool configuration are determined.

[0134] The real-time usage status data of the memory pool is obtained using the following formula:

[0135] (14)

[0136] In formula (14), Indicates time Memory usage Indicates time The amount of memory already used. Indicates time Total capacity of the memory pool.

[0137] The following formula can be used to determine whether dynamic adjustments to the memory pool configuration are needed:

[0138] (15)

[0139] In formula (15), Indicates load status flag, This indicates the current memory usage. This indicates a preset threshold. When When the value equals 1, it is marked as a high load state. A value of 0 indicates a normal load state.

[0140] Real-time monitoring tools such as Prometheus are used to track changes in the frequency of memory allocation requests and periodic fluctuations in memory release frequency. Assuming monitoring detects that memory utilization reaches 85% during peak periods, exceeding a preset threshold of 80%, the system is marked as being in a high-load state.

[0141] Step S240: To meet the dynamic adjustment requirements, the allocation ratio and reserved space of the memory pool are adjusted using a preset threshold comparison method. If the duration of the high load state exceeds the preset threshold, the reserved space ratio is increased and the allocation ratio is decreased to obtain the adjusted memory pool configuration.

[0142] The duration of high load conditions is calculated using the following formula:

[0143] (16)

[0144] In formula (16), Indicates the duration of the high load state. This indicates the preset threshold. This indicates the adjusted proportion of reserved space. Indicates the current reserved space ratio. This indicates the reserve space increment factor. When the duration of high load exceeds the threshold, the reserve space is increased proportionally.

[0145] For high-load conditions, a threshold comparison method is used to adjust the memory pool configuration. If the high load lasts for more than 5 minutes, the reserved space ratio is increased to 0.3, and the allocation ratio is decreased to 0.6. This dynamic adjustment effectively alleviates memory pressure and improves system response speed. For example, the dynamic adjustment requirements are further refined by analyzing real-time monitoring data. Monitoring shows that after the peak period, memory utilization drops to 60%, and the release frequency fluctuation weakens. Based on this, the system restores its initial configuration, releasing excess reserved space. This flexible adjustment optimizes resource utilization and reduces system operating costs.

[0146] Furthermore, in the GPU memory optimization method provided in this embodiment, step S300 includes:

[0147] Step S310: Extract memory block information from the adjusted memory pool configuration, scan the available memory blocks in the memory pool using a memory management tool, obtain the size and allocation status of each memory block, and generate a memory block information list containing the size and status of the memory blocks.

[0148] The list of memory block information is derived using the following formula:

[0149] (17)

[0150] In formula (17), This represents a list of information about the generated memory blocks. This indicates the total number of identifiable memory blocks in the memory pool. Indicates the first The size of the memory block. Indicates the first The current allocation status of each memory block is represented by formula (17), which combines the size and status information of all memory blocks into a complete information list.

[0151] In a distributed task scheduling system, the adjusted memory pool configuration requires further extraction of memory block information to optimize resource management. Memory management tools scan the memory pool, generating a list containing memory block sizes and allocation statuses. Assuming a data center memory pool contains 1000 memory blocks, the tool scan shows that 600 blocks are allocated with an average size of 2MB, and 400 blocks are unallocated with an average size of 1.5MB. Allocation statuses are categorized as "used" and "idle," and the list clearly records the current status of each memory block.

[0152] Step S320: Based on the memory block information list and data access mode, analyze the access frequency of intermediate data using statistical tools. For intermediate data with access frequency lower than a preset threshold, use a priority sorting algorithm to generate a data compression priority list.

[0153] The access frequency of intermediate data is calculated using the following formula:

[0154] (18)

[0155] In formula (18), Indicates the first The frequency of data access in each memory block Indicates the first The number of times a memory block is accessed within the statistical time window. This indicates the total length of the statistical time window. Indicates the first The size of each memory block. Formula (18) quantifies the data access frequency by calculating the number of accesses per unit of storage space per unit time.

[0156] The data compression priority list is derived using the following formula:

[0157] (19)

[0158] In formula (19), Indicates the first Compression priority score for each data block Indicates the first The access frequency of each data block This indicates the maximum access frequency among all data blocks. Indicates the first The compression benefit of each data block This indicates the maximum compression benefit. This indicates the storage space occupied by the data block. Indicates the maximum storage space occupied. These represent the weighting coefficients for access frequency, compression benefits, and storage usage, respectively.

[0159] Based on a list of memory block information and data access patterns, statistical tools analyze the access frequency of intermediate data. Intermediate data refers to temporary data generated during task execution, such as cached calculation results. Assume system records show that a certain type of intermediate data is accessed 10 times per minute, below a preset threshold of 15 times. For this type of low-frequency data, a priority sorting algorithm generates a data compression priority list.

[0160] Step S330: For the intermediate data in the data compression priority list, the Huffman coding algorithm is used for compression processing. A coding table is generated by constructing a Huffman tree. The intermediate data is converted into compressed data according to the coding table to obtain a compressed data set.

[0161] The Huffman coding algorithm sorts data by access frequency from low to high, prioritizing the compression of data accessed least frequently. For example, data with a frequency of less than 10 times per minute, accounting for 30% of the list, has the highest priority. This sorting ensures that compression operations focus on the parts with the least impact on performance.

[0162] For intermediate data in the data compression priority list, the Huffman coding algorithm is used for compression. By analyzing the frequency of the data content, a Huffman tree is constructed to generate an encoding table. For example, if some intermediate data contains a high-frequency character A and a low-frequency character B, the encoding table assigns A a shorter code and B a longer code. Assuming the original data size is 10MB, it is reduced to 4MB after compression, generating a compressed data set.

[0163] Step S340: Based on the compressed data set and storage space allocation requirements, allocate the compressed data set to the compressed storage unit using the storage management tool. If the occupancy rate of the compressed storage unit exceeds the preset threshold, adjust the storage space allocation ratio to obtain the compressed storage unit configuration.

[0164] The utilization rate of compressed storage units is calculated using the following formula:

[0165] (20)

[0166] In formula (20), This indicates the utilization rate of compressed storage units. Indicates the used storage space. Indicates the total storage space. Indicates the first The size of a compressed data set Indicates the maximum capacity of the compressed storage unit. This indicates the number of data sets to be compressed.

[0167] The storage space allocation ratio is calculated using the following formula:

[0168] (twenty one)

[0169] In formula (21), Indicates the storage space allocation ratio. This indicates the storage requirements of the compressed data set. Indicates the total amount of available storage space. This indicates the data priority weighting factor.

[0170] Compressed datasets need to be allocated to compressed storage units. Storage management tools store compressed data into dedicated units based on storage space requirements. Assuming a compressed storage unit has a capacity of 100GB and a current utilization rate of 70%, if monitoring shows the utilization rate exceeds a preset threshold of 80%, the system adjusts the storage space allocation ratio. For example, if the initial allocation ratio is 0.6, it is adjusted to 0.5, increasing the reserved space to 20GB. This dynamic adjustment ensures that the storage unit can still operate efficiently under high load while preventing data overflow.

[0171] Preferably, in the GPU memory optimization method provided in this embodiment, step S400 includes:

[0172] Step S410: Obtain access frequency data from the compressed intermediate data storage unit, analyze the access frequency of each storage unit using statistical tools, and mark the access frequency as low-frequency data if the access frequency is lower than a preset threshold, thus obtaining a low-frequency data set.

[0173] The access frequency of each storage unit is calculated using the following formula:

[0174] (twenty two)

[0175] In formula (22), Indicates the first The access frequency of each storage unit Indicates the first The number of times each storage unit is accessed within the statistical period. This represents the total number of accesses within the statistical period. Formula (22) is used to calculate the percentage of access frequency for each compressed intermediate data storage unit.

[0176] The low-frequency data set is derived using the following formula:

[0177] (twenty three)

[0178] In formula (23), Represents a low-frequency data set. Indicates the first One storage unit, Indicates the first The access frequency of each storage unit This indicates the preset access frequency threshold. This represents the total number of storage units. Formula (23) defines the criteria for determining a low-frequency data set, namely, storage units with an access frequency lower than a preset threshold constitute a low-frequency data set.

[0179] In distributed task scheduling systems, obtaining access frequency data from compressed intermediate data storage units is a crucial step in optimizing resource management. These storage units typically store compressed intermediate data generated during task execution, such as temporary results in distributed computing. Statistical tools scan these storage units and record the access frequency of each unit. For example, a data center's compressed storage unit contains 1000 units. Statistics show that 700 units have an access frequency of 5 times per minute, below a preset threshold of 10 times per minute. These units are marked as low-frequency data, forming a low-frequency data set. Obtaining access frequency data relies on real-time monitoring tools. These tools generate access frequency tables through log analysis, providing a basis for subsequent data migration.

[0180] Step S420: Based on the low-frequency data set, use a data migration tool to transfer the storage units marked as low-frequency data to the low-speed storage area, generate storage allocation records, and obtain low-speed storage allocation records.

[0181] Processing low-frequency datasets involves the use of data migration tools. These tools, based on tags, transfer low-frequency data to low-speed storage areas, such as tape storage or low-cost cloud storage. For example, a low-frequency dataset containing 500GB of data might be transferred in batches to a low-speed storage area at a rate of 100GB per hour, generating storage allocation records. These records detail the transfer time, target storage area, and data volume for each storage unit, such as "Unit A, transferred to low-speed storage area 1, data volume 2GB". This recording ensures the traceability of the data migration process and supports subsequent resource allocation.

[0182] Step S430: Based on the low-speed storage allocation record and task priority list, memory resources are reallocated using an intelligent scheduling algorithm, prioritizing the allocation of memory required by high-priority tasks to obtain a preliminary memory allocation scheme.

[0183] Based on low-speed storage allocation records and a task priority list, an intelligent scheduling algorithm reallocates memory resources. The task priority list is sorted according to the importance and timeliness of the tasks; for example, real-time analysis tasks have a higher priority than archiving tasks.

[0184] The intelligent scheduling algorithm prioritizes allocating memory resources to high-priority tasks. For example, if a high-priority task requires 10GB of memory, the intelligent scheduling algorithm allocates 8GB from the free memory pool and retrieves 2GB of data from the low-speed storage area to generate a preliminary memory allocation plan. The preliminary memory allocation plan records the number and source of the allocated memory blocks, such as "Task X, allocate 5 memory blocks, source: 4 from the free pool, 1 from low-speed storage".

[0185] Step S440: For the initial memory allocation scheme, the memory usage rate is detected by the memory management tool. If the memory usage rate exceeds the preset threshold, the resource ratio of the low-speed storage area is adjusted to obtain an optimized memory allocation scheme.

[0186] Memory utilization is calculated using the following formula:

[0187] (twenty four)

[0188] In formula (24), Indicates memory usage. Indicates the amount of memory already used. This represents the total system memory capacity. Formula (24) is used to calculate the current memory usage percentage, which is the core indicator for determining whether memory optimization is needed.

[0189] The adjusted proportion of low-speed storage area resources is obtained using the following formula:

[0190] (25)

[0191] In formula (25), This indicates the adjusted proportion of low-speed storage area resources. This indicates the initial proportion of low-speed storage area resources. Indicates the adjustment factor. This indicates the current memory usage. This represents the preset memory usage threshold. Formula (25) automatically reduces the resource allocation ratio of the low-speed storage area when the memory usage exceeds the threshold.

[0192] (26)

[0193] In formula (26), This indicates the optimized memory allocation scheme. This indicates the initial memory allocation scheme. Indicates the amount of memory allocation adjustment. This indicates the performance metrics of the high-speed storage area. This represents the performance index of the low-speed storage area. Formula (26) determines the final optimized memory allocation strategy based on performance differences.

[0194] Based on the initial memory allocation plan, the memory management tool monitors memory usage to optimize resource allocation. Assuming a total memory pool capacity of 1000GB and a current utilization rate of 85%, exceeding the preset threshold of 80%, the memory management tool analysis shows that the resource ratio of the low-speed storage area is 0.3, indicating excessive usage. The system adjusts the ratio to 0.2, reallocating some low-speed storage data to high-speed storage areas, such as moving 100GB of data to the cache. This adjustment generates an optimized memory allocation plan, which records the adjusted memory usage and resource ratio, such as "utilization reduced to 75%, low-speed storage ratio 0.2". This dynamic adjustment ensures efficient utilization of memory resources.

[0195] Please see Figure 2 This embodiment also provides a GPU memory optimization system for executing the above-described GPU memory optimization method. The GPU memory optimization system includes a real-time change feature acquisition module 10, a memory pool configuration acquisition module 20, a data storage unit acquisition module 30, and a memory allocation scheme acquisition module 40. The real-time change feature acquisition module 10 is used to acquire dynamic data flow information and intermediate data generation patterns of the task load, extract peak memory demand and data access frequency from the task execution sequence, and classify the task load patterns using a preset convolutional neural network model to obtain the real-time change features of the task load. The memory pool configuration acquisition module 20 is used to analyze memory allocation requests based on the real-time change features of the task load. The memory pool allocation ratio and reserved space are determined by summing and releasing frequencies using a gradient descent-based optimization algorithm to obtain an adjusted memory pool configuration. A data storage unit acquisition module 30 extracts available memory block information from the adjusted memory pool configuration, generates data compression priorities for intermediate data, and uses a Huffman coding algorithm to compress low-access-frequency intermediate data to obtain compressed intermediate data storage units. A memory allocation scheme acquisition module 40 temporarily stores the compressed intermediate data storage units in a low-speed storage area if the access frequency of the compressed intermediate data storage units is lower than a preset threshold. Then, an intelligent scheduling algorithm reallocates memory resources according to task priorities to obtain an optimized memory allocation scheme.

[0196] The GPU memory optimization method and system provided in this embodiment, compared with existing technologies, proposes a solution that integrates data flow analysis, memory allocation optimization, and data compression to address the problems of low memory allocation efficiency and data access latency caused by dynamic changes in task load. This embodiment classifies task load patterns using a convolutional neural network, extracts peak memory demand and data access frequency, and accurately captures real-time changing characteristics; it optimizes memory pool configuration based on gradient descent algorithm, dynamically adjusting allocation ratios and reserved space; for low-frequency accessed intermediate data, Huffman coding is used for compression and temporary storage in a low-speed storage area, and resources are reallocated according to task priority using an intelligent scheduling algorithm. This embodiment significantly improves memory utilization, reduces access latency, and achieves efficient memory resource management, making it particularly suitable for high-dynamic, high-concurrency task load scenarios.

[0197] Although preferred embodiments of the invention have been described, those skilled in the art, upon learning the basic inventive concept, can make other changes and modifications to these embodiments. Therefore, the appended claims are intended to be interpreted as including both the preferred embodiments and all changes and modifications falling within the scope of the invention. Clearly, those skilled in the art can make various alterations and modifications to the invention without departing from its spirit and scope. Thus, if these modifications and modifications of the invention fall within the scope of the claims and their equivalents, the invention is also intended to include these modifications and modifications.

Claims

1. A GPU memory optimization method, characterized in that, Includes the following steps: The system acquires dynamic data stream information and intermediate data generation patterns of task load, extracts peak memory requirements and data access frequency from the task execution sequence, and classifies task load patterns using a pre-defined convolutional neural network model to obtain real-time variation characteristics of task load, specifically including: Dynamic data stream information is obtained from the task sequence, and time series analysis tools are used to process the dynamic data stream information to extract intermediate data of the task load and obtain the time distribution of task execution and data generation patterns. The dynamic data stream information is derived using the following formula: in, Indicates time Dynamic data stream information, This represents the total number of tasks in the task sequence. Indicates the first The weighting coefficients of each task. Indicates the first The task at time The execution status value, Indicates the time decay parameter. Indicates the first The start time of each task; Intermediate task load data is derived using the following formula: in, Indicates the first Intermediate task load data within a time window This indicates the number of data sampling points within the time window. Indicates the first Resource utilization rate of each sampling point Indicates the first The computational complexity of each sampling point Indicates the first Memory usage per sampling point Indicates the first Bandwidth consumption per sampling point; The data generation pattern is derived using the following formula: in, Prediction functions that represent the patterns in data generation. Time variables representing time series These parameters represent the stages of task execution. The amplitude coefficient representing periodic fluctuations. This represents the frequency parameter of data generation. This indicates the phase offset. This represents the coefficient of the exponentially decaying term. Indicates the decay rate. Indicates the reference offset; Based on the time distribution of task execution and the data generation pattern, statistical analysis tools are used to calculate the peak memory demand and data access frequency. If the peak memory demand exceeds the preset threshold, it is marked as high resource consumption, thus obtaining the resource consumption characteristics of the task load. The data access frequency and memory demand peak in the resource usage characteristics are extracted by a convolutional neural network, and the extracted features are classified into patterns to determine the task load category. If the task load category is high-frequency change, then a real-time monitoring tool is used to continuously track the dynamic data stream, obtain real-time change characteristics, and obtain a dynamic behavior description of the task load. Based on the real-time changes in task load, the frequency of memory allocation requests and releases is analyzed. An optimization algorithm based on gradient descent is used to determine the allocation ratio and reserved space of the memory pool, resulting in the adjusted memory pool configuration. The available memory block information is extracted from the adjusted memory pool configuration. Data compression priorities are generated for intermediate data. The Huffman coding algorithm is used to compress the intermediate data with low access frequency to obtain the compressed intermediate data storage unit. If the access frequency of the compressed intermediate data storage unit is lower than the preset threshold, the compressed intermediate data storage unit will be temporarily stored in the low-speed storage area, and memory resources will be reallocated according to task priority through the intelligent scheduling algorithm to obtain an optimized memory allocation scheme.

2. The GPU memory optimization method as described in claim 1, characterized in that, In the step of calculating the peak memory demand and data access frequency using statistical analysis tools based on the time distribution of task execution and data generation patterns, and marking it as high resource consumption if the peak memory demand exceeds a preset threshold, the peak memory demand is obtained through the following formula: in, Indicates the peak of memory demand. Indicates the total time for task execution. Indicates the total number of tasks. Indicates the first The task at time Memory usage Indicates the first The task at time The activity status indicator is 1 when the task is active and 0 otherwise; The data access frequency is calculated using the following formula: in, Indicates the frequency of data access. Indicates the statistical time window. This indicates the total number of data access events. Indicates the first The amount of data in each access event Indicates the first The duration of the second access event; High resource usage is marked using the following formula: in, Indicates resource usage characteristic marker, Indicates the current memory usage. This indicates the preset memory threshold. This represents the resource usage determination coefficient. If the ratio of the current memory usage to the preset memory threshold is greater than or equal to the resource usage determination coefficient, it is marked as 1 to indicate high resource usage; otherwise, it is marked as 0 to indicate normal resource usage.

3. The GPU memory optimization method as described in claim 2, characterized in that, In the step of extracting features from the data access frequency and memory demand peak in the resource usage characteristics using a convolutional neural network, and classifying the extracted features to determine the task load category, the task load category is derived using the following formula: in, This indicates the final determined task load category. Indicates the candidate category, This represents the extracted comprehensive feature vector. Indicates belonging to a category under given features The probability, Indicate category The corresponding weight vector, Indicates the total number of categories. Indicates all categories Summing the numerators.

4. The GPU memory optimization method as described in claim 3, characterized in that, If the task load category is high-frequency changing, then in the step of continuously tracking the dynamic data stream using real-time monitoring tools to obtain real-time change characteristics and obtain a dynamic behavior description of the task load, the dynamic behavior description of the task load is derived by the following formula: in, Indicates the first The dynamic behavior description value of each task. Indicates the total length of the observation period. Indicates time The load variation coefficient, Represents the variability weight parameter. Indicates time The response characteristic value, This represents the mean of the response characteristics. This represents the standard deviation of the response characteristics.

5. The GPU memory optimization method as described in claim 1, characterized in that, The steps of analyzing memory allocation requests and release frequencies based on real-time changes in task load, and using a gradient descent-based optimization algorithm to determine the memory pool allocation ratio and reserved space to obtain the adjusted memory pool configuration include: The time series data of memory allocation requests and the periodic data of release frequency are obtained from the real-time change characteristics of task load. The frequency distribution of the memory allocation requests is calculated using time series analysis tools. The periodic characteristics of the release frequency are extracted by combining Fourier transform to obtain the dynamic behavior pattern of memory allocation requests and release frequency. Based on the dynamic behavior pattern of the release frequency, an optimization algorithm based on gradient descent is used to iteratively optimize the objective function to determine the initial configuration parameters of the memory pool. The objective function is derived using the following formula: in, Indicates the distribution ratio. Indicates the proportion of reserved space. This represents the average frequency of memory allocation requests. The periodic intensity representing the release frequency. and These are the weighting coefficients; The frequency changes of memory allocation requests and the periodic fluctuations of release frequency are continuously tracked by real-time monitoring tools to obtain real-time usage status data of the memory pool. If the memory usage rate in the real-time usage status data exceeds a preset threshold, it is marked as a high-load state to determine the dynamic adjustment needs of the memory pool configuration. To address the dynamic adjustment requirement, a preset threshold comparison method is used to adjust the memory pool allocation ratio and reserved space. If the high load state lasts for more than the preset threshold, the reserved space ratio is increased and the allocation ratio is decreased to obtain the adjusted memory pool configuration.

6. The GPU memory optimization method as described in claim 1, characterized in that, The steps of extracting available memory block information from the adjusted memory pool configuration, generating data compression priorities for intermediate data, and compressing low-access-frequency intermediate data using the Huffman coding algorithm to obtain compressed intermediate data storage units include: Extract memory block information from the adjusted memory pool configuration, scan the available memory blocks in the memory pool using a memory management tool, obtain the size and allocation status of each memory block, and generate a list of memory block information containing the size and status of memory blocks. Based on the memory block information list and data access pattern, the access frequency of intermediate data is analyzed by statistical tools. For intermediate data with access frequency lower than a preset threshold, a priority sorting algorithm is used to generate a data compression priority list. For the intermediate data in the data compression priority list, the Huffman coding algorithm is used for compression processing. A coding table is generated by constructing a Huffman tree. The intermediate data is converted into compressed data according to the coding table to obtain a compressed data set. Based on the compressed data set and storage space allocation requirements, the compressed data set is allocated to compressed storage units using a storage management tool. If the occupancy rate of the compressed storage unit exceeds a preset threshold, the storage space allocation ratio is adjusted to obtain the compressed storage unit configuration.

7. The GPU memory optimization method as described in claim 1, characterized in that, If the access frequency of the compressed intermediate data storage unit is lower than a preset threshold, the compressed intermediate data storage unit is temporarily stored in a low-speed storage area, and memory resources are reallocated according to task priority using an intelligent scheduling algorithm to obtain an optimized memory allocation scheme. The steps include: Access frequency data is obtained from the compressed intermediate data storage unit. The access frequency of each storage unit is analyzed using statistical tools. If the access frequency is lower than a preset threshold, it is marked as low-frequency data, thus obtaining a low-frequency data set. Based on the low-frequency data set, a data migration tool is used to transfer the storage units marked as low-frequency data to the low-speed storage area, generating a storage allocation record and obtaining the low-speed storage allocation record. Based on the low-speed storage allocation record and task priority list, memory resources are reallocated using an intelligent scheduling algorithm, prioritizing the allocation of memory required by high-priority tasks to obtain a preliminary memory allocation scheme. For the initial memory allocation scheme, memory usage is detected using memory management tools. If memory usage exceeds a preset threshold, the resource ratio of the low-speed storage area is adjusted to obtain an optimized memory allocation scheme.

8. A GPU memory optimization system for performing the GPU memory optimization method as described in any one of claims 1 to 7, characterized in that, The GPU memory optimization system includes: The real-time change feature acquisition module (10) is used to acquire dynamic data flow information and intermediate data generation rules of task load, extract memory demand peak and data access frequency from task execution sequence, classify task load mode using a preset convolutional neural network model, and obtain real-time change features of task load. The memory pool configuration acquisition module (20) is used to analyze the memory allocation request and release frequency based on the real-time change characteristics of task load, and use the gradient descent-based optimization algorithm to determine the allocation ratio and reserved space of the memory pool to obtain the adjusted memory pool configuration. The data storage unit acquisition module (30) is used to extract available memory block information from the adjusted memory pool configuration, generate data compression priority for intermediate data, and use the Huffman coding algorithm to compress intermediate data with low access frequency to obtain compressed intermediate data storage units. The memory allocation scheme acquisition module (40) is used to temporarily store the compressed intermediate data storage unit in the low-speed storage area if the access frequency of the compressed intermediate data storage unit is lower than the preset threshold, and then reallocate memory resources according to task priority through the intelligent scheduling algorithm to obtain an optimized memory allocation scheme.