Load balancing adaptive scheduling method for accelerating edge real-time video analytics
By adopting an edge-coordinated adaptive scheduling framework, combined with a lightweight regression model and model predictive control, the adaptability and load balancing issues of edge real-time video analysis systems under diverse inference modes are solved, achieving a balance between high-efficiency video analysis accuracy and real-time performance, and improving the system's adaptability and scalability.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- FUZHOU UNIV
- Filing Date
- 2026-03-09
- Publication Date
- 2026-06-12
AI Technical Summary
Existing edge real-time video analytics systems struggle to adapt to the dynamic changes in the number of targets in video scenes when faced with diverse inference modes. They cannot balance accuracy and real-time performance. RoI task scheduling decisions are often delayed, which can lead to device overload. Edge cluster load balancing scheduling has low prediction accuracy and high computational overhead. They also have poor scene adaptability and weak system scalability.
We construct an end-to-end adaptive scheduling framework that coordinates the endpoints and the edge. We extract candidate RoI features through background subtraction, combine them with a lightweight regression model prediction inference mode, use a model prediction control framework to schedule RoI tasks, and perform edge cluster load prediction based on a probabilistic sparse self-attention mechanism to achieve efficient collaborative scheduling.
It significantly improves detection accuracy and service level target satisfaction rate, has low decision-making overhead, good compatibility with heterogeneous equipment and system scalability, and is suitable for the needs of various real-time video analysis scenarios such as smart cities and security monitoring.
Smart Images

Figure CN122200296A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of edge computing, intelligent video analytics and computer task scheduling technology, and specifically relates to a load balancing adaptive scheduling method for accelerating real-time edge video analytics. Background Technology
[0002] With the rapid development of artificial intelligence and communication technologies, real-time video analytics has gradually become an important foundation for many scenarios such as smart cities, security monitoring, and traffic management. Leveraging various mature deep neural network (DNN) models, video analytics systems can accurately acquire and process diverse visual information, providing strong support for automated decision-making.
[0003] However, the inherently computationally intensive nature of DNN inference makes it difficult for the lightweight processors equipped in commercial cameras to simultaneously meet the high demands of video analytics systems for real-time performance and accuracy. Existing solutions to this problem mainly fall into two categories: one uses compressed, lightweight models for inference, which results in a significant decrease in detection accuracy; the other uploads the video stream to a remote cloud and relies on its powerful computing capabilities for analysis. This latter approach suffers from high bandwidth consumption and privacy risks due to long-distance transmission, and these shortcomings will be further exacerbated as the number of cameras deployed and the clarity of video capture continue to improve.
[0004] To address these issues, the industry has begun deploying video analytics systems at the network edge. This deployment approach reduces both edge-side computational pressure and cloud bandwidth consumption, while significantly improving video analytics response speed. However, limited by the available resources of edge nodes, finer-grained model inference optimization and resource scheduling strategies are needed to achieve high-precision, low-latency real-time video analytics. Currently, research in this area has primarily focused on two main inference optimization models: one is a time-based filtering model, which uses pre-analysis modules or lightweight models to skip low-value video frames and reuse the inference results of the previous frame, reducing unnecessary computational overhead; the other is a spatial filtering model, which uses lightweight background modelers or fast detectors to identify Regions of Interest (RoIs) containing potential targets or events in video frames, replacing the entire frame with RoIs for model inference, thus reducing the input data size. These two models filter video content unrelated to analytical accuracy from both temporal and spatial dimensions, effectively improving resource utilization efficiency and reducing system response latency.
[0005] To clarify the performance boundaries of the two inference modes in real-world video analysis scenarios, the latency and accuracy of the two typical inference modes were experimentally verified based on typical video sequences from the UA-DETRAC dataset. The two modes are as follows:
[0006] (1) Frame-based Inference (FI): Directly input video frames into the YOLOv8S model to perform object detection;
[0007] (2) RoI-based Inference (RI): RoIs are extracted from video frames by background removal algorithm based on Gaussian Mixture Model (GMM), and the RoIs are input into the YOLOv8S model to perform target detection.
[0008] The actual test results are as shown in the attached instruction manual. Figure 1 As shown, the core findings are as follows:
[0009] (1) The number of targets in the video sequence is dynamic. The inference delay of RI mode fluctuates significantly with the number of targets in different video frames, while the inference delay of FI mode remains stable.
[0010] (2) When the number of targets in the video frame is small, the inference delay of RI mode is lower than that of FI mode, and the inference accuracy is close to that of FI mode; when the number of targets in the video frame is large, the inference delay of RI mode is higher than that of FI mode, and the inference accuracy is significantly lower than that of FI mode.
[0011] Based on the above test results, it is clear that both existing inference modes have significant limitations in scene adaptability. The RI mode is only suitable for video scenes with sparse targets, and can reduce inference overhead while maintaining detection accuracy. In scenes with dense targets, iterative inference of multiple RoIs leads to a surge in latency and a decrease in accuracy. The FI mode can complete the analysis of the entire frame at once, making it more suitable for high target density scenes, but it will generate unnecessary inference overhead and hardware costs in scenes with sparse targets.
[0012] Combining full-frame inference and RoI inference modes, and leveraging the local computing power of cameras and edge servers, it is expected to adapt to dynamic changes in video content, reducing system latency and deployment costs while ensuring detection accuracy. However, in practical implementation, existing edge real-time video analytics systems still face three core technical challenges:
[0013] 1. It is difficult to accurately characterize the performance boundaries of different inference modes, and there is a lack of lightweight real-time switching mechanisms. The latency of the whole-frame inference mode is positively correlated with the complexity of the DNN model and the size of the input frame, and its performance is highly predictable. However, the latency of the RoI inference mode is affected by the dynamically changing RoI queue length and RoI size, and the accuracy is affected by the background complexity and target size, making it difficult to achieve accurate prediction. Due to the limited available resources of edge nodes, it is impossible to conduct actual tests and comparisons of the performance of each video frame under the two inference modes. Therefore, it is necessary to build a lightweight prediction mechanism that can take into account the performance of both inference modes to support real-time switching of inference modes. Existing solutions mostly use threshold-based or feedback control methods to achieve mode switching. These methods depend on specific deployment environments, cannot perceive the content features of video frames, and are difficult to accurately capture the dynamic differences in accuracy and latency between the two inference modes, resulting in poor scene adaptability.
[0014] 2. Difficulty in efficiently scheduling diverse task sequences from heterogeneous cameras. For RoI and video frame task sequences generated by heterogeneous cameras, they need to be scheduled to suitable devices for inference execution based on the inference mode to ensure the real-time performance of video analysis. This process requires accurate assessment of the resource requirements of different RoIs and video frames, as well as the computing power and network bandwidth status of heterogeneous devices. Simultaneously, the impact of scheduling decisions on the current and future performance of the time-slot system must be considered to avoid overloading some devices. Existing solutions mostly employ scheduling strategies based on Deep Reinforcement Learning (DRL). While these strategies possess some decision-making capabilities in dynamic environments, the high-dimensional state space and strong temporal dependencies of RoI scheduling tasks easily lead to difficulties in model convergence and deviations from the optimal solution. Furthermore, these solutions rely on offline training, requiring retraining when the number of system devices changes, resulting in poor system scalability and high overhead.
[0015] 3. Difficulty in achieving fine-grained load balancing optimization for edge clusters. For video frames and RoI inference tasks offloaded to edge clusters, further fine-grained scheduling to suitable server nodes is required to achieve load balancing. Edge clusters typically run inference services for multiple video streams simultaneously, and the real-time performance of nodes fluctuates significantly with load changes. Therefore, it is necessary to accurately perceive the load change patterns of each node within the cluster, efficiently schedule real-time arriving task sequences, and control the time overhead of the scheduling strategy itself to avoid affecting the real-time performance of video analytics. Existing solutions mostly use machine learning-based time-series prediction models to estimate system load and task traffic to assist in load balancing scheduling. However, these models are difficult to effectively capture the long-term dependencies of load sequences, cannot accurately quantify the performance gap between actual scheduling decisions and optimal decisions, and the model's operating overhead increases significantly with system scale, severely impacting the real-time performance of video analytics. Summary of the Invention
[0016] To address the shortcomings and deficiencies of existing technologies, this invention provides a load adaptive scheduling method and corresponding system for accelerating real-time edge video analysis with diverse inference modes. It aims to solve the core problems in existing edge real-time video analysis solutions, such as the difficulty of adapting a single inference mode to the dynamic changes in the number of video scene targets, the inability to balance accuracy and real-time performance, the lag in RoI task scheduling decisions which can easily lead to device overload, the low prediction accuracy of edge cluster load balancing scheduling, the large computational overhead, the difficulty in approximating the optimal solution, and the poor overall scenario adaptability and weak system scalability.
[0017] The core innovation of this invention lies in constructing an end-to-end adaptive scheduling framework. First, it extracts a set of candidate Regions of Interest (ROIs) from video frames through background subtraction. Simultaneously, it extracts multi-dimensional RoI features that characterize the scene complexity of the video frames. Based on a pre-trained lightweight regression model, it predicts and evaluates the comprehensive performance of inference accuracy and service level target satisfaction rate for both whole-frame inference and RoI inference modes. It adaptively selects the optimal inference mode for the current frame. Furthermore, feature extraction and mode decision-making are completed entirely locally on the camera, achieving lightweight and low-overhead decision-making while ensuring detection accuracy. For task sequences under RoI inference mode, this invention introduces a model prediction control framework to predict the task load of future time slots based on historical RoI sequences, using task completion efficiency as the basis for prediction. This invention constructs a scheduling model with rate and waiting time as the core optimization objectives. Combining the real-time resource status of heterogeneous devices, it solves the optimal allocation decision through rolling optimization and feedback correction mechanisms, realizing efficient collaborative scheduling of RoI tasks among local cameras, collaborative cameras, and edge server clusters, effectively avoiding scheduling decision lag and device overload problems. For whole-frame inference and RoI inference tasks offloaded to the edge cluster, this invention constructs a time-series prediction model based on a probabilistic sparse self-attention mechanism to accurately capture the load fluctuation pattern of the edge server. With the goal of minimizing the cluster load imbalance, an optimization model is constructed. Through variable relaxation and random rounding methods, a feasible optimal scheduling strategy is solved, which significantly reduces the scheduling computation overhead while achieving balanced and efficient utilization of edge cluster computing resources.
[0018] Real-world testing has verified that this solution significantly improves detection accuracy and service level target satisfaction compared to existing mainstream edge video analysis and scheduling methods. It also features extremely low decision-making overhead, good adaptability to heterogeneous devices, and system scalability, enabling it to stably adapt to the needs of various real-time video analysis scenarios such as smart cities, security monitoring, and traffic management.
[0019] The specific technical solution adopted by this invention to solve its technical problem is as follows:
[0020] A load-balancing adaptive scheduling method for accelerating real-time video analytics at the edge includes:
[0021] For the current video frame captured by the camera, foreground motion region extraction is performed by background subtraction to generate a candidate RoI set, and the corresponding feature vectors used to characterize the scene complexity of the video frame are extracted.
[0022] Based on the feature vectors, the inference accuracy and service level target satisfaction rate of the whole frame inference mode and the RoI inference mode are predicted and evaluated by a pre-trained performance prediction model, and the inference mode with the best overall performance is adaptively selected. If the whole frame inference mode is selected, the current video frame is offloaded to the edge cluster as an inference task. If the RoI inference mode is selected, the current set of candidate RoIs is used as the RoI task sequence to be scheduled.
[0023] For the RoI task sequence to be scheduled, based on the model prediction control framework, the RoI tasks of a preset number of time slots in the future are predicted according to the historical RoI task sequence. Combined with the real-time resource status of each device, a rolling optimization mechanism is used to construct and solve the scheduling optimization problem, generate the allocation decision of the RoI tasks to be scheduled among the local camera, the collaborative camera and the edge cluster, and complete the task distribution.
[0024] For all inference tasks offloaded to the edge cluster, based on the time-series prediction results of the load of each edge server in the cluster, with the optimization objective of minimizing the cluster load imbalance, a load balancing allocation decision is generated among the servers in the cluster to distribute the inference tasks to the corresponding servers for inference execution.
[0025] Furthermore, the feature vector used to characterize the scene complexity of the video frame includes 5-dimensional RoI features, namely the number of candidate RoIs, the ratio of the total area covered by all foreground motion regions to the area of the entire frame, the inverse value of the average circularity of all dynamic contours, the average convexity of all foreground contours, and the ratio of the area of the largest single foreground contour to the total area of all foreground contours.
[0026] Furthermore, when predicting and evaluating the overall performance of the two inference modes, the weighted sum of inference accuracy and service level target satisfaction rate is used as the overall performance evaluation index, and the inference mode with the higher weighted sum is selected as the inference mode with the best overall performance; the performance prediction model is a pre-trained lightweight stochastic gradient descent regressor, and the feature vector extraction and inference mode selection processes are both completed locally on the camera that acquires the video frames.
[0027] Furthermore, the scheduling optimization problem based on the model predictive control framework negatively weights the task scheduling decision with the task waiting time as the weighting factor, and takes minimizing the total scheduling cost after weighting as the optimization objective, so that RoI tasks with longer waiting times get higher scheduling priority.
[0028] Furthermore, the scheduling optimization problem includes two types of constraints: the first type is the uniqueness constraint: each RoI task is assigned to at most one device for execution; the second type is the device computing power constraint: the total latency of a single device processing all assigned tasks in a single time slot does not exceed the time slot length.
[0029] Furthermore, the solution and execution of the scheduling optimization problem adopts a rolling optimization and feedback correction mechanism. Specifically, the scheduling optimization problem is reconstructed based on the latest task queue and device load status for each time slot, and only the allocation decision of the current time slot is executed. The load status of the target device is updated for the scheduled tasks, and the waiting time of the unscheduled tasks is accumulated and added to the RoI task sequence to be scheduled in the next time slot.
[0030] Furthermore, the temporal prediction of the edge server load is achieved through a Transformer temporal model based on a probabilistic sparse self-attention mechanism and a self-attention distillation structure. The temporal model completes feature extraction through one-dimensional convolution, activation functions, and max pooling, reducing the computational complexity of load prediction from the quadratic level of the input sequence length to the logarithmic level.
[0031] Furthermore, when generating load balancing allocation decisions, the binary scheduling decision variables are first relaxed into continuous variables within a continuous interval, a convex optimization problem is constructed and solved to obtain the theoretically optimal allocation result, and then the continuous variables are converted into binary scheduling decision variables through a random rounding algorithm to generate an executable allocation decision; the load imbalance degree of the edge cluster is defined as: max(the ratio of the maximum server load to the average load minus 1, 0).
[0032] Furthermore, a load-balanced adaptive scheduling system for accelerating real-time video analytics at the edge, comprising multiple cameras and an edge server cluster, the system being used to perform the methods described above.
[0033] Furthermore, the camera is configured to perform foreground motion region extraction, feature vector extraction, and inference mode selection operations, and the edge server cluster is configured to perform load state timing prediction and inference task load balancing scheduling operations; the system enables the inference task to meet preset inference latency constraints and cluster load balancing requirements through the collaborative work of the local camera, the cooperative camera, and the edge server cluster.
[0034] And a computer device including a memory, a processor, and a computer program stored in the memory, wherein the processor executes the computer program to implement the method described above.
[0035] A non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the method described above.
[0036] Compared to existing technologies, this invention and its preferred solution abandon the mode selection mechanism that requires parallel execution of dual-mode inference for performance evaluation or relies on fixed thresholds. Through a lightweight regression model deployed at the camera, it can efficiently predict and select the inference mode with better overall performance based solely on region-of-interest features, significantly reducing the computational overhead and response latency of mode decision-making. At the task scheduling level, a model prediction control framework is introduced to achieve rolling optimization and feedback correction. Combined with historical task sequences, it dynamically predicts future load trends, giving the task allocation strategy forward-looking and adaptive capabilities, effectively improving the system's efficiency in responding to sudden changes in video content and load fluctuations. Edge cluster resource scheduling employs a computationally efficient time-series prediction model and a convex optimization relaxation solution strategy. Under the premise of strictly satisfying inference latency constraints, it significantly improves the load distribution balance among servers and avoids local resource overload or idleness. Crucially, this invention organically integrates mode selection, device-level collaborative scheduling, and edge-level load balancing scheduling into a three-level linkage optimization system. The three work together to ensure a comprehensive improvement in inference accuracy, service level target satisfaction rate, and resource utilization efficiency in complex dynamic scenarios, providing a technical solution for edge real-time video analysis that combines high adaptability, high reliability, and high resource efficiency. Attached Figure Description
[0037] The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments:
[0038] Figure 1 The figure shows a comparison of the latency and accuracy of different inference modes on different video frames. In the figure, (a) shows the comparison of the inference latency of different inference modes on different video frames, and (b) shows the comparison of the inference accuracy of different inference modes on different video frames.
[0039] Figure 2 This is an overview diagram of the architecture of the real-time edge video analysis system according to an embodiment of the present invention;
[0040] Figure 3 This is an overview diagram of the architecture of the AdaSch load adaptive scheduling framework according to an embodiment of the present invention;
[0041] Figure 4 This is a real-world testbed architecture diagram of AdaSch, an embodiment of the present invention.
[0042] Figure 5 The figure shows a performance comparison of different methods in different sizes of inference models according to embodiments of the present invention. In the figure, (a) is a comparison of the inference latency of different methods under different YOLOv8 series models, and (b) is a comparison of the detection accuracy of different methods under different YOLOv8 series models.
[0043] Figure 6The figure shows a performance comparison of different methods on the FastRCNN inference model in the embodiments of the present invention. In the figure, (a) is a comparison of the service level target satisfaction rate (SAR) of different methods under the FastRCNN model, and (b) is a comparison of the F1-score detection accuracy of different methods under the FastRCNN model.
[0044] Figure 7 The figure shows a performance comparison of different methods in the embodiments of the present invention on the MOT dataset; in the figure, (a) is a comparison of the service level target satisfaction rate (SAR) of different methods on the MOT15 dataset, and (b) is a comparison of the F1-score detection accuracy of different methods on the MOT15 dataset.
[0045] Figure 8 The figure shows the ablation experiment results of the AdaSch core component in the embodiment of the present invention; in the figure, (a) is the comparison result of the inference delay of AdaSch and after removing different core components, and (b) is the comparison result of the detection accuracy of AdaSch and after removing different core components.
[0046] Figure 9 The figure shows the sensitivity test results of AdaSch for different parameters in the embodiment of the present invention; in the figure, (a) is the effect of the accuracy-delay coefficient on the detection accuracy and inference delay of AdaSch, and (b) is the effect of the maximum tolerance delay on the detection accuracy and inference delay of AdaSch.
[0047] Figure 10 The figure shows a performance comparison of AdaSch under different system scales according to the embodiments of the present invention. In the figure, (a) is a comparison result of system inference latency under different combinations of camera and GPU numbers, and (b) is a comparison result of system detection accuracy under different combinations of camera and GPU numbers. Detailed Implementation
[0048] To make the features and advantages of the present invention more apparent and understandable, specific embodiments are described below in detail:
[0049] It should be noted that the following detailed descriptions are exemplary and intended to provide further explanation of this application. Unless otherwise specified, all technical and scientific terms used in this specification have the same meaning as commonly understood by one of ordinary skill in the art to which this application pertains.
[0050] It should be noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the exemplary embodiments according to this application. As used herein, the singular form is intended to include the plural form as well, unless the context clearly indicates otherwise. Furthermore, it should be understood that when the terms "comprising" and / or "including" are used in this specification, they indicate the presence of features, steps, operations, devices, components, and / or combinations thereof.
[0051] To address the three core technical challenges faced by real-time edge video analytics systems, this invention proposes AdaSch, a load-adaptive scheduling framework for accelerating real-time edge video analytics with diverse inference modes. This framework dynamically selects appropriate inference modes to perform analysis within the real-time video stream. By combining Model Predictive Control (MPC) with a load balancing awareness mechanism, it achieves efficient scheduling of RoI sequences and video frames, ultimately balancing the accuracy and real-time performance of video analytics across different video scenarios, while simultaneously improving system resource utilization efficiency and load balancing levels.
[0052] The core design of this scheme mainly consists of three parts:
[0053] First, an adaptive inference mode selection mechanism based on RoI features. This mechanism first extracts the motion regions of video frames and generates a candidate RoI set through a background subtractor. Then, it captures five core RoI features of the video frames through a lightweight feature extractor. Combined with a pre-trained lightweight stochastic gradient descent regression model, it evaluates the accuracy and latency performance of both whole-frame inference and RoI inference modes in real time. It can select the optimal inference mode for the current video frame without actual testing. At the same time, the decision-making process is completed locally on the camera without additional transmission overhead, which can adapt to the limited computing resources on the edge.
[0054] Second, an MPC-based RoI collaborative scheduling method. This method predicts the generation of future RoIs in time slots by analyzing the changing trends of historical RoI sequences, generating multi-time-slot RoI task sequences. Using the number of tasks completed and waiting time as optimization objectives, a mixed-integer programming model with resource and latency constraints is constructed. A linear programming solver is used to obtain the optimal allocation scheme of RoI tasks among local cameras, collaborative cameras, and edge servers. Simultaneously, through rolling optimization and feedback correction mechanisms, device load and task queue status are updated in real time to adapt to the dynamically changing system environment and avoid task queuing and timeouts caused by decision lag.
[0055] Third, a load-balance-aware edge inference task scheduling strategy. For inference tasks offloaded to the edge cluster, this strategy first uses a Transformer model based on a probabilistic sparse self-attention mechanism to accurately predict load fluctuations on the edge servers, while reducing computational complexity from O(L...2 The computation time is reduced to O(L log L), balancing prediction accuracy and operational efficiency. Then, with the goal of minimizing the cluster load imbalance index, a convex optimization model is constructed. The theoretically optimal allocation scheme is obtained through variable relaxation and a convex optimization solver. Finally, an executable scheduling decision is generated through a random rounding algorithm to achieve efficient and balanced utilization of edge cluster resources.
[0056] This invention defines a video analytics pipeline as an overall process encompassing multiple stages, including data acquisition, frame and RoI extraction, inference processing, and result feedback. For example... Figure 2 As shown, the proposed real-time edge video analytics system comprises multiple surveillance cameras and an edge cluster consisting of multiple servers. Specifically, the set of cameras is denoted as... The set of edge servers is denoted as ,in and These represent the number of cameras and edge servers, respectively. The system operates in discrete time slots, denoted as... In each time slot, the camera processes either the video frames it captures or inference requests from other cameras. Depending on the scene type of the video frames and available cluster resources, two inference modes can be selected.
[0057] a) Frame-based video analysis. In this mode, video frames are directly input into the DNN model for inference to obtain analysis results. This mode offers high inference accuracy and stable inference latency when dealing with complex multi-object scenes. However, this inference mode is highly complex, consuming significant bandwidth and computational resources, and therefore typically requires reliance on edge clusters for inference. Furthermore, in simple scenarios with sparse targets, this analysis mode based on complete video frames and edge clusters increases unnecessary inference overhead and hardware costs.
[0058] b. RoI-based Video Analytics. Unlike frame-based video analytics, RoI-based video analytics first utilizes a background removal tool locally, such as those provided by OpenCV based on Gaussian Mixture Model (GMM) and K-Nearest Neighbors (KNN), to extract dynamically changing regions in the scene by analyzing changes between consecutive frames, thus obtaining the RoIs needed for subsequent inference. Then, DNN inference is performed on these extracted RoIs to complete the video analysis. Specifically, when the number of RoIs is small or their area is small, inference can be performed directly on the camera or forwarded to other cameras for collaborative inference. Collaborating cameras can quickly process some RoIs using idle resources and high-speed LAN bandwidth, improving load balancing and reducing system costs. When local processing power is insufficient to meet complex inference requirements, RoIs are offloaded to edge servers to reduce inference latency. Edge servers are equipped with richer computing power, enabling simultaneous processing of data from multiple cameras, thereby expanding the performance ceiling of the video analytics system.
[0059] The proposed system can flexibly select appropriate inference modes and operating devices based on the accuracy and real-time requirements of the video scene and the available cluster resources. This design enables efficient collaboration between the camera and the edge server, improving load balancing while ensuring inference accuracy, thereby saving system costs.
[0060] like Figure 3 As shown, the AdaSch proposed in this invention consists of inference mode selection based on RoI features, RoI scheduling based on MPC, and edge inference task scheduling with load balancing awareness.
[0061] RoI Feature-Based Inference Mode Selection. First, the camera transmits the real-time video stream to a background subtractor to extract motion regions and generate a candidate RoI set. Then, the camera utilizes a lightweight feature extractor to quickly capture key features of the video frames. Simultaneously, an accuracy prediction model is introduced to dynamically evaluate accuracy and latency in both full-frame inference and RoI modes. Furthermore, a decision is made in real-time whether, in the current time slot, the video frame should be directly transmitted to an edge server for full-frame inference, or RoI inference should be performed locally or on a collaborating camera, to achieve optimal selection of the inference mode.
[0062] RoI scheduling based on MPC. First, by analyzing the changing trends of RoIs in historical video frames, the generation of RoIs in future time slots is predicted. Next, rolling simulation is used to generate multiple RoI task sequences for future time slots. Then, the long-term RoI scheduling problem is formally defined with the number of tasks completed and their waiting time as optimization objectives. Furthermore, a linear programming solver is employed to obtain the long-term optimal scheduling decision, achieving efficient allocation of RoI tasks among cameras, collaborative cameras, and edge servers.
[0063] Load balancing-aware edge inference task scheduling. For inference tasks offloaded to edge servers, the edge cluster load is monitored in real time, and a probabilistic sparse self-attention mechanism is used to accurately predict load fluctuations. Subsequently, based on task resource requirements and the real-time status of the edge servers, the integer constraint of inference scheduling is relaxed, and a convex optimization problem is constructed. A convex optimization solver is then used to obtain the theoretically optimal load balancing scheme. Finally, a random rounding algorithm is introduced to generate a practically executable scheduling strategy to achieve efficient and balanced utilization of edge cluster resources.
[0064] In real-time video analysis, the proposed AdaSch first determines whether to use whole-frame inference or RoI-based mode based on the number of targets in the video frame and their recognition complexity. By employing appropriate inference modes for different video scenarios, AdaSch can minimize system latency while ensuring video analysis accuracy. When facing scenes with sparse or small targets, whole-frame inference struggles to achieve effective detection. In this case, using RoI-based inference ensures the accuracy and real-time performance of video analysis. When a large number of RoIs makes it difficult for cameras to process them in a timely manner, whole-frame inference needs to be performed on an edge server to avoid excessive RoI latency. However, in practical systems, this invention cannot simultaneously perform whole-frame inference and RoI inference on every frame to compare their latency and accuracy performance. Specifically, this invention first formalizes the latency and accuracy models for whole-frame and RoI modes. Based on this, this invention implements an adaptive inference mode switching.
[0065] For full-frame inference mode, video frames captured by the camera are transmitted to the edge cluster for analysis, including frame transmission and DNN inference latency. Therefore, in this inference mode, latency and SLO Achievement Rate (SAR) are defined as follows:
[0066] ,
[0067] ,
[0068] in, Indicates camera The size of the captured video frames, express Bandwidth between edge servers express The inference computation requirements of video frames Indicates the first The computing power of an edge server. This represents the maximum tolerable delay for the reasoning task. This is an indicator function. When hour, =1; otherwise, =0.
[0069] The RoI mode comprises two steps: RoI extraction and RoI processing. First, the background of the video frames is subtracted, and dynamic contours are obtained to extract a valid set of RoIs. Each RoI can be processed by a local camera, a collaborative camera, or an edge server. (Camera) The computational requirements for extracting RoIs are denoted as: The delay for RoI extraction is then defined as:
[0070] ,
[0071] in, Indicates camera Its computing power.
[0072] The extracted RoI set is denoted as Depending on the device performing the reasoning, Further divided into These represent the RoI sets processed locally, processed by collaborative cameras, and offloaded to edge servers, respectively. For RoI sets with lower computational density... Inference can be performed directly on the local camera. Therefore, processing... The Middle The latency of a RoI is defined as:
[0073] ,
[0074] in, express The reasoning and computational requirements.
[0075] When RoI computation density is high or when camera load is high, some RoIs will be forwarded to collaborating cameras for inference. Collaborative camera processing The Middle The latency of a RoI is defined as:
[0076] ,
[0077] in, Indicates collaborative camera computing power express Data size, This indicates the bandwidth between cameras.
[0078] For high computationally dense RoI sets It will offload from the camera to an edge server for inference. One edge server processes The Middle The latency of a RoI is defined as:
[0079] .
[0080] Therefore, the total delay and SAR of RoI mode are defined as follows:
[0081] ,
[0082] ,
[0083] The max(.) function is used to select the part with the largest delay, since the total inference delay is determined by this part.
[0084] make express The reasoning pattern. If Choose local reasoning. ;otherwise, .therefore, In the time slot The total inference delay is defined as:
[0085] ,
[0086] in, and These represent the latency in RoI and full-frame inference modes, respectively.
[0087] To compare the accuracy of the detection results of the current frame under the two inference modes, its inference accuracy is defined as:
[0088] ,
[0089] in, and These represent the accuracy in RoI and full-frame inference modes, respectively.
[0090] To achieve adaptive inference mode switching and reduce system overhead, AdaSch estimates the complexity of video frames and selects an appropriate inference mode based on RoI features. First, this invention considers five features: num, area, circularity, convexity, and max_area_ratio, the specific meanings of which are shown in Table 1.
[0091] Table 1 lists the five RoI features used.
[0092]
[0093] Next, RoI extraction is performed on each frame, using the aforementioned five features as input to train a classifier and select the optimal inference mode. Simultaneously, video analysis is performed on the video frames and the extracted RoI sequences, using the inference mode with higher accuracy and SAR weighted sum as the classification label. Based on the constructed supervised task, this invention employs a lightweight stochastic gradient descent (SGD) regressor to learn the mapping relationship between RoI feature vectors and their combined performance under the two inference modes. Specifically, this invention sets a precision-latency coefficient to calculate the weighted sum of system accuracy and SAR, thereby determining the optimal inference mode. Further, during the inference phase, the optimal inference mode is selected for the current frame. Notably, to avoid additional transmission overhead and reduce system latency, all extracted RoI features are processed on the local camera.
[0094] For RoI sequences, it is necessary to coordinate local cameras, collaborating cameras, and edge servers to schedule them reasonably, thereby balancing the load among devices and meeting the accuracy and real-time requirements of video analysis as much as possible under resource constraints. Since the impact of RoIs on device load is lagging, this invention, AdaSch, considers not only the current state but also predicts the load of cameras and edge servers in subsequent time slots when making decisions. Furthermore, AdaSch can quickly adjust scheduling to load fluctuations, thereby avoiding queuing and timeout overhead caused by decision lag.
[0095] While DRL and Deep Imitation Learning (DIL) can handle similar task scheduling problems, deep learning is often viewed as a black box, lacking explicit optimization objectives and decision-making logic. This leads to an uninterpretable decision-making process and difficulty in strictly controlling inference latency through optimization constraints. Furthermore, these methods typically require numerous iterations to converge and necessitate retraining when the environment changes. To address the difficulty of traditional scheduling strategies adapting to dynamic environments, AdaSch introduces MPC into the RoI scheduling problem, constructing a rolling optimization model to generate the optimal inference decision for the current time slot. This model combines real-time observation information such as camera computing power, RoI features, and network bandwidth to construct optimization constraints that include resource limitations and latency tolerance, and solves them efficiently using a linear programming solver. This design not only provides explicit decision logic and objective functions but also enhances the model's interpretability. Simultaneously, this mechanism effectively avoids suboptimal action choices caused by the trade-off between exploration and mining in DRL, while explicit constraint processing ensures that all decisions satisfy the system safety boundary, preventing task failures caused by server overload.
[0096] The key steps of the proposed MPC-based RoI scheduling method are shown in Algorithm 1. AdaSch considers... One camera and Each edge server, with each device Has fixed computing power and network bandwidth The decision time slot length is The RoIs captured by the camera are scheduled within each time slot. The scheduling task is denoted as... ,in Indicating the need for reasoning, This indicates the amount of data. Additionally, AdaSch introduces task wait time. This reflects the impact of task delays on scheduling decisions. If a task is not scheduled in a certain time slot, It will be accumulated in the next time slot. Furthermore, in time slots of... Within this set, the RoIs to be scheduled are denoted as . ,in This represents the number of RoIs requiring inference. For each task... and equipment The decision variables are denoted as .when Dispatched to target device hour, ;otherwise, .
[0097] Predictive Model. In MPC, this invention predicts the RoI sequence for future time slots based on historical RoI sequences to achieve optimal scheduling decisions. Specifically, the historical RoI sequence captured by the camera is denoted as... The task state within the prediction window is defined as follows:
[0098]
[0099] in, This indicates the number of predicted RoIs. .
[0100] According to the observations of this invention, the changes in RoI between consecutive frames are small and exhibit a clear pattern. Therefore, this invention predicts the RoI sequence of future time slots by using the average change amplitude within historical time slots. Specifically, It can be updated to:
[0101]
[0102] Rolling optimization. The device may already be under a certain load within the time slot, among which... Indicates equipment In the time slot The load. To ensure the task can be completed on time, the equipment Effective computing power is defined as:
[0103]
[0104] in, It is a very small positive number.
[0105] Therefore, the task In the equipment The processing latency is defined as follows:
[0106]
[0107] Based on the above definitions, this invention formalizes a mixed-integer programming problem with the objective of minimizing inference latency. To improve task completion rate and prioritize scheduling tasks with longer waiting times, this invention weights the tasks in the objective function. Therefore, the optimization problem is defined as:
[0108]
[0109] in, This indicates a uniqueness constraint, and at most one device can be assigned to each inference task. This indicates that the total task processing latency of each device does not exceed the time slot length under dynamically adjusted constraints.
[0110] Next, AdaSch uses a linear programming solver to obtain the optimal solution to the above optimization problem. Furthermore, the solution is transformed into a 0-1 integer solution to obtain the scheduling decision. And Distributed to different devices.
[0111] Feedback and correction. This invention dynamically updates the device load and task queue based on the actual processing status of the tasks. For If there is equipment Make Then Assigned to device And update the device load as follows:
[0112]
[0113] For unscheduled tasks, the waiting time is updated as follows:
[0114]
[0115] Meanwhile, unscheduled tasks will be added to the task queue of the next time slot. In this approach, as waiting time accumulates, tasks with longer wait times will be prioritized for scheduling in future time slots. Through this design, AdaSch can optimize the task queue and device load reconfiguration problem in each time slot, and only execute scheduling decisions for the current time slot, forming a rolling optimization and feedback correction.
[0116]
[0117] Through the RoI scheduling design described above, RoI sequences have been scheduled to local cameras, collaborative cameras, or edge clusters. RoIs scheduled to cameras will directly perform inference to complete video analysis, while RoIs scheduled to edge clusters and the video frames output in full-frame inference mode will further determine the edge server for inference execution. To minimize average inference latency, AdaSch aims to evenly schedule tasks to appropriate servers to fully utilize available system resources. To achieve this, AdaSch estimates the actual inference requirements of tasks and schedules tasks based on the device's inference performance. However, device inference performance is affected by various factors, such as its own computing power and observable states like real-time queue length. Furthermore, due to potential multi-video stream sharing, the real-time performance of edge servers may fluctuate significantly due to variations in the load of different cameras. Therefore, by analyzing the load status of the edge cluster and predicting the available computing power of devices in advance, AdaSch's performance in real-world scenarios can be improved.
[0118] Specifically, AdaSch can estimate the impact of real-time arriving task inference demands on the load of different edge servers based on the input RoI and video frame size. To achieve more accurate task scheduling decisions, this invention, AdaSch, designs a Transformer-based load predictor to perceive potential performance changes of edge servers during inference. Compared with classic RNNs and CNNs, Transformers excel at capturing long-term dependencies and interactions between variables, and are better able to understand global patterns and local trends. Therefore, it can provide more accurate multi-device load prediction. Based on task demand analysis and device load prediction, this invention designs a load-balance-aware edge inference task scheduling method, the key steps of which are shown in Algorithm 2.
[0119] As a further preferred embodiment, the load prediction model uses historical load sequences of edge servers as training data, and the sampling interval of the training data is proportional to the system's discrete time slot length T. s Maintain consistency to ensure that the model prediction results accurately match the actual runtime slots of the system.
[0120] First, construct the encoder and decoder inputs. Indicates historical load, This represents the currently collected load sequence. This represents a temporal sample of the target load sequence (i.e., the load sequence to be predicted). When constructing the encoder, classic self-attention mechanisms require calculating the attention weights of each position with all other positions, leading to high computational complexity. To alleviate this problem, inspired by [previous invention name], this invention introduces probabilistic sparse self-attention into the encoder and uses self-attention distillation between layers to reduce computational overhead. Specifically, from [previous invention name]... layer to the first The feature extraction process of a layer is defined as follows:
[0121] ,
[0122] in, represents sparse self-attention, Conv1d represents one-dimensional convolution over a time series, ELU represents the activation function, and MaxPool represents the max pooling operation.
[0123] Adding a max-pooling layer after Conv1d reduces the input downsampling size by half and lowers the computational complexity of the training process from... Reduce to ,in for The length of the encoder is then determined. Next, the encoder output and decoder input are fed into the decoder, whose network structure consists of two attention mechanisms: a multi-head sparse attention mechanism and a standard multi-head attention mechanism. Finally, the encoder output is fed into the MLP to obtain the predicted load sequence.
[0124] Based on the designed inference latency model and the computing power differences among different edge servers, this invention can estimate the impact of scheduling RoIs or video frames to edge servers on their load in future time slots. The goal of this invention is to schedule RoIs or video frames to a suitable server to minimize the load balancing index between servers, which is defined as:
[0125] ,
[0126] in, and These represent the maximum and average server load, respectively.
[0127] Furthermore, this invention introduces a linear programming solver to obtain the minimum... The optimal solution. Due to inference scheduling decision. It is discontinuous, and this invention first relaxes it to Next, a convex optimization solver (e.g., CVXPY) is introduced to obtain the optimal solution. Then, random rounding is used to obtain a feasible solution. Through the above design, the inference scheduling decision can approximate the optimal solution with a high probability. As a further preferred implementation, the execution rule of the random rounding algorithm is: generate uniformly distributed random numbers between 0 and 1, and when the relaxed optimal solution... When the value is greater than this random number, a feasible solution is available. The value is 1 if it is not 1, otherwise the value is 0.
[0128]
[0129] like Figure 4As shown, this embodiment of the invention constructs a real-world testbed for real-time edge video analysis and uses Python and PyTorch to implement the video analysis pipeline and the proposed AdaSch. Specifically, this embodiment uses OpenCV to read video datasets on a Jetson TX2 equipped with a 256-core NVIDIA Pascal GPU to simulate real-time video streams from cameras. The edge cluster consists of a server equipped with an RTX 4090D and a server equipped with an RTX 4060. All devices are connected to a switch via a 1000Mbps link to form a local area network (LAN). During inference scheduling, RoIs and video frames are sent to the collaborating cameras and edge cluster via the LAN. In this embodiment, a Flask instance is run on each camera and edge server to monitor inference requests that may come from other cameras. When a camera needs to perform collaborative inference and frame scheduling, the camera encapsulates the RoI and frame to be scheduled and sends an HTTP request to the target device to complete the scheduling.
[0130] This embodiment uses two real-world datasets, UA-DETRAC and MOT15, as video streams from cameras to evaluate AdaSch. Specifically, UA-DETRAC is a large-scale multi-object vehicle detection dataset containing multiple vehicle types (e.g., cars, buses, and trucks) and various weather and lighting conditions (e.g., daytime, nighttime, sunny, cloudy, and rainy). MOT15 is a large-scale multi-object pedestrian tracking dataset containing over 20 different indoor and outdoor scenes, as well as various camera settings and imaging conditions. Furthermore, this embodiment uses the F1-score to measure the accuracy of video analysis, which comprehensively considers both precision and recall metrics, and the Intersection over Union (IoU) threshold for correctly identified samples that are true positives is 0.5.
[0131] At the camera end, this embodiment uses OpenCV to read video frames from the dataset and uses `cv2.createBackgroundSubtractorMOG2()` to create a GMM-based background subtractor. This subtractor maintains an estimate of the background image and uses subtraction to extract the foreground region, then performs blob detection to extract the RoI. The maximum number of frames considered for background modeling is 200, and the sensitivity threshold for background-foreground discrimination is set to 16. For the DNN inference model, pre-trained YOLOv8S and FastRCNN are used to perform object detection. Furthermore, psutil and Nvidia's pynvml are used for resource monitoring on the camera and edge cluster to track GPU load changes in real time. In the MPC-based scheduler, the prediction window is set to 5, and the linear programming problem is solved using `linprog` provided by `scipy.optimize`. In the load prediction model, the encoder input sequence length is 96, the decoder generated sequence length is 48, and the sparse attention factor is 5. CVXPY is used to solve the optimal solution of the relaxed convex programming problem and performs random rounding to obtain the scheduling decision.
[0132] This embodiment compares the following three methods to evaluate the superiority of the present invention, AdaSch.
[0133] (1) FFI (Full-Frame Inference): Input all video frames completely into the inference model for video analysis.
[0134] (2) Distream: Load balancing is achieved between cameras and edge servers through dynamic task partitioning, where cameras perform RoI extraction and load monitoring. If the camera load exceeds a predetermined threshold, the RoI with the largest area is scheduled to the server until the load falls below the threshold.
[0135] (3) ParaLoupe: Video analysis is performed based on RoIs, and different models are used to analyze RoIs with different confidence levels. The first RoI region appearing in a video frame is regarded as a low-confidence RoI and YOLOv8S is used for detection. The RoI region captured based on the target displacement of the previous frame is regarded as a high-confidence RoI and YOLOv3-Tiny is used for detection.
[0136] (4) Gecko: Detects the pixel variation between two frames and uses a feature extractor to select the best model for inference for different frames. When the variation between two frames is small, the frame skip controller is used to adjust the detection interval of the video stream and a lightweight YOLOv8S model is used for inference. When the variation between two frames is large, the YOLOv8L model with the highest accuracy is selected for inference.
[0137] Table 2 Comparison of overall performance of different methods
[0138]
[0139] This embodiment compares the overall performance of different methods on the UA-DETRAC dataset, including F1-score and SAR. As shown in Table 2, FFI exhibits the lowest average SAR for inference latency. This is because FFI uses whole-frame inference without frame content filtering, thus requiring more content to be detected than all other methods. ParaLoupe improves SAR by using a more lightweight model to perform detection on high-confidence RoIs through an adaptive model selection mechanism. Gecko dynamically adjusts the analysis granularity based on target motion trends and current load, achieving an average SAR of 60.99%, indicating that Gecko can balance latency and timeliness to some extent. Distream significantly improves SAR by dynamically sensing the load and distributing all RoI sequences to cameras and edge servers for separate processing. In contrast, the proposed AdaSch adaptively selects the optimal inference mode based on camera load and inference task load requirements, and schedules RoIs to edge servers by introducing MPC. Furthermore, by introducing a load prediction mechanism, AdaSch can better perceive the task complexity of future RoI sequences and make more reasonable scheduling, thus achieving the highest SAR.
[0140] For inference accuracy metrics, FFI performs video analysis across the entire frame, and this embodiment uses it as a benchmark to analyze other methods. ParaLoupe has the lowest average accuracy because it only uses the RoI mode. In dense target scenes, ParaLoupe may lose some RoI analysis results because it cannot complete all RoIs in time. Although ParaLoupe uses different modes to extract RoIs from video frames, this strategy, which relies entirely on RoI extraction, may miss some key targets and lead to a decrease in accuracy when the background remover is not performing well. In contrast, Gecko selects different models to perform analysis based on the frame content. In more complex scenes, Gecko ensures detection performance by selecting a higher-precision model, thus its average accuracy is higher than ParaLoupe and FFI. Distream, by combining local RoI extraction with edge inference, can complete more RoI inference within the maximum tolerable latency, thus achieving higher average accuracy. However, Distream may still experience a decrease in accuracy in dense scenes because it cannot process all RoIs. In contrast, the proposed AdaSch considers both RoI and whole-frame inference simultaneously, and adaptively switches between different inference modes based on RoI features, thus achieving the highest average accuracy. AdaSch also outperforms other methods in median accuracy, indicating more stable and higher performance in most scenarios. Furthermore, to evaluate the robustness of each method on more difficult frames, this embodiment statistically analyzes the 75th and 99th percentiles of the accuracy distribution. The 75th percentile represents the accuracy of 75% of frames below this threshold, reflecting the performance in the bottom 25% of moderately difficult scenarios; the 99th percentile represents the accuracy of only the most difficult 1% of frames below this threshold, used to measure the method's limit in extremely complex scenarios. Except for AdaSch, all other methods have an accuracy of 0 at the 99th percentile, indicating their difficulty in handling extremely complex scenarios. In contrast, AdaSch still exhibits some detection capability in extreme scenarios, demonstrating its robustness to complex environments.
[0141] First, this embodiment evaluates the impact of different model sizes on inference latency and accuracy. For example... Figure 5As shown, this embodiment considers different YOLOv8 series models, including YOLOv8N, YOLOv8S, YOLOv8M, and YOLOv8L, with increasing model complexity. As the model progresses from YOLOv8S to YOLOv8L, the inference latency and accuracy of all methods increase, indicating that while more complex models improve video analysis accuracy, they inevitably incur greater time overhead. Specifically, model changes have the greatest impact on ParaLoupe's latency but the least impact on its accuracy. This is because ParaLoupe requires inference on each extracted RoI. When using a more complex model to perform the same frame inference, ParaLoupe's latency increases dramatically. ParaLoupe's accuracy loss is mainly due to the background remover failing to accurately identify and extract some key RoI regions, thus the accuracy change is relatively small. Gecko's latency growth is relatively stable as the model size increases, with an average increase of approximately 45.21% from YOLOv8N to YOLOv8L. However, the improvement in accuracy is relatively stable and significant, rising from 35.68% in YOLOv8N to 48.69% in YOLOv8L. This indicates that Gecko maintains a stable balance between accuracy and latency, but it still cannot effectively avoid a significant increase in latency when handling more complex models. When the inference model changes from YOLOv8N to YOLOv8S, the accuracy and latency of FFI do not change much. However, when the inference model changes to YOLOv8L, its latency increases by 22.86% compared to YOLOv8M, but the accuracy only improves by 2.89%. This is because the feature fitting ability of the current model has reached its bottleneck, and larger, more complex models can only bring a small performance improvement, but will bring excessive inference overhead. In contrast, Distream and the proposed AdaSch are less affected by changes in model size in terms of inference latency and accuracy, with average improvements of approximately 16.99% and 6.48%, respectively. This is because Distream can schedule RoIs according to load changes, avoiding increased latency and decreased accuracy caused by system overload. Compared to Distream, AdaSch adaptively selects between full-frame inference and RoI mode. When model complexity and inference overhead are low, more tasks are processed quickly locally. As model complexity and inference overhead increase, AdaSch adaptively schedules some frames to edge servers to ensure the accuracy and latency requirements of keyframe processing. Simultaneously, AdaSch dynamically senses inference task demands and server resource status, avoiding queuing backlogs and performance degradation caused by increased model complexity. Notably, when the inference model is YOLOv8L, Distream and AdaSch achieve comparable accuracy. In this case, AdaSch's inference latency is 25.48% lower than Distream, demonstrating the superiority of AdaSch's load balancing mechanism in reducing latency.
[0142] Next, this embodiment replaces YOLOv8S with FastRCNN to test the performance of different methods. For example... Figure 6 As shown, ParaLoupe, employing only RoI-based inference, suffers from numerous RoIs failing to complete within the maximum tolerable latency in target-dense scenarios, resulting in SAR and F1-scores of 65.20% and 19.02%, respectively. While FFI improves inference accuracy by performing full-frame inference on edge clusters, this approach incurs excessive system overhead, preventing many tasks from completing in a timely manner, resulting in SAR and F1-scores of only 41.44% and 11.35%, respectively. Gecko demonstrates good task processing timeliness in the scenario, achieving SAR of 76.80%, only slightly lower than AdaSch. However, its F1-score of only 19.02% indicates that although Gecko can complete tasks quickly, its inference accuracy is not outstanding in target-dense scenarios, significantly lower than Distream and AdaSch. Distream significantly improves its response speed after introducing a resource-aware RoI scheduling strategy, achieving SAR and F1-scores of 74.89% and 32.56%, respectively. In comparison, AdaSch achieved SAR and F1-scores of 78.00% and 37.80% respectively, surpassing all other methods. Furthermore, this embodiment uses the MOT15 dataset instead of the UA-DETRAC dataset. Figure 7 As shown, the target density in the MOT15 dataset is lower than that in the UA-DETRAC dataset, resulting in a general improvement in the F1-score for all methods. In this case, the performance gap between FFI, Distream, and the proposed AdaSch is significantly narrowed, but AdaSch still achieves the best SAR and F1-score, reaching 80.01% and 71.62%, respectively.
[0143] To quantify the contribution of core components in AdaSch, this embodiment removes the following components to evaluate their impact on AdaSch performance.
[0144] (1) / FI (Frame Inference): Removes the full frame inference mode and only executes RoI.
[0145] (2) / RI (RoI Inference): Remove RoI mode and perform full-frame inference only.
[0146] (3) / RS (RoI Scheduling): Removes MPC-based RoI scheduling, and all video frames generated by the camera are inferred locally.
[0147] (4) / LB (Load Balancing): Remove load prediction, make scheduling decisions only based on the current state, and always select the device with the lowest load.
[0148] like Figure 8 As shown, when FI is removed, AdaSch's latency decreases slightly because it only retains lightweight RoIs and does not employ full-frame inference. However, AdaSch's F1-score drops significantly, indicating that using RoIs alone is insufficient to maintain high accuracy in diverse video scenarios. When RI is removed, AdaSch employs only full-frame inference, and its F1-score decreases, suggesting that RoI-based inference only achieves higher analysis accuracy in certain scenarios. Simultaneously, AdaSch's latency rises to 206 ms, validating the significant effect of RoI-based inference on improving system real-time performance. When RS is removed, AdaSch exhibits both increased latency and decreased accuracy. This is because the proposed MPC-based RoI scheduling method can reduce RoI latency by predicting RoI sequences and optimizing through rolling decisions, thereby scheduling RoIs to suitable devices. Without this prediction and optimization mechanism, the increased inference latency of some RoIs may exceed the maximum tolerable latency limit, causing these RoIs to fail to provide actual inference results, thus reducing accuracy. Furthermore, removing the load balancer (LB) also led to increased latency and decreased accuracy in AdaSch. This is because without a load-balance-aware edge inference task scheduling mechanism, AdaSch can only rely on the current edge cluster load status for scheduling, and cannot achieve good system load balancing based on inference complexity. Therefore, video frames and RoIs cannot be scheduled to the most suitable devices, resulting in increased latency and decreased accuracy.
[0149] First, this embodiment evaluates the impact of the precision-delay coefficient in inference mode selection on the performance of the proposed AdaSch. For example... Figure 9 As shown in (a), as the coefficient increases, the F1-score gradually increases and peaks near 0.6 (approximately 59.1%), while latency continues to increase. This is because for larger coefficients, AdaSch selects a more accurate model to perform inference, increasing both accuracy and latency in video analysis. When the coefficient increases from 0.6 to 0.8, accuracy reaches a bottleneck due to model performance limitations, but latency continues to increase. Therefore, setting the coefficient to 0.6 achieves a better balance between accuracy and latency. Next, this embodiment evaluates the impact of the maximum tolerable latency λ on AdaSch performance. Figure 9As shown in (b), both the average accuracy and average latency of the system gradually increase with the increase of λ. This is because the MPC-based RoI scheduler continuously schedules frames that have not yet been analyzed until inference is complete or the maximum tolerable latency is reached. When λ increases, the number of frames that can be analyzed increases accordingly, thus improving the overall accuracy. However, since the time allowed to complete analysis becomes longer, more video frames may accumulate, leading to increased system latency. This parameter is used to balance the system's real-time performance and accuracy, and the optimal maximum tolerable latency can be selected based on the actual performance of the device to fully utilize the performance of AdaSch.
[0150] This embodiment evaluates the scalability of the proposed AdaSch at different system scales, where the number of cameras increases from 2 to 6 and different numbers of GPUs are configured. Figure 10 As shown, when the number of servers remains constant while the number of cameras increases, system latency increases significantly. This is because as the number of cameras increases, the number of inference tasks to be completed also increases, while the resources allocated to each task decrease, thus increasing system latency. When the number of cameras remains constant while the number of servers increases, system latency decreases significantly. This is because AdaSch can perform reasonable scheduling on more edge servers to reduce system latency. Note that the average latency in the 4+2 scenario is higher than that in the 2+1 scenario because the performance of the expanded servers (RTX4060) is not as good as the original servers (RTX3090). It is worth noting that AdaSch maintains high performance under different device configurations, and the impact of different device combinations on accuracy is less significant than on latency. As the number of cameras increases, AdaSch's accuracy decreases slightly. This is because as the number of video frames to be processed increases, the server load also increases. If inference cannot be completed within the maximum tolerable latency, the corresponding video frames will be skipped for analysis, leading to a decrease in accuracy. In this case, AdaSch can still maintain overall accuracy through reasonable scheduling. Especially in scenarios with a large number of targets, AdaSch can dynamically allocate whole frames and RoI tasks to achieve good load balancing among multiple servers in edge cluster scheduling.
[0151] Table 3 Time Costs of Different Components in AdaSch
[0152]
[0153] This embodiment measured the execution time overhead of each module of AdaSch, as shown in Table 3. The overall system overhead is composed of the various components. Specifically, in a scenario with 4 devices and 2 edge servers, the RoI extraction and inference times were 38 ms and 143 ms, respectively. This indicates that RoI extraction has a low time overhead in video analysis and can efficiently extract key regions for subsequent analysis. The overhead of inference mode selection based on RoI features was 0.62 ms, verifying that AdaSch can achieve fast inference mode selection. The time required for RoI scheduling based on MPC was 0.04 ms, indicating that AdaSch can quickly schedule the RoI sequences extracted from the cameras to the appropriate devices for analysis. This is due to AdaSch's ability to directly predict the task arrival sequence and solve for the optimal scheduling decision by using historical RoI sequence changes rather than complex DNN models. Load prediction based on self-attention mechanism and task scheduling based on random rounding took 9.65 ms and 3.68 ms, respectively. These components are executed on edge servers, utilizing their more abundant computing resources to achieve fast execution. Therefore, these components in AdaSch require very low online time overhead and will not impose an additional burden on edge video analytics systems.
[0154] In summary, this invention proposes a novel load-adaptive scheduling framework (AdaSch) for accelerating real-time edge video analytics with diverse inference modes. AdaSch innovatively designs an inference mode selection mechanism based on RoI features, an RoI scheduling method based on MPC, and a load-balance-aware edge inference task scheduling strategy to improve the accuracy and real-time performance of edge video analytics systems. Extensive experiments using real-world testbeds and multiple video datasets validate the effectiveness and superiority of the proposed AdaSch. Compared to state-of-the-art methods, AdaSch achieves average improvements of 19.20% and 42.94% in accuracy and SAR, respectively. Furthermore, AdaSch exhibits superior and stable adaptability under different maximum tolerable latency and accuracy-latency tradeoff settings, achieving effective accuracy-latency tradeoffs and dynamic adjustment of inference strategies in different scenarios. In addition, AdaSch demonstrates good scalability across different system scales, and its components introduce only minimal time overhead to the video analytics system. In future work, this invention considers extending AdaSch to other visual analysis scenarios such as instance segmentation, and further exploring how to achieve more efficient load balancing in larger-scale video analysis systems.
[0155] Finally, a specific implementation of the present invention is provided:
[0156] (1) The system first performs foreground modeling on the real-time video stream from the camera, extracts motion regions through the background subtraction module, and generates a candidate RoI set. Subsequently, a lightweight feature extractor quickly captures structural and semantic features from the current video frame, while simultaneously calling a pre-trained accuracy prediction model to evaluate the accuracy and latency performance of both whole-frame inference and RoI inference modes in real time. The system automatically selects the optimal inference mode based on the prediction results and determines whether the video frame should be inferred on the local camera, collaborative camera, or edge server.
[0157] (2) After the pattern is determined, the system enters the RoI scheduling phase based on MPC. The scheduler uses the shape and temporal changes of historical RoIs to make short-term predictions on the number and distribution of future RoIs, and constructs a sequence of RoI tasks for multiple time slots in the future through rolling optimization. Then, the system establishes a long-term scheduling optimization model with the number of tasks completed and the waiting time as the core indicators, and calls the linear programming solver to generate the optimal RoI allocation scheme across cameras and edge servers.
[0158] (3) For inference tasks that need to be offloaded to edge servers, the system monitors the load status of the edge cluster in real time and uses a probabilistic sparse self-attention mechanism to predict short-term load fluctuations. Subsequently, based on the resource requirements of the task and the server status, the system generates a relaxed convex optimization scheduling model, and the convex optimization solver provides the theoretically optimal load balancing allocation. The obtained results are then rounded to transform them into a directly executable scheduling strategy, thereby achieving efficient use of edge cluster resources.
[0159] (4) After the inference task is completed, the system continuously records key statistical information during runtime, including mode selection results, scheduling decisions, task completion status, and edge load changes. This information will be used to subsequently optimize the inference mode selection model, MPC scheduling model, and load predictor, enabling the system to maintain stable performance gains during long-term operation.
[0160] (5) Throughout the process, AdaSch aims at real-time video analysis and continuously adjusts the inference execution method among cameras, collaborative nodes, and edge servers to balance inference accuracy, latency, and resource utilization. The system continuously provides efficient, intelligent, and stable inference services for multi-camera video streams without human intervention.
[0161] Based on the same inventive concept, this invention also provides a computer device, comprising: one or more processors, and a memory for storing one or more computer programs; the programs include program instructions, and the processor executes the program instructions stored in the memory. The processor may be a Central Processing Unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. It is the computing and control core of the terminal, used to implement one or more instructions, specifically for loading and executing one or more instructions stored in a computer storage medium to implement the above-described method.
[0162] It should be further explained that, based on the same inventive concept, the present invention also provides a computer storage medium storing a computer program, which, when executed by a processor, performs the above-described method. This storage medium can be any combination of one or more computer-readable media. A computer-readable medium can be a computer-readable signal medium or a computer-readable storage medium. A computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media (a non-exhaustive list) include: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof. In the present invention, a computer-readable storage medium can be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
[0163] It should be noted that, unless otherwise defined, the technical or scientific terms used in this invention should have the ordinary meaning understood by one of ordinary skill in the art to which this invention pertains. The terms "first," "second," and similar terms used in this invention do not indicate any order, quantity, or importance, but are merely used to distinguish different components. Terms such as "comprising" or "including" mean that the element or object preceding the word encompasses the elements or objects listed following the word and their equivalents, without excluding other elements or objects. Terms such as "connected" or "linked" are not limited to physical or mechanical connections, but can include electrical connections, whether direct or indirect. Terms such as "upper," "lower," "left," and "right" are used only to indicate relative positional relationships; when the absolute position of the described object changes, the relative positional relationship may also change accordingly.
[0164] The above description is merely a preferred embodiment of the present invention and is not intended to limit the present invention in any other way. Any person skilled in the art may make changes or modifications to the above-disclosed technical content to create equivalent embodiments. However, any simple modifications, equivalent changes, and modifications made to the above embodiments based on the technical essence of the present invention without departing from the scope of the present invention shall still fall within the protection scope of the present invention.
[0165] This invention is not limited to the preferred embodiment described above. Anyone inspired by this invention can derive other forms of load balancing adaptive scheduling methods for accelerating real-time edge video analysis. All equivalent changes and modifications made within the scope of the claims of this invention shall fall within the scope of this invention.
Claims
1. A load-balancing adaptive scheduling method for accelerating real-time video analytics at the edge, characterized in that, include: For the current video frame captured by the camera, foreground motion region extraction is performed by background subtraction to generate a candidate RoI set, and the corresponding feature vectors used to characterize the scene complexity of the video frame are extracted. Based on the feature vector, the inference accuracy and service level target satisfaction rate of the whole frame inference mode and the RoI inference mode are predicted and evaluated by a pre-trained performance prediction model, and the inference mode with the best overall performance is adaptively selected. If the full-frame inference mode is selected, the current video frame will be offloaded as an inference task to the edge cluster; if the RoI inference mode is selected, the current set of candidate RoIs will be used as a sequence of RoI tasks to be scheduled. For the RoI task sequence to be scheduled, based on the model prediction control framework, the RoI tasks of a preset number of time slots in the future are predicted according to the historical RoI task sequence. Combined with the real-time resource status of each device, a rolling optimization mechanism is used to construct and solve the scheduling optimization problem, generate the allocation decision of the RoI tasks to be scheduled among the local camera, the collaborative camera and the edge cluster, and complete the task distribution. For all inference tasks offloaded to the edge cluster, based on the time-series prediction results of the load of each edge server in the cluster, with the optimization objective of minimizing the cluster load imbalance, a load balancing allocation decision is generated among the servers in the cluster to distribute the inference tasks to the corresponding servers for inference execution.
2. The load balancing adaptive scheduling method for accelerating real-time edge video analysis according to claim 1, characterized in that: The feature vector used to characterize the scene complexity of a video frame includes 5-dimensional RoI features, namely the number of candidate RoIs, the ratio of the total area covered by all foreground motion regions to the total area of the frame, the inverse of the average circularity of all dynamic contours, the average convexity of all foreground contours, and the ratio of the area of the largest single foreground contour to the total area of all foreground contours.
3. The load balancing adaptive scheduling method for accelerating real-time edge video analysis according to claim 1, characterized in that: When predicting and evaluating the overall performance of the two inference modes, the weighted sum of inference accuracy and service level target satisfaction rate is used as the overall performance evaluation index, and the inference mode with the higher weighted sum is selected as the inference mode with the best overall performance; the performance prediction model is a pre-trained lightweight stochastic gradient descent regressor, and the feature vector extraction and inference mode selection processes are both completed locally on the camera that acquires the video frames.
4. The load balancing adaptive scheduling method for accelerating real-time edge video analysis according to claim 1, characterized in that: The scheduling optimization problem based on the model predictive control framework negatively weights the task scheduling decision with the task waiting time as the weighting factor, and takes minimizing the total scheduling cost after weighting as the optimization objective, so that RoI tasks with longer waiting times get higher scheduling priority.
5. The load balancing adaptive scheduling method for accelerating real-time edge video analytics according to claim 4, characterized in that: The scheduling optimization problem includes two types of constraints: the first type is the uniqueness constraint: each RoI task can be assigned to at most one device for execution; the second type is the device computing power constraint: the total latency of a single device processing all assigned tasks in a single time slot does not exceed the time slot length.
6. The load balancing adaptive scheduling method for accelerating real-time edge video analytics according to claim 4, characterized in that: The solution and execution of the scheduling optimization problem adopts a rolling optimization and feedback correction mechanism. Specifically, the scheduling optimization problem is reconstructed based on the latest task queue and device load status in each time slot, and only the allocation decision of the current time slot is executed. The load status of the target device is updated for the scheduled tasks, and the waiting time of the unscheduled tasks is accumulated and added to the RoI task sequence to be scheduled in the next time slot.
7. The load balancing adaptive scheduling method for accelerating real-time edge video analysis according to claim 1, characterized in that: The temporal prediction of the edge server load is achieved through a Transformer temporal model based on a probabilistic sparse self-attention mechanism and a self-attention distillation structure. The temporal model completes feature extraction through one-dimensional convolution, activation function and max pooling, reducing the computational complexity of load prediction from the quadratic level of the input sequence length to the logarithmic level.
8. The load balancing adaptive scheduling method for accelerating real-time edge video analytics according to claim 1, characterized in that: When generating load balancing allocation decisions, the binary scheduling decision variables are first relaxed into continuous variables within a continuous interval. A convex optimization problem is constructed and solved to obtain the theoretically optimal allocation result. Then, the continuous variables are converted into binary scheduling decision variables through a random rounding algorithm to generate an executable allocation decision. The load imbalance degree of the edge cluster is defined as: max(the ratio of the maximum server load to the average load minus 1, 0).
9. A load-balancing adaptive scheduling system for accelerating real-time edge video analytics, characterized in that, The system includes multiple cameras and an edge server cluster, and is used to perform the method of any one of claims 1 to 8.
10. The load-balancing adaptive scheduling system for accelerating real-time edge video analysis according to claim 9, characterized in that, The camera is configured to perform foreground motion region extraction, feature vector extraction, and inference mode selection operations. The edge server cluster is configured to perform load state timing prediction and inference task load balancing scheduling operations. The system enables the inference task to meet preset inference latency constraints and cluster load balancing requirements through the collaborative work of the local camera, the cooperative camera, and the edge server cluster.