Methods for hot model updates and intelligent switching in a cloud-edge collaborative architecture
By detecting the model version list and generating update instructions through the cloud platform, and caching and loading model files locally on edge nodes, the running status is monitored in real time. This solves the problem of dynamic hot updates and intelligent switching of model versions under the cloud-edge collaborative architecture, realizes efficient updates and switching without service interruption, and improves the efficiency and reliability of model deployment.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ZHONGNAN TRANSPORT
- Filing Date
- 2026-03-10
- Publication Date
- 2026-06-30
AI Technical Summary
Under the cloud-edge collaborative architecture, existing technologies struggle to achieve dynamic hot updates and intelligent switching of model versions, resulting in service interruptions during model updates and an inability to adapt to real-time changing operating environments, thus reducing the efficiency of model deployment.
The cloud platform detects changes to the model version list, generates update instructions, and transmits them to the target edge nodes. The edge nodes cache the model update files locally and load them. At the same time, the running status of the new model version is monitored in real time, and the abnormality is judged based on the comparison results, so as to achieve model switching without interruption of service.
It achieves efficient dynamic updates and intelligent switching of model versions, overcomes the service interruption problem in existing technologies, improves model deployment efficiency and reliability, ensures the continuity and reliability of model deployment, adapts to the technical problems in existing technologies, and improves the model deployment efficiency and reliability in cloud-edge collaborative scenarios.
Smart Images

Figure CN121807347B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of model update and switching technology, and in particular to a method for hot model update and intelligent switching under a cloud-edge collaborative architecture. Background Technology
[0002] In the actual deployment of cloud-edge collaborative architecture, the model, as the core execution unit, needs to be continuously optimized and upgraded according to business needs and changes in data distribution. Therefore, updating the model version becomes a critical link in ensuring system performance. However, existing technologies struggle to achieve dynamic hot updates and intelligent switching of model versions in cloud-edge collaborative scenarios, exhibiting several significant shortcomings. Traditional model update methods employ a strongly bound model loading and service process model. When a model version update is required, the service process of the edge node must be stopped, the old model must be unloaded and the new model loaded after the existing requests are processed, and then the service process must be restarted to resume request reception. This entire process typically results in several minutes of service unavailability, leading to a large number of request failures in high-concurrency and high-real-time scenarios, severely impacting business continuity. Furthermore, existing technologies lack an effective intelligent model switching mechanism. They cannot adaptively adjust the switching timing and method based on the real-time operating status, resource load, and network conditions of the edge nodes. This makes it difficult to adapt to the heterogeneous edge nodes and real-time changes in the operating environment in cloud-edge collaborative scenarios, causing model deployment to be unable to adapt to the changing operating environment, significantly reducing the efficiency of model deployment and failing to meet the needs of various businesses in cloud-edge collaborative architectures for efficient model iteration and stable operation.
[0003] Chinese Patent Publication No. CN117111981A discloses a cloud-edge collaborative model update method and system. The method includes: receiving multiple initial client model parameters and multiple local sample features sent by multiple clients; grouping the multiple clients based on the similarity between the multiple local sample features; aggregating the initial client model parameters corresponding to clients in the same group to obtain multiple target client model parameters corresponding to the multiple clients; and sending the multiple target client model parameters to their corresponding clients, wherein the target client model parameters are configured to update their corresponding initial client model parameters. This solution only groups and aggregates client model parameters before sending them to complete parameter-level updates, making it difficult to achieve dynamic hot updates and intelligent switching of model versions in cloud-edge collaborative scenarios. This results in service interruptions during the model update process and an inability to adapt to real-time changing operating environments, reducing the efficiency of model deployment. Summary of the Invention
[0004] To address this issue, the present invention provides a method for hot updating and intelligent switching of models under a cloud-edge collaborative architecture, which overcomes the problem in the prior art that it is difficult to achieve dynamic hot updating and intelligent switching of model versions in cloud-edge collaborative scenarios, resulting in service interruption during the model update process and inability to adapt to real-time changing operating environments, thus reducing the efficiency of model deployment.
[0005] To achieve the above objectives, this invention provides a method for hot model updates and intelligent switching under a cloud-edge collaborative architecture, comprising the following steps:
[0006] S1. Perform change detection on the stored model version list through the cloud platform, obtain the change detection result, generate an update instruction containing the target updated model version identifier based on the change detection result, and transmit the update instruction to the target edge node;
[0007] S2. Based on the update instruction, retrieve the model update file corresponding to the target updated model version identifier from the model repository, and store the model update file in the local cache area of the target edge node;
[0008] S3. Load the model update file in the local cache area to obtain the model file to be switched.
[0009] S4. During the process of running the current model version at the target edge node, switch the current model version to the new model version corresponding to the model file in the state to be switched.
[0010] S5. Monitor the real-time running status of the new model version through the target edge node to obtain actual model running status data;
[0011] S6. Compare the actual model running status data with the benchmark model running status data, and determine whether the running status of the new model version is abnormal based on the comparison result. When the running status of the new model version is abnormal, switch the new model version back to the current model version.
[0012] Compared with the prior art, the beneficial effects of this application are as follows:
[0013] The cloud platform detects changes to the model version list and generates an update command containing the identifier of the target updated model version, which is then transmitted to the target edge node. Based on this update command, the target edge node retrieves the corresponding model update file from the model repository and stores it in its local cache area. Simultaneously, the model update file is loaded into the local cache area to obtain the model file in the state to be switched. This allows the current model version in the target edge node to run continuously without service interruption, effectively overcoming the problem of service interruption caused by the difficulty of achieving dynamic hot updates in existing technologies. Through pre-storage and loading processing in the local cache area, data transmission latency and resource consumption during the model update process are reduced, improving model deployment efficiency. This overcomes the shortcomings of existing technologies that reduce deployment efficiency due to unreasonable update processes, and realizes efficient dynamic updates of model versions in cloud-edge collaborative scenarios.
[0014] By switching to the new model version while the current model version is running on the target edge node, and simultaneously monitoring the real-time running status of the new model version to obtain actual model running status data, the actual model running status data is compared with the baseline model running status data to determine if there is an anomaly. If an anomaly is found, the system switches back to the current model version, thus achieving intelligent model version switching without interrupting business operations. At the same time, through running status monitoring and anomaly switchback mechanisms, the system ensures that the new model version is compatible with the real-time running environment of the edge node. When the new model version cannot adapt to the environment and causes anomalies, it can quickly recover to the stable current model version. This overcomes the problem that existing technologies cannot adapt to real-time changing running environments and reduce deployment reliability, significantly improving the efficiency and reliability of model deployment in cloud-edge collaborative scenarios.
[0015] Furthermore, in step S1, the stored model version list is subjected to change detection through the cloud platform to obtain the change detection results, specifically including the following steps:
[0016] Step A01: Obtain the current storage status of the model version list, wherein the current storage status includes the version identifier and storage timestamp of each model version in the model version list;
[0017] Step A02: Calculate the difference metric between the current storage state and the baseline version list in the cloud platform, where the mathematical expression for the difference metric D is: In the formula, n represents the total number of model versions. This represents the version identifier of the i-th model version in the baseline version list. This represents the version identifier of the i-th model version in the previous storage state. The matching function for version identifiers, when and When consistent, The output is zero, otherwise The output is a non-zero value. This represents the weight coefficients of the i-th model version. This represents the difference between the current storage timestamp and the baseline timestamp in the baseline version list. This represents the difference between the current storage timestamp and the baseline timestamp in the baseline version manifest. The weighting coefficients used for adjustment;
[0018] Step A03: Compare the difference measurement value D with the preset change threshold D0. Based on the comparison result, judge the change detection result. When D > D0, the change detection result is judged as a changed state. When D ≤ D0, the change detection result is judged as an unchanged state.
[0019] This solution combines the matching verification of each version identifier in the model version list with the storage timestamp difference, introduces a differentiated weight coefficient to construct a quantitative difference metric, accurately calculates the actual degree of change in the list, and determines the change status of the list based on a preset change threshold. This avoids the one-sidedness of single-dimensional judgment, effectively avoids the problem of false detection and missed detection of version list changes, improves the accuracy and objectivity of model version list change detection, and can efficiently and reliably identify the true change status of the list.
[0020] Furthermore, in step S1, an update instruction containing the target update model version identifier is generated based on the change detection results, specifically including the following steps:
[0021] Step B01: Parse the change detection results that are determined to be in a changed state to obtain the set of model version identifiers that have changed. Based on the model version identifier set and the model version dependency relationship recorded on the cloud platform, calculate the change priority score for each changed model version. The mathematical expression for the change priority score is: In the formula, This represents the change priority score for the j-th change model version. This represents the historical call frequency of the j-th modified model version. This represents the functional criticality coefficient of the j-th modified model version. This indicates the version change magnitude of the j-th modified model version. Indicates the historical call frequency of the j-th modified model version. Normalized weights, This represents the functional criticality coefficient of the j-th change model version. Normalized weights, This indicates the version change magnitude of the j-th change model version. Normalized weights, ;
[0022] Step B02: Sort each change model version according to the change priority score, select the change model version with the highest change priority score as the target update model version, and generate an update instruction containing the target update model version identifier.
[0023] In this solution, by parsing the model version identifier corresponding to the changed status, and combining the model version dependency relationship, the solution comprehensively considers the historical call frequency, functional criticality and version change magnitude of the model version to calculate the change priority score. The solution sorts the target update versions according to the score and generates update instructions, avoiding indiscriminate batch updates, accurately locking the core version to be updated, improving the rationality and targeting of update instruction generation, and ensuring the orderly and efficient update of model versions.
[0024] Furthermore, in step S2, the model update file corresponding to the target updated model version identifier is obtained from the model repository based on the update instruction, specifically including the following steps:
[0025] Step C01: Parse the update instruction to obtain the target update model version identifier and the network location information of the model repository, and initiate an index query request to the model repository according to the target update model version identifier, and obtain the model file fragment list and global hash value corresponding to the target update model version identifier according to the index query request;
[0026] Step C02: Request the model repository to download all data fragments according to the model file fragment list, and reassemble the downloaded data fragments in the local cache area according to the fragment sequence number to obtain the reassembled model file package;
[0027] Step C03: Calculate the actual hash value of the recombined model file package and compare the actual hash value with the global hash value. When the actual hash value matches the global hash value, output the model update file.
[0028] In this solution, the corresponding identifier and network location information are obtained by parsing the update command. The model file fragment list and global hash value are obtained by index query. The fragments are downloaded and reassembled in order. The integrity of the reassembled file is then verified by the global hash value to ensure that the obtained model update file is accurate and avoids data loss or error in the file transmission and reassembly process, thereby improving the reliability of obtaining the model update file.
[0029] Furthermore, S3 includes the following steps:
[0030] S31. Based on the storage partition information of the local cache area in the target edge node and the volume parameters of the model update file, perform partition occupancy detection on the local cache area to obtain the address range and space parameters of the cache free partition. Based on the address range and space parameters of the cache free partition, allocate storage for the model update file and write the model update file to the corresponding cache free partition to obtain the cache address of the model update file.
[0031] S32. Based on the cache address of the model update file and the preset model loading dependency list, retrieve and load the dependent components of the model update file, obtain the dependency component loading completion signal, and perform a loading readiness assessment on the loading process of the dependent components of the model update file to obtain the model loading readiness assessment result. The mathematical expression for the loading readiness assessment is: In the formula, G represents the model loading readiness. This indicates the number of dependent components that have been loaded. This represents the total number of dependent components required by the model. This indicates the amount of main model data that has been loaded. This represents the total amount of data in the main body of the model. Indicates the load weight of dependent components. Indicates the weights loaded into the main body of the model. ;
[0032] S33. Based on the completion signal of dependent component loading and the model loading readiness evaluation result, initialize the model update file to obtain the initialized model file. The initialization configuration includes model input and output interface parameters, model running resource quota and model log output path.
[0033] S34. Perform lightweight pre-run detection on the initialized model file based on the preset model pre-run detection rules, obtain the pre-run detection results, and mark the model file with the pre-run detection result of passing the pre-run detection as the state to be switched, thereby obtaining the model file in the state to be switched. The lightweight pre-run test includes detecting the interface connectivity status and basic running status of the model.
[0034] This solution combines cache partition information and file volume parameters to detect cache partition occupancy and allocate storage, comprehensively evaluates model loading readiness, and conducts lightweight pre-run tests after completing multi-dimensional model initialization configuration. This accurately verifies the model's running status, ensures the adaptability and loading integrity of model file cache storage, avoids loading anomalies, effectively improves the reliability of model update file loading and initialization, and ensures that the online model can run stably.
[0035] Furthermore, S4 includes the following steps:
[0036] S41. The health status of the running instance corresponding to the current model version is evaluated by the status monitoring component in the target edge node to obtain the health status of the running instance. When the health status of the running instance meets the preset health status threshold, a switching start signal is generated.
[0037] S42. Load and initialize the model file of the state to be switched according to the switching start signal to obtain the initialized new model version instance, and start the new model version instance in the memory resources of the target edge node based on the parallel deployment method;
[0038] S43. The real-time service request traffic of the current model version is mirrored and copied through the switching execution component in the target edge node to generate mirror traffic data, and the mirror traffic data is input into the new model version instance through the switching execution component.
[0039] S44. Obtain the first output result generated by the current model version when processing the mirrored traffic data, and obtain the second output result generated by the new model version instance when processing the mirrored traffic data. Compare the first output result with the second output result to obtain a consistency comparison result. When the consistency comparison result meets the preset consistency standard, transfer the scheduling weight of the external service request from the current model version to the new model version instance through the switching execution component, and stop the operation of the current model version and release the computing resources occupied by the current model version through the switching execution component.
[0040] In this solution, a switching process is triggered after a health assessment of the current model version instance. The new model version instance is deployed in parallel, and the first output result is compared with the second output result for consistency. When the consistency comparison result meets the preset consistency standard, the traffic scheduling weight transfer is completed and the old version resources are released, realizing a seamless model version switching, avoiding the risk of service interruption during switching, ensuring service continuity, and verifying the validity of the new version to improve the security and reliability of version switching.
[0041] Furthermore, S5 includes the following steps:
[0042] S51. Periodically sample the underlying system resource consumption data of the new model version during runtime using the performance data acquisition device configured on the target edge node to obtain system resource consumption time-series data, wherein the system resource consumption time-series data includes CPU utilization data, memory occupancy data, and disk input / output operation frequency data.
[0043] S52. Analyze the process of the new model version in processing business requests through the performance data collector to obtain business indicator data corresponding to the model inference performance. The business indicator data includes request processing throughput data, average processing latency data for a single request, and confidence distribution data of the model inference results.
[0044] S53. The system resource consumption time series data and business indicator data are time-aligned and correlated by the data fusion processor of the target edge node to obtain fused multi-dimensional time series monitoring data. The fused multi-dimensional time series monitoring data is serialized and encapsulated by the data fusion processor to generate the actual model running status data.
[0045] This solution involves periodically sampling the underlying system resource consumption data during the runtime of the new model version, and simultaneously analyzing the model's business request processing process to obtain corresponding business indicator data. Then, the system resource consumption time-series data and business indicator data are time-aligned, correlated, fused, and serialized for encapsulation. This allows for a comprehensive, accurate, and time-series integration of the model's resource status and inference performance-related data, fully reflecting the model's actual operating status. This effectively avoids the one-sidedness of single-dimensional data monitoring and ensures that the acquired model operating status data is complete and relevant.
[0046] Furthermore, S6 includes the following steps:
[0047] S61. Based on the actual model operation status data and the benchmark model operation status data, calculate the difference values of each corresponding monitoring indicator to obtain the indicator difference dataset. Then, calculate the degree of abnormality in the model operation based on the indicator difference dataset to obtain the abnormality quantification value. The mathematical expression of the abnormality quantification value E is as follows: In the formula, m represents the total number of monitoring indicators. This represents the actual operating value of the k-th monitoring indicator. This represents the baseline operating value of the k-th monitoring indicator. This represents the weight of the k-th monitoring indicator;
[0048] S62. Compare the abnormal quantization value E with the preset abnormal quantization threshold E0, determine the running status of the new model version based on the comparison result, and switch the new model version based on the determination result, wherein:
[0049] When E < E0, the running status of the new model version is determined to be normal, and the new model version is not switched.
[0050] When E≥E0, the running state of the new model version is determined to be abnormal. The new model version is switched, and a version rollback instruction is generated through the decision controller built into the target edge node. The version rollback instruction controls the target edge node to stop routing business requests to the new model version and switches the new model version back to the current model version.
[0051] In this solution, by combining the actual running status data of the model with the baseline running status data, the corresponding difference values of each monitoring indicator are calculated and combined with the indicator weights to quantify the degree of abnormality in the model operation. Based on the determination of the degree of abnormality, the model version is switched or rolled back in a targeted manner. This accurately determines the running status of the model, avoids the running risks of abnormal model versions in a timely manner, ensures the stability of business request routing, and effectively ensures the reliable operation of the model. Attached Figure Description
[0052] Figure 1 This is a flowchart illustrating the model hot update and intelligent switching method under the cloud-edge collaborative architecture in an embodiment of the present invention. Detailed Implementation
[0053] The following detailed description illustrates the specific implementation method:
[0054] like Figure 1 As shown, it is a flowchart illustrating the model hot update and intelligent switching method under the cloud-edge collaborative architecture of this invention, including:
[0055] S1. Perform change detection on the stored model version list through the cloud platform, obtain the change detection result, generate an update instruction containing the target updated model version identifier based on the change detection result, and transmit the update instruction to the target edge node;
[0056] S2. Based on the update instruction, retrieve the model update file corresponding to the target updated model version identifier from the model repository, and store the model update file in the local cache area of the target edge node;
[0057] S3. Load the model update file in the local cache area to obtain the model file to be switched.
[0058] S4. During the process of running the current model version at the target edge node, switch the current model version to the new model version corresponding to the model file in the state to be switched.
[0059] S5. Monitor the real-time running status of the new model version through the target edge node to obtain actual model running status data;
[0060] S6. Compare the actual model running status data with the benchmark model running status data, and determine whether the running status of the new model version is abnormal based on the comparison result. When the running status of the new model version is abnormal, switch the new model version back to the current model version.
[0061] Specifically, in step S1, the stored model version list is checked for changes through a cloud platform to obtain the change detection results. This includes the following steps:
[0062] Step A01: Obtain the current storage status of the model version list, wherein the current storage status includes the version identifier and storage timestamp of each model version in the model version list;
[0063] Step A02: Calculate the difference metric between the current storage state and the baseline version list in the cloud platform, where the mathematical expression for the difference metric D is: In the formula, n represents the total number of model versions. This represents the version identifier of the i-th model version in the baseline version list. This represents the version identifier of the i-th model version in the previous storage state. The matching function for version identifiers, when and When consistent, The output is zero, otherwise The output is a non-zero value. This represents the weight coefficients of the i-th model version. This represents the difference between the current storage timestamp and the baseline timestamp in the baseline version list. This represents the difference between the current storage timestamp and the baseline timestamp in the baseline version manifest. The weighting coefficients used for adjustment;
[0064] Step A03: Compare the difference measurement value D with the preset change threshold D0. Based on the comparison result, judge the change detection result. When D > D0, the change detection result is judged as a changed state. When D ≤ D0, the change detection result is judged as an unchanged state.
[0065] In this embodiment, the model version list refers to the structured record of model version information stored in the cloud platform. The model version list is a snapshot of the current storage state, used to describe the actual existence of each model version in the storage system. The baseline version list refers to a reference benchmark for the model version list, either pre-set or historically saved in the cloud platform. The baseline version list serves as a comparison point to detect whether the current storage state has deviated significantly. The preset change threshold is a pre-set critical value used to determine whether the difference metric has reached a significant change level. The value of the preset change threshold is usually determined based on business tolerance, version stability requirements, or historical data statistical analysis. For example, the preset change threshold D0 can be set to 10. When the calculated difference metric D exceeds the preset change threshold D0, the model version list is determined to be in a changed state; otherwise, it is determined to be in an unchanged state.
[0066] The specific implementation process for obtaining the current storage status of the model version list includes: First, through the storage management interface or database query function of the cloud platform, scan the files or metadata records of all model versions in the storage system, extract the version identifier of each scanned model version, which is contained in the naming convention, attribute tags or metadata fields of the model file. At the same time, obtain the last modification time or upload time of the model version from the logs or file attributes of the storage system, and record the last modification time or upload time of the model version as a storage timestamp. The cloud platform summarizes the version identifiers of all extracted model versions with the corresponding storage timestamps to form a model version list.
[0067] Specifically, in step S1, an update instruction containing a target update model version identifier is generated based on the change detection results, which includes the following steps:
[0068] Step B01: Parse the change detection results that are determined to be in a changed state to obtain the set of model version identifiers that have changed. Based on the model version identifier set and the model version dependency relationship recorded on the cloud platform, calculate the change priority score for each changed model version. The mathematical expression for the change priority score is: In the formula, This represents the change priority score for the j-th change model version. This represents the historical call frequency of the j-th modified model version. This represents the functional criticality coefficient of the j-th modified model version. This indicates the version change magnitude of the j-th modified model version. Indicates the historical call frequency of the j-th modified model version. Normalized weights, This represents the functional criticality coefficient of the j-th change model version. Normalized weights, This indicates the version change magnitude of the j-th change model version. Normalized weights, ;
[0069] Step B02: Sort each change model version according to the change priority score, select the change model version with the highest change priority score as the target update model version, and generate an update instruction containing the target update model version identifier.
[0070] In this embodiment, model version dependency refers to the calling or functional association rules between different model versions recorded by the cloud platform. For example, in an image processing pipeline, the output of model version "V3.1-Face Detection" is a required input for model version "V2.5-Emotion Recognition." The cloud platform's records will show a dependency indicating that "V2.5-Emotion Recognition" depends on "V3.1-Face Detection." When "V3.1-Face Detection" changes, this model version dependency will be used to assess the propagation range of the model change's impact. Historical call frequency. Normalized weights This refers to the coefficient used when standardizing historical call frequency values. It's used to balance the dimensions and importance of different indicators. For example, if the historical call frequency range is 0-1000 times, it can be obtained by dividing by the maximum frequency value to get a value of 0-1, and then multiplying by... (e.g., 0.4) is included in the scoring calculation. Functional criticality coefficient. Normalized weights This refers to the coefficient used when standardizing the functional criticality coefficient. The functional criticality coefficient is usually pre-set to a level of 1-5 based on the model's core importance in the business. For example, the functional criticality coefficient of a payment verification model is 5. (e.g., 0.3) is used to adjust the contribution of this coefficient to the score. Version change magnitude. Normalized weights This refers to the coefficient used when standardizing the version change magnitude. The version change magnitude can be quantified by comparing the number of lines of code or parameter changes between the old and new versions. For example, the change magnitude value can be 0-10. (e.g., 0.3) is used to control the degree of influence of this indicator in the scoring.
[0071] Complete the change priority scoring for all changed model versions belonging to the set of changed model version identifiers. After calculation, the cloud platform's decision-making and scheduling module executes the sorting and selection logic. This sorting and selection logic uses all changed model versions and their corresponding change priority scores as input datasets, and executes sorting algorithms (such as bubble sort and quicksort) based on the change priority score of each changed model version. The input dataset is reordered in descending order based on the numerical values of the changed model versions, resulting in an ordered sequence of changed model versions from highest to lowest score. After sorting, the decision scheduling module directly reads and obtains the complete information of the corresponding changed model version from the starting position (i.e., the first ranked position) of the ordered sequence. This changed model version is the target updated model version. In extreme cases, where two or more changed model versions have completely equal and tied for the highest change priority score, the decision scheduling module will further compare the original values of the historical call frequencies of these tied changed model versions and select the changed model version with the higher historical call frequency as the final target updated model version.
[0072] After determining the target update model version, the cloud platform's instruction encapsulation and release module first creates a structured instruction message object, which includes an instruction header and an instruction body. In the instruction header, the module populates metadata such as instruction type, instruction generation timestamp, and globally unique instruction sequence number. In the instruction body, the module uses the determined version identifier of the target update model version as the core instruction content, filling it into a predefined field dedicated to carrying the target identifier. The module also appends other auxiliary information to this instruction body, such as the original change detection result ID that triggered this instruction generation. Finally, the module serializes the instruction message object, containing the complete header and content, according to the standard data exchange format (such as JSON or XML) agreed upon between the cloud platform and downstream update execution nodes, to obtain the update instruction.
[0073] After the target edge node receives an update command from the cloud platform, its internal storage management subroutine initiates a local storage process for the model update files. The target edge node first parses the update command to extract the target updated model version identifier, and then constructs a download request with an authentication token pointing to the corresponding resource in the model repository based on this identifier. The target edge node sends the download request to the model repository via a network interface and begins receiving the model update file data stream returned by the repository. Simultaneously, the target edge node's file system management module creates a new storage directory and target file within a predefined local cache area, following a predefined directory structure and naming rules (e.g., a path format of "cache root directory / target updated model version identifier / model file"). The received model update file data stream is continuously written to this target file. To ensure file integrity and availability, the storage management subroutine calculates a digital digest (e.g., SHA-256 checksum) of the stored file after download completion and compares it with the expected digest obtained from the model repository metadata. Only after verification passes, confirming that the model update file has not been damaged or tampered with during transmission and storage, is the model update file considered successfully stored in the local cache area of the target edge node. Subsequently, the local cache index is updated to record the storage location and status of this model update file and its target updated model version identifier. Specifically, the criterion for successful verification is that the calculated file digital digest (such as the SHA-256 value) is completely consistent with the expected digest obtained from the model repository metadata.
[0074] Specifically, in step S2, retrieving the model update file corresponding to the target updated model version identifier from the model repository based on the update instruction includes the following steps:
[0075] Step C01: Parse the update instruction to obtain the target update model version identifier and the network location information of the model repository, and initiate an index query request to the model repository according to the target update model version identifier, and obtain the model file fragment list and global hash value corresponding to the target update model version identifier according to the index query request;
[0076] Step C02: Request the model repository to download all data fragments according to the model file fragment list, and reassemble the downloaded data fragments in the local cache area according to the fragment sequence number to obtain the reassembled model file package;
[0077] Step C03: Calculate the actual hash value of the recombined model file package and compare the actual hash value with the global hash value. When the actual hash value matches the global hash value, output the model update file.
[0078] In this embodiment, network location information refers to the access address and path information of the model repository in the network, usually existing in the form of a Uniform Resource Locator (URL), used to indicate the specific location and access protocol of the model repository. An index query request is a structured network request sent by the target edge node to the model repository to query and obtain metadata for a specific model version, based on the parsed target update model version identifier. The model file shard list is a metadata file returned by the model repository after responding to the index query request, containing a unique identifier for each data shard (such as a shard sequence number) and its corresponding storage location or download link. A global hash value is a unique, fixed-length cryptographic hash value (such as a SHA-256 value) calculated by the model repository for the complete model file content of a model version when it is released. The reassembled model file package is a temporary file formed by the target edge node sequentially splicing and combining the data shards in the order of the data shards (such as shard sequence numbers) within its local cache area after successfully downloading all data shards according to the model file shard list.
[0079] In step C01, the target edge node first receives an update instruction from the cloud platform. The instruction processing module of the target edge node parses this update instruction. The parsing process involves identifying the structure of the update instruction, obtaining the target updated model version identifier (e.g., "Model-v3.2.0") and a field storing the model repository address pointed to by the update instruction, i.e., network location information. Subsequently, the network communication module of the target edge node, based on the obtained network location information, initiates a structured network request to the model repository service pointed to by the network location information. This structured network request is an index query request. The index query request is typically constructed by appending the target updated model version identifier as a query parameter to the base address formed by the network location information, creating a complete query URL, which is then sent using methods such as HTTP GET. Upon receiving this index query request, the model repository searches its internal version index for the metadata corresponding to the target updated model version identifier and returns the search results—a list of model file shards containing detailed data shard information and the pre-calculated global hash value of the complete model file—as a response to the target edge node.
[0080] In step C02, after successfully receiving and parsing the response returned by the model repository, the target edge node obtains a list of model file shards. This list contains detailed information about all data shards that make up the target model version, including the download link or storage identifier for each data shard. The target edge node's download scheduler creates an independent download task for each data shard based on the entries in the model file shard list. These download tasks send HTTP requests concurrently or sequentially to the server address indicated by the network location information of the model repository, requesting the download of the corresponding data shard. After each data shard is downloaded, it is transmitted to the target edge node in the form of a data stream or blocks. An empty file is created in the local cache area of the target edge node's file system to store intermediate files, serving as a container for the reassembled model file package. Then, according to the shard sequence number specified in the model file shard list, the data content of each data shard downloaded from the model repository is written sequentially into this empty file. For example, the entire content of data shard with shard sequence number 1 is written first, followed immediately by the entire content of data shard with shard sequence number 2, and so on, until all data shards have been written. Once all data fragments are assembled in sequence, the complete file formed in the local cache area is the reassembled model file package.
[0081] In step C03, after the reconstructed model file package is fully formed in the local cache area, the integrity verification module of the target edge node uses the same cryptographic hash algorithm (e.g., SHA-256) used when generating the global hash value in the model repository to perform a full-text scan and calculation on the reconstructed model file package in the local cache area, thereby obtaining an actual hash value representing the actual content of the file. The integrity verification module performs a strict string comparison between the obtained actual hash value and the global hash value parsed from the model repository response. If the calculated actual hash value is completely consistent with the global hash value provided by the model repository, it proves that no data corruption or tampering has occurred in the reconstructed model file package during transmission and reconstruction. At this time, the verified reconstructed model file package is officially recognized as a usable model update file. The target edge node can then mark the usable model update file as verified and, as needed, rename or move the usable model update file to its final storage location in the local cache area, completing the process of storing the model update file in the target edge node's local cache area. If the actual hash value is inconsistent with the global hash value, it indicates that the file is erroneous, and the reconstructed model file package will be considered invalid and deleted.
[0082] Specifically, S3 includes the following steps:
[0083] S31. Based on the storage partition information of the local cache area in the target edge node and the volume parameters of the model update file, perform partition occupancy detection on the local cache area to obtain the address range and space parameters of the cache free partition. Based on the address range and space parameters of the cache free partition, allocate storage for the model update file and write the model update file to the corresponding cache free partition to obtain the cache address of the model update file.
[0084] S32. Based on the cache address of the model update file and the preset model loading dependency list, retrieve and load the dependent components of the model update file, obtain the dependency component loading completion signal, and perform a loading readiness assessment on the loading process of the dependent components of the model update file to obtain the model loading readiness assessment result. The mathematical expression for the loading readiness assessment is: In the formula, G represents the model loading readiness. This indicates the number of dependent components that have been loaded. This represents the total number of dependent components required by the model. This indicates the amount of main model data that has been loaded. This represents the total amount of data in the main body of the model. Indicates the load weight of dependent components. Indicates the weights loaded into the main body of the model. ;
[0085] S33. Based on the completion signal of dependent component loading and the model loading readiness evaluation result, initialize the model update file to obtain the initialized model file. The initialization configuration includes model input and output interface parameters, model running resource quota and model log output path.
[0086] S34. Perform lightweight pre-run detection on the initialized model file based on the preset model pre-run detection rules, obtain the pre-run detection results, and mark the model file with the pre-run detection result of passing the pre-run detection as the state to be switched, thereby obtaining the model file in the state to be switched. The lightweight pre-run test includes detecting the interface connectivity status and basic running status of the model.
[0087] In this embodiment, the storage partition information of the local cache area refers to the description of the pre-planned logical or physical storage unit division within the cache area on the target edge node used to store model update files. For example, a local cache area is divided into three storage partitions: partition A (address range 0x0000-0x3FFF, size 16KB, free status), partition B (address range 0x4000-0x7FFF, size 16KB, used status), and partition C (address range 0x8000-0xFFFF, size 32KB, free status). The volume parameter of the model update file refers to the quantitative indicator of the physical storage space occupied by the model update file, usually measured in bytes. For example, the volume parameter of a convolutional neural network model update file used for image recognition is 256 megabytes. The address range and space parameters of the free cache partition refer to the location and capacity information of the currently unoccupied storage partitions of the local cache area identified after partition occupancy detection. The cache address of the model update file refers to the specific physical or logical location of the model update file's data content within the storage medium after it has been successfully written to the local cache area of the target edge node. The preset model loading dependency list is a structured list defined before model loading processing. The preset model pre-run detection rules are a set of automated test criteria and judgment logic pre-set to verify the basic health of the initialized model files. Dependency component loading weight. and model body loading weights The values are system-preset configuration parameters, all of which are real numbers between 0 and 1, and strictly satisfy the following conditions: The specific values are determined by the system designer based on the model characteristics and business requirements: if the loading of dependent components is a critical bottleneck or is crucial to stability, then... The value is relatively high (e.g., 0.7). The correspondingly lower value is 0.3; however, if the loading time of the main model data is the primary factor affecting startup speed, then... The value is relatively high (e.g., 0.8). Low (0.2). Common equilibrium values are... It is 0.5. It is 0.5.
[0088] In step S31, the storage management module of the target edge node first reads the storage partition information of the local cache region recorded in the local configuration. This storage partition information includes the start address, end address, total capacity, and occupancy status of each storage partition. Simultaneously, the storage management module obtains the volume parameters of the model update file to be stored. Then, the storage management module iterates through all partitions marked "free" in the storage partition information of the local cache region, calculating the available capacity of each free partition. The storage management module compares the available capacity of each free partition with the volume parameters of the model update file, filtering out all free partitions whose available capacity is greater than or equal to the volume parameters of the model update file. The storage management module records the start and end addresses of each qualified free partition as an address range, and records the total capacity and available capacity of each qualified free partition as space parameters. The final output of the address range and space parameters of the cache free partitions is the set of address ranges and space parameters of all qualified free partitions. The allocator of the target edge node selects a specific cache free partition from the set of address ranges and space parameters of the cache free partitions. The allocator allocates a contiguous block of model storage space within the address range of the cache free partition for the model update file. The size of the model storage space is equal to the volume parameter of the model update file. The allocator updates the partition management table to mark the allocated model storage space as occupied. Subsequently, the file system driver creates a file descriptor at the starting address of the allocated model storage space. The write controller writes the complete data content of the model update file sequentially from the beginning to the end of the file to the storage location corresponding to the file descriptor through block write operations. After the write is complete, the absolute path or physical starting address corresponding to the file descriptor is recorded as the cache address of the model update file.
[0089] In step S32, the dependency loader of the target edge node loads and parses the preset model loading dependency list, which lists the names and lookup path patterns of the required dependent components. The dependency loader, combined with the directory path where the model update file's cache address is located, instantiates the lookup path template in the preset model loading dependency list. For each dependent component in the preset model loading dependency list, the dependency loader sequentially searches for a matching file in the generated path sequence. When a dependent component file is located, the dependency loader calls the system dynamic linker to load the file into the process address space. The dependency loader repeats this process until all dependent components in the preset model loading dependency list have been successfully loaded. When the last dependent component is successfully loaded, the dependency loader generates and broadcasts a dependency component loading completion signal. Specifically, the system dynamic linker is a function provided by the operating system used to load and link dynamic library files into memory at runtime, such as the dlopen function in Linux.
[0090] In step S33, the configurator begins operation when the signal indicating completion of dependent component loading is received and the model loading readiness assessment result shows that the model loading readiness has reached the preset model loading readiness threshold. The configurator reads the initialization configuration from the metadata associated with the cache address of the model update file, including specific model input / output interface parameters, explicit model runtime resource quotas, and specified model log output paths. The configurator calls the initialization interface of the model inference framework, passing the cache address of the model update file as the model source path and the content of the initialization configuration as configuration parameters. The model inference framework allocates resources, sets logs, and loads model parameters according to the configuration. Upon successful completion, it returns a model instance handle, which is recognized as the initialized model file. Specifically, the preset model loading readiness threshold is determined based on model complexity, hardware resources, performance requirements, and business scenarios, for example, 100%.
[0091] In step S34, the preset model pre-run detection rules are first retrieved to clarify the specific standards and procedures for lightweight pre-run detection, and the initialized model file is obtained. Then, based on the preset model pre-run detection rules, lightweight pre-run detection is performed on the initialized model file, focusing on detecting the model's interface connectivity and basic operating status. When detecting interface connectivity, it is necessary to verify whether the communication between the model's input / output interfaces and the external system is smooth. When detecting the basic operating status, it is necessary to monitor whether there are any abnormal errors after the model starts, and whether resource consumption exceeds a reasonable range. After the detection is completed, a pre-run detection result is generated. The pre-run detection result is judged. If the pre-run detection result is a pass, the corresponding initialized model file is marked as a pending switch state, and the model file in the pending switch state is finally obtained. If any item fails, it is judged as a failure state, the model switching process is stopped, an error log is recorded, and an alarm is issued.
[0092] Specifically, S4 includes the following steps:
[0093] S41. The health status of the running instance corresponding to the current model version is evaluated by the status monitoring component in the target edge node to obtain the health status of the running instance. When the health status of the running instance meets the preset health status threshold, a switching start signal is generated.
[0094] S42. Load and initialize the model file of the state to be switched according to the switching start signal to obtain the initialized new model version instance, and start the new model version instance in the memory resources of the target edge node based on the parallel deployment method;
[0095] S43. The real-time service request traffic of the current model version is mirrored and copied through the switching execution component in the target edge node to generate mirror traffic data, and the mirror traffic data is input into the new model version instance through the switching execution component.
[0096] S44. Obtain the first output result generated by the current model version when processing the mirrored traffic data, and obtain the second output result generated by the new model version instance when processing the mirrored traffic data. Compare the first output result with the second output result to obtain a consistency comparison result. When the consistency comparison result meets the preset consistency standard, transfer the scheduling weight of the external service request from the current model version to the new model version instance through the switching execution component, and stop the operation of the current model version and release the computing resources occupied by the current model version through the switching execution component.
[0097] In this embodiment, the preset health threshold value needs to be determined based on the hardware resource carrying capacity of the target edge node, the model operation requirements, and the service stability requirements. It is typically quantified on a percentage basis, ranging from 80 to 95 points. The value is set according to the core operating indicators of the running instance, including CPU utilization ≤70%, memory usage ≤65%, response latency ≤500ms, and service availability ≥99.9%. After weighting and calculating the total health score for each indicator, the preset health threshold is set to 85 points. That is, when the health score of the running instance is ≥85 points, it is determined that the preset health threshold is met. The initialized new model version instance refers to the model running entity with complete service capabilities formed in the target edge node after loading and initializing the model file in the state to be switched according to the switching start signal. Parallel deployment refers to a deployment method in which the current model version instance and the new model version instance are running simultaneously in the memory resources of the target edge node. The current model version instance and the new model version instance independently occupy a portion of memory resources, and their service processes do not interfere with each other. For example, the target edge node uses process isolation technology to allocate an independent process 1 to the current model version instance, occupying 2GB of memory resources; and an independent process 2 to the new model version instance, occupying 2.5GB of memory resources. The two processes share the edge node's CPU and network resources but are allocated resource priorities according to preset rules. The current model version instance prioritizes processing external service requests, while the new model version instance only processes mirrored traffic data. Real-time service request traffic refers to the continuous stream of service call request data from external clients received by the current model version instance running on the target edge node, including key information such as request timestamps, request parameters, data format identifiers, and client identifiers. Preset consistency standards refer to the quantitative criteria used to determine whether the first and second output results meet the switching requirements when the current and new model version instances process the same traffic data. For example, format consistency requires that the number of fields, field names, and data types of the output data be completely identical; the difference between the numerical results output by the new model version instance and the numerical results output by the current model version instance should not exceed 3%. The scheduling weight of external service requests refers to a quantitative parameter used in the target edge node to allocate the proportion of external service request traffic processed between the current model version instance and the new model version instance. The value ranges from 0-100%, with a total weight of 100%. Initially, the scheduling weight of the current model version instance is 100%, and the new model version instance is 0%, meaning all external service requests are processed by the current model version instance. During the switchover, the scheduling weight is transferred from the current model version instance to the new model version instance. After the transfer, the weight of the new model version instance is 100%, and the weight of the current model version instance is 0%.
[0098] In step S41, the status monitoring component in the target edge node collects various performance metrics data of the running instance corresponding to the current model version in real time according to a preset collection cycle (e.g., 100ms / time), including key indicators such as CPU utilization, memory usage, request processing success rate, and average response latency. The status monitoring component performs comprehensive analysis on these key indicators according to preset evaluation rules and calculates a quantified running instance health score. The status monitoring component compares the calculated running instance health score with the preset health threshold set in the system configuration in real time. When the running instance health score reaches or exceeds the preset health threshold for 5 consecutive evaluation cycles, the status monitoring component determines that the current running environment meets the switching conditions, generates a switching initiation signal, and transmits the switching initiation signal to the subsequent switching management module. If the preset health threshold is not met, the running status of the current model version is continuously monitored until the preset health threshold is met. Specifically, the preset evaluation rule is a weighted comprehensive scoring method: first, standardized and quantitative values are assigned to indicators such as CPU utilization, memory usage, request success rate, and average response latency; then, weight coefficients for each indicator are configured according to business importance; and finally, a normalized health score for the running instance is obtained through weighted calculation.
[0099] In step S42, after receiving the switchover start signal from the status monitoring component, the switchover management module immediately sends an instruction to the model management service. The model management service locates the model file in the state to be switched, stored in the local cache area, according to the instruction. The model loader executes the complete loading process for the model file in the state to be switched, including reading the model file content, loading necessary dependency libraries, initializing model parameters, and configuring the runtime environment. After loading is complete, the model loader creates an instance of the newly initialized model version in memory. Based on the parallel deployment strategy, the resource manager allocates an independent runtime space for the newly initialized model version instance in the memory resources of the target edge node. This includes starting a new service container, binding a dedicated port, and ensuring that this instance is completely isolated from the current model version in terms of resource usage, thereby achieving secure parallel startup of the newly initialized model version instance.
[0100] In step S43, after confirming the successful startup of the newly initialized model version instance, the switching execution component in the target edge node begins executing the traffic mirroring operation. The switching execution component deploys traffic mirroring at the request entry point of the current model version, replicating every real-time service request traffic flowing to the current model version in real time to generate identical mirrored traffic data. The switching execution component establishes a dedicated data channel to synchronously transmit the mirrored traffic data to the input interface of the newly initialized model version instance, ensuring that the input data received by the newly initialized model version instance is completely consistent with the original request data processed by the current model version in both content and timing.
[0101] In step S44, the switching execution component simultaneously monitors the processing of both the current model version and the initialized new model version instance. It collects the first output result generated by the current model version processing the original request, and the second output result generated by the initialized new model version instance processing the mirrored traffic data. The comparison engine built into the switching execution component performs a line-by-line comparison analysis of the first and second output results according to the rules defined in the preset consistency standard, generating a consistency comparison result. When a sufficient number of comparison samples are collected and the consistency comparison result shows that it meets the preset consistency standard requirements, the switching execution component sends an instruction to the traffic scheduler to gradually adjust the scheduling weight of external service requests, progressively transferring the responsibility for handling external service requests from the current model version to the initialized new model version instance. After the scheduling weight of external service requests has been completely transferred and the initialized new model version instance is running stably, the switching execution component sends a termination instruction to the current model version, stopping the current model version's operation and notifying the resource manager to reclaim the computing resources occupied by the current model version.
[0102] Specifically, S5 includes the following steps:
[0103] S51. Periodically sample the underlying system resource consumption data of the new model version during runtime using the performance data acquisition device configured on the target edge node to obtain system resource consumption time-series data, wherein the system resource consumption time-series data includes CPU utilization data, memory occupancy data, and disk input / output operation frequency data.
[0104] S52. Analyze the process of the new model version in processing business requests through the performance data collector to obtain business indicator data corresponding to the model inference performance. The business indicator data includes request processing throughput data, average processing latency data for a single request, and confidence distribution data of the model inference results.
[0105] S53. The system resource consumption time series data and business indicator data are time-aligned and correlated by the data fusion processor of the target edge node to obtain fused multi-dimensional time series monitoring data. The fused multi-dimensional time series monitoring data is serialized and encapsulated by the data fusion processor to generate the actual model running status data.
[0106] In this embodiment, the performance data collector refers to a dedicated software component or service agent deployed on the target edge node. Underlying system resource consumption data refers to quantitative information reflecting the usage of the target edge node's underlying hardware resources during the new model version's runtime, including core system metrics such as CPU utilization data, memory usage data, and disk I / O operation frequency data. The data fusion processor refers to the module in the target edge node responsible for integrating and processing heterogeneous data.
[0107] In step S51, the performance data collector configured on the target edge node starts a data collection task according to a preset sampling period (e.g., once per second). The performance data collector reads the CPU utilization data of the process or container corresponding to the new model version by calling the operating system's performance monitoring interface. This CPU utilization data is typically expressed as a percentage of CPU core utilization. Simultaneously, the performance data collector obtains the resident memory size of the new model version process through the system memory management interface and converts it into memory occupancy data. The performance data collector also collects disk I / O operation frequency data related to the read / write operations of the new model version through the disk I / O statistics interface, recording the number of read / write operations per unit time. The performance data collector adds precise timestamps to the CPU utilization data, memory occupancy data, and disk I / O operation frequency data obtained from each sample, forming a time-series data sequence of system resource consumption, and caches this system resource consumption time-series data in a local buffer.
[0108] In step S52, during the processing of actual business requests in the new model version, the same performance data collector collects business indicator data through business probes deployed at the model service interface layer. The performance data collector counts the number of requests successfully processed by the new model version per unit time and calculates the request processing throughput data. The performance data collector records the time elapsed from receiving a response to completing each request and calculates the average processing latency per request. The performance data collector also extracts confidence scores from the inference output of the new model version, generating confidence distribution data for the model inference results. This confidence distribution data includes information such as the mean, standard deviation, or the proportion of samples in different confidence intervals. The performance data collector adds a timestamp to the business indicator data, synchronized with the system resource consumption time-series data, forming a business indicator data sequence.
[0109] In step S53, the data fusion processor at the target edge node periodically acquires system resource consumption time-series data and business indicator data from the performance data collector's buffer. The data fusion processor first performs time alignment on the system resource consumption time-series data and business indicator data based on timestamps, matching data from different sources within the same time window. Next, the data fusion processor performs data association on the aligned data, establishing correspondences between CPU utilization data, memory usage data, disk I / O operation frequency data, request processing throughput data, average processing latency per request data, and confidence distribution data of model inference results. The data fusion processor reassembles all the associated indicator data according to a unified structural format, forming fused multi-dimensional time-series monitoring data. Each time point in this fused multi-dimensional time-series monitoring data contains complete system and business dimension indicators. Finally, the data fusion processor serializes and encapsulates the fused multi-dimensional time-series monitoring data, typically using standard data exchange formats such as JSON Lines or Protocol Buffers, adding necessary header information to generate a complete actual model runtime status data stream.
[0110] Specifically, S6 includes the following steps:
[0111] S61. Based on the actual model operation status data and the benchmark model operation status data, calculate the difference values of each corresponding monitoring indicator to obtain the indicator difference dataset. Then, calculate the degree of abnormality in the model operation based on the indicator difference dataset to obtain the abnormality quantification value. The mathematical expression of the abnormality quantification value E is as follows: In the formula, m represents the total number of monitoring indicators. This represents the actual operating value of the k-th monitoring indicator. This represents the baseline operating value of the k-th monitoring indicator. This represents the weight of the k-th monitoring indicator;
[0112] S62. Compare the abnormal quantization value E with the preset abnormal quantization threshold E0, determine the running status of the new model version based on the comparison result, and switch the new model version based on the determination result, wherein:
[0113] When E < E0, the running status of the new model version is determined to be normal, and the new model version is not switched.
[0114] When E≥E0, the running state of the new model version is determined to be abnormal. The new model version is switched, and a version rollback instruction is generated through the decision controller built into the target edge node. The version rollback instruction controls the target edge node to stop routing business requests to the new model version and switches the new model version back to the current model version.
[0115] In this embodiment, the baseline model operating status data refers to a standard reference dataset collected through testing or monitoring before the model version is officially deployed or during historical stable operation. This dataset represents the various performance indicators that the model should achieve under expected normal operating conditions. The baseline model operating status data includes the typical numerical range or average value of all monitored indicators (such as CPU utilization, memory usage, request processing throughput, etc.) under normal operating conditions. The preset anomaly quantification threshold is usually determined based on a comprehensive consideration of historical data analysis, business tolerance, and system stability requirements. For example, the preset anomaly quantification threshold can be set to "5.0", indicating that when the calculated anomaly quantification value E reaches or exceeds 5.0, the operating status of the new model version is determined to be an abnormal state. The decision controller is defined as the core logic module in the target edge node responsible for automatically making operation and maintenance decisions and triggering corresponding control commands based on the calculation results of rules or algorithms.
[0116] In step S61, firstly, the actual model operating status data and the baseline model operating status data are loaded. Then, each monitoring indicator is traversed, and the specific value of the k-th monitoring indicator in the current time period is read from the actual model operating status data, denoted as . Simultaneously, the standard reference value corresponding to this monitoring indicator is read from the baseline model's operational status data and denoted as... .calculate and The difference, and calculate it. and The square of the difference. Then, the weight coefficient pre-configured for the k-th monitoring indicator is obtained. The squared difference and the weighting coefficients Multiply the values to obtain the weighted difference contribution value of the k-th monitoring indicator. Repeat this process for all monitoring indicators, summing all the calculated weighted difference contribution values. The final sum is the anomaly quantification value E. Sum the differences, squared differences, and weighted results of all monitoring indicators generated in the calculation process to form an indicator difference dataset, and output the calculated anomaly quantification value E.
[0117] In step S62, the anomaly quantification value E from the state analysis module is obtained, along with the preset anomaly quantification threshold (denoted as E0) in the system configuration. The anomaly quantification value E is compared with the preset anomaly quantification threshold E0. If the comparison result shows that the anomaly quantification value E is less than the preset anomaly quantification threshold E0, the new model version is determined to be in a normal operating state. In this case, no switching instruction is generated, the system maintains its current operating state, and the new model version continues to process business requests. If the comparison result shows that the anomaly quantification value E is greater than or equal to the preset anomaly quantification threshold E0, the new model version is determined to be in an abnormal operating state. At this time, the anomaly handling process is triggered, and the decision controller built into the target edge node is activated. The decision controller generates a clear version rollback instruction according to the preset anomaly handling strategy. This version rollback instruction is issued through the control plane, notifying the traffic scheduling component to stop routing new business requests to the new model version, then instructing the model lifecycle management component to stop the running process of the new model version and release its resources, and finally reloading and starting the previously backed-up current model version, completing the entire process of switching back from the new model version to the current model version. Furthermore, the preset anomaly handling strategy refers to a set of preset standardized handling rules and business assurance logic based on the decision controller built into the edge node when it determines that the new model version is running abnormally. The preset anomaly handling strategy includes: an immediate full rollback strategy, which executes a complete rollback process immediately after an anomaly occurs; a tiered handling strategy, which classifies anomalies into levels according to the magnitude of the anomaly quantification value and matches the corresponding handling intensity; a rate limiting and degradation rollback strategy, which first reduces the traffic of the new model and then gradually shuts down the rollback; a backup-first recovery strategy, which prioritizes loading the old version of the stable backup to quickly restore the business; and a fault-tolerant retry strategy, which first attempts to restart the new model, and only starts the rollback after the retry fails, reducing unnecessary version switching.
[0118] The above are merely embodiments of the present invention. Commonly known structures and characteristics are not described in detail here. Those skilled in the art are aware of all common technical knowledge in the field prior to the application date or priority date, are aware of all existing technologies in that field, and have the ability to apply conventional experimental methods prior to that date. Those skilled in the art can, under the guidance of this application, improve and implement this solution in combination with their own capabilities. Some typical known structures or methods should not be obstacles for those skilled in the art to implement this application. It should be noted that those skilled in the art can make several modifications and improvements without departing from the structure of the present invention. These should also be considered within the scope of protection of the present invention, and will not affect the effectiveness of the implementation of the present invention or the practicality of the patent. The scope of protection claimed in this application should be determined by the content of its claims, and the specific embodiments described in the specification can be used to interpret the content of the claims.
Claims
1. A method for hot model updates and intelligent switching under a cloud-edge collaborative architecture, characterized by: Includes the following steps: S1. Perform change detection on the stored model version list through the cloud platform, obtain the change detection result, generate an update instruction containing the target updated model version identifier based on the change detection result, and transmit the update instruction to the target edge node; S2. Based on the update instruction, retrieve the model update file corresponding to the target updated model version identifier from the model repository, and store the model update file in the local cache area of the target edge node; S3. Load the model update file in the local cache area to obtain the model file to be switched states; S3 includes the following steps: S31. Based on the storage partition information of the local cache area in the target edge node and the volume parameters of the model update file, perform partition occupancy detection on the local cache area to obtain the address range and space parameters of the cache free partition. Based on the address range and space parameters of the cache free partition, allocate storage for the model update file and write the model update file to the corresponding cache free partition to obtain the cache address of the model update file. S32. Based on the cache address of the model update file and the preset model loading dependency list, retrieve and load the dependent components of the model update file, obtain the dependency component loading completion signal, and perform a loading readiness assessment on the loading process of the dependent components of the model update file to obtain the model loading readiness assessment result. The mathematical expression for the loading readiness assessment is: In the formula, G represents the model loading readiness. This indicates the number of dependent components that have been loaded. This represents the total number of dependent components required by the model. This indicates the amount of main model data that has been loaded. This represents the total amount of data in the main body of the model. Indicates the load weight of dependent components. Indicates the weights loaded into the main body of the model. ; S33. Based on the completion signal of dependent component loading and the model loading readiness evaluation result, initialize the model update file to obtain the initialized model file. The initialization configuration includes model input and output interface parameters, model running resource quota and model log output path. S34. Perform lightweight pre-run detection on the initialized model file based on the preset model pre-run detection rules, obtain the pre-run detection results, and mark the model file with the pre-run detection result of passing the pre-run detection as the state to be switched, thereby obtaining the model file in the state to be switched. The lightweight pre-run test includes detecting the interface connectivity status and basic running status of the model. S4. During the process of running the current model version at the target edge node, switch the current model version to the new model version corresponding to the model file in the state to be switched. S5. Monitor the real-time running status of the new model version through the target edge node to obtain actual model running status data; S5 includes the following steps: S51. Periodically sample the underlying system resource consumption data of the new model version during runtime using the performance data acquisition device configured on the target edge node to obtain system resource consumption time-series data, wherein the system resource consumption time-series data includes CPU utilization data, memory occupancy data, and disk input / output operation frequency data. S52. Analyze the process of the new model version in processing business requests through the performance data collector to obtain business indicator data corresponding to the model inference performance. The business indicator data includes request processing throughput data, average processing latency data for a single request, and confidence distribution data of the model inference results. S53. The system resource consumption time series data and business indicator data are time-aligned and data-correlated by the data fusion processor of the target edge node to obtain fused multi-dimensional time series monitoring data. The fused multi-dimensional time series monitoring data is serialized and encapsulated by the data fusion processor to generate the actual model running status data. S6. Compare the actual model running status data with the benchmark model running status data, and determine whether the running status of the new model version is abnormal based on the comparison result. When the running status of the new model version is abnormal, switch the new model version back to the current model version.
2. The method for model hot updating and intelligent switching under the cloud-edge collaborative architecture according to claim 1, characterized in that: In step S1, the stored model version list is checked for changes through a cloud platform to obtain the change detection results. This process includes the following steps: Step A01: Obtain the current storage status of the model version list, wherein the current storage status includes the version identifier and storage timestamp of each model version in the model version list; Step A02: Calculate the difference metric D between the current storage state and the baseline version list in the cloud platform; Step A03: Compare the difference measurement value D with the preset change threshold D0. Based on the comparison result, judge the change detection result. When D > D0, the change detection result is judged as a changed state. When D ≤ D0, the change detection result is judged as an unchanged state.
3. The method for model hot updating and intelligent switching under the cloud-edge collaborative architecture according to claim 2, characterized in that: In step S1, an update instruction containing the target update model version identifier is generated based on the change detection results, specifically including the following steps: Step B01: Analyze the change detection results that are determined to be in a changed state to obtain the set of model version identifiers that have changed, and calculate the change priority score for each changed model version based on the model version identifier set and the model version dependency relationship recorded on the cloud platform. Step B02: Sort each change model version according to the change priority score, select the change model version with the highest change priority score as the target update model version, and generate an update instruction containing the target update model version identifier.
4. The method for model hot updating and intelligent switching under the cloud-edge collaborative architecture according to claim 1, characterized in that: In step S2, the model update file corresponding to the target updated model version identifier is obtained from the model repository based on the update instruction, specifically including the following steps: Step C01: Parse the update instruction to obtain the target update model version identifier and the network location information of the model repository, and initiate an index query request to the model repository according to the target update model version identifier, and obtain the model file fragment list and global hash value corresponding to the target update model version identifier according to the index query request; Step C02: Request the model repository to download all data fragments according to the model file fragment list, and reassemble the downloaded data fragments in the local cache area according to the fragment sequence number to obtain the reassembled model file package; Step C03: Calculate the actual hash value of the recombined model file package and compare the actual hash value with the global hash value. When the actual hash value matches the global hash value, output the model update file.
5. The method for hot model updating and intelligent switching under the cloud-edge collaborative architecture according to claim 1, characterized in that: S4 includes the following steps: S41. The health status of the running instance corresponding to the current model version is evaluated by the status monitoring component in the target edge node to obtain the health status of the running instance. When the health status of the running instance meets the preset health status threshold, a switching start signal is generated. S42. Load and initialize the model file of the state to be switched according to the switching start signal to obtain the initialized new model version instance, and start the new model version instance in the memory resources of the target edge node based on the parallel deployment method; S43. The real-time service request traffic of the current model version is mirrored and copied through the switching execution component in the target edge node to generate mirror traffic data, and the mirror traffic data is input into the new model version instance through the switching execution component. S44. Obtain the first output result generated by the current model version when processing the mirrored traffic data, and obtain the second output result generated by the new model version instance when processing the mirrored traffic data. Compare the first output result with the second output result to obtain a consistency comparison result. When the consistency comparison result meets the preset consistency standard, transfer the scheduling weight of the external service request from the current model version to the new model version instance through the switching execution component, and stop the operation of the current model version and release the computing resources occupied by the current model version through the switching execution component.
6. The method for model hot updating and intelligent switching under the cloud-edge collaborative architecture according to claim 1, characterized in that: S6 includes the following steps: S61. Based on the actual model operation status data and the benchmark model operation status data, calculate the difference values of each corresponding monitoring indicator to obtain the indicator difference dataset, and calculate the degree of abnormality of the model operation based on the indicator difference dataset to obtain the abnormality quantification value E. S62. Compare the abnormal quantization value E with the preset abnormal quantization threshold E0, determine the running status of the new model version based on the comparison result, and switch the new model version based on the determination result, wherein: When E < E0, the running status of the new model version is determined to be normal, and the new model version is not switched. When E≥E0, the running state of the new model version is determined to be abnormal. The new model version is switched, and a version rollback instruction is generated through the decision controller built into the target edge node. The version rollback instruction controls the target edge node to stop routing business requests to the new model version and switches the new model version back to the current model version.