A model full life cycle management method and system based on an Al operation platform

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By performing neuron-level plasticity analysis and swarm intelligence optimization on an AI operation platform, plasticity genes and optimization strategies are generated, and a strategy knowledge base is constructed. This solves the problem of poor adaptability caused by isolated model operation and enables the autonomous and continuous evolution and dynamic adaptive improvement of the model.

CN122242620APending Publication Date: 2026-06-19SHANGHAI XIEZHI INFORMATION TECH CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: SHANGHAI XIEZHI INFORMATION TECH CO LTD
Filing Date: 2026-03-20
Publication Date: 2026-06-19

Application Information

Patent Timeline

20 Mar 2026

Application

19 Jun 2026

Publication

CN122242620A

IPC: G06N3/086; G06N3/098; G06N3/082; G06N3/09; G06N3/042; G06N3/10; G06N3/006; G06N5/022; G06F8/656; G06F11/07; G06F11/30; G06N3/0464

AI Tagging

Application Domain

Fault response Software engineering

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Fault handling method and electronic device
CN122220137AFault response Biological models
A large model association fault positioning method and system for heterogeneous training scenarios
CN122220133AFault response Inference methods
Fan control method, fan control device, electronic device, and storage medium
CN122258055AFault response Digital data processing details Control engineering Electric devices
An asset scrappage decision method, apparatus and readable storage medium
CN122198949AImage analysis Fault response
An abnormal timing processing system and method
CN122240382AFault responseElectric unknown time interval measurementDigital data Electronic data processing

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure CN122242620A_ABST

Patent Text Reader

Abstract

This invention provides a model lifecycle management method and system based on an AI operation platform. The method includes: acquiring a target model and initializing a model population in the cloud; establishing or accessing a global experience knowledge base; acquiring plasticity genes based on the target model and using them as constraints; generating an optimization strategy set and constructing a strategy knowledge base using a swarm intelligence optimization algorithm; acquiring a new generation model and updating the global experience knowledge base through a co-evolution mechanism based on the model population and the global experience knowledge base; using the new generation model as a new target model and updating the strategy knowledge base; loading the target model, its corresponding plasticity genes, and the strategy knowledge base on the device; acquiring environmental feature vectors in real time and combining them with the strategy knowledge base to obtain dynamic adjustment strategies; then performing hot switching on the target model in conjunction with the plasticity genes and obtaining diagnostic root causes. This invention achieves model experience accumulation, improves the model's long-term adaptability, and ultimately enables the model's autonomous and continuous evolution.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of artificial intelligence operation and maintenance, specifically, to a method and system for full lifecycle management of models based on an AI operation platform. Background Technology

[0002] Current mainstream AI operation and maintenance platforms treat trained models as static artifacts for deployment and monitoring. This paradigm has a fundamental flaw: it cannot enable the model group to accumulate and pass on experience, thereby achieving autonomous and collaborative evolution.

[0003] Specifically, on current AI operation and maintenance platforms, models are often managed as isolated individuals, and their capabilities are fixed from the moment they are deployed. For example, retraining or version rollback triggered by monitoring causes the adaptive knowledge and optimization strategies gained by the old model during operation to be lost and cannot be transferred to the new model, resulting in the inability to accumulate group experience.

[0004] Therefore, the models managed on existing AI-powered operations and maintenance platforms do not have the ability to evolve. The overall intelligence level of the models cannot be improved in a step-by-step manner through continuous operation. When faced with new challenges, AI-powered operations and maintenance platforms can only respond passively. The entire life cycle is linear and open-loop, and long-term adaptability depends on human intervention. They cannot evolve autonomously and continuously. Summary of the Invention

[0005] Therefore, the technical problem to be solved by the present invention is to provide a model lifecycle management method and system based on an AI operation platform, so as to solve the technical problem in the prior art that the model operates as an isolated individual and its group experience cannot be accumulated, which leads to poor long-term adaptability of the model and the inability to achieve autonomous and continuous evolution.

[0006] This invention provides a model lifecycle management method based on an AI operation platform, applied in the cloud, including: A1: Obtain the target model and initialize the model population, and establish or connect to the global experience knowledge base; A2: Perform neuron-level plasticity analysis on the target model and generate corresponding plasticity genes; A3: Using plasticity genes as constraints, and employing swarm intelligence optimization algorithms, a set of optimization strategies for the target model under various environmental constraints is generated. A4: Construct a strategy knowledge base based on the set of optimization strategies; A5: Based on the model population and the global experience knowledge base, the model population is driven to iteratively optimize through a co-evolutionary mechanism to obtain a new generation of models and update the global experience knowledge base; A6: Use the next-generation model as the new target model and repeat steps A2-A4 to update the policy knowledge base.

[0007] This invention also provides a model lifecycle management method based on an AI operation platform, applied to the device side, including: B1: Load the target model and the corresponding plasticity gene and strategy knowledge base; B2: Real-time acquisition of environmental feature vectors, combined with a strategy knowledge base to obtain dynamically adjusted strategies; B3: Based on dynamic adjustment strategies and plasticity genes, hot switching is performed on the running target model; B4: Perform predictive health monitoring on the target model after hot switching to obtain diagnostic root causes.

[0008] In another aspect, this invention provides a model lifecycle management system based on an AI operation platform, comprising: The cloud processing module is used to acquire the target model and initialize the model population, establish or access the global experience knowledge base, perform neuron-level plasticity analysis on the target model to generate plasticity genes, use the plasticity genes as constraints to generate an optimization strategy set using a swarm intelligence optimization algorithm, and build a strategy knowledge base based on this. Based on the model population and the global experience knowledge base, the model population is driven to iteratively optimize through a co-evolution mechanism to obtain a new generation of models and update the global experience knowledge base, and the new generation of models is used as new target models to restart the plasticity analysis, strategy generation and knowledge base construction process. The device-side processing module, which is connected to the cloud-side processing module, is used to load the target model, the corresponding plasticity gene and strategy knowledge base, acquire environmental feature vectors in real time, and obtain dynamic adjustment strategies through consensus fusion in combination with the strategy knowledge base. Based on the dynamic adjustment strategies and plasticity genes, the target model running online is hot-switched, and the model after hot-switching is subjected to predictive health monitoring and diagnostic root causes are obtained. A data feedback update module is used to feed back the diagnostic root cause data obtained by the device-side processing module to the cloud processing module. The cloud processing module updates the global experience knowledge base based on the feedback data to drive the next round of collaborative evolution mechanism.

[0009] Based on the above, this invention generates plasticity genes by performing neuron-level plasticity analysis on the target model, providing a foundation for the model's structured adaptation. Then, using these plasticity genes as constraints, a swarm intelligence optimization algorithm is employed to generate a set of optimization strategies and construct a strategy knowledge base. Furthermore, by initializing the model population and using iterative optimization driven by a co-evolutionary mechanism based on the model population and the global experience knowledge base, the invention obtains a new generation of models. This co-evolutionary mechanism specifically includes evaluating fitness, selecting elite individuals, performing cross-recombination to obtain offspring models, and fine-tuning the offspring models based on the global experience knowledge base. The model is modeled and updated globally to form an evolutionary closed loop. Simultaneously, environmental feature vectors are acquired in real time on the device and combined with the strategy knowledge base to obtain dynamic adjustment strategies. Based on the dynamic adjustment strategies and plasticity genes, hot switching is performed on the target model, and predictive health monitoring is conducted to obtain diagnostic root causes, which are then fed back to the cloud. The above method effectively overcomes the fundamental defects of existing technologies, such as isolated model operation, inability to accumulate and inherit collective intelligence, and lack of autonomous evolution capability of the system as a whole. It realizes continuous autonomous optimization from model population initialization and co-evolution to the generation of new generation models, and improves the long-term adaptability of the model in dynamic environments. Attached Figure Description

[0010] Figure 1 A cloud-based execution flowchart of a model lifecycle management method based on an AI operation platform, as presented in this invention, is shown. Figure 2 The device-side execution flowchart of a model lifecycle management method based on an AI operation platform according to the present invention is presented; Figure 3 A schematic diagram of a model lifecycle management system based on an AI operation platform is shown. Detailed Implementation

[0011] To make the technical solution of the present invention clearer and its technical advantages more apparent, the technical solution of the present invention will be clearly and completely described below in conjunction with specific embodiments. It should be noted that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of the present invention.

[0012] It should be noted that, in this document, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes that element. Furthermore, components, features, and elements with the same names in different embodiments of this application may have the same meaning or different meanings, the specific meaning of which must be determined by its interpretation in that specific embodiment or further in conjunction with the context of that specific embodiment.

[0013] It is understood that although the steps in the flowcharts of this application's embodiments are shown sequentially according to the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some of the steps in the figures may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily completed at the same time, but can be executed at different times, and their execution order is not necessarily sequential, but can be performed alternately or in turn with other steps or at least a portion of the sub-steps or stages of other steps.

[0014] In this document, the term "embodiment" means that a particular feature, structure, or characteristic described in connection with an embodiment may be included in at least one embodiment of the invention. The appearance of this phrase in various places throughout the specification does not necessarily refer to the same embodiment, nor is it a separate or alternative embodiment mutually exclusive with other embodiments. It will be explicitly and implicitly understood by those skilled in the art that the embodiments described herein can be combined with other embodiments.

[0015] To make the objectives, technical solutions, and advantages of the present invention clearer, the embodiments of the present invention will be described in further detail below with reference to the accompanying drawings.

[0016] like Figure 1 As shown in the figure, this embodiment provides a model lifecycle management method based on an AI operation platform, applied in the cloud, including: A1: Obtain the target model and initialize the model population, and establish or connect to the global experience knowledge base.

[0017] A2: Perform neuron-level plasticity analysis on the target model and generate corresponding plasticity genes; A3: Using plasticity genes as constraints, and employing swarm intelligence optimization algorithms, a set of optimization strategies for the target model under various environmental constraints is generated. A4: Construct a strategy knowledge base based on the set of optimization strategies; A5: Based on the model population and the global experience knowledge base, the model population is driven to iteratively optimize through a co-evolutionary mechanism to obtain a new generation of models and update the global experience knowledge base; A6: Use the next-generation model as the new target model and repeat steps A2-A4 to update the policy knowledge base.

[0018] The method described in step A1 above includes: In step A1, the target model is obtained, specifically by loading a pre-trained artificial intelligence model as the target model from a pre-built model repository or a user-specified path. This target model can be a neural network model or other machine learning model, in formats including but not limited to standard formats such as TensorFlow, PyTorch, or ONNX. In some possible embodiments, assuming an image classification task, the target model could be a ResNet-50 convolutional neural network model pre-trained on the ImageNet dataset, stored in a PyTorch .pt file in the model repository. The model is then retrieved by calling the platform's model loading interface, reading the file, and instantiating it into an executable target model object.

[0019] In step A1, the model population is initialized. Specifically, based on the target model, multiple model variant individuals are generated through operations such as random parameter perturbation, structural fine-tuning, or genetic mutation to form the initial model population. The size of the model population can be preset according to computing resources and task complexity, for example, generating 10 to 100 individuals to ensure the diversity of the model population. In some possible embodiments, following the previous example, when initializing the model population based on the ResNet-50 target model, a small Gaussian random perturbation can be applied to the convolutional layer weights of the model replicas, such as random parameter perturbation, or random masking, such as dropping out 20% of the neurons in a fully connected layer, i.e., structural fine-tuning, or swapping the structure of a specific residual block in any two model replicas, i.e., genetic mutation. By repeating such operations multiple times, for example, 50 ResNet-50 model variants with subtle differences are generated, thus forming the initial model population.

[0020] In step A1, a global experience knowledge base is established or accessed. Specifically, if a global experience knowledge base already exists in the system, it is directly accessed; otherwise, a new structured database is created as the global experience knowledge base to store historical strategies, performance indicators, environmental characteristics, and diagnostic root cause data during the model evolution process. The global experience knowledge base needs to support graph structure or vectorized storage to facilitate data retrieval and updates for the subsequent co-evolution mechanism.

[0021] In summary, step A1 realizes the transformation from a single static model to a dynamic evolutionary population and establishes an experience carrier for accumulating and passing on collective wisdom, namely a global experience knowledge base. This provides a data and structural foundation for subsequent co-evolution based on the population and knowledge base, as well as dynamic strategy matching at the device end, making the entire lifecycle management method form a closed-loop system that can be started and has memory and learning capabilities.

[0022] The method described in step A2 above includes: performing parameter sensitivity analysis and activation stability analysis on neurons of the target model to generate a plasticity score matrix; analyzing the pairwise correlations of neuron activation in the target model to construct a neuron dependency map; wherein the plasticity gene is composed of the plasticity score matrix and the neuron dependency map.

[0023] In step A2, parameter sensitivity analysis is performed on the neurons of the target model. Specifically, for each neuron or parameter group of the target model, its parameter values are perturbed and the magnitude of change in the overall output of the model or the performance index of a specific task is observed to calculate its sensitivity score. For example, the gradient norm of the loss function for the neuron's parameters can be calculated based on the backpropagation algorithm. The larger the gradient norm, the more sensitive the neuron is to the influence of the model output, and the greater its plasticity potential. Monte Carlo sampling can also be used to randomly perturb the parameters and statistically analyze the expected value of performance degradation. In some possible embodiments, continuing the example above, assuming that when performing parameter sensitivity analysis on the ResNet-50 model, the gradient L2 norm of the weights of each neuron in its last fully connected layer, i.e., the classification layer, relative to the classification loss on the ImageNet validation set can be calculated and used as the parameter sensitivity score of that neuron.

[0024] In step A2, activation stability analysis is performed on the neurons of the target model. Specifically, the target model is run on a representative subset of the dataset, and the activation values of each neuron are collected under different input samples. The statistical distribution of the activation values, such as the mean, variance, or coefficient of variation, is calculated and used as an activation stability score. It is understood that neurons with drastic fluctuations in activation values, i.e., high variance, may correspond to units that are more sensitive to specific features and have greater adjustment potential, while neurons that are too stable or cannot be activated may have lower plasticity. The representative subset of the dataset is specifically obtained from the original training or validation set of the target model through random sampling or stratified sampling. A small but representative dataset is used for activation value collection and analysis, such as a 10% random subset of the ImageNet validation set. In some possible embodiments, continuing the example above, assuming that when analyzing the ResNet-50 model, the aforementioned 10% random subset of the ImageNet validation set is used as a representative subset, input into the model, and forward propagation is performed; for each neuron in the last fully connected layer, the activation values it generates on all input samples are recorded, forming an activation value sequence; the standard deviation σ and mean μ of this sequence are calculated, and the coefficient of variation (CV) of the activation values of that neuron is then obtained. The coefficient of variation is used as the activation stability score of the neuron. For example, if the activation value of a neuron fluctuates greatly on different categories of pictures such as cats, dogs, and cars, that is, σ is large, μ is moderate, and CV value is large, it indicates that it is sensitive to specific category features, and the plasticity score should be correspondingly large.

[0025] Furthermore, based on the combined analysis results of the parameter sensitivity and activation stability dimensions, a comprehensive plasticity score is assigned to each neuron, and the scores of all neurons constitute a plasticity score matrix. For example, the parameter sensitivity score and activation stability score can be normalized separately, and then weighted and summed according to task requirements. In some possible embodiments, continuing the example above, in the ResNet-50 image classification task, the parameter sensitivity score, i.e., gradient norm, and the activation stability score, i.e. activation value variation coefficient, can be weighted with a weight of 6:4 to obtain the comprehensive plasticity score of each neuron, thereby forming the plasticity score matrix of that layer. The plasticity score matrix quantifies the structural adjustment potential of different parts of the target model.

[0026] In step A2, the pairwise correlations of neuron activations in the target model are analyzed to construct a neuron dependency map. Specifically, on the same representative subset of data, the activation values of all neurons on each sample are recorded; the pairwise correlation between any two neuron activation sequences is calculated, for example, using Pearson correlation coefficient or mutual information as the pairwise correlation. High pairwise correlation indicates that the two neurons are functionally co-activated or inhibited, and there is an activation dependency relationship. In some possible embodiments, it is assumed that the activation value sequences of neurons A and B on N samples are as follows: and Then its Pearson correlation coefficient ,in and Let A and B represent the activation values of neuron A and neuron B respectively on the i-th sample. and Here, represents the sequence mean, N is the total number of samples, and r is the calculated Pearson correlation coefficient, which ranges from [−1, 1]. This formula is the standard definition of the Pearson correlation coefficient, used to measure the degree of linear correlation between two variables. In this embodiment, the formula is correctly applied to calculate the correlation between any two neuron activation value sequences to construct a neuron dependency graph. Traversing all neuron pairs yields an N×N symmetric correlation matrix. Based on the calculated correlation matrix, a correlation threshold is set, and neuron pairs whose correlation exceeds the threshold are considered to have a dependency relationship. This forms the basis of a graph structure, the neuron dependency graph. In the neuron dependency graph, nodes represent neurons, and edges represent activation dependencies between them. The neuron dependency graph can be directed; for example, the direction of influence can be determined using Granger causality tests.

[0027] In step A2, finally, the generated plasticity score matrix and neuron dependency graph are encapsulated together to form the plasticity gene corresponding to the target model. The plasticity gene is a structured metadata that fully represents the blueprint or genetic information of the target model itself, which can be safely and effectively adjusted, from two dimensions: the adjustment potential score represented by the plasticity score matrix and the structural adjustment constraints represented by the neuron dependency graph. In some possible embodiments, following the example above, assuming that neuron-level plasticity analysis is performed on the ResNet-50 target model described in step A1, first, the plasticity score matrix is generated: the gradient L2 norm of the weight of each neuron in the last fully connected layer of the model, i.e., the classification layer, relative to the classification loss on the validation set, is calculated as its parameter sensitivity. The algorithm performs several key steps: First, it assigns a plasticity score to each neuron in the ResNet layer. Second, it calculates the coefficient of variation of activation values on a subset of the ImageNet validation set, using this coefficient as an activation stability score. Then, it normalizes and weights these two scores to obtain a comprehensive plasticity score for each neuron in the layer, forming a plasticity score matrix. Third, it constructs a neuron dependency graph by calculating the Pearson correlation coefficient between the activation values of neurons in the fully connected layer and the output neurons of the previous global average pooling layer, retaining strong correlations with absolute values greater than 0.3. This results in a bipartite neuron dependency graph that clearly identifies the key connections between classification neurons and feature neurons. Finally, this score matrix and dependency graph together constitute the plasticity gene of the ResNet-50 model for the current task.

[0028] In summary, step A2, through systematic neuron-level analysis, transforms the target model from a black box into an object whose internal structure and characteristics can be quantified. The generated plasticity genes not only provide clear mathematical constraints (i.e., neuron dependency maps) and optimization guidance (i.e., plasticity scoring matrices) for subsequent optimization in step A3, but also provide a key basis for safe hot switching at the device end in step B3, which is the technical foundation for realizing the dynamic adaptive evolution of the model.

[0029] The method described in step A3 above includes: encoding the neuron pruning strategy for the target model into particle positions; evaluating the accuracy parameters, delay parameters, and energy consumption parameters of the neuron pruning strategy corresponding to each particle position in multiple virtual simulation environments; and driving the particle swarm to iteratively and collaboratively explore using the neuron dependency graph as a constraint to obtain an optimized strategy set.

[0030] In step A3, the neuron pruning strategy for the target model is encoded as a particle position. Specifically, a particle in the particle swarm optimization algorithm is defined, and its position represents a specific neuron pruning strategy. The particle position is a multi-dimensional vector, the dimension of which is equal to the total number of neurons N in step A2. Each element in this multi-dimensional vector corresponds to the pruning decision state of a specific neuron. The pruning decision state can be encoded using binary values, where the element takes the value 0 or 1, where 1 indicates that the neuron is retained and 0 indicates pruning, i.e., removing or masking the neuron. Alternatively, continuous value encoding can be used, where the element takes the value in the interval [0,1], and the element value represents the importance weight of retaining the neuron. By setting a threshold, such as 0.5, it is transformed into a specific pruning action. For example, when the element value is ≥0.5, it means that the neuron is retained; otherwise, it is pruned, that is, the neuron is removed or masked. In the particle swarm initialization stage, the particle positions of all particles are randomly generated in the solution space. This initialization process can be guided by the plasticity score matrix generated in step A2. For example, assuming that the pruning decision state is encoded in binary, based on the plasticity score matrix, for neurons with low plasticity scores, the probability of their corresponding decision elements being initialized to 0 is increased, for example, from 0.5 to 0.7. Thus, the prior knowledge of the quantified neuron's plasticity score is used to improve search efficiency.

[0031] In step A3, the accuracy, latency, and energy consumption parameters of the neuron pruning strategy corresponding to each particle position are evaluated in multiple virtual simulation environments. Specifically, multiple virtual simulation environments are constructed to simulate different hardware conditions of the target model, such as different computing chips and memory bandwidth. Then, for the neuron pruning strategy encoded at a particle position, it is decoded into a specific pruning operation on the target model, and a pruned sub-model is obtained. The sub-model is then deployed sequentially to each virtual simulation environment, and forward inference is performed using a calibration dataset. The calibration dataset can be obtained in a similar way to the representative sub-dataset in step A2. In each virtual simulation environment, three core performance indicators of the sub-model are measured and recorded synchronously, including accuracy parameters, such as classification accuracy, latency parameters, such as single inference time, and energy consumption parameters, such as energy consumption estimated based on a power consumption model. Finally, each particle position, i.e., each neuron pruning strategy, will obtain an evaluation result set, which contains the accuracy, latency, and energy consumption parameters of the neuron pruning strategy in multiple virtual environments.

[0032] In step A3, the neuron dependency graph is used as a constraint to drive the particle swarm to iteratively and collaboratively explore and obtain an optimization strategy set. Specifically, the neuron dependency graph generated in step A2 is used as a hard constraint in the particle swarm optimization process. In each iteration, when new candidate particle positions (i.e., candidate neuron pruning strategies) are generated, the set of neurons to be pruned is checked. If neurons in this set have strong connections with some neurons in the set to be retained in the neuron dependency graph (e.g., the absolute value of the correlation coefficient exceeds a preset threshold), it indicates that the neuron pruning strategy may damage key functional pathways within the target model. Therefore, the neuron pruning strategy is marked as violating neuron connection constraints. For neuron pruning strategies that violate neuron connection constraints, a maximum penalty term can be added to the fitness function to eliminate them during the particle swarm optimization process. Based on the above, the particle swarm optimization algorithm iteratively updates the particle positions of all particles according to the historical best and global best information of the swarm. It performs a cooperative search in the solution space that satisfies the neuron dependency graph constraints. After multiple rounds of iteration and convergence, it outputs a set of optimization strategies. This set of optimization strategies can be realized by a Pareto front, which contains multiple non-dominated solutions that cannot surpass each other. Each solution is a neuron pruning scheme that achieves an optimal trade-off among multiple objectives such as accuracy parameters, delay parameters, and energy consumption parameters while adhering to neuron connection constraints.

[0033] In summary, step A3, based on the plasticity genes obtained in step A2, automatically generates multiple sets of optimization strategies for different device hardware conditions and environments, while ensuring the functional integrity of the target model, through a swarm intelligence optimization algorithm. Specifically, the plasticity score matrix in the plasticity genes is used to guide the initialization and search tendency of the particle swarm, and the neuron dependency graph is used as a structural constraint. The set of optimization strategies is the input for building the strategy knowledge base in the subsequent step A4, realizing the transformation from structural analysis of the target model to the generation of optimization strategies.

[0034] The method described in step A4 above includes: extracting historical policies, corresponding virtual environment features, and performance indicators as nodes and relationships based on the optimized policy set, constructing a heterogeneous graph, and using graph representation learning to obtain embedded representations to form a policy-environment-performance relationship knowledge graph; at the same time, using virtual environment features as input and the corresponding optimal policy identifier as supervision label, training a policy meta-model.

[0035] In step A4, historical strategies, corresponding virtual environment features, and performance metrics are extracted from the optimization strategy set as nodes and relationships to construct a heterogeneous graph. Specifically, the optimization strategy set output in step A3 is used as the data source. Each optimization strategy in the optimization strategy set, the corresponding virtual environment feature, and the performance metric evaluated by the optimization strategy under the virtual environment feature are abstracted into a strategy node, an environment feature node, and a performance metric node, respectively. The performance metric may include accuracy parameters, latency parameters, and energy consumption parameters. The strategy node contains the encoded vector of the optimization strategy, i.e., the vector representation of the particle position in step A3. The environment feature node is a multi-dimensional vector representing the hardware configuration of the virtual simulation environment. For example, this multi-dimensional vector may contain the number of CPU cores, GPU computing power (FLOPS), memory capacity, and power consumption limit. The performance metric node contains multi-dimensional evaluation results. The data structure is then used to create edges based on the relationships between the data. For example, an application edge is established between the optimization strategy node and the virtual environment feature node in which it is evaluated, and a generation edge is established between the optimization strategy node and the performance indicator node generated under the virtual environment feature. Similarity edges can also be established between different optimization strategy nodes if their corresponding virtual environment features are similar or their performance indicators are close. Based on the above operations, a heterogeneous graph containing multiple types of nodes and relational edges is constructed. In some possible embodiments, following the example above, assuming that step A3 generates an optimization strategy set containing 100 optimization strategies for the ResNet-50 model, and each optimization strategy is evaluated in 3 virtual environments, a heterogeneous graph containing 100 strategy nodes, 3 environment feature nodes, and 300 performance indicator nodes can be constructed, and connections can be established based on the above application edges, generation edges, and similarity edges.

[0036] In step A4, graph representation learning is used to obtain embedding representations to form a policy-environment-performance relationship knowledge graph. Specifically, graph representation learning algorithms, such as RGCN and HGT, are used to learn the low-dimensional vector representations, i.e., embedding representations, of each node in the heterogeneous graph. This learning process aims to encode the semantic information of the nodes, such as the structural information of the policy, the hardware features of the environment, the numerical performance of the performance, and their complex relationships in the graph, such as the performance index P generated by optimizing policy A under virtual environment feature H, into a dense vector space. Through training, closely related nodes in the embedding space are brought closer together, for example, an optimized policy is closer to the virtual environment feature where its optimal performance index is located. Finally, all nodes and their relationships are integrated into a structured policy-environment-performance relationship knowledge graph based on their respective embedding representations. This policy-environment-performance relationship knowledge graph can efficiently support similarity retrieval, relationship reasoning, and policy recommendation.

[0037] In step A4, the virtual environment features are used as input, and the corresponding optimal policy identifier is used as a supervision label to train the policy meta-model. Specifically, supervised learning samples are obtained from the set of optimized policies. For each virtual environment feature, an optimal policy is selected from all optimized policies evaluated under that virtual environment feature, according to a preset trade-off rule, such as selecting the optimization policy with the highest accuracy under the condition that the latency is below a certain threshold, or finding the Pareto optimal solution of the accuracy parameter minus the latency parameter under energy consumption constraints. The virtual environment feature is used as the model input feature, and the label of the optimal policy selected under that virtual environment feature is used as the model input feature. The system uses the index or hash value of the corresponding encoded vector as a label for supervised learning and trains a classification model, such as a multi-classifier, as a policy meta-model. This policy meta-model learns the mapping relationship from virtual environment features to optimal policies. In some possible embodiments, gradient boosting decision trees such as multilayer perceptrons or XGBoost can be used as the structure of the policy meta-model. Continuing the example above, for three virtual environment features, the optimal policy under each condition is determined. Then, using the virtual environment features of the three environment feature nodes as input and the indices of the three optimal policies as labels, a three-class policy meta-model is trained.

[0038] In summary, step A4 constructs the discrete set of optimization strategies generated in step A3 into a strategy-environment-performance relationship knowledge graph that supports complex queries and relational reasoning, and a strategy meta-model that can quickly recommend the optimal strategy based on the input virtual environment features. The strategy-environment-performance relationship knowledge graph and the strategy meta-model together constitute the core of the strategy knowledge base, providing a knowledge foundation and decision rules for dynamically selecting and adjusting strategies based on real-time environment features in step B2 on the device.

[0039] The method described in step A5 above includes: evaluating the fitness of the model population on a preset device environment set, and selecting elite individuals based on the fitness; performing cross-recombination on the elite individuals to obtain offspring models; fine-tuning the offspring models based on a global experience knowledge base, i.e., guiding the offspring models to learn high-performance strategies from the global experience knowledge base; combining the elite individuals with the fine-tuned offspring models to form a new generation model population, and obtaining a new generation model from the new generation model population; and updating the high-performance strategies generated in this evolutionary iteration to the global experience knowledge base.

[0040] In step A5, the fitness of the model population is evaluated on a preset set of device environments, and elite individuals are selected based on the fitness. Specifically, a set of device environments containing various target device hardware configurations is first defined. This set of device environments can correspond to the virtual simulation environment constructed in step A3, or be an extension of the virtual simulation environment. Then, each model individual in the model population initialized in step A1, or the model population generated in the previous round of evolution, is deployed and evaluated on the set of device environments. A unified test dataset is used during the evaluation process. The method of obtaining the test dataset can be consistent with the representative subset in step A2 or the calibration dataset in step A3. Performance indicators of each model individual in various device environments are collected. Next, a multi-objective optimization evaluation method, such as non-dominated ranking based on Pareto dominance, is used to obtain the fitness of each model individual. Finally, based on the fitness, selection methods such as tournament selection are used to select the K best-performing individuals from the current model population as elite individuals.

[0041] In step A5, the elite individuals are cross-recombined to obtain offspring models. Specifically, the selected elite individuals are paired up as parent models, and cross-recombination is performed on the paired parent models at the network structure level. One possible implementation is structure exchange, that is, randomly selecting certain layers in the network, such as several residual blocks in ResNet-50, and exchanging all parameters and structures of these layers between two parent models. Another possible implementation is parameter fusion, that is, for the corresponding network layer, linearly interpolating the parameters of the two parent models at a random ratio to generate new parameters. By repeating the above cross-recombination operation, offspring models with the same number as the original population size or a specific ratio are generated.

[0042] In step A5, the progenitor model is fine-tuned based on the global experience knowledge base, i.e., the progenitor model is guided to learn high-performance strategies from the global experience knowledge base. Specifically, the global experience knowledge base established or accessed in step A1 is obtained. The global experience knowledge base stores high-performance strategies collected in history and their associated virtual environment features and performance indicators. The high-performance strategies include the validated optimization strategies generated in steps A3 and A4 and previous evolutionary rounds, i.e., neuron pruning strategies. The goal of fine-tuning is to enable the progenitor model to learn to imitate the behavioral patterns implied by the high-performance strategies. One possible implementation is knowledge distillation, which uses the high-performance strategies recorded in the global experience knowledge base as teachers and the progenitor model as students, using a distillation loss function, such as KL divergence, to make the student's output closer to the teacher. Another possible implementation is meta-learning, which treats the global experience knowledge base as a training task distribution. The progenitor model obtains initial parameters that can quickly adapt to new environments by performing rapid adaptive training on this training task distribution, such as the MAML algorithm. The above fine-tuning process can guide the progenitor model to learn high-performance strategies from the global experience knowledge base.

[0043] In step A5, elite individuals are combined with fine-tuned offspring models to form a new generation model population, and new generation models are obtained from this new generation model population. Specifically, the elite individuals obtained in step A5 are merged with the fine-tuned offspring models to form a new generation model population. The combination can be carried out in a preset ratio to achieve a stable population size. For example, elite individuals and offspring models are combined in a ratio of 8:2 to form a new generation model population. Then, the fitness of the new generation model population is evaluated again. Finally, the best-performing model individual is selected from this new generation model population based on its fitness as the new generation model produced in this round of co-evolution.

[0044] In step A5, the high-performance strategies generated in this evolutionary iteration are updated to the global experience knowledge base. Specifically, the new high-performance strategies generated during this round of co-evolutionary iteration are updated to the global experience knowledge base to achieve experience accumulation. The new high-performance strategies generated during this round of co-evolutionary iteration include existing strategies from the global experience knowledge base that have been successfully learned and validated by the offspring models during the fine-tuning training in step A5, as well as novel strategies discovered when evaluating the new generation of model populations that are generated by cross-recombination or fine-tuning training and whose performance surpasses the existing records in the global experience knowledge base. The complete information of the above strategies, including their corresponding neuron pruning strategy encoding, the generated model context, virtual environment features, and performance indicators, are added to the global experience knowledge base as new records.

[0045] In some possible embodiments, following the example above, assuming the evolution scenario of the ResNet-50 model, the 50 ResNet-50 model variants (i.e., the model population) generated in step A1 are evaluated in three virtual simulation environments simulating high-end mobile phones, mid-range mobile phones, and IoT terminals. A subset of the ImageNet validation set, similar to that described in step A2, is used as a unified test dataset to measure the classification accuracy, inference latency, and estimated energy consumption of each model variant in the three environments. Then, a multi-objective optimization method is employed, such as assigning greater weight to the accuracy parameter and setting upper limits for latency and energy consumption before weighted summation to calculate a fitness for each model variant. Based on the fitness ranking, the top 5 model variants are selected as elite individuals. These 5 elite individuals are randomly paired, and for one parent model pair, residual blocks numbered 3 and 4 in their networks are randomly selected for overall swapping, thereby generating a new model variant as a descendant model. This process is repeated to generate 20 descendant models. From the global experience knowledge base, historical data that meets the condition of latency below 30 milliseconds in a mid-range mobile phone environment and... The optimization strategy with the highest classification accuracy, namely a specific neuron pruning scheme, is used. The pruned ResNet-50 model with this optimization strategy is used as the teacher model, and 20 descendant models are used as student models. Fine-tuning is performed using knowledge distillation, allowing the student models to learn the output features and lightweight characteristics of the teacher model. Five elite individuals are merged with the 20 fine-tuned descendant models to form a new generation model population of 25 individuals. This new generation model population is then evaluated again in the three virtual environments mentioned above. Finally, the model variant with the highest fitness in the three virtual simulation environments is selected as the new generation model produced in this round of co-evolution. New knowledge generated in this round of evolution is updated to the global experience knowledge base. For example, during fine-tuning, if a descendant model is found to have achieved a better energy efficiency ratio than historical records in a mid-range mobile phone environment by combining the residual blocks of elite individuals and the pruning strategy of the teacher model, this new model structure combination can be considered as an implicit new optimization strategy. It and its performance indicators in specific environments are then structurally added to the global experience knowledge base for reference and learning in subsequent evolutionary rounds.

[0046] In summary, step A5, through a co-evolutionary mechanism, combines fitness-based elite selection, cross-recombination promoting diversity, and guided learning based on a global experience knowledge base to drive iterative optimization of the model population. This co-evolutionary mechanism not only generates a new generation of models with better performance, but also achieves the continuous accumulation and evolution of the collective wisdom of the entire system by updating the global experience knowledge base, providing a basis for the autonomous and continuous optimization of the target model's life cycle.

[0047] The method described in step A6 above uses the new generation model as the new target model and re-executes steps A2-A4 to update the strategy knowledge base. Specifically, the new generation model produced by the co-evolution mechanism in step A5 is set as the starting point of a new round of lifecycle management process, that is, it is used as the new target model. Then, with the new target model as input, steps A2 to A4 are re-executed to start a new round of optimization strategy mining and knowledge base construction process.

[0048] like Figure 2 As shown in the figure, this embodiment provides a model lifecycle management method based on an AI operation platform, applied to the device side, including: B1: Load the target model and the corresponding plasticity gene and strategy knowledge base; B2: Real-time acquisition of environmental feature vectors, combined with a strategy knowledge base to obtain dynamically adjusted strategies; B3: Based on dynamic adjustment strategies and plasticity genes, hot switching is performed on the running target model; B4: Perform predictive health monitoring on the target model after hot switching to obtain diagnostic root causes.

[0049] The method described in step B2 above includes: inputting the environmental feature vector in parallel to multiple heterogeneous decision units constructed based on a policy knowledge base, wherein the multiple heterogeneous decision units include at least a first recommender constructed based on a policy meta-model in the policy knowledge base, and a second inference engine constructed based on a policy-environment-performance relationship knowledge graph in the policy knowledge base; and obtaining the dynamically adjusted policy through a consensus fusion algorithm based on the recommended policies and confidence levels output by each decision unit.

[0050] In step B2, the environmental feature vector is input in parallel to multiple heterogeneous decision units constructed based on a policy knowledge base. These multiple heterogeneous decision units include at least a first recommender constructed based on a policy meta-model in the policy knowledge base, and a second inference engine constructed based on a policy-environment-performance relationship knowledge graph in the policy knowledge base. Specifically, the device monitors and collects its hardware status and context information in real time. This information includes at least currently available computing resources, such as CPU / GPU / NPU utilization, memory availability, battery power, heat dissipation status, and network latency. This information is then fused and standardized to form a multi-dimensional environmental feature vector. Subsequently, this environmental feature vector is simultaneously and in parallel input to two independently running decision units, namely the first recommender and the second inference engine, to obtain policy recommendation results based on different inference logics.

[0051] In step B2, the first recommender is specifically implemented by the policy meta-model trained in step A4. Its input is the environmental feature vector, and its output is the first recommendation policy and the first recommendation confidence. The policy meta-model has been trained in step A4 in the cloud. During inference on the device, the first recommender receives the environmental feature vector, performs forward computation of the model, and outputs the first recommendation policy and the first recommendation confidence. The first recommendation policy is specifically the first recommender, that is, the optimized policy corresponding to the optimal policy identifier output by the policy meta-model. The first recommendation confidence is specifically the first recommender, that is, the confidence of the policy meta-model in the output optimal policy identifier, such as the highest class probability obtained through the Softmax layer.

[0052] In step B2, the second inference engine is specifically a graph query and similarity calculation-based inference engine. Its input is an environmental feature vector, which is used for retrieval and inference within the policy-environment-performance relationship knowledge graph constructed in step A4. The output is a second recommended policy and a second recommended confidence level. The second inference engine operates by first calculating the similarity (e.g., cosine similarity) between the input environmental feature vector and the embedded vectors of all environmental feature nodes in the policy-environment-performance relationship knowledge graph, either through a projection network or directly as a query vector. The projection network can be trained during the graph representation learning process in step A4 or is a simple linear transformation layer. Then, it locates... The system identifies one or more environmental feature nodes that are most similar to the target environment. Then, by traversing the application edges and generation edges in the policy-environment-performance knowledge graph, it finds the policy node directly connected to these most similar environmental nodes that has the best performance indicators, such as the optimal balance across multiple performance indicators (accuracy, latency, energy consumption, etc.) or is a Pareto optimal solution. The optimized policy represented by this policy node is then output as the second recommended policy. The confidence level of this second recommendation can be determined based on the similarity between the input environmental feature vector and the matched environmental feature nodes, as well as the performance indicators associated with the policy node. For example, the similarity between the input environmental feature vector and the matched environmental feature nodes can be used... The normalized improvement of the performance metric associated with this strategy node relative to the average level of all strategies in the same environment within the strategy-environment-performance relationship knowledge graph is denoted as: ,right and The weighted average is used as the second recommendation confidence level, such as the second recommendation confidence level. .

[0053] In step B2, based on the recommendation strategies and confidence levels output by each decision unit, the dynamic adjustment strategy is obtained through a consensus fusion algorithm. Specifically, firstly, the first recommendation strategy and first recommendation confidence level from the first recommender, and the second recommendation strategy and second recommendation confidence level from the second inference engine are obtained; secondly, based on the first recommendation confidence level and the second recommendation confidence level, the first recommendation strategy and the second recommendation strategy are arbitrated and fused through the consensus fusion algorithm. One possible implementation is a weighted voting algorithm, specifically, the first recommendation confidence level and the second recommendation confidence level output by the first recommender and the second inference engine are used as weights to perform weighted voting on the first recommendation strategy and the second recommendation strategy. If the strategy identifiers recommended by the first recommender and the second inference engine are the same, then the strategy is directly determined as the dynamic adjustment strategy; if they are different, then the first recommendation confidence level and the second recommendation confidence level are regarded as the voting weights of their respective strategies, and the two values are directly compared to determine the weights. The recommended strategy with higher weight is determined as the final dynamic adjustment strategy. It should be noted that during the weighted voting process, the confidence levels of the first and second recommendations can be normalized in advance to ensure comparability. Another possible implementation is the confidence threshold method, that is, firstly, a high confidence threshold is preset. If the confidence levels of the first and second recommendations output by the first recommender and the second inference engine are greater than the high confidence threshold, then their corresponding recommendation strategies are directly used as the dynamic adjustment strategies. If neither the confidence level of the first nor the confidence level of the second recommendation exceeds the high confidence threshold, then a secondary inference process is triggered or a default conservative strategy is adopted. The secondary inference process can be implemented based on a broader range of neighbor relationships in the query strategy-environment-performance relationship knowledge graph. The default conservative strategy can be the target model that does not perform any neuron pruning, or the optimization strategy recorded in the knowledge graph that has the most stable average performance in the most extensive device environments.

[0054] The method described in step B3 above, based on a dynamic adjustment strategy and a plasticity gene, performs a hot switch on the running target model. Specifically, the device receives the dynamic adjustment strategy obtained in step B2 and, combined with the plasticity gene loaded in step B1, dynamically adjusts the structure of the target model currently providing services on the device. The hot switch refers to completing the online update of the target model without stopping the model service or interrupting inference request processing. Further, firstly, the dynamic adjustment strategy is parsed. This dynamic adjustment strategy is specifically a neuron pruning strategy defined in step A3, and its encoding form is consistent with the particle position encoding in step A3, being a binary vector. The dynamic adjustment strategy is then... Decode and obtain the neuron pruning strategy, which indicates which specific neurons in the target model should be retained and which should be pruned. Before pruning, the neuron pruning strategy is quickly validated using a plasticity gene to ensure the safety of hot switching and a smooth transition of model performance. One possible implementation is that the quick validation can be based on a plasticity score matrix, that is, checking the neurons planned to be pruned in the neuron pruning strategy. If the plan is to prune neurons with high plasticity scores, a warning is issued or the plan is automatically fine-tuned by prioritizing the pruning of neurons with lower plasticity scores, so as to prioritize adjusting those with lower potential and protect the core capabilities of the model. Another possible implementation is that the quick validation can be based on the neuron's plasticity score matrix. The dependency graph is used for constraint verification, which involves checking the neuron pruning strategy to ensure that it does not break the strong dependencies defined in the neuron dependency graph. Specifically, if two neurons have a strong connection edge in the graph, one should not be pruned while the other is retained. If such a conflict is found, the pruning plan must be modified, such as retaining or pruning both neurons simultaneously, to ensure the functional integrity of the model. After verification, the pruning plan is dynamically implemented on the device. One possible implementation is dynamic computation graph rewriting, which involves modifying the computation graph of the currently running target model in memory for computation graph-based frameworks such as TensorFlow and PyTorch, setting the output of the computation node corresponding to the planned pruned neuron. One possible implementation is to remove the connection directly or zero, while preserving its parameters for possible rollback. Another possible implementation is conditional forward propagation, where a gate controlled by a pruning plan is added to each neuron during forward propagation. For pruned neurons, the gate is closed, their output is ignored, and subsequent layers only use the output of the preserved neurons for computation. After the hot switch is completed, the first batch of real request data received after the hot switch is acquired. A lightweight performance metric, such as the distribution of prediction confidence and the change in latency, is calculated based on the first batch of real request data and compared with the baseline before the hot switch to ensure that the performance change is within the expected range. If an abnormal deviation occurs, a fast rollback mechanism can be triggered to restore the target model before the hot switch.

[0055] In some possible implementations, continuing the example above, assuming that in the ResNet-50 scenario, the device has determined a dynamic adjustment strategy. After decoding this dynamic adjustment strategy, a pruning plan for the last fully connected layer is obtained, i.e., a 1000-dimensional binary vector. First, the scores of the neurons to be pruned in the plasticity rating matrix are checked to confirm that they are all units with low scores. Second, the neuron dependency graph is queried to check whether there are strong dependency edges between the neurons to be pruned and the neurons to be retained. Assuming that the neurons to be pruned are found to be pruned... Corresponding dog category and retained feature neurons If a strong edge exists in the neuron dependency graph corresponding to a hair feature, the automatic correction plan will be implemented, and the neuron will be... Remove from the pruning list and instead prune another low-scoring neuron that has no strong dependency on the retained neuron; after verification, the device rewrites the computation graph dynamically, masking the output of the specified neuron within milliseconds. After the hot switch, process the requests for the next few images and observe whether the classification accuracy decreases slightly within an acceptable range, such as <1%, while the inference latency decreases significantly, such as from 22ms to 15ms. If it meets expectations, the hot switch is successful; if the accuracy decreases significantly, a rollback is triggered immediately. A fast rollback can be achieved by saving a complete snapshot of the model computation graph state or parameters before the hot switch, and directly restoring that snapshot when the rollback is triggered.

[0056] The method described in step B4 above includes: collecting the real-time running data sequence of the target model after hot switching, and calculating the health score based on the real-time running data sequence; obtaining the predicted health score using a pre-trained time series prediction model based on the health score of historical periods; triggering an anomaly warning when the predicted health score is less than the preset health benchmark, or the prediction error exceeds the preset error threshold; and in response to the anomaly warning, obtaining diagnostic root cause data based on a causal inference method, and feeding the diagnostic root cause data back to the global experience knowledge base.

[0057] In step B4, the real-time running data sequence of the target model after hot switching is collected, and a health score is calculated based on the real-time running data sequence. Specifically, the device continuously collects the real-time running data sequence of the target model after hot switching. The real-time running data sequence is a multi-dimensional time series, and its data items include at least model performance indicators, such as single inference time, task accuracy or confidence, system resource indicators, such as CPU / memory usage of the inference process, and business indicators, such as request success rate and abnormal output ratio. Based on the real-time running data sequence, a health score is calculated for each time point through a predefined health score function. One possible implementation is to set a health range for each dimension of data, such as latency should be lower than threshold L and CPU usage should be lower than threshold C. Points are deducted for each dimension that exceeds the range, and the final score is normalized to the [0,1] interval, where 1 represents complete health.

[0058] In step B4, a pre-trained time-series prediction model is used to obtain the predicted health score based on the historical health score. Specifically, the device maintains a time window, such as the historical sequence of health scores within the most recent hour. This historical sequence is input into a pre-trained time-series prediction model in the cloud, such as a lightweight model based on LSTM, GRU, or Transformer architecture. The task of this time-series prediction model is to predict one or more future time steps based on the health scores over a past period, such as the health score for the next 5 minutes. This time-series prediction model is trained in the cloud using a large amount of historical running data containing normal and abnormal patterns to learn the rules of health score changes. On the device, this time-series prediction model runs offline or in a lightweight form, periodically, such as every minute, receiving the latest historical sequence of health scores and outputting the predicted health score for the future.

[0059] In step B4, an anomaly warning is triggered when the predicted health score is lower than the preset health benchmark, or the prediction error exceeds the preset error threshold. Specifically, the health benchmark is a score representing the lowest acceptable health state of the model, for example, 0.7. When the predicted health score is consistently lower than the health benchmark, such as being lower than 0.7 for three consecutive prediction cycles, it is determined that the model's future state may enter an unhealthy range. The prediction error threshold is a threshold representing the prediction reliability. The prediction error is calculated by comparing the difference between the predicted health score and the subsequently observed health score. If the prediction error continuously exceeds the threshold, for example, the absolute error > 0.2, it indicates that the actual operating state deviates significantly from the prediction of the time-series prediction model, and a new abnormal pattern not covered in the training data may have appeared. When any of the above conditions are met, an anomaly warning is immediately triggered. The anomaly warning includes at least a timestamp, current environmental characteristics, triggering conditions, and relevant operating data snapshots, which are used to provide a data foundation for obtaining diagnostic root cause data in subsequent steps.

[0060] In step B4, in response to the anomaly warning, diagnostic root cause data is obtained based on causal inference methods and fed back to the global experience knowledge base. Specifically, one possible implementation is to construct a simple causal graph. The nodes of the causal graph include environmental features, such as power consumption, load, executed dynamic adjustment strategies, internal model indicators, such as the distribution of activation values at each layer, and health scores. By analyzing the changes and statistical relationships of these variables before and after the timestamp corresponding to the anomaly warning, such as using conditional independence tests and intervention analysis, the variables or combinations of variables most likely to cause a decline in health are inferred as the diagnostic result. For example, the diagnostic result might be that a dynamic adjustment strategy was executed under the environmental feature vector [power consumption < 20%, load > 90%]. This leads to an abnormal increase in the activation entropy of the attention layer, resulting in increased latency and decreased health. The complete diagnostic results are encapsulated as diagnostic root cause data, whose structure includes at least the associated abnormal warning ID, the diagnostic result (as in the example above), the associated environmental feature vector, the dynamic adjustment strategy in effect at that time, and key data evidence for corroboration. This key data evidence may include real-time running data sequences collected when the warning is triggered, such as the specific inference time during the abnormal period, CPU usage values, key statistics calculated during causal inference, such as the correlation coefficient between variables, the effect size after intervention, and relevant internal states extracted from the model, such as the specific activation value distribution of the abnormal layer. In one possible embodiment, continuing with the example above, assuming a monitoring scenario after a hot switch of the ResNet-50 model, when the time-series prediction model predicts the next minute's score based on a historical health score that is stable at 0.9... When the actual observed score plummeted to 0.65 and the prediction error exceeded 0.2, an anomaly warning was triggered. The subsequent causal inference analysis yielded diagnostic results. Under the constraint of extremely low available memory, the currently effective dynamic adjustment strategy caused an unexpected caching bottleneck, leading to abnormal feature extraction and a performance crash. Key supporting data included time-series data showing a sudden increase in memory usage from 60% to 95% during the abnormal period, and quantitative records showing that the activation entropy of the model's fourth convolutional layer output was lower than the historical baseline during the same period. These specific key data pieces, along with the diagnostic results, environmental feature vectors, and policy identifiers, were encapsulated as diagnostic root cause data. Subsequently, the device transmitted the diagnostic root cause data to the global experience knowledge base in the cloud via a secure channel. Upon receiving the data, the cloud stored and analyzed it as new experience to optimize subsequent policy recommendations and model evolution.

[0061] like Figure 3 As shown in the figure, this embodiment provides a model lifecycle management system based on an AI operation platform, including: The cloud processing module is used to acquire the target model and initialize the model population, establish or access the global experience knowledge base, perform neuron-level plasticity analysis on the target model to generate plasticity genes, use the plasticity genes as constraints to generate an optimization strategy set using a swarm intelligence optimization algorithm, and build a strategy knowledge base based on this. Based on the model population and the global experience knowledge base, the model population is driven to iteratively optimize through a co-evolution mechanism to obtain a new generation of models and update the global experience knowledge base, and the new generation of models is used as new target models to restart the plasticity analysis, strategy generation and knowledge base construction process. The device-side processing module, which is connected to the cloud-side processing module, is used to load the target model, the corresponding plasticity gene and strategy knowledge base, acquire environmental feature vectors in real time, and obtain dynamic adjustment strategies through consensus fusion in combination with the strategy knowledge base. Based on the dynamic adjustment strategies and plasticity genes, the target model running online is hot-switched, and the model after hot-switching is subjected to predictive health monitoring and diagnostic root causes are obtained. A data feedback update module is used to feed back the diagnostic root cause data obtained by the device-side processing module to the cloud processing module. The cloud processing module updates the global experience knowledge base based on the feedback data to drive the next round of collaborative evolution mechanism.

[0062] The above description discloses only preferred embodiments of the present invention and should not be construed as limiting the scope of the present invention. Therefore, equivalent variations made in accordance with the claims of the present invention are still within the scope of the present invention.

Claims

1. A model lifecycle management method based on an AI operation platform, characterized in that, Applied to the cloud, the method includes: A1: Obtain the target model and initialize the model population, and establish or connect to the global experience knowledge base; A2: Perform neuron-level plasticity analysis on the target model and generate corresponding plasticity genes; A3: Using plasticity genes as constraints, and employing swarm intelligence optimization algorithms, a set of optimization strategies for the target model under various environmental constraints is generated. A4: Construct a strategy knowledge base based on the set of optimization strategies; A5: Based on the model population and the global experience knowledge base, the model population is driven to iteratively optimize through a co-evolutionary mechanism to obtain a new generation of models and update the global experience knowledge base; A6: Use the next-generation model as the new target model and repeat steps A2-A4 to update the policy knowledge base.

2. The model lifecycle management method based on an AI operation platform according to claim 1, characterized in that, Perform neuron-level plasticity analysis on the target model to generate corresponding plasticity genes, including: Parameter sensitivity analysis and activation stability analysis are performed on neurons of the target model to generate a plasticity score matrix; Analyze the pairwise correlations of neuron activation in the target model and construct a neuron dependency map; The plasticity gene is composed of a plasticity scoring matrix and a neuron dependency map.

3. A model lifecycle management method based on an AI operation platform according to claim 1 or 2, characterized in that, The method further includes: The neuron pruning strategy for the target model is encoded as particle positions; In multiple virtual simulation environments, the accuracy, delay, and energy consumption parameters of the neuron pruning strategy corresponding to each particle location are evaluated. Using the neuron dependency map as a constraint, the particle swarm is driven to conduct iterative collaborative exploration to obtain a set of optimization strategies.

4. A model lifecycle management method based on an AI operation platform as described in claim 1, comprising constructing a strategy knowledge base based on an optimization strategy set, including: Based on the optimization strategy set, historical strategies, corresponding virtual environment features and performance indicators are extracted as nodes and relationships to construct a heterogeneous graph. Graph representation learning is then used to obtain embedded representations to form a knowledge graph of strategy-environment-performance relationships. Simultaneously, virtual environment features are used as input, and the corresponding optimal policy identifier is used as a supervision label to train the policy meta-model.

5. The model lifecycle management method based on an AI operation platform according to claim 1, characterized in that, Based on the model population and the global experience knowledge base, a co-evolutionary mechanism is used to drive iterative optimization of the model population, obtaining a new generation of models and updating the global experience knowledge base, including: The fitness of the model population is evaluated on a pre-defined set of equipment environments, and elite individuals are selected based on the fitness. Cross-recombination of the elite individuals yields offspring models; Fine-tuning the training of the offspring model based on the global experience knowledge base means guiding the offspring model to learn high-performance strategies from the global experience knowledge base. Elite individuals are combined with finely tuned offspring models to form a new generation of model population, and new generation models are obtained from the new generation of model population. The high-performance strategies generated in this evolutionary iteration are updated to the global experience knowledge base.

6. A model lifecycle management method based on an AI operation platform, characterized in that, Applied to the device side, the method includes: B1: Load the target model and the corresponding plasticity gene and strategy knowledge base; B2: Real-time acquisition of environmental feature vectors, combined with a strategy knowledge base to obtain dynamically adjusted strategies; B3: Based on dynamic adjustment strategies and plasticity genes, hot switching is performed on the running target model; B4: Perform predictive health monitoring on the target model after hot switching to obtain diagnostic root causes.

7. A model lifecycle management method based on an AI operation platform according to claim 6, characterized in that, Real-time acquisition of environmental feature vectors, combined with a policy knowledge base to obtain dynamically adjusted policies, including: The environmental feature vectors are input in parallel to multiple heterogeneous decision units constructed based on the policy knowledge base. The multiple heterogeneous decision units include at least a first recommender constructed based on the policy meta-model in the policy knowledge base, and a second inference engine constructed based on the policy-environment-performance relationship knowledge graph in the policy knowledge base. Based on the recommendation strategies and confidence levels output by each decision-making unit, the dynamic adjustment strategy is obtained through a consensus fusion algorithm.

8. A model lifecycle management method based on an AI operation platform according to claim 6, characterized in that, Predictive health monitoring of the target model after hot switching to obtain diagnostic root causes, including: Collect real-time running data sequences of the target model after hot switching, and calculate the health score based on the real-time running data sequences; Based on historical health scores, a pre-trained time-series prediction model is used to obtain predicted health scores. An anomaly warning is triggered when the predicted health score is lower than the preset health benchmark, or when the prediction error exceeds the preset error threshold. In response to anomaly warnings, diagnostic root cause data is obtained based on causal inference methods, and the diagnostic root cause data is fed back to the global experience knowledge base.

9. A model lifecycle management system based on an AI operation platform, applied to the method described in any one of claims 1 to 8, characterized in that, include: The cloud processing module is used to acquire the target model and initialize the model population, establish or access the global experience knowledge base, perform neuron-level plasticity analysis on the target model to generate plasticity genes, use the plasticity genes as constraints to generate an optimization strategy set using a swarm intelligence optimization algorithm, and build a strategy knowledge base based on this. Based on the model population and the global experience knowledge base, the model population is driven to iteratively optimize through a co-evolution mechanism to obtain a new generation of models and update the global experience knowledge base, and the new generation of models is used as new target models to restart the plasticity analysis, strategy generation and knowledge base construction process. The device-side processing module, which is connected to the cloud-side processing module, is used to load the target model, the corresponding plasticity gene and strategy knowledge base, acquire environmental feature vectors in real time, and obtain dynamic adjustment strategies through consensus fusion in combination with the strategy knowledge base. Based on the dynamic adjustment strategies and plasticity genes, the target model running online is hot-switched, and the model after hot-switching is subjected to predictive health monitoring and diagnostic root causes are obtained. A data feedback update module is used to feed back the diagnostic root cause data obtained by the device-side processing module to the cloud processing module. The cloud processing module updates the global experience knowledge base based on the feedback data to drive the next round of collaborative evolution mechanism.