Model management method and apparatus

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By unloading old version model data based on version description information during the model version deployment or update phase, the problem of resource waste in model management is solved, and efficient resource utilization of computing nodes is achieved.

WO2026138629A1PCT designated stage Publication Date: 2026-07-02HUAWEI TECH CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: WO · WO
Patent Type: Applications
Current Assignee / Owner: HUAWEI TECH CO LTD
Filing Date: 2025-12-18
Publication Date: 2026-07-02

Application Information

Patent Timeline

18 Dec 2025

Application

02 Jul 2026

Publication

WO2026138629A1

IPC: G06F8/71

AI Tagging

Technology Topics

Model management Theoretical computer science

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Model evaluation method and device, electronic equipment and storage medium
CN122152649ABiological models Hardware monitoring Model management Simulation
Method, model processing method and apparatus for server deployment model
CN115708061BModel management Management efficiency
A bim-based deep foundation pit support stress monitoring method and system
CN122113210AForce measurement by measuring frquency variationsGeometric CAD Model management Processing
Substation lightning protection and grounding intelligent optimization design method and system based on geological artificial intelligence large model deduction
CN122366150AModel management Scale model
A model management method and apparatus, a communication device, a storage medium, and a program product
CN122372113AModel management Data pack

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing model management solutions struggle to efficiently allocate computing node resources, leading to unnecessary resource waste. This is especially true when model versions are frequently updated, as older model data can occupy computing node memory resources for extended periods, reducing storage resource utilization.

Method used

The target version is determined by the version description information of the target model, and an uninstallation command is sent to the control module to control the computing node to uninstall the model data corresponding to the target version, so as to manage the model version reasonably and avoid the old version from occupying memory resources for a long time.

Benefits of technology

This improves the resource utilization of computing nodes, avoids resource waste, and ensures the efficient operation of computing nodes.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN2025143449_02072026_PF_FP_ABST

Patent Text Reader

Abstract

A model management method and apparatus, which are capable of implementing reasonable model management and improving resource utilization. In the method, a first node can, on the basis of version description information of a first version of a target model, determine a target version among deployed versions of the target model; and send a first unloading instruction to a first control module used to control compute nodes on which the target version is deployed, so as to instruct unloading of model data corresponding to the target version. Because version description information of the target version is consistent with the version description information of the first version, a deployed new version can directly replace the target version. Thus, an unloading operation for model data of a related old version is triggered to prevent the model data of the old version from occupying memory resources of the compute nodes for a long time, thereby improving resource utilization.

Need to check novelty before this filing date? Find Prior Art

Description

Model Management Method and Device

[0001] This application claims priority to Chinese Patent Application No. 202411956311.5, filed with the State Intellectual Property Office of China on December 25, 2024, entitled “Model Management Method and Apparatus”, the entire contents of which are incorporated herein by reference. Technical Field

[0002] This application relates to the field of communications, and more particularly to a model management method and apparatus. Background Technology

[0003] With the development of artificial intelligence (AI) and machine learning (ML) technologies, more and more applications are implementing richer service functions based on AI / ML models. Terminals are limited by factors such as computing resources, storage space, energy efficiency, and heat dissipation, making them insufficient to support high-performance application inference that enables large models (such as AI / ML models). Therefore, in communication systems, the device resources (or processing resources) of the computing nodes on the access network side can be scheduled. This means allocating all computation of the large model to the computing nodes on the access network side, or allocating a portion of the computation to the computing nodes on the access network side while still allocating the remaining portion to the terminal, thus meeting the computing power requirements of the large model.

[0004] However, current model management solutions struggle to efficiently allocate computing node resources, leading to unnecessary resource waste. Summary of the Invention

[0005] To address the aforementioned technical problems, this application provides a model management method and apparatus that can rationally manage models and improve resource utilization.

[0006] Firstly, a communication method is provided. This method can be executed by a first node, or by a component of the first node, such as its processor, chip, or chip system. It can also be implemented by a logic module or software capable of performing all or part of the first node's functions. The following explanation uses the execution of this method by the first node as an example. This model management method includes: determining a target version from deployed versions of the target model based on version description information of a first version of the target model; ensuring the version description information of the target version matches the version description information of the first version; sending a first unload instruction to a first control module; the first control module controlling the computing nodes deploying the target version; and the first unload instruction instructing the unloading of model data corresponding to the target version.

[0007] Based on the above technical solution, in this application, the first node can determine a target version consistent with the version description information of the first version of the target model from the deployed versions of the target model, and then send an uninstallation command to the first control module to control the computing node deploying the target version to uninstall the model data corresponding to the target version. In other words, the model management method provided in this application can trigger the uninstallation operation of related old version model data based on the version description information of the version to be deployed during the model version deployment or update phase, avoiding the long-term occupation of computing node memory resources by old version model data, thereby improving resource utilization.

[0008] In conjunction with the first aspect mentioned above, in one possible design, the model version description information includes the model parameter information of the model version; the model parameter information includes at least one of the following: the data structure of the model parameters, the number of model parameters, the size of the model parameters, and the semantic type of the model parameters; the model parameters include the model's input parameters and / or output parameters. The aforementioned model parameter information is used to define the relevant configuration requirements for the application node's call interface to the target model. If the aforementioned model parameter information is the same for two model versions, it indicates that the calling method for the two versions remains unchanged, and the application node can directly call the model without changing the configuration.

[0009] In conjunction with the first aspect mentioned above, in one possible design, the method includes: obtaining version description information of deployed versions of the target model from the target model's information repository; and identifying the model version among the deployed versions of the target model that matches the version description information of the first version as the target version. The first node can maintain the version description information of deployed versions of the target model through an information repository, so that it can obtain the version description information of the current deployed version of the target model in real time when information comparison is required later.

[0010] In conjunction with the first aspect above, in one possible design, the method further includes: receiving a first request message; the first request message is used to request the deployment of a first version of the target model; the first request message includes version description information of the first version; the method includes: determining the target version based on the version description information of the first version in response to the first request message, and sending a first uninstallation command to the first control module.

[0011] In conjunction with the first aspect mentioned above, in one possible design, the first request message further includes at least one of the following: a model identifier of the target model, a value factor corresponding to the target model, and application version support information for the first version. The value factor is used to evaluate the value of the model version of the target model; the application version support information is used to characterize the application versions supported by the first version. In this embodiment, the model and the value factor can have a corresponding relationship, that is, a corresponding value factor can be configured for different models. Since the structure, function, and complexity of each model are different, the adapted evaluation dimensions may also differ. Therefore, in this embodiment, an adapted value factor can be configured for the target model, thereby performing value evaluation on different versions of the target model based on the corresponding value factor, improving the model evaluation effect. In addition, when the version of the target model iterates, the application versions it can be compatible with may also change. Therefore, the first node can obtain the application version support information of the first version from the first request message to determine the application compatibility of the current target model.

[0012] In conjunction with the first aspect described above, in one possible design, the method further includes: sending a deployment instruction to a second control module; the deployment instruction instructs the computing nodes controlled by the second control module to deploy the first version of the model data.

[0013] In conjunction with the first aspect mentioned above, in one possible design, the deployment command is also used to instruct the computing nodes controlled by the second control module to collect statistical information corresponding to the value factors for the first version. In this case, the computing nodes deploying the corresponding model version also act as data collection nodes, collecting the statistical information corresponding to the value factors to facilitate subsequent value assessment.

[0014] In conjunction with the first aspect mentioned above, in one possible design, the value factors include at least one of the following: inference accuracy, usage frequency, last call time, and creation time. By expanding the value factors across multiple dimensions, the value corresponding to different model versions can be evaluated more accurately and reasonably.

[0015] In conjunction with the first aspect mentioned above, in one possible design, the method further includes: sending a second unloading instruction to a third control module; the third control module is used to control the computing nodes deploying low-value versions from the deployed versions; the second unloading instruction is used to instruct the unloading of model data corresponding to the low-value versions, where the low-value versions are model versions among the deployed versions that meet the low-value condition. Related technologies typically delete versions according to version iteration order after the current number of deployed versions reaches a limit, or delete versions after the storage time reaches a limit. However, this method may lead to the deletion of high-value versions, thus affecting normal user operation, resulting in unreasonable model management. In this embodiment, when version unloading is required, the first node can select low-value versions for unloading based on model value, thereby achieving reasonable management of model versions and improving resource utilization.

[0016] In conjunction with the first aspect mentioned above, in one possible design, the model versions that meet the low-value condition include: the N model versions with the lowest value among the deployed versions, or the model versions among the deployed versions whose value is less than a preset value threshold, where N is a positive integer.

[0017] In conjunction with the first aspect mentioned above, in one possible design, the value of a model version is determined by the statistical information corresponding to the value factors for that model version.

[0018] In conjunction with the first aspect above, in one possible design, the method includes: sending a second uninstallation command to a third control module when the number of deployed versions of the target model exceeds a preset version number threshold.

[0019] In conjunction with the first aspect described above, in one possible design, the method further includes: sending an application update instruction to the application node; the application update instruction includes unavailable version information and / or available version information; the unavailable version information is used to characterize the application version of the application to which the currently unsupported target model belongs; the available version information is used to characterize the application version of the application to which the currently supported target model belongs. Since each model version may support different application versions, when a model version is uninstalled, it may lead to incompatibility between some application versions of the application to which the target model belongs. In this case, the first node can determine the available and unavailable versions of the current application and report this information to the application node. Thus, the application node can immediately obtain the application version support status of the current target model and perform the corresponding application update operation.

[0020] In conjunction with the first aspect mentioned above, in one possible design, the application update instruction also includes a version identifier for low-value versions.

[0021] Secondly, a model management device is provided for implementing various methods. The model management device includes modules, units, or means corresponding to the implementation of the methods, wherein the modules, units, or means can be implemented in hardware, software, or by hardware executing corresponding software. The hardware or software includes one or more modules or units corresponding to the functions.

[0022] In some possible designs, the model management device may include a processing module and a transceiver module. The processing module can be used to implement the processing functions in any of the above aspects and any possible implementations thereof. The transceiver module may include a receiving module and a sending module, respectively used to implement the receiving function and the sending function in any of the above aspects and any possible implementations thereof.

[0023] In some possible designs, the transceiver module can consist of transceiver circuits, transceivers, transceivers, or communication interfaces.

[0024] Thirdly, a model management apparatus is provided, comprising: a processor and a memory; the memory being used to store computer instructions that, when executed by the processor, cause the model management apparatus to perform the method described in any of the above aspects and any possible design thereof.

[0025] Fourthly, a model management device is provided, comprising: a processor and a communication interface; the communication interface being used to communicate with a module outside the model management device; the processor being used to execute computer programs or instructions to cause the model management device to perform the methods described in any of the above aspects and any possible designs thereof.

[0026] Fifthly, a model management apparatus is provided, comprising: at least one processor; said processor being configured to execute a computer program or instructions stored in a memory to cause the model management apparatus to perform the methods described in any of the foregoing aspects and any possible designs thereof. The memory may be coupled to the processor, or may be independent of the processor.

[0027] In a sixth aspect, a model management device (e.g., the model management device may be a chip or a chip system) is provided, the model management device including a processor for implementing the functions involved in any of the above aspects and any possible designs thereof.

[0028] In some possible designs, the model management device includes a memory for storing necessary program instructions and data.

[0029] In some possible designs, when the device is a chip system, it can be composed of chips or contain chips and other discrete components.

[0030] The model management device described in the third to seventh aspects may be the first node in the first aspect, or a device contained in the first node, such as a chip or chip system.

[0031] In a seventh aspect, a model management device is provided, which may be a first node, or a module or unit (e.g., a chip, a chip system, or a circuit) corresponding to the execution of the methods / operations / steps / actions described in the first aspect in the first node, or a module or unit that can be matched and used with the first node.

[0032] It is understandable that when the model management device provided in any of the second to seventh aspects is a chip, the sending action / function of the model management device can be understood as output information, and the receiving action / function of the model management device can be understood as input information.

[0033] Eighthly, a computer-readable storage medium is provided that stores a computer program or instructions that, when executed on a model management device, enable the model management device to perform the methods described in any of the preceding aspects and any possible designs thereof.

[0034] Ninthly, a computer program product containing instructions is provided, which, when run on a model management device, enables the model management device to perform the methods described in any of the foregoing aspects and any possible design thereof.

[0035] The technical effects of any of the design methods in aspects two through nine can be found in the technical effects of different design methods in aspect one, and will not be repeated here. Attached Figure Description

[0036] Figure 1 is a schematic diagram of the structure of a communication system provided in this application;

[0037] Figure 2 is a schematic diagram of another communication system provided in this application;

[0038] Figure 3 is a schematic diagram of an O-RAN system provided in this application;

[0039] Figure 4 is a schematic diagram of another O-RAN system provided in this application;

[0040] Figure 5 is a schematic diagram of another O-RAN system provided in this application;

[0041] Figure 6 is a flowchart illustrating a model management method provided in this application;

[0042] Figure 7 is a flowchart illustrating another model management method provided in this application;

[0043] Figure 8 is a flowchart illustrating another model management method provided in this application;

[0044] Figure 9 is a flowchart illustrating another model management method provided in this application;

[0045] Figure 10 is a flowchart illustrating another model management method provided in this application;

[0046] Figure 11 is a flowchart illustrating another model management method provided in this application;

[0047] Figures 12-14 are schematic diagrams of the model management device provided in this application. Detailed Implementation

[0048] In the description of this application, unless otherwise stated, " / " indicates that the objects before and after are in an "or" relationship. For example, A / B can mean A or B. "And / or" in this application is merely a description of the relationship between the related objects, indicating that there can be three relationships. For example, A and / or B can mean: A exists alone, A and B exist simultaneously, and B exists alone. A and B can be singular or plural.

[0049] In the description of this application, unless otherwise stated, "multiple" means two or more. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of a single item or a plurality of items. For example, at least one of a, b, or c can mean: a, b, c, ab, ac, bc, or abc, where a, b, and c can be single or multiple.

[0050] Furthermore, to facilitate a clear description of the technical solutions in the embodiments of this application, the terms "first" and "second" are used in the embodiments of this application to distinguish identical or similar items with substantially the same function and effect. Those skilled in the art will understand that the terms "first" and "second" do not limit the quantity or execution order, and the terms "first" and "second" are not necessarily different.

[0051] In the embodiments of this application, the terms "exemplary" or "for example" are used to indicate that something is an example, illustration, or description. Any embodiment or design that is described as "exemplary" or "for example" in the embodiments of this application should not be construed as being more preferred or advantageous than other embodiments or design. Specifically, the use of terms such as "exemplary" or "for example" is intended to present the relevant concepts in a specific manner to facilitate understanding.

[0052] It is understood that the term "embodiment" used throughout the specification means that a specific feature, structure, or characteristic related to an embodiment is included in at least one embodiment of this application. Therefore, various embodiments throughout the specification do not necessarily refer to the same embodiment. Furthermore, these specific features, structures, or characteristics can be combined in any suitable manner in one or more embodiments. It is understood that in the various embodiments of this application, the sequence number of each process does not imply the order of execution; the execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application.

[0053] It is understood that in this application, "...when" and "if" both refer to the corresponding processing that will be carried out under certain objective circumstances, and are not limited to a specific time, nor do they require a judgment action to be performed during implementation, nor do they imply any other limitations.

[0054] It is understood that some optional features in the embodiments of this application can be implemented independently in certain scenarios without relying on other features, such as the current solution on which they are based, to solve the corresponding technical problems and achieve the corresponding effects. Alternatively, they can be combined with other features as needed in certain scenarios. Correspondingly, the apparatus given in the embodiments of this application can also implement these features or functions, which will not be elaborated here.

[0055] In this application, unless otherwise specified, the same or similar parts between the various embodiments can be referred to each other. In the various embodiments of this application, unless otherwise specified or there is a logical conflict, the terminology and / or descriptions between different embodiments are consistent and can be mutually referenced. Technical features in different embodiments can be combined to form new embodiments based on their inherent logical relationships. The following descriptions of the embodiments of this application do not constitute a limitation on the scope of protection of this application.

[0056] To facilitate understanding of the technical solutions of the embodiments of this application, a brief introduction to the relevant technologies of this application is given below.

[0057] With the development of AI / ML technology, AI / ML models are gradually evolving from traditional small-scale neural network models (such as multilayer perceptrons (MLPs), convolutional neural networks (CNNs), and recurrent neural networks (RNNs)) to large-scale neural network models based on Transformers (an AI model based on a self-attention mechanism). At the same time, a large number of intelligent terminal applications based on large models are emerging, and more and more applications are using AI / ML models to achieve richer service functions.

[0058] However, terminals are limited by their own computing resources, storage space, energy efficiency, and heat dissipation, making them insufficient to support the computing power requirements for executing large models, such as high-performance application inference using large models. To meet the computing power requirements of large models deployed on terminals, servers carrying large models can be deployed in cloud data centers. This allows all or part of the computing tasks (or computing power tasks) of models with high computing power requirements to be allocated to / transmitted to the cloud for processing by cloud servers. While this cloud data center-based processing solution can provide additional computing power for model inference, the long transmission distance between the terminal and the cloud results in significant latency for computing tasks to reach the cloud, making it unsuitable for handling latency-sensitive applications. Furthermore, the strong dependence of AI / ML-based applications on communication networks is reflected in the guarantee of key performance indicators (KPIs) such as latency and bandwidth. Currently, AI / ML models are decoupled from communication networks, failing to meet the requirements for computing power and latency.

[0059] Furthermore, with the rapid development of IoT technology, the demand for local computing at the terminal is emerging exponentially. Cloud-based data center processing methods are insufficient to meet the needs of all or part of the computing tasks and model data exchange. Therefore, the computing tasks for model inference can be deployed on edge devices on the access network side to meet the requirements for computing power and low latency.

[0060] Because network traffic exhibits significant fluctuations (such as temporal / spatial tidal effects), some edge devices, such as base stations, are not at full capacity most of the time. The computing power of these edge devices can be made available to other devices. For example, network for AI (NET4AI) is a current technological development direction for opening up network computing power. Edge computing services are one application scenario for this. In this scenario, the communication network can open up its device's computing power to third-party applications, allowing these applications to deploy all or part of the model's computational tasks (or computational tasks) on computing nodes within the network. For instance, a third-party application can send a model deployment request to the communication network. The communication network can respond to this request by downloading model data to the computing node using the Uniform Resource Locator (URL) address provided by the third-party application, thus deploying all or part of the model's computational tasks on the computing node.

[0061] However, models from third-party applications may undergo version updates due to algorithm updates, training data updates, feature expansions, bug fixes, etc., leading to repeated triggering of model deployment requests to the communication network. If model version updates are frequent, the network side may store multiple versions of the same model from third-party applications as the model iterates, which reduces the utilization of storage resources on the network side devices.

[0062] In related technologies, after each deployment of a new version of a model, the communication network retains the model data of the old version until the number of versions of that model in the model library stored in the communication network reaches a limit or the storage time of that version reaches a limit, at which point the deletion operation of the old version is triggered. This results in model versions that may no longer be usable still occupying a large amount of storage resources on the computing nodes before the number of model versions reaches the limit, causing resource waste and impacting the inference performance of other models on the same node.

[0063] In summary, current model management solutions struggle to efficiently allocate computing node resources, leading to unnecessary resource waste.

[0064] Based on this, in this application, the first node can, for the first version of the target model, determine the target version that matches the version description information of the first version from the deployed versions of the target model, and then send an uninstallation command to the first control module to control the computing node deploying the target version to uninstall the model data corresponding to the target version. In other words, the model management method provided in this application can trigger the uninstallation operation of related old version model data based on the version description information of the version to be deployed during the model version deployment or update phase, avoiding the long-term occupation of computing node memory resources by old version model data, thereby improving resource utilization.

[0065] The technical solutions of this application embodiment can be used in various communication systems, including third-generation partnership project (3GPP) communication systems, such as fourth-generation (4G) systems like Long Term Evolution (LTE), fifth-generation (5G) systems like New Radio (NR), LTE and 5G hybrid networking systems, integrated communication and sensing systems, non-terrestrial networks (NTN), device-to-device (D2D) communication systems, vehicle-to-everything (V2X) communication systems, machine-type communication (MTC) systems, Internet of Things (IoT) systems, or other future communication systems. The communication system can also be a non-3GPP communication system; there is no limitation on this.

[0066] The communication systems described above are merely illustrative examples, and are not limited to those described herein. The communication systems provided in this application do not impose any limitations on the solutions described herein. This will be explained uniformly here and will not be repeated below.

[0067] Figure 1 illustrates a possible, non-limiting system diagram. As shown in Figure 1, the communication system 10 includes a radio access network (RAN) 100 and a core network (CN) 200. RAN 100 includes at least one RAN node (110a and 110b in Figure 1, collectively referred to as 110) and at least one terminal (120a-120j in Figure 1, collectively referred to as 120). RAN 100 may also include other RAN nodes, such as wireless relay devices and / or wireless backhaul devices (not shown in Figure 1). Terminal 120 is wirelessly connected to RAN node 110. RAN node 110 is wirelessly or wired connected to core network 200. The core network node in core network 200 and RAN node 110 in RAN 100 can be different physical devices, or they can be the same physical device integrating core network logical functions and radio access network logical functions.

[0068] In one possible implementation, a core network node can refer to equipment in the core network 200 that provides service support to terminal 120. The core network node in core network 200 may include at least one of the following: access and mobility management function (AMF) network elements, session management function (SMF) network elements, user plane function (UPF) network elements, policy control function (PCF) network elements, unified data management (UDM) network elements, application function (AF) network elements, network exposure function (NEF) network elements, network slice selection function (NSSF) network elements, or location management function (LMF) network elements, etc. Of course, core network 200 may also include other core network nodes, without limitation.

[0069] The AMF (Agency Flow Management) network element is deployed in the core network 200 to provide mobility management and connectivity management for the network, such as user location updates, user registration with the network, and user handover. The AMF network element can act as an intermediate route between the LMF, SMF, and RAN 100. The SMF network element is mainly responsible for session management in the mobile network, such as session establishment, modification, and release. The UPF (User Plane Function) network element is a user plane function element, mainly responsible for connecting to external networks and processing user packets, such as forwarding and charging. The PCF (Programmable Flow Function) network element is mainly responsible for providing policies to the AMF and SMF, such as Quality of Service (QoS) policies and slice selection policies. The UDM (User DM) network element is used to store user data, such as subscription information and authentication / authorization information. The AF (Agency Flow) network element is responsible for providing services to the 3GPP network. The NEF (Network Flow Function) network element is mainly used to open the capabilities of various network functions and is responsible for converting internal and external information. The LMF network element is a device or component deployed in the core network 200 to provide positioning functions for the terminal 120; for example, the LMF network element can initiate a positioning process to locate a specific terminal.

[0070] In this application, network elements may also be referred to as entities or functional entities. For example, an AMF network element may also be referred to as an AMF entity or an AMF functional entity. In addition, the aforementioned SMF network elements, UPF network elements, PCF network elements, UDM network elements, AF network elements, NEF network elements, and LMF network elements may have other names in future communication systems, and this application does not impose specific limitations on them.

[0071] In one possible implementation, RAN 100 can be a 3rd Generation Partnership Project (3GPP) related cellular system, such as a 4G, 5G mobile communication system, or a future-oriented evolution system. RAN 100 can also be an open RAN (O-RAN or ORAN), a cloud radio access network (CRAN), an NTN network (such as an NTN supporting pass-through mode and / or regenerative mode, or an NTN supporting eye-viewing mode (earth fixed cell) and / or non-eye-viewing mode (earth moving cell), or a wireless fidelity (WiFi) system. RAN 100 can also be a communication system that integrates two or more of the above systems.

[0072] RAN node 110, sometimes also referred to as access network equipment, RAN entity, or access node, constitutes part of the communication system and is used to help terminals achieve wireless access. Multiple RAN nodes 110 in RAN 100 can be of the same type or different types. In some scenarios, the roles of RAN node 110 and terminal 120 are relative. For example, network element 120i in Figure 1 can be a helicopter or drone, which can be configured as a mobile base station. For terminals 120j accessing RAN 100 through network element 120i, network element 120i is a base station; but for base station 110a, network element 120i is a terminal. RAN node 110 and terminal 120 are sometimes both referred to as communication devices. For example, network elements 110a and 110b in Figure 1 can be understood as communication devices with base station functions, and network elements 120a-120j can be understood as communication devices with terminal functions.

[0073] For RAN node 110, in one possible scenario, RAN node 110 can be a base station, an evolved NodeB (eNodeB, also known as eNB), an access point (AP), a transmission reception point (TRP), a next-generation NodeB (gNB), a next-generation base station in a future mobile communication system, or an access node in a WiFi system, etc. RAN node 110 can be a macro base station (as shown in Figure 1, 110a), a micro base station or indoor station (as shown in Figure 1, 110b), a relay node or donor node, or a radio controller in a CRAN scenario. Examples include: satellite base stations, radio network controllers (RNCs), base station controllers (BSCs), base transceiver stations (BTSs), home base stations (e.g., home evolved NodeBs, or home NodeBs, HNBs), relay stations, balloon stations, drone stations, radio backhaul nodes, or grant nodes (G nodes) in satellite flash, etc. It is understood that network equipment can be either ground-based or non-ground-based (such as satellites, drones, high-altitude communication equipment, etc.). Furthermore, the names of network equipment with base station functions may differ in communication systems employing different wireless access technologies; this application does not limit this. Optionally, RAN node 110 can also be a server, wearable device, vehicle, or in-vehicle equipment. For example, the access network equipment in vehicle-to-everything (V2X) technology can be a roadside unit (RSU). RAN node 110 is also referred to as a next-generation RAN (NG-RAN) node.

[0074] In another possible scenario, multiple RAN nodes 110 collaborate to assist the terminal in achieving wireless access, with each RAN node 110 implementing a portion of the base station's functions. For example, a RAN node 110 can be a central unit (CU), a distributed unit (DU), a CU-control plane (CP), a CU-user plane (UP), or a radio unit (RU), etc. CUs and DUs can be set up separately or included in the same network element, such as a baseband unit (BBU). RUs can be included in radio frequency equipment or radio frequency units, such as remote radio units (RRUs), active antenna units (AAUs), or remote radio heads (RRHs).

[0075] In different systems, CU (or CU-CP and CU-UP), DU, or RU may have different names, but those skilled in the art will understand their meaning. For example, in an ORAN system, CU can also be called O-CU (open CU), DU can also be called O-DU, CU-CP can also be called O-CU-CP, CU-UP can also be called O-CU-UP, and RU can also be called O-RU. For ease of description, this application uses CU, CU-CP, CU-UP, DU, and RU as examples. Any of the units among CU (or CU-CP, CU-UP), DU, and RU in this application can be implemented through software modules, hardware modules, or a combination of software and hardware modules.

[0076] In one possible scenario, terminal 120 can be a device used to implement wireless communication functions, such as a terminal, a chip or circuit that can be used in the terminal, or an entity associated with the terminal. Specifically, terminal 120 can be user equipment (UE), access terminal, terminal unit, terminal station, mobile station (MS), mobile station, remote station, remote terminal, mobile device, wireless communication equipment, terminal agent or terminal device, subscriber unit, smartphone, wireless data card, tablet computer, wireless modem, laptop computer, machine-type communication (MTC) terminal, tag, etc., in a 5G network or a future evolved public land mobile network (PLMN). The access terminal can be a cellular phone, cordless phone, session initiation protocol (SIP) phone, wireless local loop (WLL) station, personal digital assistant (PDA), handset with wireless communication capabilities, computing device or other processing device connected to a wireless modem, in-vehicle device or wearable device, virtual reality (VR) terminal, augmented reality (AR) terminal, wireless terminal in industrial control, wireless terminal in self-driving, wireless terminal in remote medical care, wireless terminal in smart grid, wireless terminal in transportation safety, wireless terminal in smart city, wireless terminal in smart home, or terminal node (T-node) in StarSpark, etc. In one possible implementation, terminal 120 can be mobile or fixed. It is understood that the terminal and the mobile user can be completely independent. All information related to a user can be stored in a subscriber identity module (SIM) card, which can be used on a terminal device.The terminal can send and / or receive signals via the air interface to complete the interaction with network-side devices.

[0077] The chip or circuit in the terminal includes components inside the terminal, such as at least one of a chip, a central processing unit (CPU), a network processing unit (NPU), and a terminal radio frequency module.

[0078] Entities associated with the terminal include terminal-side servers, computing / processing nodes, computing / processing entities, computing / processing units, and servers such as over-the-top (OTT) servers. OTT refers to various services provided to users by a third party other than the network operator via the operator's network. Examples of OTT services include OTT voice communication services, OTT multimedia services, and OTT data processing services. The terminal interacts with relevant information (e.g., data) through communication with this associated network entity. For example, this associated network entity and the terminal may belong to the same vendor. Since model training, model selection, etc., may not be executed on the terminal but rather on the terminal-side OTT server, the term "terminal" in this embodiment also includes the terminal-side OTT server.

[0079] It should be understood that the terminal in this embodiment may also be referred to as the "UE side" or the "UE part".

[0080] As exemplified, Figure 2 shows an exemplary implementation of the system shown in Figure 1. This communication system may include a third-party server, a network management system (NMS), an element management system (EMS), and computing nodes. The NMS and EMS can be deployed in the core network shown in Figure 1.

[0081] A third-party server can be a single server or a server cluster consisting of multiple servers. In some implementations, the server cluster can also be a distributed cluster. The server can provide services to the chip, and therefore can also be called a chip server. Alternatively, the third-party server can be the first network element in the core network. For example, the third-party server can be an enterprise / application (APP) server, capable of providing model data from third-party applications and being responsible for APP development and updating the models used by the APP. For models with high computational resource consumption, deployment within the communication network can be requested through the third-party server.

[0082] NMS can provide services such as service support, fault management, configuration management, performance management, security management, and billing management for communication networks. For example, NMS can expose model deployment services to third-party servers through application programming interfaces (APIs), allowing models from third parties to be deployed in the communication network. NMS can be a model management service consumer, i.e., an entity requesting model management services (such as a carrier's network management system). When it receives a model deployment request from a third-party server, it can pass the request to EMS.

[0083] EMS (Electronic Management System) can manage and control network elements in communication networks, such as performing data analysis, fault response, and function control. EMS can connect to NMS (Network Management System) through standardized interfaces (e.g., the 3GPP Management Service (MnS) specification) to enable the deployment of third-party models. EMS can also be a model management service producer, i.e., an entity that provides model management service functionality (e.g., a network management system from an equipment vendor). EMS can respond to model deployment requests to deploy models and can also manage and maintain deployed models.

[0084] Compute nodes have the capability to execute computational tasks, providing computing power for third-party models. EMS can deploy third-party models on the corresponding compute nodes. Compute nodes can be deployed at RAN100 and / or CN200 in Figure 1.

[0085] In some embodiments, compute nodes can be deployed individually or in a distributed manner. For example, in a individually deployed scenario, each compute node has a control module (also called a control function or control entity). In this case, the EMS can send instructions to the control module of the corresponding compute node to instruct the compute node to perform operations such as model deployment, update, and unload. For a distributed deployment (e.g., Kubernetes, or K8s), multiple compute nodes can form a computing power cluster, including master nodes and slave nodes. The master node has a control module used to control each compute node in the computing power cluster. In this case, the EMS can send instructions to the control module of the master node to instruct the compute nodes in the computing power cluster to perform operations such as model deployment, update, and unload.

[0086] It is understood that this application does not limit the number of computing nodes. It is also understood that computing nodes can be independent devices, or they can be integrated into the same device to implement different functions. Alternatively, they can be network elements in hardware devices, software functions running on dedicated hardware, or virtualization functions instantiated on a platform (e.g., a cloud platform). For example, a computing node can be a server dedicated to computing tasks, or it can be a computing board inside the BBU. Logically, a computing node can be regarded as an independent network element and managed uniformly by the EMS.

[0087] This application does not limit the specific form of the aforementioned computing node. A computing node can also be referred to as a computing network element or a computing module.

[0088] It is understood that Figure 2 above is merely a schematic diagram and does not constitute a limitation on the applicable scenarios of the technical solutions provided in this application. Those skilled in the art should understand that in specific implementation processes, the communication system shown in Figure 2 may include fewer devices than those shown in Figure 2, or the communication system shown in Figure 2 may also include other devices. At the same time, the number of devices in the communication system shown in Figure 2 can be determined according to specific needs and is not limited.

[0089] Optionally, the devices in Figure 2, such as third-party servers, NMS, EMS, and computing nodes, can also be referred to as communication devices. They can be general-purpose devices or special-purpose devices. This application embodiment does not specifically limit them.

[0090] In one possible implementation, the NMS and EMS in this embodiment can be operation administration and maintenance (OAM) devices. It should be understood that the network device in this embodiment can also be referred to as a "network side" or a "network part." This embodiment does not specifically limit this.

[0091] In one possible implementation, the relevant functions of the terminal 120 or network device in this application embodiment can be implemented by one device, multiple devices working together, or one or more functional modules within a single device. This application embodiment does not specifically limit this. It is understood that the above functions can be network elements in hardware devices, software functions running on dedicated hardware, a combination of hardware and software, or virtualization functions instantiated on a platform (e.g., a cloud platform).

[0092] RAN nodes can be devices or components within devices in the aforementioned NG-RAN, such as ng-eNB nodes, gNB nodes, or transmission points (TPs) and transmission and reception points (TRPs) within ng-eNB and gNB nodes, or central units (CUs) integrated into the NG-RAN. RAN nodes can also be network elements with transmission capabilities, such as transmission measurement function (TMF) network elements. In some embodiments, RAN nodes can also be access nodes in an O-RAN system. A RAN typically consists of a series of modules, such as antennas, RRUs, and BBUs. Traditional RAN architectures define the overall reception and output of a RAN node but do not restrict the transmission and communication between internal modules. O-RAN architectures define the architectural connections and standardized interfaces between various modules within the RAN, allowing the RAN to be decoupled into multiple standard modules, thereby enabling the combination and replacement of modules.

[0093] As exemplified, Figure 3 illustrates a possible, non-limiting structural diagram of an O-RAN system. The Service Management and Orchestration Framework (SMO), as the network management device in the O-RAN, is used for the operation and management of devices within the O-RAN. The Non-Real-Time RAN Intelligent Controller (Non-RT RIC), located within the SMO module, implements non-real-time intelligent management of RAN functions, such as enabling AI / ML workflows including model training and updates, and guiding applications / functions within the Near-RT RIC based on policies. The Near-Real-Time RAN Intelligent Controller (Near-RT RIC) implements near-real-time intelligent management of the RAN. Through data collection and related operations on the E2 interface, it achieves near-real-time control and optimization of O-RAN modules and resources.

[0094] The O-RAN central unit (O-CU) comprises the O-RAN central unit control plane (O-CU-CP) and the O-RAN central unit user plane (O-CU-UP). The O-CU implements the radio resource control (RRC) layer, the packet data convergence protocol (PDCP) layer, the service data adaptation protocol (SDAP) layer, and other control functions. Specifically, the O-CU-CP implements the RRC layer functions and the PDCP control plane functions. The O-CU-UP implements the SDAP layer functions and the PDCP user plane functions.

[0095] The O-RAN distributed unit (O-DU) is used to implement the radio link control (RLC) layer, media access control (MAC) layer, and higher physical layer (Higher PHY). The higher physical layer functions include one or more of the following: forward error correction (FEC) encoding / decoding, scrambling / descrambling, or modulation / demodulation.

[0096] The O-RAN radio unit (O-RU) is used to implement lower physical layer (PHY) functions and radio frequency (RF) functions. These PHY functions include one or more of the following: fast Fourier transform (FFT) / inverse fast Fourier transform (iFFT), digital beamforming, or extraction and filtering of the physical random access channel (PRACH). In other words, the O-RU possesses functions similar to TRP and RRH RF devices, as well as PHY processing capabilities. Furthermore, the O-RU, O-CU, and O-DU can also be used as a single unit, i.e., the O-eNB / gNB, to implement the aforementioned functions.

[0097] O-RAN cloud (O-Cloud) is a cloud computing platform that includes physical infrastructure nodes for hosting O-RAN functions such as RIC and O-DU. O-Cloud supports software components (such as operating systems, virtual machine monitoring, and container runtimes), management, and orchestration functions.

[0098] For example, an O-RAN system includes communication interfaces between newly added internal components and other communication interfaces. For instance, the A1 interface serves as the interface between Non-RT RICs and Near-RT RICs, used for intelligent and dynamic control of radio resources within the O-RAN. Non-RT RICs can provide policies, enriched information, and ML model updates to Near-RT RICs via the A1 interface, while Near-RT RICs can provide policy feedback to Non-RT RICs via the A1 interface.

[0099] The E2 interface is an open interface between two endpoints used to connect the Near-RT RIC and the RAN node. The RAN node includes the CU and DU in 5G, the O-RAN compatible eNB in 4G, and the O-CU (O-CU-CP and / or O-CU-UP) and / or O-DU in O-RAN. The Near-RT RIC can obtain data collection and feedback from the RAN node through the E2 node, and the RAN node can obtain control feedback from the Near-RT RIC through the E2 node.

[0100] The O1 interface is the interface between the management entity in the SMO and the O-RAN module, used for operation management. This interface enables network management (such as fault management, configuration management, billing management, performance management, and security management, also known as FCAPS management), software management, and file management. The O2 interface is the interface between the SMO and the infrastructure management framework that supports O-RAN virtual network functions.

[0101] The Open Fronthaul (FH) CUS-Plane interface includes a control plane (C-Plane), a user plane (U-Plane), and a synchronization plane (S-Plane). The control plane is used for real-time control between the O-DU and O-RU, such as transmitting beamforming weights from the O-DU to the O-RU or performing power control from the O-DU to the O-RU. The user plane is used to transmit communication data between the DU and RU for access network devices and terminals. The synchronization plane is used by the O-DU to provide clock synchronization to the O-RU. The Open FH M-Plane interface is the management plane interface, used for connection between the O-RU and O-DU, as well as the SMO, enabling management, monitoring, and configuration functions.

[0102] In addition, the NG interface is the interface between RAN nodes (e.g., base stations, CUs, CU-CPs, CU-UPs) and the core network; NG-u is the user plane NG interface; and NG-c is the control plane NG interface. The Xn interface is the interface between NR RAN nodes; Xn-u is the user plane Xn interface; and Xn-c is the control plane Xn interface. The X2 interface is the interface between LTE RAN nodes; X2-u is the user plane X2 interface; and X2-c is the control plane X2 interface. In NR systems, the X2 interface is mainly used in E-UTRA-NR dual connectivity (EN-DC) scenarios, where the primary base station is an LTE RAN node connected to the LTE core network via the X2 interface. The E1 interface is the interface between CU-CPs and CU-UPs; the F1-c interface is the interface between CU-CPs and DUs; and the F1-u interface is the interface between CU-UPs and DUs.

[0103] For example, Figure 4 shows an exemplary implementation of the system shown in Figure 3. The SMO / Non-RT RIC has ML training and ML model library functions. Data from the O-CU / O-DU can be transferred to the SMO / Non-RT RIC for offline training via the O1 interface. The Non-RT RIC has ML training and ML inference functions. The Non-RT RIC can download / update models from the ML model library via the O1 / A1 interface, and the Non-RT RIC can also obtain data from the O-CU / O-DU via the E2 interface for online training and inference. The Non-RT RIC can deploy models based on the trained ML models to achieve ML inference. Similarly, ML inference can also be used to provide performance feedback on ML training to achieve online learning of ML training information. Based on the results of ML inference, the Non-RT RIC issues corresponding control behaviors or guidance instructions to the O-CU / O-DU via the E2 interface.

[0104] As exemplarily shown in Figure 5, another exemplary implementation of the system shown in Figure 3 is presented. This O-RAN system may include a third-party server, an SMO, and a RAN. The Non-RT RIC is located in the SMO module, and the Near-RT RIC is located in the RAN.

[0105] Among them, SMO can expose model deployment services to third parties through the R1 interface, and Non-RT RIC and Near-RT RIC in SMO can establish a connection through the A1 / O1 interface.

[0106] In the O-RAN system, applications based on non-real-time RIC platforms (rApps) or Non-RT RIC modules in the SMO can act as consumers of model management services, i.e., entities requesting model management services (such as operator network management systems). Non-RT RIC modules in the SMO or Near-RT RIC modules in the RAN can act as producers of model management services, i.e., entities providing model management service functionality (such as equipment vendor network management systems). Compute nodes can be deployed in the RAN to provide computing power support for model inference.

[0107] It should be understood that the models involved in the embodiments of this application can be described as functions (such as AI functions or ML functions), characteristics, or algorithms, etc., and "model operation" can also be called "functional operation". For example, model training, model transfer, model update, model inference, model monitoring, and model management can be replaced by functional training, functional update, functional inference, functional monitoring (or performance monitoring), and functional management, respectively. Here, "function" can be understood as a function corresponding to artificial intelligence. A model can implement one or more functions, and one or more models can also work together to implement a function.

[0108] The node executing model inference can be a terminal, a network-side computing node, or both. Therefore, model deployment methods can be categorized into one-sided model deployment and two-sided model deployment.

[0109] In this context, unilateral model deployment refers to the process of completing the entire inference process for an air interface feature, use case, or function by deploying an AI / ML model solely on the network side (referred to as the network-side AI / ML model) for inference, or deploying an AI / ML model solely on the terminal side (referred to as the terminal-side AI / ML model). The unilateral model includes both network-side and UE-side models. For the network-side model, the terminal can report relevant information to the network side as input data for model inference / training / monitoring / management. For the terminal-side model, the terminal can perform model inference / training / monitoring based on the acquired relevant information and send the output results to the network side.

[0110] Two-sided model deployment refers to deploying AM / ML models on both the terminal and network sides for a given air interface feature / use case / function. In this case, the network-side model and the terminal-side model need to be paired to complete the entire inference process for that air interface feature.

[0111] It is understood that the system described in the embodiments of this application is for the purpose of more clearly illustrating the technical solutions of the embodiments of this application, and does not constitute a limitation on the technical solutions provided in the embodiments of this application. As those skilled in the art will know, with the evolution of network architecture and the emergence of new business scenarios, the technical solutions provided in the embodiments of this application are also applicable to similar technical problems.

[0112] The following description uses the first node as an example to illustrate the model management method provided in this application. This model management method is applicable to the aforementioned communication systems, and also to other communication systems not mentioned. In the following embodiments of this application, the message names, parameter names, or information names between the first node and other nodes are merely examples; other names may exist in other embodiments, and the method provided in this application does not specifically limit these.

[0113] It is understood that in the embodiments of this application, each communication device (including the first node or other nodes) may execute some or all of the steps in the embodiments of this application. These steps or operations are merely examples, and the embodiments of this application may also execute other operations or variations thereof. Furthermore, the steps may be executed in different orders as presented in the embodiments of this application, and it is not necessary to execute all the operations in the embodiments of this application.

[0114] It is understood that this application uses the first node and the control module as examples to illustrate the execution of the interaction, but this application does not limit the execution entities of the interaction. For example, the method executed by the first node in this application can also be executed by a module applied to the first node (e.g., a chip, chip system, or processor), or by a logic node, logic module, or software capable of implementing all or part of the functions of the first node. Or, for example, the method executed by the node where the control module is located in this application can also be executed by a module applied to the control module (e.g., a chip, chip system, or processor), or by a logic node, logic module, or software capable of implementing all or part of the control module.

[0115] The model management method provided in the embodiments of this application will be described below. As shown in Figure 6, the model management method may include:

[0116] Step 601: The first node determines the target version from the deployed versions of the target model based on the version description information of the first version of the target model.

[0117] In this application, the first version refers to the model version of the target model to be deployed on the computing node, or the model version already deployed on the computing node. The deployed version can also be called the old version, and the first version can be a new version or an update of the old version.

[0118] In some embodiments, the first version is a model version that can be deployed on computing nodes and is higher than the target version. Optionally, the processing power (or computing power) of the target model in the first version is greater than that of the target model in the target version. In other words, when the processing power (or computing power) of the target model is proportional to the version of the target model, the first version is higher than the target version. That is, the first version is a newer version than the target version, or the first version is an updated / upgraded version of the target version, etc. For example, a third party obtains the first version corresponding to the target model through algorithm updates, and this first version has higher inference accuracy than the target version.

[0119] For example, the first node can be a communication device in the aforementioned communication system, or a chip or circuit that can be used in the communication device, or an entity associated with the communication device. For instance, the first node can be a network device in RAN100 and / or CN200 in Figure 1, or an EMS in Figure 2, or an SMO / Non-RT RIC in Figures 3-5, and so on.

[0120] For example, the target model can be any AI / ML model from a third party. Due to algorithm updates, training data updates, feature expansions, bug fixes, etc., the target model can adopt a versioned deployment scheme to facilitate iterative updates. Each updated target model is considered a new version, and the target model can have one or more model versions (or simply versions). Optionally, each model version can be configured with a version identifier, which is used for model versioning, and different versions of the target model have different version identifiers.

[0121] In this application, the version description information of the model version is used to describe the update information of the model version. The update information is a detailed record of the updated content of the model version, including but not limited to algorithm improvements, performance enhancements, and data changes. Data changes include, but are not limited to, changes in the model parameters (or changes in the model's interface information). This version description information can characterize / reflect the calling of the model's input and / or output ends by the application to which the model belongs, that is, how the model receives input data and produces output results. The version description information of different model versions may be the same or different. In this application, the application to which the model belongs can refer to an application that uses the model / calls the model to implement a certain function, and the application to which the model belongs can be called a third-party application.

[0122] In this application, the version description information of the target version is consistent with that of the first version, which can be understood as the target version having the same version description information as the first version. For two model versions with consistent version description information, the application can call the model using the same interface parameter configuration. This means that during model version iteration, if the version description information of the model version remains unchanged, the application does not need to modify its original interface configuration parameters to continue using the new version of the model; it only needs to adjust the call address to the call address of the new version of the model. For example, if a third party obtains the first version corresponding to the target model through algorithm updates, and this first version has higher inference accuracy than the target version, since the version description information of the first version is consistent with that of the target version, the first version can replace the target version, and the application node can call the first version without changing the interface parameter configuration.

[0123] Specifically, the model version description information may include the model parameter information for that model version. This model parameter information is used to characterize the attributes of the model parameters, i.e., the model's interface information. Model parameters include the model's input parameters and / or output parameters. Specifically, the model parameter information includes at least one of the following: the data structure of the model parameters, the number of model parameters, the size (dimension) of the model parameters, and the semantic type of the model parameters. Data structures may include lists, arrays, fields, custom objects, etc. Semantic types may include text, images, audio, video, etc. The above model parameter information is used to define the relevant configuration requirements for the application node's call interface to the target model. If the above model parameter information is the same for two model versions, it means that the application's calling method for the two model versions remains unchanged, and the application node can directly call the model without changing the interface parameter configuration.

[0124] In this application, the target model can be either a one-sided model or a two-sided model. As mentioned earlier, a one-sided model refers to a target model deployed only on the network side or the terminal side (this embodiment uses network side deployment as an example). A two-sided model refers to a target model where part is deployed on the network side and the other part is deployed on the terminal side. For a one-sided model, the model parameters of the target model can include the input parameters and output parameters of the target model. For a two-sided model, the model parameters of the target model can include the output parameters of the model deployed on the terminal side and the input parameters of the model deployed on the network side.

[0125] In some embodiments, the first node may trigger step 601 above during the model's version deployment or update phase. It should be understood that the update phase described in this application can be replaced by the model's version upgrade phase, i.e., the phase of upgrading the currently deployed version. Furthermore, if the version description information of the target version is inconsistent with the version description information of the first version, the first node may directly perform the deployment operation without uninstallation; the deployment method can refer to step 802 below.

[0126] Specifically, the process of the first node determining the target version from the deployed versions of the target model can be referred to in the method shown in Figure 7 below.

[0127] Step 602: The first node sends a first unload command to the first control module. Correspondingly, the first control module receives the first unload command from the first node.

[0128] The first control module controls the compute nodes where the target version is deployed, and the first uninstallation command instructs the uninstallation of the model data corresponding to the target version. Model data refers to the data deployed on the compute nodes corresponding to the model version.

[0129] For example, the first control module can be a functional module of a computing node. The operation of the first node sending a first unload instruction to the first control module can be understood as the first node sending a first unload instruction to the computing node where the first control module resides. The first unload instruction may include the model identifier of the target model and the version identifier of the target version, so that the first control module can determine the target model based on the model identifier, and then determine the target version to be unloaded based on the version identifier. The model identifier of the target model and the version identifier of the target version can be stored in the database of the first node, and the first node retrieves the model identifier of the target model and the version identifier of the target version from the database.

[0130] In scenarios where compute nodes are deployed independently, the first control module can be a functional module within the compute node that is deploying the target version. In this case, the first node sends a first unload command to the compute node that is deploying the target version, and the compute node that is deploying the target version receives the first unload command from the first node, thereby triggering the unloading operation of the model data of the target version by the first control module.

[0131] In scenarios where compute nodes are deployed in a distributed manner, the first control module can be a functional module of the master node among multiple distributed compute nodes. In this case, the first node sends a first unload instruction to the master node, and the master node receives the first unload instruction from the first node. Subsequently, since the master node can obtain the deployment status of each compute node in the distributed deployment, it can forward the first unload instruction to the compute node deploying the target version, thereby triggering the compute node deploying the target version to unload the target version of the model data.

[0132] Based on the above technical solution, in this application, the first node can determine a target version consistent with the version description information of the first version of the target model from the deployed versions of the target model, and then send an uninstallation command to the first control module to control the computing node deploying the target version to uninstall the model data corresponding to the target version. In other words, the model management method provided in this application can trigger the uninstallation operation of related old version model data based on the version description information of the version to be deployed during the model version deployment or update phase, avoiding the long-term occupation of computing node memory resources by old version model data, thereby improving resource utilization.

[0133] As one possible embodiment, referring to Figure 6 and as shown in Figure 7, step 601 above can be implemented through steps 701-702:

[0134] Step 701: The first node retrieves the version description information of the deployed version of the target model from the target model's information database.

[0135] The information related to each model deployed on the network side can be stored in an information repository. This repository can be a local database of the first node or a third-party database. The first node can retrieve version description information for each version from the target model's information repository. For ease of management, one information repository is set up for each model, which can be called the model's information repository. This repository stores / records information about the model, such as the model identifier, the version identifiers of deployed model versions, the version description information of each deployed model version, and model data generated during application invocation / use. It should be understood that the model's information repository is empty during initialization. Subsequently, the model's information repository is dynamically updated. For example, once a new model version is deployed on the network side, the version identifier and version description information of the new model version can be saved in the model's information repository.

[0136] For example, taking a target model as an example, in some embodiments, the target model's information database can store the target model's model identifier, the version identifier of the deployed version of the target model, and the version description information of the deployed version. When the target model performs operations such as deploying a new version or uninstalling an old version, the first node can update the target model's information database to ensure data real-time performance.

[0137] The target model can have one or more deployed versions. When multiple deployed versions exist, the first node can retrieve the version description information for each deployed version from the target model's information repository. The version description information is described above and will not be repeated here.

[0138] Step 702: The first node determines the model version that matches the version description information of the first version among the deployed versions of the target model as the target version.

[0139] For example, the first node can compare the various parameter information in the version description information of the deployed version with that of the first version to determine whether the version description information of the deployed version is consistent with that of the first version. When there are multiple deployed versions, the first node can compare each deployed version separately.

[0140] Based on the above technical solution, the first node can maintain the version description information of the deployed version of the target model through an information database, so that when information comparison is required in the future, the version description information of the deployed version of the current target model can be obtained in real time.

[0141] It should be understood that the model management method provided in this application embodiment can be applied to different stages of the model management scenario according to the actual situation. For example, in the model deployment request stage of a third-party application, the application node to which the third-party application belongs can request the network side to deploy a new version of the model data of the target model. At this time, the first node can respond to the request and execute the above-mentioned model management method. Specifically, the process is shown in Figure 8. Before executing step 601, the method further includes the following steps 801-802.

[0142] Step 801: The application node sends a first request message. Correspondingly, the first node receives the first request message.

[0143] The first request message is used to request the deployment of a first version of the target model. The first request message may include version description information for the first version. In this application, the first version can be a new version or an updated version of an older version. Updating an older version may include, but is not limited to, updating some model parameters in the model parameters of the target model in the older version. When the first version is a new version, the first request message may be named a model deployment request message; when the first version is an updated version of an older version, the first request message may be named a model update request message, etc., without limitation.

[0144] In some embodiments, step 602 above may be triggered by the first request message. For example, the first node may respond to the version description information of the first version based on the first request message to determine the target version and send a first uninstallation command to the first control module.

[0145] In some embodiments, the application node may send the first request message directly to the first node, or indirectly to the first node. For example, the application node may forward the first request message to the first node through an intermediate node.

[0146] For example, the first request message may also include information such as the application identifier of the application to which the target model belongs, and the version identifier of the first version (e.g., version number).

[0147] In some embodiments, the first request message further includes at least one of the following: a model identifier of the target model, a value factor corresponding to the target model, and application version support information of the first version. The value factor is used to evaluate the value of the model version of the target model, and the application version support information is used to characterize the application versions supported by the first version.

[0148] In this application, the model identifier serves as a unique identifier for each model, distinguishing different models. A correspondence can exist between models and value factors; that is, different value factors can be configured for different models. Since each model has a different structure, function, and complexity, the appropriate evaluation dimensions may also differ. Therefore, in this application, appropriate value factors can be configured for the target model, thereby enabling value evaluation of different versions of the target model based on the corresponding value factors, thus improving the model evaluation effect.

[0149] For example, value factors include at least one of the following: inference accuracy, usage frequency, last call time, and creation time. Inference accuracy characterizes the accuracy of the target model in the corresponding version. Inference accuracy can be determined through statistical analysis of the data output by the target model in the corresponding version and the actual feedback data. For example, inference accuracy can be the difference between the data output by the target model in the corresponding version and the actual feedback data. Usage frequency characterizes how frequently the corresponding version of the target model is called. Last call time and creation time characterize the timeliness of the corresponding version of the target model. Last call time can refer to the call time immediately preceding the current call time, and creation time can refer to the deployment completion time of the corresponding version. By expanding the value factors to include these multiple dimensions, the value of different model versions can be evaluated more accurately and reasonably.

[0150] Furthermore, the model version of the target model and the application version to which the target model belongs usually have an adaptation relationship. When the version of the target model iterates, the application versions it can be compatible with may also change. Therefore, the first node can obtain the application version support information of the first version from the first request message in order to determine the application compatibility of the current target model.

[0151] Step 802: The first node sends a deployment command to the second control module. Correspondingly, the second control module receives the deployment command from the first node.

[0152] The deployment command is used to instruct the deployment of the first version (which may be referred to as the first version) of the model data.

[0153] The description of the second control module is similar to that of the first control module. Specifically, for a standalone computing node, this second control module can be a functional module for the computing node where the first version of the model data is to be deployed. In this case, the first node can directly instruct the computing node to deploy the first version of the model data. For a distributed deployment, this second control module can be a functional module for the master node among multiple computing nodes in a distributed deployment. In this case, the first node can send the deployment instruction to the master node, and the master node forwards the deployment instruction to the slave nodes where the first version is to be deployed.

[0154] In some embodiments, the deployment instructions are further used to instruct the computing nodes controlled by the second control module to collect statistical information corresponding to the value factors for the first version, such as the inference accuracy of the target model in the first version, the usage frequency of the first version, the last call time, and the creation time. In this case, the computing node deploying the corresponding model version also acts as a data collection node, collecting the statistical information corresponding to the value factors to facilitate subsequent value assessment.

[0155] In some embodiments, the deployment instruction includes model data for a first version. For example, the first node can obtain a model image URL from a first request message, download the corresponding version of the model data based on the model image URL, and then issue the deployment instruction to the relevant compute nodes.

[0156] In some embodiments, the first node may determine the second control module in response to a first request message. For example, the first request message may include the service area of the application to which the target model belongs, and the first node can determine the second control module based on the service area. For example, in a standalone deployment, the first node may use the control module of a computing node within that service area as the second control module. In a distributed deployment, the first node may use the control module of the master node among multiple distributed computing nodes within that service area as the second control module.

[0157] In some embodiments, after deployment is complete, the first node can report the deployment result to the application node. This deployment result may include an application identifier, deployment result indication information (indicating successful or failed deployment), and the calling address of the target model for the first version. For example, the second control module can report the deployment result to the first node, which then reports the deployment result to the application node.

[0158] As one embodiment of this disclosure, the model management method provided in this application can also perform a model version uninstallation operation based on model value, thereby further improving resource utilization. Referring to the embodiment shown in FIG6, as shown in FIG9, the method further includes the following step 901.

[0159] Step 901: The first node sends a second unloading command to the third control module. Correspondingly, the third control module receives the second unloading command from the first node.

[0160] The third control module is used to control the deployment of computing nodes for low-value versions in the deployed versions, and the second unload instruction is used to instruct the unloading of model data corresponding to the low-value version. The low-value version is the model version in the deployed versions that meets the low-value condition.

[0161] For details regarding the third control module, please refer to the first control module described above; it will not be repeated here. Furthermore, when there are multiple low-value versions, there can also be multiple third control modules, meaning the first node triggers an uninstallation operation for each low-value version. The uninstallation method can be found in the first uninstallation command described above; it will not be repeated here.

[0162] For example, model versions that meet the low-value condition include: the N model versions with the lowest value among the deployed versions, or model versions among the deployed versions whose value is less than a preset value threshold, where N is a positive integer.

[0163] The value of a model version is determined by statistical information corresponding to a value factor for that model version. For example, this value factor may include at least one of inference accuracy, usage frequency, last call time, and creation time. Inference accuracy and usage frequency can be positively correlated with value; that is, the higher the inference accuracy, the higher the value of the model version. The higher the usage frequency, the higher the value of the model version. Last call time and creation time can be negatively correlated with value. The longer the last call time is from the current time, the lower the value of the model version. The longer the creation time is from the current time, the lower the value of the model version. The value of a model version can be obtained through a weighted calculation of the value factors; this application does not limit the specific calculation method.

[0164] For example, the first node can obtain statistical information corresponding to the value factors of the deployed model version from the computing node. The value of the model version can then be determined based on this statistical information.

[0165] It should be understood that step 901 above can be triggered by an instruction or based on preset conditions. In one possible implementation, if the number of deployed versions of the target model exceeds a preset version number threshold, the first node sends a second uninstallation instruction to the third control module.

[0166] The above-mentioned preset version number threshold can be set according to the actual situation, and this application does not limit it.

[0167] Related technologies typically delete versions in the order of version iteration after the current deployment limit is reached, or delete versions after the storage time limit is reached. However, this approach may result in the deletion of high-value versions, affecting normal user operation, thus leading to unreasonable model management. In this embodiment, when version uninstallation is required, the first node can select low-value versions for uninstallation based on model value, thereby achieving reasonable management of model versions and improving resource utilization.

[0168] In addition, when the model version is uninstalled, it may cause some application versions of the application to which the target model belongs to to become incompatible. Therefore, the first node can send an application update instruction to the application node to instruct the application node to perform an application update.

[0169] As one embodiment of this disclosure, in conjunction with the embodiment shown in FIG6, as shown in FIG9, the method further includes the following step 902.

[0170] Step 902: The first node sends an application update command to the application node. Correspondingly, the application node receives the application update command from the first node.

[0171] The application update instruction includes unavailable version information and / or available version information. Unavailable version information is used to characterize the application version of the application to which the target model currently does not support is located, while available version information is used to characterize the application version of the application to which the target model currently supports is located.

[0172] In some embodiments, the application update instruction may also include a version identifier for a low-value version.

[0173] Since each model version may support different application versions, uninstalling a model version may cause incompatibility issues with some application versions of the target model. In this case, the first node can determine the available and unavailable versions of the current application and report this information to the application node. This allows the application node to instantly obtain the application version support status of the current target model and perform the corresponding application update operation.

[0174] In some embodiments, step 902 may be performed before or after step 901, and this application does not limit this. Furthermore, steps 901 and 902 may be performed separately depending on the actual situation; the two steps are not coupled.

[0175] The overall process of the model management method provided in this application has been described above. The following describes the scenario of unilateral model deployment (i.e., the entire model is deployed on the network-side computing node) using application nodes, model management service consumers, model management service producers, and computing nodes as examples, and taking the first request message as a model deployment request message.

[0176] As one embodiment of this disclosure, as shown in FIG10, the model management method includes the following steps:

[0177] Step 1001: Apply the node to update the model.

[0178] For example, due to reasons such as algorithm updates, training data updates, feature expansions, and bug fixes, third-party application developers need to update the file content of the integrated model within the application, that is, to iterate the model version.

[0179] Step 1002: The application node sends a model deployment request message to the model management service consumer. Correspondingly, the model management service consumer receives the model deployment request message from the application node.

[0180] The model management service consumer acts as the operator's network management system and can be an NMS, rApp, or Non-RT RIC. The model management service consumer establishes communication connections with application nodes through relevant interfaces. When a model version needs to be deployed, the application node can send a model deployment request message to the model management service consumer.

[0181] For example, the model deployment request message may include the application identifier of the third-party application, the service area desired by the third-party application, model information, model image URL, and new version information. The model information may include the model identifier and associated value factors for evaluating the model's value (e.g., average inference accuracy, usage frequency, last call time, creation time, etc.). The new version information includes the version number of the new model version, version description information (i.e., a list of input / output parameter information for the new model version, such as: data structure, number of parameters, size, semantic type, etc., involving the application's calls to both ends of the model), and the application version numbers supported by this model version.

[0182] Step 1003: The model management service consumer sends a model deployment request message to the model management service producer. Correspondingly, the model management service producer receives the model deployment request message from the model management service consumer.

[0183] In this context, the model management service producer, acting as the network management provider for the equipment vendor, can be an EMS, a Non-RT RIC, or a Near-RT RIC. The model management service producer establishes a communication connection with the model management service consumer through relevant interfaces. Upon receiving a model deployment request message, the model management service consumer can forward the request to the model management service producer.

[0184] Step 1004: The model management service producer downloads the model data based on the model image URL and determines the computing node to be deployed.

[0185] For example, a model management service producer can download the required model data based on the model image URL in the model deployment request message, and select a suitable computing node as the deployment node based on the collected resource information and service area. For instance, the model management service producer can collect information on the availability of computing nodes (computing resources, storage resources, bandwidth resources, etc.) within the service area, thereby selecting a computing node that meets the deployment requirements. It should be understood that "meeting deployment requirements" as described in this application means that, given the target model is deployed on the computing node, the resources of the computing node are sufficient to support running the target model and meet its computing power requirements.

[0186] Step 1005: The model management service producer sends deployment instructions to the compute nodes. Correspondingly, the compute nodes receive the deployment instructions from the model management service producer.

[0187] For a standalone compute node, the model management service producer can directly send deployment instructions to that compute node to instruct it to deploy a new version of the model data. For a distributed compute node, the model management service producer can send deployment instructions (e.g., Kubernetes deployment instructions) to the master node among multiple compute nodes, and the master node will forward the deployment instructions to the slave nodes used for deployment.

[0188] For example, the deployment instructions include the new version of the model data and the configured value factors. After deployment, the compute nodes can collect statistical data corresponding to the value factors.

[0189] Step 1006: The model management service producer matches the version description information to determine the old version with the same version description information.

[0190] For example, the model management service producer can store the version description information obtained from the model deployment request message into the database of the target model, and query the database for the version description information of the deployed version of the model requested by the application, thereby determining whether the application can directly call the new version requested for deployment.

[0191] If the old version description information in the information database is the same as the new version description information, it means that the application's calling method for the model has not changed. In this case, only the calling interface address of the model needs to be updated.

[0192] Step 1007: The model management service producer sends an unload command to the compute node. Correspondingly, the compute node receives the unload command from the model management service producer.

[0193] Since applications can directly call the new version, the model management service producer can uninstall the old version without affecting normal application calls. The uninstallation command includes the model identifier and the old version's version number. The compute node is either the compute node that deployed the old version or the master node that controls the compute node. The old version is the same version in the repository as the new version's version description information (also known as interface information).

[0194] Step 1008: The model management service producer sends the deployment results to the model management service consumer. Correspondingly, the model management service consumer receives the deployment results from the model management service producer.

[0195] Step 1009: The model management service consumer sends the deployment results to the application node. Correspondingly, the application node receives the deployment results from the model management service consumer.

[0196] For example, after the model management service producer completes the deployment operation, it can send the deployment result back to the application node through the model management service consumer. This deployment result may include the application identifier, a deployment result indication (success / failure), and the calling address corresponding to the new version.

[0197] When the number of model versions reaches the limit, or when a model expires, this model management method also includes the following steps:

[0198] Step 1010: The model management service producer sends a reporting instruction to the compute node. Correspondingly, the compute node receives the reporting instruction from the model management service producer.

[0199] The reporting instruction may include a model identifier.

[0200] Step 1011: The compute node sends value factor feedback information to the model management service producer. Correspondingly, the model management service producer receives value factor feedback information from the compute node.

[0201] The value factor feedback information includes the model identifier and the statistical information corresponding to the collected value factors.

[0202] In some embodiments, step 1010 above may be an optional step. For example, computing nodes may periodically report statistical information corresponding to the collected value factors, and each computing node may also report in response to the reporting instructions of the model management service producer. When the model management service producer detects that the number of model versions under a certain model has reached a limit or expired, it can obtain the statistical information corresponding to the value factors collected by the computing nodes deployed on all model versions under that model.

[0203] Step 1012: The model management service producer calculates the value of each model version and determines the model version to be deleted.

[0204] For example, the model management service producer can calculate the value of each model version based on statistical information and select low-value versions for deletion. The model management service producer can also select expired versions for deletion. After determining the model version to be deleted, the model management service producer can query the application versions supported by the model version to be deleted. If only the model version to be deleted can be supported by the queried application versions, steps 1013-1014 can be executed to trigger an application update operation. Otherwise, step 1015 can be executed directly.

[0205] Step 1013: The model management service producer sends an application update notification to the model management service consumer. Correspondingly, the model management service consumer receives the application update notification from the model management service producer.

[0206] The application update notification may include the version number of the application that is no longer available, or the version number of the application that can be upgraded to.

[0207] Step 1014: The model management service consumer sends an application update notification to the application node. Correspondingly, the application node receives the application update notification from the model management service consumer.

[0208] Since deleting a model version will cause incompatibility issues with its supported application versions, the corresponding application version will also need to be updated. The model management service producer can send an application update notification message to the model management service consumer, which will then be transparently transmitted.

[0209] Step 1015: The model management service producer sends an unload command to the compute node. Correspondingly, the compute node receives the unload command from the model management service producer.

[0210] For example, the model management service producer issues an uninstallation command to the compute node that has deployed the model version to be deleted, instructing the compute node to uninstall the corresponding model version.

[0211] In this embodiment, the application node of a third-party APP developer can include version description information of the updated new version model file in the model deployment request message to inform the network side. The network side can store this information and then determine whether there is a duplicate entry in the stored historical information. If so, it means that the old version of the model file can be deleted, and the new version can directly replace the old version with the same information, thereby releasing the storage and computing resources of the old version in the computing node and improving resource utilization. Compared with related technologies that lack information judgment for different versions and deletion decision operations for old models, and can only passively wait for the triggering conditions of the model storage policy to release memory resources, this embodiment can actively perform information matching when a new version of the model is deployed, triggering the uninstallation of the relevant version, ensuring that the deployed version is the unique and latest model under the same version description information in the information database, and also ensuring that the application in use has models with different interfaces available.

[0212] The above describes the scenario of unilateral model deployment. The following describes the scenario of bilateral model deployment (i.e., the network-side computing nodes only deploy the network-side model after the complete model split). For related content, please refer to the section on unilateral model deployment; it will not be repeated here.

[0213] As one embodiment of this disclosure, as shown in FIG11, the model management method includes the following steps:

[0214] Step 1101: Apply the node to update the model.

[0215] Specifically, the execution steps can be referred to in S1001, and will not be repeated here.

[0216] Step 1102: The application node sends a model deployment request message to the model management service consumer. Correspondingly, the model management service consumer receives the model deployment request message from the application node.

[0217] For bilateral model deployment, the third party also sends an update request to the network side, and all parameters are the same except for the version description information. In this embodiment of the application, the version description information may include relevant attribute information of the output parameters of the UE-side model and the input parameters of the network-side model.

[0218] Step 1103: The model management service consumer sends a model deployment request message to the model management service producer. Correspondingly, the model management service producer receives the model deployment request message from the model management service consumer.

[0219] Step 1104: The model management service producer downloads the model data based on the model image URL and determines the computing node to be deployed.

[0220] Step 1105: The model management service producer sends deployment instructions to the compute nodes. Correspondingly, the compute nodes receive the deployment instructions from the model management service producer.

[0221] Step 1106: The model management service producer matches the version description information to determine the old version with the same version description information.

[0222] For bilateral model deployments, the version description information stored by the model management service producer differs from that of unilateral model deployments, and the version description information used for information matching also differs. In this embodiment of the application, information storage and query matching are performed based on the relevant attribute information of the output parameters of the UE-side model and the input parameters of the network-side model to determine whether the updated UE-side model and the application can interface with the updated network-side model. If the same content as the new version's version description information can be found in the information database, it means that the way the UE-side model calls the network-side model and the way the application calls the output results of the network-side model have not changed, so the calling interface can be directly changed to the address of the new version. If the same content cannot be found, it means that the UE-side model and / or the application will not be able to call this model, and the old version still needs to be retained.

[0223] Step 1107: The model management service producer sends an unload command to the compute node. Correspondingly, the compute node receives the unload command from the model management service producer.

[0224] Step 1108: The model management service producer sends the deployment results to the model management service consumer. Correspondingly, the model management service consumer receives the deployment results from the model management service producer.

[0225] Step 1109: The model management service consumer sends the deployment results to the application node. Correspondingly, the application node receives the deployment results from the model management service consumer.

[0226] When the number of model versions reaches the limit, or when a model expires, this model management method also includes the following steps:

[0227] Step 1110: The model management service producer sends a reporting instruction to the compute node. Correspondingly, the compute node receives the reporting instruction from the model management service producer.

[0228] Step 1111: The compute node sends value factor feedback information to the model management service producer. Correspondingly, the model management service producer receives value factor feedback information from the compute node.

[0229] Step 1112: The model management service producer calculates the value of each model version and determines the model versions to be deleted.

[0230] For example, the model management service producer can calculate the value of each model version based on statistical information and select low-value versions for deletion. The model management service producer can also select expired versions for deletion. After determining the model version to be deleted, the model management service producer can query the application versions supported by the model version to be deleted. If only the model version to be deleted can be supported by the queried application versions, steps 1113-1114 can be executed to trigger an application update operation. Otherwise, step 1115 can be executed directly.

[0231] Step 1113: The model management service producer sends an application update notification to the model management service consumer. Correspondingly, the model management service consumer receives the application update notification from the model management service producer.

[0232] For bilateral model deployment, the application update prompt message is also used to instruct the application node to uninstall the UE-side model corresponding to the old version to be uninstalled.

[0233] Step 1114: The model management service consumer sends an application update notification to the application node. Correspondingly, the application node receives the application update notification from the model management service consumer.

[0234] Step 1115: The model management service producer sends an unload command to the compute node. Correspondingly, the compute node receives the unload command from the model management service producer.

[0235] The method provided in this application has been described above. In addition, this application also provides a model management device for implementing the functions described in the above method embodiments.

[0236] It is understood that the first node in the above embodiments, in order to achieve the above functions, includes hardware structures and / or software modules corresponding to the execution of each function, such as a model management device. Those skilled in the art should readily recognize that, based on the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein, this application can be implemented in hardware or a combination of hardware and computer software. Whether a function is executed in hardware or by computer software driving hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.

[0237] This application embodiment can divide the model management device into functional modules according to the above method embodiment. For example, each function can be divided into a separate functional module, or two or more functions can be integrated into one processing module. The integrated module can be implemented in hardware or as a software functional module. The module division in this application embodiment is illustrative and only represents one logical functional division. In actual implementation, there may be other division methods.

[0238] Figure 12 shows a schematic diagram of a model management device 120. The model management device 120 includes a processing module 1201 and a transceiver module 1202. The model management device 120 can be used to implement the functions of the first node or the second node described above.

[0239] In some embodiments, the model management device 120 may further include a storage module (not shown in FIG12) for storing program instructions and data.

[0240] In some embodiments, the transceiver module 1202, also referred to as a transceiver unit, is used to implement sending and / or receiving functions. The transceiver module 1202 may consist of a transceiver circuit, a transceiver, a transceiver unit, or a communication interface.

[0241] In some embodiments, the transceiver module 1202 may include a receiving module and a sending module, respectively configured to perform the receiving and sending steps in the above method embodiments, and / or other processes to support the technology described herein; the processing module 1201 may be configured to perform the processing steps in the above method embodiments, and / or other processes to support the technology described herein.

[0242] The processing module 1201 is used to determine the target version from the deployed versions of the target model based on the version description information of the first version of the target model; the version description information of the target version is consistent with the version description information of the first version; the transceiver module 1202 is used to send a first unload command to the first control module; the first control module is used to control the computing node that deploys the target version; the first unload command is used to instruct the unloading of the model data corresponding to the target version.

[0243] In one possible design, the version description information of the model version includes the model parameter information of the model version; the model parameter information includes at least one of the following: the data structure of the model parameters, the number of model parameters, the size of the model parameters, and the semantic type of the model parameters; the model parameters include the model's input parameters and / or output parameters.

[0244] In one possible design, the processing module 1201 is used to obtain the version description information of the deployed version of the target model from the information database of the target model; and to determine the model version in the deployed version of the target model that is consistent with the version description information of the first version as the target version.

[0245] In one possible design, the transceiver module 1202 is used to receive a first request message; the first request message is used to request the deployment of a first version of the target model; the first request message includes version description information of the first version. The transceiver module 1202 is used to determine the target version based on the version description information of the first version in response to the first request message, and send a first uninstallation command to the first control module.

[0246] In one possible design, the first request message may also include at least one of the following: the model identifier of the target model, the value factor corresponding to the target model, and the application version support information of the first version; the value factor is used to evaluate the value of the model version of the target model; and the application version support information is used to characterize the application versions supported by the first version.

[0247] In one possible design, the transceiver module 1202 is used to send deployment instructions to the second control module; the deployment instructions are used to instruct the computing nodes controlled by the second control module to deploy the first version of the model data.

[0248] In one possible design, the deployment instructions are also used to instruct the computing nodes controlled by the second control module to collect statistical information corresponding to the value factors for the first version.

[0249] In one possible design, the value factors include at least one of the following: inference accuracy, usage frequency, last call time, and creation time.

[0250] In one possible design, the transceiver module 1202 is used to send a second unload command to the third control module; the third control module is used to control the deployment of computing nodes of low-value versions in the deployed version; the second unload command is used to instruct the unloading of model data corresponding to the low-value version, where the low-value version is the model version in the deployed version that meets the low-value condition.

[0251] In one possible design, model versions that meet the low-value condition include: the N model versions with the lowest value among the deployed versions, or model versions with a value less than a preset value threshold among the deployed versions, where N is a positive integer.

[0252] In one possible design, the value of a model version is determined by statistical information corresponding to the value factors for that model version.

[0253] In one possible design, the transceiver module 1202 is used to send a second unload command to the third control module when the number of deployed versions of the target model exceeds a preset version number threshold.

[0254] In one possible design, the transceiver module 1202 is used to send an application update instruction to the application node; the application update instruction includes unavailable version information and / or available version information; the unavailable version information is used to characterize the application version of the application to which the currently unsupported target model belongs; the available version information is used to characterize the application version of the application to which the currently supported target model belongs.

[0255] In one possible design, the application update instruction also includes a version identifier for low-value versions.

[0256] All relevant content of each step involved in the above method embodiments can be referenced from the functional description of the corresponding functional module, and will not be repeated here.

[0257] In this application, the model management device 120 can be presented in an integrated manner by dividing it into various functional modules. Here, "module" can refer to an application-specific integrated circuit (ASIC), a circuit, a processor and memory that executes one or more software or firmware programs, integrated logic circuits, and / or other devices that can provide the above functions.

[0258] In some embodiments, when the model management device 120 in FIG12 is a chip or chip system, the function / implementation process of the transceiver module 1202 can be implemented through the input / output interface (or communication interface) of the chip or chip system, and the function / implementation process of the processing module 1201 can be implemented through the processor (or processing circuit) of the chip or chip system.

[0259] Since the model management device 120 provided in this embodiment can execute the above method, the technical effects it can achieve can be referred to the above method embodiment, and will not be repeated here.

[0260] As one possible product form, the first node described in the embodiments of this application can be implemented using the following: one or more field programmable gate arrays (FPGAs), programmable logic devices (PLDs), controllers, state machines, gate logic, discrete hardware components, any other suitable circuits, or any combination of circuits capable of performing the various functions described throughout this application.

[0261] As another possible product form, the first node described in this application embodiment can be implemented using a general bus architecture. For clarity, refer to FIG13, which is a schematic diagram of the model management device 1300 provided in this application embodiment. The model management device 1300 includes a processor 1301 and a transceiver 1302. The model management device 1300 can be a first node, or a chip or chip system thereof. FIG13 only shows the main components of the model management device 1300. In addition to the processor 1301 and transceiver 1302, the model management device may further include a memory 1303 and input / output devices (not shown in FIG13).

[0262] Optionally, the processor 1301 is mainly used to process communication protocols and communication data, control the entire model management device, execute software programs, and process the data of the software programs, thereby implementing the methods provided in the above-described method embodiments. The memory 1303 is mainly used to store software programs and data. The transceiver 1302 may include a radio frequency (RF) circuit and an antenna. The RF circuit is mainly used for converting baseband signals to RF signals and processing RF signals. The antenna is mainly used for transmitting and receiving RF signals in the form of electromagnetic waves. Input / output devices, such as touch screens, displays, and keyboards, are mainly used to receive user input data and output data to the user.

[0263] Optionally, the processor 1301, transceiver 1302, and memory 1303 can be connected via a communication bus.

[0264] When the model management device is powered on, the processor 1301 can read the software program in the memory 1303, execute the instructions of the software program, and process the data of the software program. When data needs to be transmitted wirelessly, the processor 1301 performs baseband processing on the data to be transmitted and outputs the baseband signal to the radio frequency (RF) circuit. The RF circuit processes the baseband signal and transmits the RF signal outward in the form of electromagnetic waves through the antenna. When data is sent to the model management device, the RF circuit receives the RF signal through the antenna, converts the RF signal into a baseband signal, and outputs the baseband signal to the processor 1301. The processor 1301 converts the baseband signal into data and processes the data.

[0265] In another implementation, the radio frequency circuitry and antenna can be set up independently of the processor performing baseband processing. For example, in a distributed scenario, the radio frequency circuitry and antenna can be arranged in a remote manner, independent of the model management device.

[0266] In some embodiments, those skilled in the art will recognize that the above-described model management device 120 can be implemented in the form of the model management device 1300 shown in FIG13.

[0267] As an example, the function / implementation process of the processing module 1201 in Figure 12 can be implemented by the processor 1301 in the model management device 1300 shown in Figure 13 calling computer execution instructions stored in the memory 1303. The function / implementation process of the transceiver module 1202 in Figure 12 can be implemented by the transceiver 1302 in the model management device 1300 shown in Figure 13.

[0268] As another possible product form, the first node in this application can adopt the composition structure shown in FIG14, or include the components shown in FIG14. FIG14 is a schematic diagram of the composition of a model management device 1400 provided in this application. The model management device 1400 can be the first node or a chip or system-on-a-chip in the first node.

[0269] As shown in Figure 14, the model management device 1400 includes at least one processor 1401 and at least one communication interface (Figure 14 is merely an example illustrating the inclusion of a communication interface 1404 and a processor 1401). Optionally, the model management device 1400 may also include a communication bus 1402 and a memory 1403.

[0270] Processor 1401 can be a general-purpose central processing unit (CPU), a general-purpose processor, a network processor (NP), a digital signal processor (DSP), a microprocessor, a microcontroller, a PLD, or any combination thereof. Processor 1401 can also be other devices with processing functions, such as circuits, devices, or software modules, without limitation.

[0271] The communication bus 1402 is used to connect different components in the model management device 1400, enabling communication between them. The communication bus 1402 can be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc. This bus can be divided into address bus, data bus, control bus, etc. For ease of illustration, only one thick line is used in Figure 14, but this does not indicate that there is only one bus or one type of bus.

[0272] Communication interface 1404 is used for communicating with other devices or communication networks. Exemplarily, communication interface 1404 can be a module, circuit, transceiver, or any device capable of communication. Optionally, the communication interface 1404 can also be an input / output interface located within processor 1401, used to implement signal input and signal output for the processor.

[0273] The memory 1403 may be a device with storage function, used to store instructions and / or data. The instructions may be computer programs.

[0274] For example, the memory 1403 may be a read-only memory (ROM) or other type of static storage device capable of storing static information and / or instructions; it may also be a random access memory (RAM) or other type of dynamic storage device capable of storing information and / or instructions; it may also be an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disc storage, optical disc storage (including compressed optical discs, laser discs, optical discs, digital universal optical discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, etc., without limitation.

[0275] It should be noted that the memory 1403 can exist independently of the processor 1401, or it can be integrated with the processor 1401. The memory 1403 can be located within the model management device 1400 or outside the model management device 1400, without limitation. The processor 1401 can be used to execute the instructions stored in the memory 1403 to implement the methods provided in the following embodiments of this application.

[0276] Optionally, the processor 1401 and / or memory 1403 may include an artificial intelligence (AI) module, which is used to implement AI-related functions. The AI module can be implemented through software, hardware, or a combination of both. For example, the AI module may include a radio network intelligent controller (RIC) module. For example, the AI module can be a near real-time RIC or a non-real-time RIC.

[0277] As an optional implementation, the model management device 1400 may also include an output device 1405 and an input device 1406. The output device 1405 communicates with the processor 1401 and can display information in various ways. For example, the output device 1405 may be a liquid crystal display (LCD), a light-emitting diode (LED) display device, a cathode ray tube (CRT) display device, or a projector, etc. The input device 1406 communicates with the processor 1401 and can receive user input in various ways. For example, the input device 1406 may be a mouse, keyboard, touchscreen device, or sensor device, etc.

[0278] In some embodiments, those skilled in the art will recognize that the model management device 120 shown in FIG12 can take the form of the model management device 1400 shown in FIG14 in terms of hardware implementation.

[0279] As an example, the function / implementation process of the processing module 1201 in Figure 12 can be implemented by the processor 1401 in the model management device 1400 shown in Figure 14 calling computer execution instructions stored in the memory 1403. The function / implementation process of the transceiver module 1202 in Figure 12 can be implemented by the communication interface 1404 in the model management device 1400 shown in Figure 14.

[0280] The structure shown in Figure 14 does not constitute a specific limitation on the first node. For example, in other embodiments of this application, the first node may include more or fewer components than illustrated, or combine some components, or split some components, or have different component arrangements. The components illustrated may be implemented in hardware, software, or a combination of software and hardware.

[0281] In some embodiments, this application also provides a model management device, which includes a processor for implementing the methods in any of the above method embodiments.

[0282] As one possible implementation, the model management device also includes a memory. This memory stores necessary computer programs and data. The computer program may include instructions, which the processor can invoke to instruct the model management device to execute the methods described in any of the above method embodiments. Alternatively, the memory may not be present in the model management device.

[0283] As another possible implementation, the model management device also includes an interface circuit, which is a code / data read / write interface circuit, used to receive computer execution instructions (which are stored in memory and may be read directly from memory or may be transmitted through other devices) and transmit them to the processor.

[0284] As another possible implementation, the model management device also includes a communication interface for communicating with modules outside the model management device.

[0285] It is understood that the model management device can be a chip or a chip system. When the model management device is a chip system, it can be composed of chips or may include chips and other discrete devices. This application does not specifically limit this.

[0286] This application also provides a computer-readable storage medium having a computer program or instructions stored thereon, which, when executed by a computer, implements the functions of any of the above-described method embodiments.

[0287] This application also provides a computer program product that, when executed by a computer, implements the functions of any of the above method embodiments.

[0288] Those skilled in the art will understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.

[0289] It is understood that the systems, apparatuses, and methods described in this application can also be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative. For instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the couplings or direct couplings or communication connections shown or discussed may be through some interfaces; indirect couplings or communication connections between devices or units may be electrical, mechanical, or other forms.

[0290] The units described as separate components may or may not be physically separate; that is, they may be located in one place or distributed across multiple network units. The components shown as units may or may not be physical units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0291] In addition, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.

[0292] In the above embodiments, implementation can be achieved, in whole or in part, through software, hardware, firmware, or any combination thereof. When implemented using software programs, implementation can be, in whole or in part, in the form of a computer program product. This computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of this application are generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions can be transmitted from one website, computer, server, or data center to another via wired (e.g., coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium accessible to a computer or a data storage device containing one or more servers, data centers, etc., that can be integrated with the medium. The available medium can be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid-state drive (SSD)). In embodiments of this application, the computer may include the aforementioned apparatus.

[0293] Although this application has been described herein in conjunction with various embodiments, those skilled in the art, by reviewing the accompanying drawings, disclosure, and appended claims, will understand and implement other variations of the disclosed embodiments in carrying out the claimed application. In the claims, the word "comprising" does not exclude other components or steps, and "a" or "an" does not exclude a plurality. A single processor or other unit can implement several functions listed in the claims. While different dependent claims may recite certain measures, this does not mean that these measures cannot be combined to produce good results.

[0294] Although this application has been described in conjunction with specific features and embodiments, it is obvious that various modifications and combinations can be made thereto without departing from the scope of this application. Accordingly, this specification and drawings are merely illustrative descriptions of the application as defined by the appended claims, and are considered to cover any and all modifications, variations, combinations, or equivalents within the scope of this application. Clearly, those skilled in the art can make various alterations and modifications to this application without departing from its scope. Thus, if such modifications and modifications fall within the scope of the claims and their equivalents, this application is also intended to include such modifications and modifications.

Claims

1. A model management method, characterized in that, The method includes: Based on the version description information of the first version of the target model, the target version is determined from the deployed versions of the target model; the version description information of the target version is consistent with the version description information of the first version. A first uninstallation command is sent to the first control module; the first control module is used to control the computing nodes that deploy the target version; the first uninstallation command is used to instruct the uninstallation of the model data corresponding to the target version.

2. The method according to claim 1, characterized in that, The version description information of the model version includes the model parameter information of the model in the model version; the model parameter information includes at least one of the following: the data structure of the model parameters, the number of model parameters, the size of the model parameters, and the semantic type of the model parameters; the model parameters include the model's input parameters and / or output parameters.

3. The method according to claim 1 or 2, characterized in that, The version description information based on the first version of the target model, which determines the target version from the deployed versions of the target model, includes: Retrieve version description information of the deployed version of the target model from the target model's information database; The version of the target model whose version description information is consistent with that of the first version among the deployed versions is determined as the target version.

4. The method according to any one of claims 1-3, characterized in that, The method further includes: Receive a first request message; the first request message is used to request the deployment of the first version of the target model; The first request message includes version description information of the first version; Sending the first unload command to the first control module includes: The system determines the target version based on the version description information of the first version in the first request message and sends the first uninstallation command to the first control module.

5. The method according to claim 4, characterized in that, The first request message further includes at least one of the following: the model identifier of the target model, the value factor corresponding to the target model, and the application version support information of the first version; the value factor is used to evaluate the value of the model version of the target model. The application version support information is used to characterize the application versions supported by the first version.

6. The method according to claim 5, characterized in that, The method further includes: Send a deployment instruction to the second control module; the deployment instruction is used to instruct the deployment of the first version of the model data.

7. The method according to claim 6, characterized in that, The deployment command is also used to instruct the computing nodes controlled by the second control module to collect statistical information corresponding to the value factors of the first version.

8. The method according to any one of claims 5-7, characterized in that, The value factor includes at least one of the following: Inference accuracy, usage frequency, last call time, and creation time.

9. The method according to any one of claims 1-8, characterized in that, The method further includes: A second unload instruction is sent to a third control module; the third control module is used to control the deployment of computing nodes for low-value versions in the deployed version; the second unload instruction is used to instruct the unloading of model data corresponding to the low-value version, wherein the low-value version is a model version in the deployed version that meets the low-value condition.

10. The method according to claim 9, characterized in that, The model versions that meet the low-value condition include: the N model versions with the lowest value among the deployed versions, or the model versions among the deployed versions whose value is less than a preset value threshold, where N is a positive integer.

11. The method according to claim 10, characterized in that, The value of the model version is determined by the statistical information corresponding to the value factors for the model version.

12. The method according to any one of claims 9-11, characterized in that, Sending the second unload command to the third control module includes: If the number of deployed versions of the target model exceeds a preset version number threshold, the second uninstallation command is sent to the third control module.

13. The method according to any one of claims 9-12, characterized in that, Before sending the second unload command to the third control module, the method further includes: Send an application update instruction to the application node; the application update instruction includes unavailable version information and / or available version information; the unavailable version information is used to characterize the application version of the application to which the target model belongs that is not currently supported; the available version information is used to characterize the application version of the application to which the target model belongs that is currently supported.

14. The method according to claim 13, characterized in that, The application update instruction also includes the version identifier of the low-value version.

15. A model management device, characterized in that, include: A functional unit for performing the method as described in any one of claims 1-14; wherein the action performed by the functional unit is implemented by hardware or by hardware executing corresponding software.

16. A model management device, characterized in that, The model management device includes a processor; the processor is configured to run a computer program or instructions to cause the model management device to perform the method as described in any one of claims 1-14.

17. A computer-readable storage medium, characterized in that, A computer-readable storage medium stores computer instructions or programs that, when executed on a computer, cause the method described in any one of claims 1-14 to be performed.

18. A computer program product, characterized in that, The computer program product includes computer instructions; when some or all of the computer instructions are run on a computer, the method described in any one of claims 1-14 is performed.