Load balancing control method, system and device of edge computing and storage medium

By constructing a temporal knowledge graph and a graph neural network model, the network connection requirements in the edge computing environment are predicted, which solves the problem of uneven distribution of network resources in the edge computing environment and realizes the uniform and efficient allocation of network resources and improves the data reading response speed.

CN117176723BActive Publication Date: 2026-06-30PENG CHENG LAB

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
PENG CHENG LAB
Filing Date
2023-08-10
Publication Date
2026-06-30

Smart Images

  • Figure CN117176723B_ABST
    Figure CN117176723B_ABST
Patent Text Reader

Abstract

This application provides a load balancing control method, system, device, and storage medium for edge computing, belonging to the field of data interaction technology. The method includes: acquiring network request information sent by a user device and determining the request time; acquiring the status information of multiple communication devices at the current request time, and constructing a temporal knowledge graph using the user device and multiple communication devices as nodes, and the request time and status information as attributes corresponding to each node; inputting the temporal knowledge graph into a pre-trained graph neural network model for processing, predicting the evaluation values ​​of the connections between the user device and each communication device in multiple future time periods; based on the evaluation values, determining the target device that meets the network connection requirements in each future time period from the multiple communication devices, and sending a connection command to the user device, so that the user device can communicate and connect with the corresponding target device in each future time period.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of artificial intelligence technology, and in particular to a load balancing control method, system, device and storage medium for edge computing. Background Technology

[0002] Edge computing is an open platform that leverages the core capabilities of networking, computing, storage, and applications to provide the nearest endpoint service on the side closest to the object or data source. By launching applications at the edge, edge computing enables faster network service response times, meeting fundamental industry needs in areas such as real-time business, application intelligence, security, and privacy.

[0003] When handling load balancing, it is typically assumed that the environment is static or remains unchanged over a fixed period of time. However, in real-world applications, edge computing environments are highly dynamic. Therefore, device states, network conditions, and user requests can change at any moment. Due to the complexity and dynamism of edge computing environments, uneven distribution of network resources is highly likely to occur, thereby affecting the normal operation of servers. Summary of the Invention

[0004] The main objective of this application is to propose a load balancing control method, system, device, and storage medium for edge computing, which can predict nodes in complex and changing environments, thereby achieving uniform and efficient allocation of network resources.

[0005] To achieve the above objectives, a first aspect of this application proposes a load balancing control method for edge computing, applied in a server. The server is communicatively connected to multiple communication devices, each communication device including at least one of multiple edge nodes and network nodes. The method includes: acquiring network request information sent by a user device and determining the request time of the network request information; acquiring the status information of multiple communication devices at the current request time, and constructing a temporal knowledge graph with the user device and the multiple communication devices as nodes, and the request time and the status information as attributes corresponding to each node; acquiring a pre-trained graph neural network model, inputting the temporal knowledge graph into the graph neural network model for processing, and predicting the evaluation values ​​of the connection between the user device and each of the communication devices in multiple future time periods; based on each evaluation value, determining the target device that meets the network connection requirements in each future time period from the multiple communication devices, and sending a connection instruction to the user device so that the user device can communicate with the corresponding target device in each future time period.

[0006] According to some embodiments of this application, the graph neural network model is trained according to the following steps: acquiring node data of various communication devices and user devices in history; acquiring the relationship between the node data, and constructing a sample time-series knowledge graph based on the node data and the relationship; collecting negative and positive samples from the sample time-series knowledge graph, inputting them into the graph neural network model, and obtaining sample evaluation values; adjusting the parameters of the graph neural network model based on the sample evaluation values ​​to obtain an optimized graph neural network model.

[0007] According to some embodiments of this application, the step of obtaining the relationships between the node data and constructing a sample temporal knowledge graph based on the node data and the relationships includes: performing data cleaning and preprocessing on the node data to obtain sample nodes; forming sample triplet data based on the sample nodes as entities and the actions between the sample nodes as relationships; wherein the sample triplet data includes a head entity, a relationship, and a tail entity; obtaining the start time and end time of each relationship, and forming sample quintuple data based on the sample triplet data, the start time, and the end time; and constructing a sample temporal knowledge graph based on multiple sample quintuple data.

[0008] According to some embodiments of this application, during the training phase of the graph neural network model, the method further includes: performing feature transformation on sample nodes in the sample temporal knowledge graph to obtain vector sample nodes; inputting the vector sample nodes into the graph neural network model to obtain neighboring sample nodes for each vector sample node; aggregating the vector sample nodes and the neighboring sample nodes to obtain semantic aggregation information; and adjusting the parameters of the graph neural network model based on the semantic aggregation information.

[0009] According to some embodiments of this application, the evaluation value is obtained according to the following steps: obtaining the node corresponding to the user equipment; inputting the node into the graph neural network model to obtain a first vector node and the communication nodes of each communication device; calculating the product of the first vector node and each of the communication nodes to obtain the evaluation value of each communication device.

[0010] According to some embodiments of this application, the graph neural network model is optimized through the following steps: fitting a knowledge graph environment using a deep reinforcement learning network; the environment includes the temporal knowledge graph and the initial states and initial actions of each initial node data of the temporal knowledge graph; performing message passing on the initial node data to obtain multiple candidate states and corresponding multiple candidate actions; obtaining the state transition probabilities of the candidate states based on the candidate states and the candidate actions, and selecting a target state based on the state transition probabilities; inputting the target state and the corresponding candidate actions into a first feedback function to calculate the expected state feedback of the target state; inputting the expected state feedback into an advantage function to select a target action from the multiple candidate actions; executing the target action based on the target state to obtain an updated state; inputting the updated state into a second feedback function to calculate immediate feedback; inputting the target action and the updated state into the first feedback function to calculate expected feedback; and adjusting the parameters of the graph neural network model with the expected feedback as the execution target to update the strategy for generating evaluation values.

[0011] According to some embodiments of this application, before obtaining the state transition probability of the candidate state based on the candidate state and the candidate action, and selecting the target state based on the state transition probability, the method further includes: selecting an initial state and a corresponding initial action from the environment; executing the initial action based on the initial state to obtain a transition state; comparing the initial state and the corresponding transition state, and if the comparison results are inconsistent, marking the transition state to obtain a marked transition state; and obtaining the state transition probability of the initial state and the corresponding transition state by dividing the number of marked transition states by the number of initial states.

[0012] According to some embodiments of this application, the step of obtaining the state transition probability of the candidate state based on the candidate state and the candidate action, and selecting the target state based on the state transition probability, includes: searching from pre-calculated state transition probabilities based on the candidate state and the candidate action to obtain the state transition probability corresponding to each candidate state; and selecting the candidate state corresponding to the largest state transition probability as the target state.

[0013] According to some embodiments of this application, the step of selecting a target action from multiple candidate actions by inputting the expected state feedback advantage function includes: calculating the expected feedback of the target state corresponding to each of the multiple candidate actions according to the first feedback function; adding the expected feedback corresponding to each candidate action to obtain the total expected feedback; dividing the total expected feedback by the number of candidate actions to obtain the average expected feedback; subtracting the average expected feedback from the expected feedback corresponding to each candidate action to obtain a reference value for each candidate action; arranging the reference values ​​corresponding to the multiple candidate actions, and selecting the candidate action with the largest reference value as the target action.

[0014] According to some embodiments of this application, the real-time feedback includes: positive real-time feedback and negative real-time feedback; the step of calculating real-time feedback based on the updated state input to the second feedback function includes: performing message passing based on the target state and the target action to obtain the updated state; in the updated state, obtaining data indicators of the environment; wherein, the data indicators include device bandwidth overrun ratio, request processing time, and device processing capacity; if the data indicator is positive, then multiplying the device bandwidth overrun ratio by a first coefficient and taking the negative value yields a first positive indicator, and multiplying the request processing time by a second coefficient and taking the negative value yields a first positive indicator. The system calculates a second positive indicator and a third positive indicator by multiplying the device's processing capacity by a third coefficient and taking a positive value. The first, second, and third positive indicators are then added together to obtain immediate feedback. If the data indicator is negative, a first negative indicator is obtained by multiplying the device's bandwidth over-limit ratio by a first coefficient and taking a positive value; a second negative indicator is obtained by multiplying the request processing time by a second coefficient and taking a positive value; and a third negative indicator is obtained by multiplying the device's processing capacity by a third coefficient and taking a negative value. The first, second, and third negative indicators are then added together to obtain immediate feedback.

[0015] According to some embodiments of this application, the first feedback function is obtained by training N times through the following steps: obtaining the current state and the current action corresponding to the current state from the environment; inputting the current state and the current action into the second feedback function to obtain the current feedback value; inputting the current feedback into the initial first feedback function to obtain the expected feedback value; selecting a target action based on the current state and the current action, and executing the target action to obtain an updated state value; inputting the data index corresponding to the updated state value into the second feedback function to obtain a reference feedback value; adjusting the parameters of the first feedback function based on the reference feedback value and the expected feedback value to obtain the first feedback function; wherein, N is a positive integer greater than or equal to 1, and N is the number of training iterations reached when the first feedback function converges.

[0016] A second aspect of this application proposes a load balancing control method for edge computing, applied in a communication device, wherein the communication device is an edge node or a network node, and the communication device is communicatively connected to a server. The method includes: sending status information of the communication device to the server; wherein the server is configured to acquire network request information sent by a user device and determine the request time of the network request information; further configured to, after acquiring the status information of multiple communication devices at the current request time, construct a temporal knowledge graph with the user device and multiple communication devices as nodes, and the request time and status information as attributes corresponding to each node; further configured to acquire a pre-trained graph neural network model, input the temporal knowledge graph into the graph neural network model for processing, and predict the evaluation value of the connection between the user device and each of the communication devices in multiple future time periods; further configured to, based on each evaluation value, determine the target device that meets the network connection requirements in each future time period from the multiple communication devices; if it is determined to be the target device in any future time period, receive network connection information sent by the user device, and establish a communication connection with the user device based on the network connection information.

[0017] A third aspect of this application proposes a load balancing control method for edge computing, applied in a user equipment (UE), wherein the UE communicates with a server. The method includes: sending network request information to the server, so that the server determines the request time of the network request information after receiving it; wherein the server is further configured to acquire the status information of multiple communication devices at the current request time, and construct a temporal knowledge graph with the UE and the multiple communication devices as nodes, and the request time and the status information as attributes corresponding to each node; further configured to acquire a pre-trained graph neural network model, input the temporal knowledge graph into the graph neural network model for processing, and predict the evaluation value of the connection between the UE and each of the communication devices in multiple future time periods; further configured to determine the target device that meets the network connection requirements in each future time period from the multiple communication devices according to the evaluation values, and send a connection instruction to the UE; receive the connection instruction sent by the server, send network connection information to the corresponding target device in each future time period, and establish a communication connection with the corresponding target device based on the network connection information.

[0018] This application's fourth aspect proposes a load balancing control system for edge computing, applied in a server. The server is communicatively connected to multiple communication devices, each including at least one of multiple edge nodes and network nodes. The system comprises: a network request information acquisition module, used to acquire network request information sent by a user device and determine the request time of the network request information; a temporal knowledge graph construction module, used to acquire the status information of multiple communication devices at the current request time, and construct a temporal knowledge graph using the user device and the multiple communication devices as nodes, and the request time and the status information as attributes corresponding to each node; an evaluation value prediction module, used to acquire a pre-trained graph neural network model, input the temporal knowledge graph into the graph neural network model for processing, and predict the evaluation value of the connection between the user device and each of the communication devices in multiple future time periods; and a target device acquisition module, used to determine the target device that meets the network connection requirements in each future time period from the multiple communication devices according to the evaluation values, and send a connection command to the user device so that the user device can communicate with the corresponding target device in each future time period.

[0019] The fifth aspect of this application provides an apparatus comprising a memory and a processor, the memory storing a computer program, the processor executing the computer program to implement the edge computing load balancing control method described in any one of the embodiments of the first aspect of this application.

[0020] The sixth aspect of this application provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the edge computing load balancing control method described in any one of the embodiments of the first aspect of this application.

[0021] The edge computing load balancing control method, system, device, and storage medium proposed in this application can obtain the status information of multiple communication devices at the request time based on the request time of network request information. It constructs a knowledge graph using user devices and multiple communication devices as nodes, and request time and status information as attributes of each node. The time-series graph is then processed by a pre-trained graph neural network model to predict the evaluation values ​​of user devices' connections to multiple communication devices in multiple future time periods. Based on the evaluation values, target devices are selected, and connection commands are sent to the user devices, enabling them to connect to the corresponding target devices in the predicted future time periods. This allows for node prediction through evaluation value calculation in complex and dynamic edge computing environments, achieving uniform and efficient allocation of network resources. Attached Figure Description

[0022] Figure 1 This is a schematic diagram of the edge computing load balancing control system provided in the embodiments of this application;

[0023] Figure 2 This is a flowchart of the load balancing control method for edge computing provided in the embodiments of this application;

[0024] Figure 3 This is a schematic diagram of the time-series knowledge graph provided in the embodiments of this application;

[0025] Figure 4 This is a flowchart of the graph neural network training process provided in an embodiment of this application;

[0026] Figure 5 yes Figure 4 Another flowchart for step SS202 in the process;

[0027] Figure 6 This is yet another flowchart of graph neural network training provided in the embodiments of this application;

[0028] Figure 7 This is a schematic diagram illustrating node updates via message passing provided in an embodiment of this application;

[0029] Figure 8 This is a flowchart illustrating the acquisition of evaluation values ​​provided in an embodiment of this application;

[0030] Figure 9 This is a flowchart of the inference optimization process of the graph neural network model provided in the embodiments of this application;

[0031] Figure 10 This is a flowchart of the steps preceding step S603 provided in the embodiments of this application;

[0032] Figure 11 This is a flowchart of step S603 provided in the embodiments of this application;

[0033] Figure 12 This is a flowchart of step S605 provided in the embodiments of this application;

[0034] Figure 13 This is a flowchart of step S607 provided in the embodiments of this application;

[0035] Figure 14 This is a flowchart of the training process for the first feedback function provided in an embodiment of this application;

[0036] Figure 15 This is a flowchart of a method applied to a communication device provided in an embodiment of this application;

[0037] Figure 16 This is a flowchart of a method applied to a user equipment according to an embodiment of this application;

[0038] Figure 17 This is a diagram illustrating the overall optimization process provided in the embodiments of this application;

[0039] Figure 18 This is a functional module diagram of the edge computing load balancing control system provided in the embodiments of this application;

[0040] Figure 19 This is a schematic diagram of the hardware structure of the device provided in the embodiments of this application. Detailed Implementation

[0041] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.

[0042] It should be noted that although functional modules are divided in the device schematic diagram and a logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in a different order than the module division in the device or the order in the flowchart. The terms "first," "second," etc., in the specification, claims, and the aforementioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence.

[0043] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of this application only and is not intended to limit this application.

[0044] Edge computing aims to improve the efficiency and performance of cloud computing architectures. It moves computing resources and data processing functions from cloud data centers to edge nodes closer to the data source, such as sensors, terminal devices, and network edge devices, thereby reducing data transmission latency, improving response speed, and reducing network bandwidth requirements.

[0045] However, in practical applications, the edge computing environment is highly dynamic, which means that the status of devices, network conditions, and user requests may change at any time. Due to the complexity and dynamism of the edge computing environment, uneven distribution of network resources may occur, which will affect the normal operation of the server and the efficiency of edge computing.

[0046] Based on this, embodiments of this application provide a load balancing control method, system, device, and storage medium for edge computing, which can improve the response speed of data reading.

[0047] The edge computing load balancing control method, system, device and storage medium provided in this application are specifically described through the following embodiments. First, the edge computing load balancing control system in this application embodiment is described.

[0048] Please refer to Figure 1 In some embodiments, the edge computing load balancing control system includes: a control module 101, a model pre-training module 102, an algorithm training module 103, and a load balancing module 104.

[0049] In some embodiments, the control module 101 can be the nerve center and command center of the system. The control module 101 can generate operation control signals according to the instruction opcode and timing signals to complete the control of instruction fetching and execution. For example, the control module 101 can control the model pre-training module 102, the algorithm training module 103 and the load balancing module 104 according to the instructions to achieve load balancing of edge computing.

[0050] In some embodiments, the model pre-training module 102 can perform preliminary training on the graph neural network model, enabling the model to have the ability to pass messages in order to find the hidden state of the current node. In some embodiments, the model pre-training module 102 can also train the model by using two nodes with edges as positive samples and two nodes without edges as negative samples, and continuously adjust the model parameters during the training process, so that the model has the ability to predict potential connections between nodes.

[0051] In some embodiments, the algorithm training module 103 can train the algorithm of the model, specifically through training with a deep reinforcement network (DQN), so that the model can continuously adjust the strategy and model parameters according to the current reward and expected reward, and optimize the algorithm.

[0052] In some embodiments, the load balancing module 104 can input the time-series knowledge graph into the graph neural network model for processing, predict the evaluation values ​​of the connection between the user equipment and each communication device in multiple future time periods, and determine the target device that meets the network connection requirements in each future time period from multiple communication devices based on the evaluation values, and send a connection instruction to the user equipment so that the user equipment can communicate and connect with the corresponding target device in each future time period.

[0053] The load balancing control method for edge computing in this application can be illustrated through the following embodiments.

[0054] It should be noted that in all specific embodiments of this application, when processing data related to user identity or characteristics, such as user information, user behavior data, user historical data, and user location information, user permission or consent is obtained first. For example, when acquiring data from the user's device and network request information sent by the user's device, user permission or consent is obtained first. Furthermore, the collection, use, and processing of this data comply with relevant laws, regulations, and standards. In addition, when embodiments of this application require the acquisition of sensitive personal information from the user's device, separate permission or consent from the user is obtained through pop-ups or redirection to a confirmation page. Only after obtaining the user's separate permission or consent is the necessary user-related data for the normal operation of embodiments of this application obtained.

[0055] Figure 2 This is an optional flowchart of the edge computing load balancing control method provided in the embodiments of this application. Figure 2 The method is applied to a server, which is connected to multiple communication devices, including at least one of multiple edge nodes and network nodes. The method may include, but is not limited to, steps S101 to S104.

[0056] Step S101: Obtain network request information sent by the user equipment and determine the request time when the network request information was sent;

[0057] Step S102: Obtain the status information of multiple communication devices at the current request time, and construct a time-series knowledge graph with the user equipment and multiple communication devices as nodes, and the request time and status information as the attributes of each node.

[0058] Step S103: Obtain the pre-trained graph neural network model, input the temporal knowledge graph into the graph neural network model for processing, and predict the evaluation value of the connection between the user equipment and various communication devices in multiple future time periods.

[0059] Step S104: Based on the various evaluation values, determine the target device that meets the network connection requirements in each future time period from multiple communication devices, and send a connection instruction to the user equipment so that the user equipment can communicate and connect with the corresponding target device in each future time period.

[0060] Please refer to Figure 3In some embodiments, the communication device may include at least one of multiple edge nodes and network nodes. Specifically, an edge node refers to a device or server connected to the network edge, primarily used to provide edge computing, storage, and services. Edge nodes may include edge servers, edge routers, edge switches, edge devices (such as sensors, cameras, smart home devices), etc. It is understood that the attributes of an edge node may include the device ID, device type, device operating status, device CPU utilization, device memory utilization, and device network bandwidth utilization, etc.

[0061] In some embodiments, a network node refers to a device or server used for data transmission and communication in a computer network. Network nodes may include switches, routers, servers, firewalls, etc. The node attributes of a network node include the source IP address, destination IP address, source port, destination port, protocol type, number of data packets, and number of bytes of traffic.

[0062] In some embodiments, user equipment refers to various terminal devices used by a user on a network. User equipment may include mobile phones, tablets, IoT devices (such as smartphones, smartwatches, smart home devices, etc.) and terminal devices (such as printers, cameras, sensors, etc.), etc. The node attributes of the user equipment include the requesting user ID, the time of the request, the type of the request (such as video streaming, file download, etc.), the destination of the request, and the duration of the request, etc.

[0063] As is understandable, request time refers to the specific time when the user device sends the request, including year, month, day, hour, minute, and second. By obtaining the request time, relevant performance indicators such as response time and network latency of network request information can be analyzed.

[0064] In some embodiments, status information of multiple communication devices can be obtained using methods such as sensors, API calls, or web crawlers. A temporal knowledge graph is constructed using user devices and multiple communication devices as nodes (i.e., entities), request time and status information as attributes of each node, and relationships between nodes as edges. Specifically, the existence time of a relationship (start and end times) can be embedded into the relationship to construct quintuples, and then the temporal knowledge graph is constructed based on multiple quintuples. It is understood that constructing a temporal knowledge graph can both systematically represent the complex network environment and facilitate the establishment of temporal correlations, thereby enabling the analysis and prediction of the constantly changing state of the temporal knowledge graph based on time.

[0065] In some embodiments, the evaluation value in a pre-trained graph neural network can be calculated using the following formula:

[0066] p = hj *h i

[0067] Understandably, h j The vector representation of node j, h i To represent the vector of node i adjacent to j, we calculate the similarity between node j and node i, which is equivalent to calculating the evaluation value of node j and node i. This is equivalent to the probability value p that there is an edge between node j and node i. If the probability value is large, it means that there is a high probability that there is an edge between node j and node i. Therefore, p can be used as the evaluation value.

[0068] In some embodiments, since the edge nodes are highly dynamic, the evaluation values ​​between the data of each node can be calculated in real time, and the target device can be selected from multiple communication devices based on the evaluation values, that is, the communication device connected to the user equipment can be selected.

[0069] It is understandable that during the training process of a graph neural network model, rewards or penalties are applied based on the training results, thereby enabling the graph neural network model to continuously optimize and formulate the optimal strategy.

[0070] In some embodiments, the instructions sent by the target device to the user equipment may include the target device's address, connection method, authentication information, etc.

[0071] Understandably, given the complexity and high variability of edge computing environments, assessing and making decisions based on information for future time periods can better adapt to environmental changes. By continuously updating and adjusting the selection of target devices, network connectivity needs can be met at different times, thus achieving load balancing.

[0072] In some embodiments, if the learning progress of the agent is found to be too slow during the training of the graph neural network, the learning rate can be appropriately increased; if the agent's choice of actions is found to be too random, the expected reward or policy can be adjusted to make the agent more inclined to choose advantageous actions.

[0073] Understandably, once the graph neural network model is trained, it can be deployed to a real-world environment. At each time point, the agent can choose an action based on its current state and expected reward, execute that action in the environment, and then the environment returns a new state and an immediate reward. This allows for continuous optimization of the graph neural network model during actual use, ensuring load balancing.

[0074] The edge computing load balancing control method, system, device, and storage medium proposed in this application can obtain the status information of multiple communication devices at the request time based on the request time of network request information. It constructs a knowledge graph using user devices and multiple communication devices as nodes, and request time and status information as attributes of each node. The time-series graph is then processed by a pre-trained graph neural network model to predict the evaluation values ​​of user devices' connections to multiple communication devices in multiple future time periods. Based on the evaluation values, target devices are selected, and connection commands are sent to the user devices, enabling them to connect to the corresponding target devices in the predicted future time periods. This allows for node prediction through evaluation value calculation in complex and dynamic edge computing environments, achieving uniform and efficient allocation of network resources.

[0075] Please refer to Figure 4 In some embodiments, the graph neural network model can be trained according to the following steps, but not limited to steps S201 to S204:

[0076] Step S201: Obtain node data of various communication devices and user equipment in history;

[0077] Step S202: Obtain the relationships between node data and construct a sample time-series knowledge graph based on the node data and relationships;

[0078] Step S203: Collect negative and positive samples from the sample time-series knowledge graph, input them into the graph neural network model, and obtain the sample evaluation value;

[0079] Step S204: Adjust the parameters of the graph neural network model based on the sample evaluation values ​​to obtain the optimized graph neural network model.

[0080] In some embodiments, node data from various communication devices and user devices throughout history can be obtained as training data. Node data is treated as entities, and a quintuple consisting of the entity, its relation, the relation's start time, and its end time is embedded into the time-series knowledge graph. It is understood that node data that has a relation to neighboring node data can be used as positive sample data, and node data that does not have a relation to neighboring node data can be used as negative sample data.

[0081] For example, we can assume a positive sample is A1, and the positive sample adjacent to A1 is A2, with an edge relationship between A1 and A2; we can also assume a positive sample is B1, and the positive sample adjacent to B1 is B2, with no edge relationship between B1 and B2. Then, we can calculate the similarity (i.e., sample evaluation value) between A1 and A2, and also calculate the similarity (i.e., sample evaluation value) between B1 and B2. Further, we can compare the similarity between A1 and A2 with a preset threshold, and adjust the parameters of the graph neural network model based on the comparison result. In some embodiments, we compare the similarity between B1 and B2 with a preset threshold, and adjust the parameters of the graph neural network model based on the comparison result. It is understood that, generally, the similarity between A1 and A2 should be greater than the preset threshold, and the similarity between B1 and B2 should be less than the preset threshold. It is understood that the preset threshold can be adjusted empirically, such as 0.8, etc., and this application embodiment does not impose specific limitations in this regard.

[0082] Understandably, the sample evaluation value can be calculated using the following formula:

[0083] p = h j *h i

[0084] In some embodiments, h j The vector representation of node j, h i For the vector representation of node i adjacent to j, P represents h. j and h i The probability that a relationship exists between them is the sample evaluation value.

[0085] Please refer to Figure 5 In some embodiments, step S202 includes, but is not limited to, steps S301 to S304:

[0086] Step S301: Perform data cleaning and preprocessing on the node data to obtain sample nodes;

[0087] Step S302: Based on the sample nodes as entities and the actions between sample nodes as relations, sample triplet data is formed; wherein, the sample triplet data includes a head entity, a relation, and a tail entity;

[0088] Step S303: Obtain the start time and end time of each relation, and form sample quintuple data based on the sample triplet data, start time and end time;

[0089] Step S304: Construct a sample time-series knowledge graph based on multiple sample quintuple data.

[0090] In some embodiments, for node data, sample nodes can be obtained by removing outliers, filling in missing values, etc. For example, sample nodes can be used as entities, and the actions between sample nodes can be used as relations to form sample triplet data, and the final triplet is (head entity, relation, tail entity).

[0091] In some embodiments, start and end time information can be obtained for each relationship. For example, the start and end times of communication between sensors can be recorded. In some embodiments, five-tuple data can be constructed based on the head entity, relationship, tail entity, start and end times. In some embodiments, timestamps can be attached to both entities and relationships to better reflect changes in user device status, network conditions, request information, etc., over time. This dynamic representation can more accurately reflect the real-time environment, thereby enabling more precise load balancing decisions.

[0092] In some embodiments, entities can be converted into vectors, and time information, such as hours, minutes, and seconds, can be embedded into the vectors to obtain entity vectors with embedded time information.

[0093] In some embodiments, a temporal knowledge graph can be constructed based on multiple sample quintuple data. It is understood that the temporal knowledge graph should be as complete as possible, encompassing all communication devices and user equipment.

[0094] Please refer to Figure 6 In some embodiments, during the training phase of the graph neural network model, the method further includes, but is not limited to, steps S401 to S404:

[0095] Step S401: Perform feature transformation on the sample nodes in the sample time-series knowledge graph to obtain vector sample nodes;

[0096] Step S402: Input the vector sample nodes into the graph neural network model to obtain the neighboring sample nodes of each vector sample node;

[0097] Step S403: Aggregate the vector sample nodes and adjacent sample nodes to obtain semantic aggregation information;

[0098] Step S404: Adjust the parameters of the graph neural network model based on the semantic aggregation information.

[0099] In some embodiments, a traditional graph neural network architecture or a conditional graph network can be used to vectorize the sample nodes in the temporal knowledge graph, and message passing can be performed on the vector sample nodes to obtain the node i adjacent to the sample node. All nodes adjacent to the adjacent node i are aggregated to update the information of node i, so that the graph neural network model can learn the implicit semantic relationships in the temporal knowledge graph and prepare for subsequent node prediction.

[0100] In some embodiments, assuming a knowledge graph containing information about user devices and edge nodes, a conditional graph network can vectorize this information and update it via message passing. For example, node i may be connected to nodes a, b, and c. By aggregating the information from nodes a, b, and c, node i can update its own vector representation, that is, it can update its hidden state. This allows the graph neural network model to learn the semantic relationships between node i and the nodes connected to nodes a, b, and c. Thus, the graph neural network model can learn the connections between nodes in the temporal knowledge graph, thereby improving its representation capabilities.

[0101] For example, the message passing function is as follows:

[0102]

[0103] Where h j The vector representation of node j (hidden state), W and b are model parameters, h i Let i be the vector representation of node i connected to j.

[0104] Please refer to Figure 7 , Figure 7 This is a schematic diagram illustrating node updates via message passing. In some embodiments, message passing can aggregate information about neighboring nodes based on the connection relationships between vector sample nodes to represent the vector sample nodes.

[0105] Please refer to Figure 8 In some embodiments, the evaluation value is obtained according to the following steps, but not limited to steps S501 to S503:

[0106] Step S501: Obtain the node corresponding to the user equipment;

[0107] Step S502: Input the nodes into the graph neural network model to obtain the first vector node and the communication nodes of each communication device;

[0108] Step S503: Calculate the product of the first vector node and each communication node to obtain the evaluation value of each communication device.

[0109] Understandably, the evaluation value can be calculated using the following formula:

[0110] p = h j *h i

[0111] In some embodiments, h j The vector representation of node j, h i For the vector representation of node i adjacent to j, P represents h. j and h i The probability that a relationship exists is the evaluation value.

[0112] In some embodiments, the node corresponding to the user device can be obtained, and the node can be input into a graph neural network model to obtain a first vector node. Message passing is then performed on the vector node to obtain multiple communication nodes, such as communication node 1, communication node 2, and communication node 3. It can be understood that the communication node is obtained through message passing via the vector node; therefore, the communication node is also a vector.

[0113] In some embodiments, the first vector can be multiplied by each communication node to calculate the similarity between the first vector and each communication node. Specifically, the first vector can be multiplied by communication node 1, communication node 2, and communication node 3 to obtain three evaluation values ​​for the first vector with communication node 1, communication node 2, and communication node 3.

[0114] Please refer to Figure 9 In some embodiments, the graph neural network model is obtained through inference and optimization based on the following steps, but not limited to steps S601 to S609:

[0115] Step S601: Use a deep reinforcement learning network to fit the knowledge graph environment; the environment includes the temporal knowledge graph and the initial state and initial action of each initial node data of the temporal knowledge graph.

[0116] Step S602: Message passing is performed on the initial node data to obtain multiple candidate states and corresponding multiple candidate actions;

[0117] Step S603: Based on the candidate states and candidate actions, obtain the state transition probabilities of the candidate states, and select the target state based on the state transition probabilities;

[0118] Step S604: Input the target state and the corresponding candidate actions into the first feedback function to calculate the expected state feedback of the target state;

[0119] Step S605: Select the target action from multiple candidate actions by inputting the expected state feedback into the advantage function;

[0120] Step S606: Based on the target state, execute the target action to obtain the updated state;

[0121] Step S607: Input the updated status into the second feedback function to calculate the real-time feedback;

[0122] Step S608: Input the target action and update state into the first feedback function to calculate the expected feedback;

[0123] Step S609: With the expected feedback as the execution target, adjust the parameters of the graph neural network model to update the strategy for generating evaluation values.

[0124] In some embodiments, a deep reinforcement learning network can be used to fit a knowledge graph environment. Specifically, a deep reinforcement learning network can be constructed, comprising an input layer, hidden layers, and an output layer. The input layer receives a representation of the environment state, i.e., the current entities, relations, and attributes in the knowledge graph. The hidden layers process the input and extract useful features. The output layer predicts the Q-value (an evaluation of the current state and action) for each action. During training, the deep reinforcement learning network fits an agent that selects an action based on the current environment state and receives an immediate reward (immediate feedback) from the environment. The deep reinforcement learning network then updates the parameters of the graph neural network model based on this immediate reward, so that the selected action better optimizes the expected reward (i.e., the expected feedback).

[0125] Understandably, message passing can be performed on the initial node data to obtain multiple candidate states, and corresponding candidate actions can be derived based on these candidate states. Specifically, the message passing function is as follows:

[0126]

[0127] Where h j The vector representation of node j (hidden state), W and b are model parameters, h i Let i be the vector representation of node i connected to j.

[0128] In some embodiments, the next possible state, i.e., the target state, after taking different actions can be predicted based on the state transition probabilities. It is understood that the formula for the state transition probabilities is as follows:

[0129] P = P(s`|s,a)

[0130] Where P represents the state transition probability, s' represents the new state, s represents the target state, and a represents the target action.

[0131] In some embodiments, the Monte Carlo method can be used, where an agent interacts extensively with the environment. In each interaction, the agent selects an action based on the current state and the expected feedback from the policy, executes the action in the environment, and then the environment returns a new state and an immediate reward. Understandably, the state, action, and new state in each interaction can be recorded, and the number of transitions to each new state given the current state and action can be counted. Finally, the state transition probability is estimated by calculating the proportion of transitions to each new state given the current state and action out of the total number of transitions.

[0132] Understandably, one can calculate the state transition probability for multiple candidate states and candidate actions, and select the candidate state with the highest state transition probability as the target state; or, one can select the candidate state with the highest transition probability and the corresponding candidate action as the target state and target action.

[0133] In some embodiments, the expected feedback of the agent in the current state can also be calculated using a state value function. Specifically, the state value function is as follows:

[0134] V(s)=E[R t+1 +γ t+2 +γ 2 R t+3 +...|S t =s]

[0135] Where γ represents the discount factor, R represents immediate feedback, E represents expectation, and V(s) represents anticipated feedback. In some embodiments, the agent can be rewarded and the parameters of the graph neural network model can be adjusted by continuously calculating the state value function of the current state.

[0136] In some embodiments, the first feedback function is an action-value function, which can be used to calculate the expected feedback. Specifically, the action-value function is an estimate of the agent's expected future reward after taking a certain action in a given state. The agent can make decisions based on the action-value function, selecting the action that maximizes the expected reward. In some embodiments, the action-value function is represented as follows:

[0137] Q(s,a)=E[R t+1 +γ+ t+2 +γ 2 R t+3 +...|S t =s,A t =a]

[0138] Where E represents expectation, γ represents discount factor, R represents immediate feedback, and S t A represents the state at a certain moment. tLet s represent the action at a certain moment, a represent the specific state, and Q(s,a) represent the expected feedback. In some embodiments, the expected feedback corresponding to the target state and multiple candidate actions can be calculated based on the action value function, and the discount factor of the first feedback function can be continuously adjusted based on the accumulated real-time feedback.

[0139] Specifically, the first feedback function can be obtained through the iterative process of a reinforcement learning algorithm. In some embodiments, the agent selects an action based on the current state and expected feedback, and executes this action in the environment. After the environment completes the execution, it returns a new state and an immediate feedback. The agent updates the action value function based on this immediate feedback and the new state. Understandably, this process will be repeated until the action value function converges, that is, the value of the action value function no longer changes significantly.

[0140] In some embodiments, an action value function can be calculated based on the target state to obtain the expected state feedback. The total expected state feedback value can be calculated based on the expected state feedback. The average expected state feedback value can be calculated based on the total expected state feedback value. Then, a reference value can be obtained by subtracting the average expected state feedback value from the expected state feedback value corresponding to each candidate action. The candidate action corresponding to the largest reference value can be selected as the target action.

[0141] In some embodiments, the agent can also select actions based on the current state and immediate feedback. The specific policy update rules are as follows:

[0142] π(a|s)=ε / m+(1-ε)·I(a=argmax a` Q(s,a`))

[0143] Where ε represents the use of ε-greedy probability to balance policy updates, m represents the number of possible actions a in state s, and I(a = argmax) a `Q(s,a))` represents the agent choosing the action with the highest Q value with a probability of 1-ε. This means the agent can determine the appropriate action based on its state and feedback from the environment. For example, given network traffic, it might choose to assign a task to a less loaded device.

[0144] In some embodiments, in deep reinforcement learning, the first feedback function, i.e., the action-value function, can be continuously updated. The specific Q-learning (deep reinforcement learning) update rules are as follows:

[0145] Q(s,a)=Q(s,a)+α[r+γmax a` Q(s`,a`)-Q(s,a)]

[0146] Where γ represents the discount factor, max a` Q(s`,a`) represents the action with the largest Q value, Q(s,a) represents the agent's expected feedback given the action and state, and α represents the learning rate. In some embodiments, if the agent's learning progress is found to be too slow, the learning rate can be increased. In some embodiments, while performing reinforcement learning, the parameters of the update rule can be adjusted, and the first feedback function can be updated.

[0147] In some embodiments, a target action can be performed in the target state to obtain an updated state, and the updated state is then input into a second feedback function to calculate immediate feedback. In some embodiments, the second feedback function is a reward function, and the immediate feedback is the reward or penalty received by the agent after performing the target action.

[0148] In some embodiments, the expected feedback, i.e. the expected reward, can be calculated based on the target action and the updated state input to a first feedback function, i.e., an input action function. The agent can then adjust the parameters of the graph neural network model based on the expected reward to update the strategy for generating the evaluation value.

[0149] Please refer to Figure 10 In some embodiments, before step S603, there are steps including but not limited to steps S701 to S704:

[0150] Step S701: Select an initial state and its corresponding initial action from the environment;

[0151] Step S702: Perform initial actions based on the initial state to obtain the transfer state;

[0152] Step S703: Compare the initial state with the corresponding transit state. If the comparison results are inconsistent, mark the transit state to obtain the marked transit state.

[0153] Step S704: Divide the number of marked transit states by the number of initial states to obtain the state transition probabilities of the initial states and their corresponding transit states.

[0154] In some embodiments, an initial state and a corresponding initial action can be selected from the environment, and the initial action can be performed on the initial state to obtain a transition state. In some embodiments, the initial state and the transition state can be compared. If the comparison results are consistent, it indicates that no state transition has occurred. If the comparison results are inconsistent, it indicates that a state transition has occurred. In this case, the transition state is marked to obtain a marked transition state.

[0155] In some embodiments, the initial action is performed on the same initial state. For example, the initial action a is performed 10 times based on the initial state a. During this process, the state mark that has undergone state transition is still marked, and the mark is passed 6 times.

[0156] Therefore, by dividing by 10 6 times, we get the state transition probability of performing the initial action a in the initial state a is 60%.

[0157] Understandably, calculating the state transition probabilities in advance allows them to be directly obtained when the graph neural network model is put into use or continues to be trained, eliminating the need for repeated calculations and improving the computational efficiency of the graph neural network model.

[0158] Please refer to Figure 11 In some embodiments, step S603 includes, but is not limited to, steps S801 to S802:

[0159] Step S801: Based on the candidate state and candidate action, search from the pre-calculated state transition probabilities to obtain the state transition probability corresponding to each candidate state;

[0160] Step S802: Select the candidate state corresponding to the highest state transition probability as the target state.

[0161] In some embodiments, if the graph neural network model has pre-calculated and stored the state transition probability of a candidate state during training, then the corresponding state transition probability can be directly obtained. For example, if the state transition probability of candidate state a and candidate action b is calculated in advance and obtained as c, then the state transition probability of candidate state a is c. If the state transition probability of candidate state a has not been calculated, then if candidate state a and candidate action b are selected and trained for a preset number of times (10 times), and the number of state transitions is marked (e.g., marked as 5 times), then the state transition probability of state a is 50%.

[0162] It is understandable that if multiple candidate states all need to have their state transition probabilities calculated, then the number of training iterations for each candidate state should be consistent. For example, if candidate state a is trained 10 times to obtain its state transition probability, then candidate state b should also be trained 10 times to obtain its state transition probability.

[0163] It is understandable that the higher the state transition probability, the more likely the corresponding candidate state is to undergo a state transition. In this case, the candidate state with the highest state transition probability is selected as the target state.

[0164] Please refer to Figure 12In some embodiments, the target action is selected from multiple candidate actions by feeding back the expected state into the advantage function, including but not limited to steps S901 to S905:

[0165] Step S901: Calculate the expected feedback of the target state corresponding to multiple candidate actions according to the first feedback function;

[0166] Step S902: Add the expected feedbacks corresponding to each candidate action to obtain the total expected feedback;

[0167] Step S903: Obtain the average expected feedback by dividing the total expected feedback by the number of candidate actions;

[0168] Step S904: Subtract the average expected feedback from the expected feedback corresponding to each candidate action to obtain the reference value for each candidate action;

[0169] Step S905: Arrange the reference values ​​corresponding to multiple candidate actions, and select the candidate action with the largest reference value as the target action.

[0170] In some embodiments, the expected feedback, i.e. the expected reward, corresponding to the target state for each of the multiple candidate actions can be calculated based on the first feedback function (i.e., the action value function).

[0171] In some embodiments, the expected feedbacks corresponding to each candidate action can be summed to obtain the total expected feedback. The total expected feedback is then divided by the number of candidate actions, such as dividing by 10 if there are 10 candidate actions, to obtain the average expected feedback.

[0172] In some embodiments, the expected feedback of each candidate action can be subtracted from the average expected feedback to obtain a reference value for each candidate action. The reference value of a candidate action is used to describe the advantage of each candidate action relative to the average action. Therefore, the higher the reference value, the higher the advantage of the corresponding candidate action. The candidate action with the highest reference value can be selected as the target action, or several candidate actions with the highest reference values ​​can be selected as the target action.

[0173] In some embodiments, the dominance function is represented as follows:

[0174] A(s,a)=Q(s,a)-V(s)

[0175] Where A(s,a) represents the reference value for each candidate action, Q(s,a) represents the expected feedback corresponding to a candidate action, V(s) represents the average expected feedback, a represents the candidate action, and s represents the target state.

[0176] Please refer to Figure 13In some embodiments, the immediate feedback includes: positive immediate feedback and negative immediate feedback; step S607 includes, but is not limited to, steps S1001 to S100:

[0177] Step S1001: Message transmission is performed based on the target state and target action to obtain the updated state;

[0178] Step S1002: In the update state, obtain environmental data indicators; wherein, the data indicators include device bandwidth over-limit ratio, request processing time and device processing capacity.

[0179] Step S1003: If the data indicator is positive, then the first positive indicator is obtained by multiplying the device bandwidth over-limit ratio by the first coefficient and taking the negative value; the second positive indicator is obtained by multiplying the request processing time by the second coefficient and taking the negative value; and the third positive indicator is obtained by multiplying the device processing capacity by the third coefficient and taking the positive value.

[0180] Step S1004: Add the first positive indicator, the second positive indicator, and the third positive indicator together to obtain real-time feedback;

[0181] Step S1005: If the data indicator is negative, then the first negative indicator is obtained by multiplying the device bandwidth over-limit ratio by the first coefficient and taking a positive value; the second negative indicator is obtained by multiplying the request processing time by the second coefficient and taking a positive value; and the third negative indicator is obtained by multiplying the device processing capacity by the third coefficient and taking a negative value.

[0182] Step S1006: Add the first negative indicator, the second negative indicator, and the third negative indicator together to obtain real-time feedback.

[0183] In some embodiments, the first positive metric is the device bandwidth overrun ratio, the second positive metric is the request processing time, and the third positive metric is the device processing capacity. In some embodiments, the first negative metric is the device bandwidth overrun ratio, the second negative metric is the request processing time, and the third negative metric is the device processing capacity.

[0184] In some embodiments, the second feedback function is represented as follows:

[0185] R(s,a,s`)=-D(s,a,s`)=-α*B(s,a,s`)-β*T(s,a,s`)+γ*PC(s,a,s`)

[0186] Where s represents the state, a represents the action, γ represents the discount factor, α represents the learning rate, β represents the time adjustment factor, B represents the device bandwidth overrun ratio, T represents the request processing time, and PC represents the device processing capacity.

[0187] In some embodiments, evaluation metrics can be used to determine whether the agent is reasonably allocating device load and network traffic, thereby determining immediate feedback. If the agent's actions result in balanced device load and reasonable distribution of network traffic, the agent receives a positive reward; conversely, if the agent's actions lead to unbalanced device load or network congestion, the agent receives a negative penalty. It is understandable that determining the feedback mechanism based on the agent's actions helps the agent continuously adjust its strategies to achieve better load balancing.

[0188] In some embodiments, data metrics can be used to determine whether the agent has achieved balanced device load and reasonable distribution of network traffic. Specifically, load balancing can be evaluated by calculating the standard deviation of device loads or other relevant metrics. Ideally, the loads of all devices should be as close as possible, i.e., the standard deviation of loads should be as small as possible. In some embodiments, the reasonableness of traffic distribution can be evaluated by observing the degree of congestion in the network. For example, metrics such as average latency and packet loss rate in the network can be calculated. Ideally, the latency in the network should be as low as possible, and the packet loss rate should be as small as possible.

[0189] In some embodiments, when the data metric is positive, a first positive metric is obtained by multiplying the device bandwidth overrun ratio by a first coefficient and taking the negative value; a second positive metric is obtained by multiplying the request processing time by a second coefficient and taking the negative value; and a third positive metric is obtained by multiplying the device processing capacity by a third coefficient and taking the positive value. The first, second, and third positive metric are then added together to obtain real-time feedback. That is:

[0190] R(s,a,s`)=-D(s,a,s`)=-α*B(s,a,s`)-β*T(s,a,s`)+γ*PC(s,a,s`).

[0191] In some embodiments, if the data metric is negative, the first negative metric is obtained by multiplying the device bandwidth overrun ratio by a first coefficient and taking a positive value; the second negative metric is obtained by multiplying the request processing time by a second coefficient and taking a positive value; and the third negative metric is obtained by multiplying the device processing capacity by a third coefficient and taking a negative value. The first, second, and third negative metrics are then added together to obtain real-time feedback. That is:

[0192] R(s,a,s`)=-D(s,a,s`)=α*B(s,a,s`)+β*T(s,a,s`)-γ*PC(s,a,s`)

[0193] Please refer to Figure 14 In some embodiments, the first feedback function is obtained by training N times through the following steps, but not limited to steps S1101 to S1106:

[0194] Step S1101: Obtain the current state and the current action corresponding to the current state from the environment;

[0195] Step S1102: Input the second feedback function based on the current state and current action to obtain the current feedback value;

[0196] Step S1103: Input the current feedback into the initial first feedback function to obtain the expected feedback value;

[0197] Step S1104: Select the target action based on the current state and the current action, and execute the target action to obtain the updated state value;

[0198] Step S1105: Input the data indicators corresponding to the updated status value into the second feedback function to obtain the reference feedback value;

[0199] Step S1106: Adjust the parameters of the first feedback function according to the reference feedback value and the expected feedback value to obtain the first feedback function; where N is a positive integer greater than or equal to 1, and N is the number of training iterations reached when the first feedback function converges.

[0200] In some embodiments, the first feedback function is an action-value function, which can be used to calculate the expected feedback. Specifically, the action-value function is an estimate of the agent's expected future reward after taking a certain action in a given state. The agent can make decisions based on the action-value function, selecting the action that maximizes the expected reward. In some embodiments, the action-value function is represented as follows:

[0201] Q(s,a)=E[R t+1 +γ+ t+2 +γ 2 R t+3 +...|S t =s,A t =a]

[0202] Where E represents expectation, γ represents discount factor, R represents immediate feedback, and S t A represents the state at a certain moment. t Let s represent the action at a certain moment, a represent the specific state, and Q(s,a) represent the expected feedback. In some embodiments, the expected feedback corresponding to the target state and multiple candidate actions can be calculated based on the action value function, and the discount factor of the first feedback function can be continuously adjusted based on the accumulated real-time feedback.

[0203] Understandably, the current state and its corresponding current action can be obtained from the environment. Based on the current state and action, a second feedback function is input to obtain the current feedback value. It is understood that the second feedback function is a reward function. In some embodiments, the current feedback can be input to an untrained initial first feedback function, i.e., an untrained action value function that yields the expected feedback value, to obtain the expected feedback value.

[0204] In some embodiments, a target action can be selected based on the current state and the current action, and the target action can be executed to obtain an updated state value; the data index corresponding to the updated state value can be input into a second feedback function to obtain a reference feedback value; and the parameters of the first feedback function can be adjusted based on the reference feedback value and the expected feedback value to obtain the first feedback function.

[0205] Please refer to Figure 15 In another embodiment, this application proposes a load balancing control method for edge computing, applied in a communication device, where the communication device is an edge node or network node, and the communication device is communicatively connected to a server. The method includes, but is not limited to, steps S1201 to S1202:

[0206] Step S1201: Send the status information of the communication devices to the server; wherein, the server is used to obtain network request information sent by the user device and determine the request time of the network request information; it is also used to construct a temporal knowledge graph with the user device and multiple communication devices as nodes and the request time and status information as the attributes corresponding to each node after obtaining the status information of multiple communication devices at the current request time; it is also used to obtain a pre-trained graph neural network model, input the temporal knowledge graph into the graph neural network model for processing, and predict the evaluation value of the connection between the user device and each communication device in multiple future time periods; it is also used to determine the target device that meets the network connection requirements in each future time period from multiple communication devices based on the evaluation values.

[0207] Step S1202: If the device is identified as the target device in any future time period, receive network connection information sent by the user equipment and establish a communication connection with the user equipment based on the network connection information.

[0208] The specific implementation method of this edge computing load balancing control method is basically the same as the specific implementation method of the edge computing load balancing control method (applied to servers) described above, and will not be repeated here.

[0209] Please refer to Figure 16 In some embodiments, this application proposes a load balancing control method for edge computing, applied in a user equipment (UE) that communicates with a server. The method includes, but is not limited to, steps S1301 to S1302:

[0210] Step S1301: Send network request information to the server so that the server can determine the request time of the network request information after receiving it. The server is also used to obtain the status information of multiple communication devices at the current request time, and construct a temporal knowledge graph using the user device and multiple communication devices as nodes, and the request time and status information as attributes of each node. It is also used to obtain a pre-trained graph neural network model, input the temporal knowledge graph into the graph neural network model for processing, and predict the evaluation values ​​of the connection between the user device and each communication device in multiple future time periods. Furthermore, based on the evaluation values, it determines the target device that meets the network connection requirements in each future time period from among the multiple communication devices and sends a connection command to the user device.

[0211] Step S1302: Receive connection instructions sent by the server, send network connection information to the corresponding target device in each future time period, and establish a communication connection with the corresponding target device based on the network connection information.

[0212] The specific implementation method of this edge computing load balancing control method is basically the same as the specific implementation method of the edge computing load balancing control method (applied to servers) described above, and will not be repeated here.

[0213] Please refer to the following. Figure 17 The overall implementation process of this application will be described using one embodiment.

[0214] First, each node is defined, including user devices and communication devices, along with their corresponding attributes. A communication device can include at least one of multiple edge nodes and network nodes. User device attributes can include user ID, request time, request type (e.g., video stream, file download), request destination, and request duration. Edge node attributes can include device ID, device type (e.g., server, router, mobile device, in-vehicle device), device operating status (e.g., running, stopped), device CPU utilization, device memory utilization, and device network bandwidth utilization. Network node attributes can include source IP address, destination IP address, source port, destination port, protocol type, number of packets, and number of bytes.

[0215] Secondly, the relationships between the nodes are defined. For example, relationships such as "device A sends traffic to device B", "device A receives traffic from device B", and "user X initiates a request to device A" can be defined. Understandably, these relationships help the graph neural network model understand the interaction patterns in the network under complex and changing scenarios, thereby enabling more effective load balancing.

[0216] In some embodiments, communication devices and user equipment are used as nodes in a temporal knowledge graph, and the relationships defined above are used as edges in the temporal knowledge graph. It is understood that each node can have multiple attributes, such as the state information of edge nodes, detailed data of network nodes, and specific information of user equipment. It is also understood that each edge can have a start time and an end time, representing the duration of the corresponding relationship.

[0217] In some embodiments, data collection and preprocessing can be performed, i.e., data from edge nodes, network nodes, and user devices can be collected and cleaned. In some embodiments, triplets of a time-series knowledge graph can be extracted from unstructured data, wherein the triplets include a head entity, a relation, and a tail entity.

[0218] Furthermore, the triples are transformed into quintuples by embedding the start and end times of the relation. The resulting quintuple includes the head entity, relation, tail entity, start time, and end time.

[0219] In some embodiments, a temporal knowledge graph can be constructed based on multiple quintuples. The entities and relationships in the temporal knowledge graph all contain time tags, which can reflect the changes of edge nodes, network nodes, and user devices over time. These dynamic changes enable the graph neural network model to more accurately reflect the real-time environment, thereby making more accurate load balancing decisions.

[0220] In some embodiments, a graph neural network model can be trained. First, a graph neural network model is prepared, and a training strategy is formulated, namely, message passing between nodes, so that the graph neural network has the ability to pass messages. Then, positive and negative samples are selected from the temporal knowledge graph, and the probability of the existence of edges between positive and negative samples is calculated, thereby adjusting the parameters of the graph neural network model and optimizing the graph neural network model.

[0221] In some embodiments, reinforcement learning algorithm strategies are formulated by first defining each algorithm, formulating algorithm strategies, and then training and optimizing the algorithms.

[0222] Specifically, the State Value Function (SFC) represents the expected cumulative reward an agent can obtain from a series of future actions in the current state. Understandably, by calculating the SFC, the agent can evaluate the merits of different states and choose the optimal action accordingly.

[0223] The state transition probability represents the likelihood that an agent will transition from its current state to the next state after performing an action; the action value function represents the expected cumulative reward that can be obtained after performing an action in the current state, and the action value function is calculated based on the current policy and the state transition probability; the update rule of a graph neural network can describe how to update the hidden state of a node in a graph neural network.

[0224] In some embodiments, the policy update rule describes how the agent selects actions based on the current state and environmental feedback. The policy is the rule by which the agent selects actions. Based on the current state and environmental feedback, the agent will adjust the policy to select the optimal action. The Q-learning update rule describes how the action value function is updated in Q-learning. Specifically, Q-learning is a value function-based reinforcement learning algorithm that learns the optimal policy by iteratively updating the action value function to approximate the optimal action value function.

[0225] In some embodiments, the advantage function describes the advantage of an action relative to the average action. The advantage function is used to evaluate the degree of advantage of a certain action relative to other actions. By calculating the advantage function, the agent can select the action with the greatest advantage.

[0226] In some embodiments, a reward function describes the immediate reward an agent receives after selecting an action. The reward function provides corresponding rewards or penalties based on the agent's actions and environmental feedback to guide the agent in learning better strategies. In the agent's task, a positive reward indicates that the agent's actions have a positive impact on the environment, while a negative penalty indicates that the agent's actions have an adverse impact on the environment.

[0227] In the continuous training and enhancement of the graph neural network model algorithm, the graph neural network model can continuously adjust the parameters of the above functions by the agent continuously selecting and executing corresponding actions, thus obtaining an optimized graph neural network model. In some embodiments, after the training and parameter tuning of the graph neural network model are completed, the graph neural network model is deployed and put into use.

[0228] Please see Figure 18 This application also provides a load balancing control system for edge computing, which can implement the above-mentioned load balancing control method for edge computing. The load balancing control system for edge computing includes:

[0229] An edge computing load balancing control system is applied in a server, the server being communicatively connected to multiple communication devices, the communication devices including at least one of multiple edge nodes and network nodes, the system comprising:

[0230] The network request information acquisition module 1801 is used to acquire network request information sent by the user equipment and determine the request time when the network request information was sent.

[0231] The temporal knowledge graph construction module 1802 is used to obtain the status information of multiple communication devices under the current request time, and construct a temporal knowledge graph with user equipment and multiple communication devices as nodes, and request time and status information as the attributes of each node.

[0232] The evaluation value prediction module 1803 is used to obtain the pre-trained graph neural network model, input the temporal knowledge graph into the graph neural network model for processing, and predict the evaluation value of the connection between the user equipment and various communication devices in multiple future time periods.

[0233] The target device acquisition module 1804 is used to determine the target device that meets the network connection requirements in each future time period from multiple communication devices based on various evaluation values, and send a connection instruction to the user equipment so that the user equipment can communicate and connect with the corresponding target device in each future time period.

[0234] Please refer to Figure 3 In some embodiments, the communication device may include at least one of multiple edge nodes and network nodes. Specifically, an edge node refers to a device or server connected to the network edge, primarily used to provide edge computing, storage, and services. Edge nodes may include edge servers, edge routers, edge switches, edge devices (such as sensors, cameras, smart home devices), etc. It is understood that the attributes of an edge node may include the device ID, device type, device operating status, device CPU utilization, device memory utilization, and device network bandwidth utilization, etc.

[0235] In some embodiments, a network node refers to a device or server used for data transmission and communication in a computer network. Network nodes may include switches, routers, servers, firewalls, etc. The node attributes of a network node include the source IP address, destination IP address, source port, destination port, protocol type, number of data packets, and number of bytes of traffic.

[0236] In some embodiments, user equipment refers to various terminal devices used by a user on a network. User equipment may include mobile phones, tablets, IoT devices (such as smartphones, smartwatches, smart home devices, etc.) and terminal devices (such as printers, cameras, sensors, etc.), etc. The node attributes of the user equipment include the requesting user ID, the time of the request, the type of the request (such as video streaming, file download, etc.), the destination of the request, and the duration of the request, etc.

[0237] As is understandable, request time refers to the specific time when the user device sends the request, including year, month, day, hour, minute, and second. By obtaining the request time, relevant performance indicators such as response time and network latency of network request information can be analyzed.

[0238] In some embodiments, status information of multiple communication devices can be obtained using methods such as sensors, API calls, or web crawlers. A temporal knowledge graph is constructed using user devices and multiple communication devices as nodes (i.e., entities), request time and status information as attributes of each node, and relationships between nodes as edges. Specifically, the existence time of a relationship (start and end times) can be embedded into the relationship to construct quintuples, and then the temporal knowledge graph is constructed based on multiple quintuples. It is understood that constructing a temporal knowledge graph can both systematically represent the complex network environment and facilitate the establishment of temporal correlations, thereby enabling the analysis and prediction of the constantly changing state of the temporal knowledge graph based on time.

[0239] In some embodiments, the evaluation value in a pre-trained graph neural network can be calculated using the following formula:

[0240] p = h j *h i

[0241] Understandably, h j The vector representation of node j, h i To represent the vector of node i adjacent to j, we calculate the similarity between node j and node i, which is equivalent to calculating the evaluation value of node j and node i. This is equivalent to the probability value p that there is an edge between node j and node i. If the probability value is large, it means that there is a high probability that there is an edge between node j and node i. Therefore, p can be used as the evaluation value.

[0242] In some embodiments, since the edge nodes are highly dynamic, the evaluation values ​​between the data of each node can be calculated in real time, and the target device can be selected from multiple communication devices based on the evaluation values, that is, the communication device connected to the user equipment can be selected.

[0243] It is understandable that during the training process of a graph neural network model, rewards or penalties are applied based on the training results, thereby enabling the graph neural network model to continuously optimize and formulate the optimal strategy.

[0244] In some embodiments, the instructions sent by the target device to the user equipment may include the target device's address, connection method, authentication information, etc.

[0245] Understandably, given the complexity and high variability of edge computing environments, assessing and making decisions based on information for future time periods can better adapt to environmental changes. By continuously updating and adjusting the selection of target devices, network connectivity needs can be met at different times, thus achieving load balancing.

[0246] In some embodiments, if the learning progress of the agent is found to be too slow during the training of the graph neural network, the learning rate can be appropriately increased; if the agent's choice of actions is found to be too random, the expected reward or policy can be adjusted to make the agent more inclined to choose advantageous actions.

[0247] Understandably, once the graph neural network model is trained, it can be deployed to a real-world environment. At each time point, the agent can choose an action based on its current state and expected reward, execute that action in the environment, and then the environment returns a new state and an immediate reward. This allows for continuous optimization of the graph neural network model during actual use, ensuring load balancing.

[0248] The edge computing load balancing control method, system, device, and storage medium proposed in this application can obtain the status information of multiple communication devices at the request time based on the request time of network request information. It constructs a knowledge graph using user devices and multiple communication devices as nodes, and request time and status information as attributes of each node. The time-series graph is then processed by a pre-trained graph neural network model to predict the evaluation values ​​of user devices' connections to multiple communication devices in multiple future time periods. Based on the evaluation values, target devices are selected, and connection commands are sent to the user devices, enabling them to connect to the corresponding target devices in the predicted future time periods. This allows for node prediction through evaluation value calculation in complex and dynamic edge computing environments, achieving uniform and efficient allocation of network resources.

[0249] The specific implementation of this edge computing load balancing control system is basically the same as the specific embodiment of the edge computing load balancing control method described above, and will not be repeated here. Subject to meeting the requirements of the embodiments of this application, the edge computing load balancing control system may also be equipped with other functional modules to implement the edge computing load balancing control method described above.

[0250] This application also provides a device, which includes a memory and a processor. The memory stores a computer program, and the processor executes the computer program to implement the above-described load balancing control method for edge computing. This device can be any smart terminal, including tablet computers, in-vehicle computers, etc.

[0251] Please see Figure 19 , Figure 19 The hardware structure of a device according to another embodiment is illustrated. The device includes:

[0252] The processor 1901 can be implemented using a general-purpose CPU (Central Processing Unit), microprocessor, application-specific integrated circuit (ASIC), or one or more integrated circuits, and is used to execute relevant programs to implement the technical solutions provided in the embodiments of this application.

[0253] The memory 1902 can be implemented as a read-only memory (ROM), static storage device, dynamic storage device, or random access memory (RAM). The memory 1902 can store the operating system and other applications. When the technical solutions provided in the embodiments of this specification are implemented through software or firmware, the relevant program code is stored in the memory 1902 and is called and executed by the processor 1901 to execute the edge computing load balancing control method of the embodiments of this application.

[0254] The input / output interface 1903 is used to implement information input and output;

[0255] The communication interface 1904 is used to enable communication and interaction between this device and other devices. Communication can be achieved through wired means (such as USB, Ethernet cable, etc.) or wireless means (such as mobile network, WIFI, Bluetooth, etc.).

[0256] Bus 1905 transmits information between various components of the device (e.g., processor 1901, memory 1902, input / output interface 1903, and communication interface 1904);

[0257] The processor 1901, memory 1902, input / output interface 1903, and communication interface 1904 are connected to each other within the device via bus 1905.

[0258] This application also provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the above-described load balancing control method for edge computing.

[0259] Memory, as a non-transitory computer-readable storage medium, can be used to store non-transitory software programs and non-transitory computer-executable programs. Furthermore, memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory may optionally include memory remotely located relative to the processor, and these remote memories can be connected to the processor via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

[0260] The embodiments described in this application are for the purpose of more clearly illustrating the technical solutions of the embodiments of this application, and do not constitute a limitation on the technical solutions provided by the embodiments of this application. As those skilled in the art will know, with the evolution of technology and the emergence of new application scenarios, the technical solutions provided by the embodiments of this application are also applicable to similar technical problems.

[0261] Those skilled in the art will understand that the technical solutions shown in the figures do not constitute a limitation on the embodiments of this application, and may include more or fewer steps than shown, or combine certain steps, or different steps.

[0262] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs.

[0263] Those skilled in the art will understand that all or some of the steps in the methods disclosed above, as well as the functional modules / units in the systems and devices, can be implemented as software, firmware, hardware, or suitable combinations thereof.

[0264] The terms “first,” “second,” “third,” “fourth,” etc. (if present) in the specification and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms “comprising” and “having,” and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.

[0265] It should be understood that in this application, "at least one" and "several" refer to one or more, and "multiple" refers to two or more. "And / or" describes the relationship between related objects, indicating that three relationships can exist. For example, "A and / or B" can represent three cases: only A exists, only B exists, and both A and B exist simultaneously, where A and B can be singular or plural. The character " / " generally indicates that the preceding and following related objects are in an "or" relationship. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one of a, b, or c can represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", where a, b, and c can be single or multiple.

[0266] In the embodiments provided in this application, it should be understood that the disclosed systems and methods can be implemented in other ways. For example, the system embodiments described above are merely illustrative; for instance, the division of the units described above is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interfaces, devices, or units, and may be electrical, mechanical, or other forms.

[0267] The units described above as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0268] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.

[0269] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes multiple instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of this application. The aforementioned storage medium includes various media capable of storing programs, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0270] The preferred embodiments of the present application have been described above with reference to the accompanying drawings, but this does not limit the scope of the claims of the present application. Any modifications, equivalent substitutions, and improvements made by those skilled in the art without departing from the scope and substance of the embodiments of the present application shall be within the scope of the claims of the present application.

Claims

1. A load balancing control method for edge computing, characterized in that, Applied in a server, the server is communicatively connected to multiple communication devices, the communication devices including at least one of multiple edge nodes and network nodes, the method includes: Obtain network request information sent by the user equipment and determine the request time when the network request information was sent; Obtain the status information of multiple communication devices at the current request time, and construct a time-series knowledge graph with the user equipment and multiple communication devices as nodes, and the request time and the status information as attributes corresponding to each node; A pre-trained graph neural network model is obtained, and the temporal knowledge graph is input into the graph neural network model for processing to predict the evaluation values ​​of the connections between the user equipment and each of the communication devices in multiple future time periods. The evaluation values ​​are obtained according to the following steps: obtaining the node corresponding to the user equipment; inputting the node into the graph neural network model to obtain a first vector node and the communication nodes of each communication device; calculating the product of the first vector node and each of the communication nodes to obtain the evaluation value of each communication device. Based on the various evaluation values, a target device that meets the network connectivity requirements in each future time period is determined from among the multiple communication devices, and a connection instruction is sent to the user equipment so that the user equipment can communicate and connect with the corresponding target device in each future time period.

2. The load balancing control method for edge computing according to claim 1, characterized in that, The graph neural network model is trained according to the following steps: Acquire node data for various communication devices and user devices throughout history; Obtain the relationships between the node data, and construct a sample time-series knowledge graph based on the node data and the relationships; Negative and positive samples are collected from the sample time-series knowledge graph and input into the graph neural network model to obtain sample evaluation values; Based on the sample evaluation values, the parameters of the graph neural network model are adjusted to obtain an optimized graph neural network model.

3. The load balancing control method for edge computing according to claim 2, characterized in that, The step of acquiring the relationships between the node data and constructing a sample time-series knowledge graph based on the node data and the relationships includes: The node data is cleaned and preprocessed to obtain sample nodes; Based on the sample nodes as entities and the actions between the sample nodes as relations, sample triplet data is formed; wherein, the sample triplet data includes a head entity, a relation, and a tail entity; Obtain the start and end times of each relation, and form sample quintuples based on the sample triplet data, the start and end times; Based on multiple sample quintuple data, a sample time-series knowledge graph is constructed.

4. The load balancing control method for edge computing according to claim 2, characterized in that, During the training phase of the graph neural network model, the method further includes: Feature transformation is performed on the sample nodes in the sample time-series knowledge graph to obtain vector sample nodes; The vector sample nodes are input into the graph neural network model to obtain the neighboring sample nodes of each vector sample node; The vector sample nodes and their adjacent sample nodes are aggregated to obtain semantic aggregation information; The parameters of the graph neural network model are adjusted based on the semantic aggregation information.

5. The load balancing control method for edge computing according to claim 1, characterized in that, The graph neural network model is optimized based on the following steps: A deep reinforcement learning network is used to fit a knowledge graph environment; the environment includes the temporal knowledge graph and the initial state and initial action of each initial node data of the temporal knowledge graph; The initial node data is processed by message passing to obtain multiple candidate states and corresponding multiple candidate actions; Based on the candidate states and the candidate actions, the state transition probability of the candidate states is obtained, and the target state is selected based on the state transition probability. The expected state feedback of the target state is calculated by inputting the target state and the corresponding candidate actions into the first feedback function. The target action is selected from multiple candidate actions by inputting the expected state feedback into the advantage function. Based on the target state, the target action is executed to obtain the updated state; The updated state is input into the second feedback function to calculate the immediate feedback. The expected feedback is calculated by inputting the real-time feedback and the updated status into the first feedback function. With the expected feedback as the execution target, the parameters of the graph neural network model are adjusted to update the strategy for generating evaluation values.

6. The load balancing control method for edge computing according to claim 5, characterized in that, Before obtaining the state transition probability of the candidate state based on the candidate state and the candidate action, and selecting the target state based on the state transition probability, the method further includes: From the environment, select an initial state and the corresponding initial action; The initial action is performed based on the initial state to obtain the transmission state; The initial state and the corresponding transmission state are compared. If the comparison results are inconsistent, the transmission state is marked to obtain the marked transmission state. The state transition probabilities of the initial state and its corresponding transit state are obtained by dividing the number of the marked transit states by the number of the initial states.

7. The load balancing control method for edge computing according to claim 5, characterized in that, The step of obtaining the state transition probability of the candidate state based on the candidate state and the candidate action, and selecting the target state based on the state transition probability, includes: Based on the candidate states and the candidate actions, the state transition probability corresponding to each candidate state is obtained by searching from the pre-calculated state transition probabilities. The candidate state corresponding to the highest state transition probability is selected as the target state.

8. The load balancing control method for edge computing according to claim 5, characterized in that, The step of selecting a target action from multiple candidate actions by feeding back the expected state into the advantage function includes: The expected feedback corresponding to the target state for each of the candidate actions is calculated based on the first feedback function. The expected feedback corresponding to each of the candidate actions is added together to obtain the total expected feedback; The average expected feedback is obtained by dividing the total expected feedback by the number of candidate actions; The reference value for each candidate action is obtained by subtracting the average expected feedback from the expected feedback corresponding to each candidate action. The reference values ​​corresponding to the multiple candidate actions are arranged, and the candidate action with the largest reference value is selected as the target action.

9. The load balancing control method for edge computing according to claim 5, characterized in that, The instant feedback includes: positive instant feedback and negative instant feedback; the step of calculating instant feedback based on the updated state input to the second feedback function includes: Based on the target state and the target action, message transmission is performed to obtain the updated state; In the updated state, data metrics of the environment are acquired; wherein, the data metrics include device bandwidth overrun ratio, request processing time, and device processing capacity; If the data indicator is positive, then the first positive indicator is obtained by multiplying the device bandwidth over-limit ratio by the first coefficient and taking the negative value, the second positive indicator is obtained by multiplying the request processing time by the second coefficient and taking the negative value, and the third positive indicator is obtained by multiplying the device processing capacity by the third coefficient and taking the positive value. The first positive indicator, the second positive indicator, and the third positive indicator are added together to obtain immediate feedback; If the data indicator is negative, then the first negative indicator is obtained by multiplying the device bandwidth over-limit ratio by a first coefficient and taking a positive value, the second negative indicator is obtained by multiplying the request processing time by a second coefficient and taking a positive value, and the third negative indicator is obtained by multiplying the device processing capacity by a third coefficient and taking a negative value. The first negative indicator, the second negative indicator, and the third negative indicator are added together to obtain immediate feedback.

10. The load balancing control method for edge computing according to claim 5, characterized in that, The first feedback function is obtained by training it N times through the following steps: From the environment, obtain the current state and the current action corresponding to the current state; The current feedback value is obtained by inputting the second feedback function based on the current state and the current action. The expected feedback value is obtained by inputting the current feedback into the initial first feedback function. Select a target action based on the current state and the current action, and execute the target action to obtain an updated state value; The data indicators corresponding to the updated status value are input into the second feedback function to obtain the reference feedback value; Based on the reference feedback value and the expected feedback value, the parameters of the first feedback function are adjusted to obtain the first feedback function; where N is a positive integer greater than or equal to 1, and N is the number of training iterations reached when the first feedback function converges.

11. A load balancing control method for edge computing, characterized in that, Applied in communication equipment, wherein the communication equipment is an edge node or a network node, and the communication equipment is communicatively connected to a server, the method includes: The server sends the status information of the communication device to the server; wherein the server is used to obtain network request information sent by the user device and determine the request time of the network request information; it is also used to construct a temporal knowledge graph with the user device and the multiple communication devices as nodes, and the request time and the status information as attributes corresponding to each node after obtaining the status information of multiple communication devices at the current request time; it is also used to obtain a pre-trained graph neural network model, input the temporal knowledge graph into the graph neural network model for processing, and predict the evaluation value of the connection between the user device and each of the communication devices in multiple future time periods; it is also used to determine the target device that meets the network connection requirements in each future time period from the multiple communication devices according to the evaluation value; wherein the evaluation value is obtained according to the following steps: obtaining the node corresponding to the user device; inputting the node into the graph neural network model to obtain a first vector node and the communication nodes of each communication device; calculating the product of the first vector node and each of the communication nodes to obtain the evaluation value of each communication device; If the device is identified as the target device at any future time period, it receives network connection information sent by the user equipment and establishes a communication connection with the user equipment based on the network connection information.

12. A load balancing control method for edge computing, characterized in that, Applied in a user equipment, wherein the user equipment is communicatively connected to a server, the method includes: The system sends network request information to the server, enabling the server to determine the request time after receiving the network request information. The server is further configured to acquire status information of multiple communication devices at the current request time, and construct a temporal knowledge graph using the user device and the multiple communication devices as nodes, and the request time and status information as attributes corresponding to each node. It is also configured to acquire a pre-trained graph neural network model, input the temporal knowledge graph into the graph neural network model for processing, and predict the evaluation values ​​of the connections between the user device and each of the communication devices in multiple future time periods. Furthermore, based on the evaluation values, it determines the target device that meets the network connection requirements in each future time period from the multiple communication devices and sends a connection command to the user device. The evaluation values ​​are obtained through the following steps: acquiring the node corresponding to the user device; inputting the node into the graph neural network model to obtain a first vector node and the communication nodes of each communication device; calculating the product of the first vector node and each of the communication nodes to obtain the evaluation value of each communication device. The system receives the connection instruction sent by the server, sends network connection information to the corresponding target device in each future time period, and establishes a communication connection with the corresponding target device based on the network connection information.

13. A load balancing control system for edge computing, characterized in that, Applied in a server, the server is communicatively connected to multiple communication devices, the communication devices including at least one of multiple edge nodes and network nodes, the system comprising: The network request information acquisition module is used to acquire network request information sent by the user equipment and determine the request time when the network request information was sent. The temporal knowledge graph construction module is used to obtain the status information of multiple communication devices under the current request time, and construct a temporal knowledge graph with the user equipment and multiple communication devices as nodes, and the request time and the status information as the attributes corresponding to each node. The evaluation value prediction module is used to acquire a pre-trained graph neural network model, input the temporal knowledge graph into the graph neural network model for processing, and predict the evaluation values ​​of the connections between the user equipment and each of the communication devices in multiple future time periods; wherein, the evaluation values ​​are obtained according to the following steps: acquiring the node corresponding to the user equipment; inputting the node into the graph neural network model to obtain a first vector node and the communication nodes of each communication device; calculating the product of the first vector node and each of the communication nodes to obtain the evaluation value of each communication device; The target device acquisition module is used to determine, based on the various evaluation values, the target device that meets the network connection requirements in each future time period from among the multiple communication devices, and send a connection instruction to the user equipment so that the user equipment can communicate and connect with the corresponding target device in each future time period.

14. A device, characterized in that, The device includes a memory and a processor, the memory storing a computer program, and the processor executing the computer program to implement the edge computing load balancing control method according to any one of claims 1 to 10.

15. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by the processor, it implements the load balancing control method for edge computing as described in any one of claims 1 to 10.