Genetic growth tree based cluster speed controller automated design method and apparatus

By representing the target control strategy as a gene growth tree and utilizing a combination of graph neural networks and Monte Carlo tree search, a high-performance and interpretable distributed speed controller is generated, solving the problem of balancing performance and interpretability in existing technologies and enabling the application of autonomous systems in safety-critical fields.

CN122194664APending Publication Date: 2026-06-12NAT UNIV OF DEFENSE TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
NAT UNIV OF DEFENSE TECH
Filing Date
2026-03-19
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing automation design methods cannot guarantee both high controller performance and inherent interpretability, thus limiting the application of autonomous systems in safety-critical areas.

Method used

An automated design method for cluster velocity controllers based on gene growth trees is adopted. By representing the target control strategy as a gene growth tree, graph neural networks are used to evaluate and guide its evolution. Monte Carlo tree search is combined to generate a distributed velocity controller, ensuring the high performance and interpretability of the controller.

Benefits of technology

It enables efficient and intelligent searching within a vast policy space, generating distributed controllers that combine high performance and inherent interpretability, thus bridging the gap between performance and interpretability in traditional methods.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122194664A_ABST
    Figure CN122194664A_ABST
Patent Text Reader

Abstract

The application relates to the technical field of robot swarm control and artificial intelligence, in particular to a swarm speed controller automatic design method and device based on a gene growth tree. The method comprises the following steps: representing a control strategy as a gene growth tree which can be solved into a mathematical expression; modeling an evolution process of the gene growth tree as a Markov decision process, wherein a state is a current gene growth tree and an action is an evolution operation; evaluating a graph structure of the gene growth tree by using a graph neural network, wherein the graph neural network comprises a value network and a Q network; prospectively planning an optimal evolution path by using Monte Carlo tree search based on the evaluation of the graph neural network; and iteratively generating a high-performance and interpretable distributed speed controller. The method solves the core contradiction that performance and interpretability are difficult to be compatible in the automatic design of a robot swarm controller.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the fields of automation control and artificial intelligence technology, and in particular to an automated design method and apparatus for a cluster speed controller based on a gene growth tree. Background Technology

[0002] With the development of robotics and artificial intelligence, multi-robot systems are increasingly being used in logistics, inspection, disaster relief, and other fields. The key to unlocking the potential of robot swarms lies in designing efficient collaborative control strategies. Currently, there are two main technical approaches to controller design: one is the high-performance "black box" model represented by multi-agent reinforcement learning, whose decision-making logic is opaque and difficult to verify and trust; the other is the interpretable symbolic model represented by genetic programming, but its search efficiency is low, making it difficult to find high-performance solutions.

[0003] However, current automation design methods cannot guarantee both high controller performance and inherent interpretability, which severely restricts the application of autonomous systems in safety-critical fields. Therefore, a new automation design paradigm is urgently needed to bridge the gap between performance and interpretability. Summary of the Invention

[0004] Therefore, it is necessary to provide an automated design method and apparatus for a gene growth tree-based cluster speed controller that can automatically design high-performance and highly interpretable designs to address the aforementioned technical problems.

[0005] An automated design method for a cluster speed controller based on a gene growth tree, the method comprising:

[0006] The target control strategy is represented as a gene growth tree; wherein the gene growth tree is a tree structure composed of a predefined set of functions and a set of terminals, which can be directly parsed into a mathematical expression; The evolutionary process of the gene growth tree is modeled as a Markov decision process; where the state is the current gene growth tree and the action is the evolutionary operation applied to the gene growth tree. The graph structure of the gene growth tree is evaluated using a graph neural network; wherein the graph neural network includes a value network for evaluating the potential of the gene growth tree state and a Q network for evaluating the utility of evolutionary operations. Based on the evaluation results of the graph neural network, Monte Carlo tree search is used to search in the state space of the Markov decision process to select evolutionary operations and guide the gene growth tree to evolve towards high performance. The Monte Carlo tree search, policy evaluation, and training of the graph neural network are performed iteratively to generate a distributed speed controller for unmanned swarms.

[0007] In one embodiment, the function set of the gene growth tree includes vector operation nodes, scalar operation nodes, and vector synthesis nodes, and the terminal set includes scalar terminals and vector terminals from robot local perception information.

[0008] In one embodiment, the vector operation node includes vector addition, vector subtraction, and vector symmetry operations, and the vector synthesis node is used to combine two scalar inputs into a single vector output and force its two subtrees to output scalar information of magnitude and angle types, respectively.

[0009] In one embodiment, the space of evolutionary operations includes a variety of fine-grained operations, which are classified into one or more of the following: a simplified operation that replaces an internal node with a leaf node, a conversion operation that changes the node type, and a crossover operation that exchanges subtrees between different gene growth trees.

[0010] In one embodiment, the gene growth tree is converted into a graph representation of a node feature matrix and an adjacency matrix; the graph representation is encoded through a multi-layer graph convolutional network to aggregate information about nodes and their neighbors; all updated node representations are aggregated into a single graph-level representation vector through a global pooling layer; and the graph-level representation vector is mapped to a scalar evaluation value through a multi-layer perceptron head.

[0011] In one embodiment, each step of the Monte Carlo tree search simulation includes four steps: selection, expansion, evaluation, and backtracking. In the selection step, the PUCT formula is used to balance the utilization of existing high-value actions with the exploration of underexplored actions. The prior probabilities of the PUCT formula are provided by the Q network.

[0012] In one embodiment, a batch of initial gene growth trees is randomly generated and evaluated; the graph neural network is then pre-trained under supervision using these initial gene growth trees and their fitness data.

[0013] An automated design device for a cluster speed controller based on a gene growth tree, the device comprising: The strategy representation module is used to represent the target control strategy as a gene growth tree; wherein the gene growth tree is a tree structure composed of a predefined set of functions and a set of terminals, which can be directly parsed into a mathematical expression; The process modeling module is used to model the evolution process of the gene growth tree as a Markov decision process; wherein the state is the current gene growth tree, and the action is the evolutionary operation applied to the gene growth tree; The graph evaluation module is used to evaluate the graph structure of the gene growth tree using a graph neural network; wherein the graph neural network includes a value network for evaluating the potential of the gene growth tree state and a Q network for evaluating the utility of evolutionary operations. The search guidance module is used to guide the evolution of the gene growth tree based on the evaluation of the graph neural network using Monte Carlo tree search. The self-game learning module is used to iteratively perform the Monte Carlo tree search, policy evaluation, and training of the graph neural network to generate a distributed speed controller for unmanned swarms.

[0014] A computer device includes a memory and a processor, the memory storing a computer program, and the processor executing the computer program performing the following steps: The target control strategy is represented as a gene growth tree; wherein the gene growth tree is a tree structure composed of a predefined set of functions and a set of terminals, which can be directly parsed into a mathematical expression; The evolutionary process of the gene growth tree is modeled as a Markov decision process; where the state is the current gene growth tree and the action is the evolutionary operation applied to the gene growth tree. The graph structure of the gene growth tree is evaluated using a graph neural network; wherein the graph neural network includes a value network for evaluating the potential of the gene growth tree state and a Q network for evaluating the utility of evolutionary operations. Based on the evaluation results of the graph neural network, Monte Carlo tree search is used to search in the state space of the Markov decision process to select evolutionary operations and guide the gene growth tree to evolve towards high performance. The Monte Carlo tree search, policy evaluation, and training of the graph neural network are performed iteratively to generate a distributed speed controller for unmanned swarms.

[0015] A computer-readable storage medium having a computer program stored thereon, the computer program performing the following steps when executed by a processor: The target control strategy is represented as a gene growth tree; wherein the gene growth tree is a tree structure composed of a predefined set of functions and a set of terminals, which can be directly parsed into a mathematical expression; The evolutionary process of the gene growth tree is modeled as a Markov decision process; where the state is the current gene growth tree and the action is the evolutionary operation applied to the gene growth tree. The graph structure of the gene growth tree is evaluated using a graph neural network; wherein the graph neural network includes a value network for evaluating the potential of the gene growth tree state and a Q network for evaluating the utility of evolutionary operations. Based on the evaluation results of the graph neural network, Monte Carlo tree search is used to search in the state space of the Markov decision process to select evolutionary operations and guide the gene growth tree to evolve towards high performance. The Monte Carlo tree search, policy evaluation, and training of the graph neural network are performed iteratively to generate a distributed speed controller for unmanned swarms.

[0016] The aforementioned automated design method and device for cluster velocity controllers based on gene growth trees organically combine interpretable gene growth tree representation, Monte Carlo tree search guided by evolution, and graph neural network for evaluating controller structure. This enables efficient and intelligent search in a vast policy space, ultimately allowing the autonomous evolution of a distributed controller that possesses both high performance and inherent interpretability, effectively solving the technical problems mentioned in the background. Attached Figure Description

[0017] Figure 1 This is an application scenario diagram of an automated design method for a cluster speed controller based on a gene growth tree, as shown in one embodiment. Figure 2 This is a flowchart illustrating an automated design method for a cluster speed controller based on a gene growth tree in one embodiment. Figure 3 This is a diagram of the GNN model architecture in the first embodiment; Figure 4 This is a flowchart illustrating the overall framework of an automated design method for a cluster speed controller based on a gene growth tree in one embodiment. Figure 5 This is a structural block diagram of an automated design device for a cluster speed controller based on a gene growth tree in one embodiment. Figure 6 This is an internal structural diagram of a computer device in one embodiment. Detailed Implementation

[0018] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.

[0019] The automated design method for cluster speed controllers based on gene growth trees provided in this application can be applied to, for example... Figure 1In the application environment shown, development terminal 102 communicates with design server 104 via a network. Design server 104 is responsible for running the core algorithms of the GNES (Neuro-guided Evolutionary Search) framework, including the evolution of gene growth trees, training of graph neural networks, and Monte Carlo tree search. Development terminal 102 can be used to monitor the training process, configure task parameters, and view and verify the final evolved interpretable controller.

[0020] In one embodiment, such as Figure 2 As shown, an automated design method for cluster speed controllers based on gene growth trees is provided, which can be applied to... Figure 1 Taking the design server in the example, the following steps are included: Step 202: Represent the target control strategy as a gene growth tree.

[0021] A gene growth tree is a tree structure composed of a predefined set of functions and a set of terminals that can be directly parsed into mathematical expressions.

[0022] Specifically, based on a predefined domain-specific language, a tree structure is constructed with a specific symbol as the root node, internal nodes as functions (such as vector addition or cosine functions), and leaf nodes as perceptual information or constants (such as target distance or neighbor direction). This tree can be directly parsed into a mathematical expression through recursive execution and output a two-dimensional acceleration control command.

[0023] Step 204: Model the evolution of the gene growth tree as a Markov decision process.

[0024] Each gene growth tree is considered a state, and each evolutionary operation performed on it, such as replacing an addition node with a multiplication node or deleting a subtree, is considered an action. The evolutionary goal is to find an optimal sequence of actions that gradually evolves the initial state into a high-performance final state.

[0025] Step 206: Use a graph neural network to evaluate the graph structure of the gene growth tree.

[0026] The gene growth tree is converted into a graph data representation and input into a trained graph neural network. This graph neural network includes a value network and a Q network, which can learn the features of the gene growth tree graph structure and output an evaluation value of the tree's potential or a prediction of the gains after performing a certain evolutionary operation.

[0027] Step 208: Based on the evaluation results of the graph neural network, Monte Carlo tree search is used to search in the state space of the Markov decision process to select evolutionary operations and guide the gene growth tree to evolve towards high performance.

[0028] Monte Carlo Tree Search utilizes heuristic information provided by graph neural networks to perform prospective simulations within an action space comprised of evolutionary operations. Through multiple simulations, it statistically identifies the most promising next evolutionary operation from the current gene growth tree state, thereby intelligently planning the evolutionary path.

[0029] Step 210: Iteratively perform Monte Carlo tree search, policy evaluation, and training of the graph neural network to generate a distributed speed controller for unmanned swarms.

[0030] The new gene growth tree generated after executing the actions selected by Monte Carlo tree search is evaluated in a simulation environment to obtain real performance data. This new data is used to continuously train the graph neural network, improving its evaluation accuracy. This cycle repeats, forming a self-improving closed-loop system.

[0031] The aforementioned automated design method for cluster velocity controllers based on gene growth trees ensures that the final controller is transparent and interpretable through this core representation. By reshaping the evolutionary process into a Markov decision process and guiding it with Monte Carlo tree search, it fundamentally solves the problem of low efficiency in traditional genetic programming search. Furthermore, by introducing graph neural networks to directly evaluate the program structure, it provides powerful, data-driven heuristics for Monte Carlo tree search. This deep integration of several innovations enables this method to efficiently and automatically discover cluster control strategies that combine high performance and high interpretability.

[0032] In one embodiment, the function set of the gene growth tree includes vector operation nodes, scalar operation nodes, and vector synthesis nodes, and the terminal set includes scalar terminals and vector terminals from the robot's local perception information.

[0033] Specifically, the function set is carefully designed as a hierarchical structure to match the physical nature of robot motion (velocity, acceleration, etc., are vectors). Vector operation nodes (e.g., $, %, ^) directly process vector data; scalar operation nodes (e.g., +, -, >, <) handle numerical calculations; and vector composition nodes (&) act as a bridge connecting the scalar and vector domains. The terminal set serves as the interface between the controller and the environment, including amplitude-type terminals representing distance, velocity magnitude, etc., and angle-type terminals representing direction, provided in real time by the robot's local perception module.

[0034] In one embodiment, the vector operation node includes vector addition, vector subtraction, and vector symmetry operations, and the vector synthesis node is used to combine two scalar inputs into a single vector output and force its two subtrees to output scalar information of magnitude and angle types, respectively.

[0035] This design enforces the correctness of physical semantics through grammatical rules. For example, when the vector synthesis node '&' is executed, it recursively calculates the output of its first subtree as the magnitude of the vector, calculates the output of its second subtree as the angle of the vector, and then combines the two into a two-dimensional vector. The validity verification module ensures during the evolution process that the '&' node always connects two scalar subtrees and semantically encourages one subtree to output magnitude information and the other to output angle information, which fundamentally prevents the generation of physically invalid controllers.

[0036] More specifically, a gene growth tree can be defined as a rooted, ordered tree. Each node n∈T has an associated symbol s(n).

[0037] If n is an internal node, then s(n) belongs to the function set F. The number (arity) of the node's children is uniquely determined by s(n).

[0038] If n is a leaf node, then s(n) belongs to the terminal set T.

[0039] In the MATLAB implementation, a gene growth tree exists in two forms: 1. String Representation: To facilitate storage and transmission, the gene growth tree is serialized into a string. This encoding method is implemented through breadth-first traversal. For example, a tree representing cos(a+b) might be encoded as '~>+ab'. This compact representation is the primary form of processing within the system.

[0040] 2. Structured Node Array: For computation and analysis, string encoding is parsed into a structured array. Each element in the array represents a node in the tree, containing detailed information such as id, symbol, level, child, and name. This structured representation is fundamental for executing controller logic and performing GNN graph transformations.

[0041] Detailed design of function set F: The design of the function set is crucial to the expressive power of the gene growth tree. A hierarchical function set, incorporating vector and scalar operations, was carefully designed to suit the physical nature of robot motion control.

[0042] Root Node: The root node of all gene growth trees is fixed at ~, and it is interpreted as a multi-way vector adder. This ensures that the final output is always a two-dimensional vector and allows the outputs of multiple sub-policies (subtrees) to be fused in a vector superposition manner.

[0043] Vector Operators: These nodes operate on two-dimensional vectors (represented as [magnitude; angle] in the implementation).

[0044] $(vecaddition): Performs Cartesian addition on the vectors output by all its child nodes.

[0045] % (vecsubtract): Subtracts the vectors output by its two child nodes.

[0046] ^ (vecsymmetric): Reflects the first subvector about the second subvector (as the axis of symmetry). This is a powerful non-linear operation that can produce complex obstacle avoidance or following behaviors.

[0047] &(vecconstruct): This is the bridge connecting the scalar world and the vector world. It takes two scalar quantum inputs, the first as the magnitude of the vector and the second as the direction (angle) of the vector, and outputs a two-dimensional vector.

[0048] Scalar Operators: These nodes operate on scalar values.

[0049] Two inputs: +, -, , / , X (min), Y (max).

[0050] Single input: >(cosd), <(sind), [ (abs), ] (relu), { (sqrt),} (square), |(negate), etc., totaling 12 types. This rich set of functions provides ample building blocks for evolution.

[0051] Detailed design of terminal set T: The terminal set is the interface between the controller and the environment, and it injects the robot's perception information into the computation tree.

[0052] Amplitude Terminals: a, b, c, d, e, O. These symbols represent scalar values ​​such as distance and velocity.

[0053] a: distance_t - Euclidean distance to the target.

[0054] b: distance_n - The magnitude of the combined force vector with neighbors. This is a non-trivial calculation. The srep_neighbor function calculates a resultant force vector, similar to an artificial potential field, by considering all neighbors within the communication range desirenei. Its magnitude represents the combined push / pull strength from the neighbors.

[0055] c: thetanow(1) - The current speed of the robot.

[0056] d: f_dist_err - Radial formation error, which is the difference between the current distance to the target and the desired formation radius desirenei.

[0057] e: A fixed constant value (e.g., 6.0) that provides a usable bias term for evolution.

[0058] O: Scalar zero.

[0059] Angle Terminals: A, B, C, D, E. These symbols represent direction angles (in degrees).

[0060] A: direction_t - The direction angle pointing to the target.

[0061] B: direction_n - The direction angle of the combined force vector from the neighboring region.

[0062] C: thetanow(2) - The robot's current velocity direction.

[0063] D: f_angle_err - Tangential formation error, representing the deviation of the robot from its ideal angular position in a circular formation.

[0064] E: A fixed angular constant (e.g., 90.0 degrees).

[0065] Zero Vector Terminal: 0. Represents a two-dimensional vector [0; 0].

[0066] The execution flow of the gene growth tree: When a GGT controller is executed, the interpreter makes recursive calls starting from the root node (ID=1).

[0067] 1. The execute function selects the corresponding processing function (such as vecaddition, add, funcos, etc.) based on the symbol of the current node.

[0068] 2. If the current node is an internal function node, it will recursively call execute to calculate the values ​​of all its child nodes.

[0069] 3. If the current node is a leaf node (terminal), it will directly extract the corresponding value from robot.f (the environment state vector calculated by cal_env) or a preset value.

[0070] 4. After the child node returns its value, the parent node performs the corresponding mathematical operation and passes the result upwards.

[0071] 5. Finally, the root node collects the two-dimensional vectors returned by all its subtrees and adds them together to obtain the final acceleration command [ax; ay].

[0072] This process dynamically transforms a symbolic tree structure into a specific control signal that acts on the robot's physical model at each time step.

[0073] In summary, the list of node types for the gene growth tree is shown in Table 1. Table 1. Node Definitions of the Gene Growth Tree

[0074] In one embodiment, the space of evolutionary operations includes a variety of fine-grained operations, which are divided into one or more of the following: a simplified operation that replaces an internal node with a leaf node, a conversion operation that changes the node type, and a crossover operation that exchanges subtrees between different gene growth trees.

[0075] Specifically, the core of Monte Carlo tree search is planning within a predefined action space. In this application, an action is an evolutionary operator acting on the gene growth tree. A set of 24 fine-grained operations was designed. This action space is designed to balance structural exploration and semantic preservation, and can be divided into six categories.

[0076] The first type: Transformation to Leaf Node These operations aim to simplify the structure of a tree by reducing complexity by "degrading" a complex internal node (and its entire subtree) into a simple terminal node.

[0077] Action 1 (evotype=1): Leaf node → Leaf node. This is a pure content mutation that randomly changes the sign of a leaf node (e.g., from a to b) without altering the structure of the tree.

[0078] Action 2 (evotype=2): Single-input scalar node → Leaf node. Replace a single-input function node (such as cos) with a random leaf node and delete its original subtree. This is a pruning operation.

[0079] Action 3 (evotype=3): Two-input scalar node → Leaf node. Similar to Action 2, but applied to two-input nodes (such as +), deleting both of their subtrees.

[0080] Action 4 (evotype=4): Vector node → Leaf node. This is a more complex simplification operation that changes the type of the parent node and converts the target node and its sibling nodes into leaf nodes.

[0081] The second category: Transformation to Single-Input ScalarNode These types of operations play an important role in increasing or changing the functional complexity of a tree.

[0082] Action 5 (evotype=5): Leaf node → Single-input scalar node. This is a "growth" operation that replaces a leaf node with a random single-input function and generates a new random leaf node as its child node, thereby increasing the depth of the tree.

[0083] Action 6 (evotype=6): Single-input scalar node → Single-input scalar node. Change the type of a single-input function node (e.g., from cos to sin) while keeping the structure unchanged.

[0084] Action 7 (evotype=7): Two-input scalar node → Single-input scalar node. This is a simplification operation that replaces a two-input node with a single-input node and randomly deletes one of its subtrees.

[0085] Action 8 (evotype=8): Vector node → Single-input scalar node. This is an advanced cross-type conversion that refactors a subtree that processes vectors into a subtree that processes scalars.

[0086] The third category: Transformation to Double-Input ScalarNode These types of operations are primarily used to increase the computational power and complexity of trees.

[0087] Action 9 (evotype=9): Leaf node → Two-input scalar node. "Grows" a leaf node into a two-input function and generates two new leaf nodes as its child nodes.

[0088] Action 10 (evotype=10): Single-input scalar node → Double-input scalar node. Upgrades a single-input function node to a double-input function and adds a new leaf node subtree to it.

[0089] Action 11 (evotype=11): Two-input scalar node → Two-input scalar node. Changes the type of a two-input function node (e.g., from + to ...). ).

[0090] Action 12 (evotype=12): Vector node → Two-input scalar node. Similar to Action 8, but the goal is to convert it to a two-input scalar node.

[0091] The fourth category: Transformation to Vector Node This is one of the most innovative operations in the framework, enabling jumps between different computational domains (scalar and vector domains), allowing the evolutionary process to explore completely different control logic structures.

[0092] Action 13 (evotype=13): Leaf node → Vector node.

[0093] Action 14 (evotype=14): Single-input scalar node → Vector node.

[0094] Action 15 (evotype=15): Double-input scalar node → Vector node.

[0095] Action 16 (evotype=16): Vector node → Vector node.

[0096] The above operation can transform the semantics of the entire subtree by backtracking upwards from a deep scalar computation node and upgrading one of its ancestors, a vector composition node &, to a vector operation node $ or %.

[0097] Fifth category: Simplification to Zero This type of operation is a powerful pruning and simplification technique that can quickly remove invalid or redundant computational branches by replacing a complete subtree with a zero value (scalar O or vector 0).

[0098] Action 17 (evotype=17): Leaf node → Scalar zero (O).

[0099] Action 18 (evotype=18): Single-input scalar node → scalar zero (O).

[0100] Action 19 (evotype=19): Two-input scalar node → scalar zero (O).

[0101] Action 20 (evotype=20): Vector node → Vector zero (0).

[0102] Category 6: Crossover Crossover (implemented by the chromosomecross function) is used to exchange “gene fragments” (i.e., subtrees) between two GGTs, and is key to generating new combinations and facilitating information exchange.

[0103] Action 21 (evotype=21): Swap between leaf nodes of two trees.

[0104] Action 22 (evotype=22): Randomly select two single-input scalar nodes in the two trees and swap their entire subtrees.

[0105] Action 23 (evotype=23): Swap the subtrees of two two-input scalar nodes.

[0106] Action 24 (evotype=24): Swap the subtrees of two vector nodes.

[0107] A key implementation detail is that the crossover operation is preferentially performed between nodes of the same type. This is a homologous recombination strategy that can effectively improve the survival rate and semantic coherence of crossover offspring.

[0108] This meticulously designed, hierarchical action space provides a powerful toolset for Monte Carlo Tree Search, enabling a comprehensive and purposeful exploration of the GGT, from fine-tuning (such as changing leaf nodes) to macroscopic reconstruction (such as scalar-vector transformation).

[0109] Specifically, the descriptions of all actions are shown in Table 2. Table 2. Categories and detailed descriptions of actions

[0110] In one embodiment, evaluating the graph structure of a gene-growing tree using a graph neural network includes: converting the gene-growing tree into a graph representation of node feature matrices and adjacency matrices; encoding the graph representation through a multi-layer graph convolutional network to aggregate information about nodes and their neighbors; aggregating all updated node representations into a single graph-level representation vector through a global pooling layer; and mapping the graph-level representation vector to a scalar evaluation value through a multi-layer perceptron head.

[0111] In its implementation, the node feature matrix employs one-hot encoding of symbols, while the adjacency matrix encodes the parent-child relationships of the tree. The GNN model uses a three-layer graph convolutional network as the encoder, with each layer aggregating neighborhood information through a message-passing mechanism to update the node representation. To ensure training stability and accelerate convergence, residual connections and layer normalization are introduced between the graph convolutional network layers. Then, global average pooling is used to aggregate the final representations of all nodes into a fixed-dimensional graph embedding vector representing the entire gene growth graph. Finally, this graph embedding vector is fed into the head of a multilayer perceptron, outputting the final scalar evaluation value (value or Q-value).

[0112] Specifically, such as Figure 3 As shown, the graph representation of a gene growth tree: Before being input into a GNN, a gene growth tree is first converted into a standard graph representation (X, A). 10 .

[0113] Node feature matrix For a gene growth tree with N nodes, each node is represented as a D-dimensional feature vector. This vector is a one-hot encoding of the node symbol.

[0114] Adjacency Matrix This matrix encodes the parent-child relationships in the gene growth tree, i.e., the topology of the tree.

[0115] GNN Architecture: The Value Network and Q Network share a common core architecture, as follows: Figure 3 As shown in the figure, a gene growing tree is transformed into a node feature matrix X and an adjacency matrix A, which are then fed into a network containing a multi-layer graph convolutional network, a global pooling layer, and an MLP head, ultimately outputting a scalar prediction value.

[0116] The GNN architecture consists of three main parts: 1. Graph Convolutional Layers: The model contains 3 GCN layers. The core operation of each GCN layer is: Here, H(l) is the node embedding representation of the l-th layer, W(l) is the learnable weight matrix, and σ is the activation function (ReLU). Intuitively, this operation means that the new representation of each node is a weighted average of its own representation and the representations of its neighboring nodes, thus achieving local aggregation of information.

[0117] 2. Residual Connections: To train deeper networks and avoid gradient vanishing, residual connections are added between GCN layers. .

[0118] 3. Layer Normalization: After each GCN layer, we apply layer normalization to stabilize the training process and accelerate convergence.

[0119] 4. Readout Layer: After multiple layers of GCN processing, each node has an information-rich embedding vector. Using global average pooling, the embedding vectors of all nodes are averaged to obtain a fixed-size vector representing the entire GGT graph, called graph embedding.

[0120] MLP Head: Finally, this graph embedding is fed into a multilayer perceptron (MLP), which consists of several fully connected layers and ultimately outputs a scalar value—that is, a value or policy prediction for the GGT.

[0121] Dual-head network definition: Value Network (V-Network): Its training objective is to minimize the mean squared error (MSE) between the predicted fitness and the true fitness. The training data format is (GGT_graph, True_Fitness).

[0122] Q-Network: Its training objective is to minimize the mean squared error between the predicted fitness gain and the true fitness gain. The training data format is ((GGT_graph, Action), True_Fitness_Gain).

[0123] The training process is driven by the adamupdate optimizer and includes L2 regularization to prevent overfitting.

[0124] In one embodiment, each step of the Monte Carlo tree search simulation includes four steps: selection, expansion, evaluation, and backtracking. In the selection step, the PUCT formula is used to balance the utilization of existing high-value actions with the exploration of underexplored actions. The prior probabilities of the PUCT formula are provided by the Q network.

[0125] In this embodiment, as Figure 4 As shown, the implementation process of the MCTS algorithm is as follows.

[0126] 1. Initialization: Create a root node MCTSNode, which represents the gene growth tree to be optimized.

[0127] 2. Root node expansion: Call getPossibleActionsAndStates to get all possible evolutionary actions. and the next state it produces .

[0128] For each action Use Q-Network to predict its value .

[0129] All The values ​​are transformed into a prior probability distribution using the Softmax function. .

[0130] To encourage exploration, Dirichlet noise is applied to this probability distribution.

[0131] Use these noisy prior probabilities to expand the root node.

[0132] 3. Simulation Loop: Executes numSimulations simulations.

[0133] Selection: Starting from the root node, select child nodes using the PUCT formula. The PUCT formula is: ,in . This is an exploration item that rewards actions with few visits and high prior probabilities. Q(s,a) is the average value estimate of the action a performed by the current node s, and N(s,a) is the number of times the action is visited. It is from Q network The provided prior probabilities (normalized by softmax) It is a hyperparameter that controls the degree of exploration.

[0134] Evaluation and Expansion: When a leaf node sL that has not been fully expanded is selected, its value V(sL) is evaluated using the V-Network. Then, as with expanding the root node, the prior probabilities of the action are obtained using the Q-Network, and the leaf node is expanded.

[0135] Backtracking: Propagate the value of V(sL) backward along the selected path, and update the number of visits N(s,a) and total value W(s,a) of all nodes on the path (Q(s,a)=W(s,a) / N(s,a)).

[0136] 4. Final Decision: After the simulation, the final evolutionary action is selected based on the number of visits to the root node's child nodes. In the early stages of training, we perform random sampling based on the power of the number of visits (controlled by the temperature parameter mcts_temperature) to ensure diversity of exploration; in the later stages, we deterministically select the action with the most visits (greedy selection).

[0137] Through the close collaboration of these three modules, the symbolic evolution process of GP is transformed from a blind, random search into a data-driven, thoughtful, and goal-oriented planning process.

[0138] Phase switching: The script controls the execution of different phases through variables such as RUN_EVOLUTION_PHASE and cycle_iter.

[0139] The Replay Buffer consists of two cell arrays, `historical_q_data_pool` and `historical_v_data_pool`, used to store training samples generated by MCTS. The script includes logic to limit the pool size (`REPLAY_BUFFER_MAX_SIZE`), discarding the oldest data when the pool is full. This is standard practice in reinforcement learning, helping the algorithm track non-stationary optimal policies.

[0140] Dynamic Hyperparameters: The script demonstrates a clever design—using different MCTS hyperparameters in different phases of the main loop. In the first half (cycle_iter <= 0.5)... NUM_OVERALL_CYCLES) uses a higher exploration coefficient. It sets the temperature (mcts_temperature) and enables Dirichlet noise to encourage exploration. In the latter half, it lowers these parameters and eventually switches to greedyLocalSearch to enhance utilization, thus converging to the optimal solution.

[0141] Parallelization: The MCTS search process is parallelized using a parfor loop. In each iteration, the system initiates an independent MCTS search in parallel for multiple elite GGT controllers. This greatly utilizes the computing power of multi-core CPUs and is key to making this computationally intensive algorithm feasible.

[0142] In one embodiment, prior to the self-game cycle, the method further includes an initial data generation phase: randomly generating and evaluating a batch of gene growth trees by running a standard genetic algorithm; and using the batch of gene growth trees and their fitness data to perform supervised pre-training on the graph neural network.

[0143] It should be understood that, although Figure 2 The steps in the flowchart are shown sequentially as indicated by the arrows, but these steps are not necessarily executed in the order indicated by the arrows. Unless otherwise specified herein, there is no strict order in which these steps are executed, and they can be performed in other orders. Figure 2 At least some of the steps in the process may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily completed at the same time, but can be executed at different times. The execution order of these sub-steps or stages is not necessarily sequential, but can be executed in turn or alternately with other steps or at least some of the sub-steps or stages of other steps.

[0144] In one embodiment, such as Figure 5 As shown, an automated design device for a cluster speed controller based on a gene growth tree is provided, including: a policy representation module 502, a process modeling module 504, a graph evaluation module 506, a search guidance module 508, and a self-game learning module 510, wherein: The strategy representation module 502 is used to represent the target control strategy as a gene growth tree; wherein the gene growth tree is a tree structure composed of a predefined set of functions and a set of terminals, which can be directly parsed into a mathematical expression; The process modeling module 504 is used to model the evolution process of the gene growth tree as a Markov decision process; wherein the state is the current gene growth tree, and the action is the evolutionary operation applied to the gene growth tree; Graph evaluation module 506 is used to evaluate the graph structure of the gene growth tree using a graph neural network; wherein the graph neural network includes a value network for evaluating the potential of the gene growth tree state and a Q network for evaluating the utility of evolutionary operations. Search guidance module 508 is used to guide the evolution of the gene growth tree using Monte Carlo tree search based on the evaluation of the graph neural network. The self-game learning module 510 is used to iteratively perform the Monte Carlo tree search, policy evaluation, and training of the graph neural network to generate a distributed speed controller for unmanned swarms.

[0145] Specific limitations regarding the automated design device for cluster speed controllers based on gene growth trees can be found in the limitations of the automated design method for cluster speed controllers based on gene growth trees mentioned above, and will not be repeated here. Each module in the aforementioned automated design device for cluster speed controllers based on gene growth trees can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in or independent of the processor in a computer device in hardware form, or stored in the memory of a computer device in software form, so that the processor can call and execute the corresponding operations of each module.

[0146] In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as follows: Figure 6 As shown, the computer device includes a processor, memory, network interface, and database connected via a system bus. The processor provides computing and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system, computer programs, and database. The internal memory provides the environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The database stores data related to environment configuration. The network interface communicates with external terminals via a network connection. When the computer program is executed by the processor, it implements an intelligent environmental analysis method based on an intelligent model.

[0147] Those skilled in the art will understand that Figure 6 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer device to which the present application is applied. Specific computer devices may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.

[0148] In one embodiment, a computer device is provided, including a memory and a processor, the memory storing a computer program, the processor executing the computer program to implement the steps of the method described above.

[0149] In one embodiment, a computer-readable storage medium is provided having a computer program stored thereon, which, when executed by a processor, implements the steps of the method described above.

[0150] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium. When executed, the computer program can include the processes of the embodiments of the above methods. Any references to memory, storage, databases, or other media used in the embodiments provided in this application can include non-volatile and / or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link DRAM (SLDRAM), RAMbus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

[0151] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

[0152] The embodiments described above are merely illustrative of several implementation methods of this application, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of the invention. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this application should be determined by the appended claims.

Claims

1. An automated design method for a cluster speed controller based on a gene growth tree, characterized in that, The method includes: The target control strategy is represented as a gene growth tree; wherein the gene growth tree is a tree structure composed of a predefined set of functions and a set of terminals, which can be directly parsed into a mathematical expression; The evolutionary process of the gene growth tree is modeled as a Markov decision process; where the state is the current gene growth tree and the action is the evolutionary operation applied to the gene growth tree. The graph structure of the gene growth tree is evaluated using a graph neural network; wherein the graph neural network includes a value network for evaluating the potential of the gene growth tree state and a Q network for evaluating the utility of evolutionary operations. Based on the evaluation results of the graph neural network, Monte Carlo tree search is used to search in the state space of the Markov decision process to select evolutionary operations and guide the gene growth tree to evolve towards high performance. The Monte Carlo tree search, policy evaluation, and training of the graph neural network are performed iteratively to generate a distributed speed controller for unmanned swarms.

2. The method according to claim 1, characterized in that, The function set of the gene growth tree includes vector operation nodes, scalar operation nodes, and vector synthesis nodes, and the terminal set includes scalar terminals and vector terminals from the robot's local perception information.

3. The method according to claim 2, characterized in that, The vector operation node includes vector addition, vector subtraction and vector symmetry operations. The vector synthesis node is used to combine two scalar inputs into a single vector output and forces its two subtrees to output scalar information of magnitude and angle types, respectively.

4. The method according to claim 1, characterized in that, The space of evolutionary operations includes a variety of fine-grained operations, which are classified into one or more of the following: simplified operations that replace internal nodes with leaf nodes, conversion operations that change node types, and crossover operations that exchange subtrees between different gene growth trees.

5. The method according to claim 1, characterized in that, The graph structure of the gene growth tree is evaluated using a graph neural network, including: The gene growth tree is converted into a graph representation of node feature matrices and adjacency matrices; The graph representation is encoded using a multi-layer graph convolutional network to aggregate information about nodes and their neighbors; The updated node representations are aggregated into a single graph-level representation vector through a global pooling layer. The graph-level representation vector is mapped to a scalar evaluation value through the multilayer perceptron head.

6. The method according to claim 1, characterized in that, Each step of the Monte Carlo tree search simulation includes four steps: selection, expansion, evaluation, and backtracking. In the selection step, the PUCT formula is used to balance the utilization of existing high-value actions with the exploration of underexplored actions. The prior probabilities of the PUCT formula are provided by the Q network.

7. The method according to any one of claims 1 to 6, characterized in that, The method also includes an initial data generation phase: Randomly generate and evaluate a batch of initial gene growth trees; The graph neural network was pre-trained under supervision using the initial gene growth tree and its fitness data.

8. An automated design device for a cluster speed controller based on a gene growth tree, characterized in that, The device includes: The strategy representation module is used to represent the target control strategy as a gene growth tree; wherein the gene growth tree is a tree structure composed of a predefined set of functions and a set of terminals, which can be directly parsed into a mathematical expression; The process modeling module is used to model the evolution process of the gene growth tree as a Markov decision process; wherein the state is the current gene growth tree, and the action is the evolutionary operation applied to the gene growth tree; The graph evaluation module is used to evaluate the graph structure of the gene growth tree using a graph neural network; wherein the graph neural network includes a value network for evaluating the potential of the gene growth tree state and a Q network for evaluating the utility of evolutionary operations. The search guidance module is used to guide the evolution of the gene growth tree based on the evaluation of the graph neural network using Monte Carlo tree search. The self-game learning module is used to iteratively perform the Monte Carlo tree search, policy evaluation, and training of the graph neural network to generate a distributed speed controller for unmanned swarms.