Single-cell fate mapping method and system based on monte carlo tree search

By employing a single-cell fate planning method based on Monte Carlo tree search, and utilizing a single-cell autoencoder and policy network to autonomously explore in a virtual environment and optimally plan cell state transitions, this method solves the problems of low efficiency and high computational cost in traditional methods, and achieves precise induction of cell states.

CN122245433APending Publication Date: 2026-06-19TONGJI UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
TONGJI UNIV
Filing Date
2026-02-13
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing cell fate regulation technologies suffer from inefficiency, lack of decision-making capabilities in prediction models, and inability to plan nonlinear paths. Traditional methods experience an explosive increase in computational load in scenarios involving multi-step sequential drug administration and multi-drug combination therapy, making it difficult to achieve precise induction of cell state.

Method used

A single-cell fate planning method based on Monte Carlo tree search is adopted. A digital twin representation is constructed through a single-cell autoencoder. Combined with a policy network and a value network, the Monte Carlo tree search algorithm is used to autonomously explore and design the optimal biological intervention scheme in a virtual environment, realizing causal reasoning and path planning.

Benefits of technology

It achieves a paradigm shift from prediction to reverse design, can accurately focus on high-value perturbation paths, solves the problem of computational explosion in multi-step/multi-drug combination scenarios, supports multi-step path planning, conforms to the nonlinear evolution law of cells, and improves efficiency and accuracy.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122245433A_ABST
    Figure CN122245433A_ABST
Patent Text Reader

Abstract

This invention relates to a single-cell fate planning method and system based on Monte Carlo tree search. The method includes: S1, receiving the initial cell state and target cell state of a single cell; S2, constructing a digital twin representation of the cell as the state space for the planning process; S3, initializing the search tree; S4, selecting the node with the most exploration value from the child nodes of the current node as the target node for the current deduction; S5, predicting the state vector of the single cell evolving over time in the potential space using a single-cell perturbation prediction model; S6, doubly evaluating the gains of the state vector of the single cell evolving over time in the potential space; S7, updating the core parameters of the node based on the evaluation results. If the preset number of iterations has not been reached, return to S4; otherwise, output the optimal perturbation action instruction based on the visit distribution of the root node of the search tree. Compared with the prior art, this invention realizes a paradigm shift from predicting cell states to reverse-engineering cell fate.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of computational biology, generative artificial intelligence, and intelligent planning technology, and in particular to a single-cell fate planning method and system based on Monte Carlo tree search. Background Technology

[0002] The core objective of cell fate regulation is to precisely induce cells from their initial state (such as stem cells, diseased cancer cells, and undifferentiated cells) to a target state (such as functional somatic cells, normal phenotype cells, and specific killer immune cells) through external artificial intervention (such as chemical drug stimulation, CRISPR gene editing, gene overexpression / knockdown, etc.). Technological breakthroughs in this area can drive the development of major fields such as tissue and organ regeneration, reversible cancer treatment, and the research and development of universal cell therapy products. It is the core direction for the biopharmaceutical industry to upgrade from "passive treatment" to "active design".

[0003] In regenerative medicine, cancer treatment, and drug development, a core challenge facing scientists is: "How to precisely induce cells from one state to another through external interventions (such as drugs and gene editing)."

[0004] Existing solutions have significant limitations: 1. The "trial and error" method is inefficient: Traditional wet screening relies on high-throughput screening (HTS), but when faced with thousands of drugs and their exponentially growing combinations (cocktails), the exhaustive method is unacceptable in terms of time and cost.

[0005] 2. Predictive models lack decision-making ability: Most existing AI models (such as CPA and scGen) are "predictors," meaning they take "cells + drugs" as input and output "results." They cannot answer the reverse question, namely, "what drug should be used to achieve the result?" While this can be solved by brute-force searching for all drugs, the computational load will explode for multi-step sequential treatment or multi-drug combinations.

[0006] 3. Nonlinearity in biological processes: Cell fate determination often involves traversing complex energy barriers (Waddington Landscape). Single linear perturbations are often insufficient to achieve state transitions; a specific sequence of perturbations is required. Existing linear models cannot plan such nonlinear paths.

[0007] Therefore, there is an urgent need for an intelligent system with causal reasoning and path planning capabilities, which can autonomously explore, deduce, and design the optimal biological intervention plan in a virtual environment. Summary of the Invention

[0008] The purpose of this invention is to overcome the shortcomings of the existing technology and provide a single-cell fate planning method and system based on Monte Carlo tree search.

[0009] The objective of this invention can be achieved through the following technical solutions: A single-cell fate planning method based on Monte Carlo tree search, the method comprising: Step S1: Receive the initial cell state and target cell state of a single cell; Step S2: Use a single-cell autoencoder to compress the whole-genome expression data of a single cell into a low-dimensional dense latent state vector, and construct a digital twin representation of the cell as the state space of the planning process. Step S3: Based on the state space, the initial potential state vector of a single cell is used as the root node of the Monte Carlo tree search to initialize the search tree; Step S4: Based on the current potential state vector of a single cell, and combining the parameters of existing nodes in the search tree and the posterior probability of the policy network, select the node with the most exploration value from the child nodes of the current node as the target node for the current inference. Step S5: Based on the current potential state vector of a single cell and the perturbation action command obtained automatically in the space, the single cell perturbation prediction model is used to predict the state vector of the single cell in the potential space as time evolves. Step S6: Based on the value probe and the value network, the gains of the single cell in the state vector evolving over time in the potential space are evaluated in a dual manner. Step S7: Based on the results of the dual evaluation, update the core parameters of all nodes on the entire path of the current inference node in the search tree. If the preset number of loops has not been reached, return to step S4; otherwise, output the optimal perturbation action instruction based on the access distribution of the root node of the search tree.

[0010] Furthermore, in step S5, the process of automatically acquiring the disturbance action command in space includes: Extracting action embedding vectors from the action space; The action embedding vector is interacted with the single-cell latent state through an attention mechanism to obtain perturbation action instructions.

[0011] Furthermore, the action space is a unified embedding vocabulary, and the action embedding vectors in the action space include: Chemical drugs: corresponding to preset small molecule compound IDs; Gene editing: Target gene ID corresponding to CRISPR a / i; Gene overexpression: corresponding to the target factor ID of the transgene or activation.

[0012] Furthermore, the policy network is a multi-layer Transformer Encoder structure. After receiving the current potential state vector of a single cell, the policy network outputs a multi-dimensional posterior probability distribution vector of all perturbation actions in the action space. Each posterior probability distribution vector value corresponds to a child node in the search tree.

[0013] Furthermore, in step S4, the process of selecting the node with the most exploration value from the child nodes of the current node includes: Input the current potential state vector of the single cell into the policy network to obtain the multidimensional posterior probability distribution vector of all perturbation actions. Each posterior probability distribution vector value corresponds to a child node in the search tree. The node access count of each node in the search tree is counted. The node access count includes the total access count of each parent node and the historical access count of each child node in the search tree. The exploration value is determined based on the posterior probability distribution vector value and the node access count. The higher the posterior probability distribution vector value, the higher the exploration value; the lower the node access count, the higher the exploration value.

[0014] Furthermore, the value probe consists of two MLP layers, and the value network is a multilayer perceptron MLP structure; After obtaining the state vector of a single cell evolving over time in the potential space, the value probe predicts the expression level of key cell state markers, and based on the predicted expression level of key cell state markers, calculates a quantified immediate reward value according to the rules set by the biological objective, and outputs an immediate reward signal.

[0015] Furthermore, in step S6, the process of dually evaluating the gain of the single cell's state vector evolving over time in the potential space based on both the value probe and the value network includes: The state vector of the single cell evolving over time in the potential space is input into the value probe to obtain an instant reward signal. The state vector of the single cell evolving over time in the potential space is input into the value network to obtain the long-term expected return for reaching the target cell state. The instantaneous reward signal and long-term expected return are standardized and aligned, and a comprehensive return value is obtained through weighted fusion calculation and output.

[0016] Furthermore, if the target cell state requires short-term intervention, the output optimal perturbation action command is the optimal single-step perturbation action; if the target cell state requires nonlinear cell state transition, the output optimal perturbation action command is the optimal multi-step perturbation sequence.

[0017] A single-cell fate planning system based on Monte Carlo tree search, the system comprising: The information receiving module is used to receive the initial cell state and the target cell state of a single cell. The state-aware module is used to construct a digital twin representation of cells. It uses a single-cell autoencoder to compress the whole-genome expression data of a single cell into a low-dimensional dense latent state vector, and constructs the digital twin representation of the cell as the state space of the planning process. The dynamics deduction module is used as an environment simulator for the system. Based on the current potential state vector of a single cell and the perturbation action command obtained automatically in space, it uses a single cell perturbation prediction model to predict the state vector of a single cell in the potential space over time. The intelligent decision-making module is equipped with a policy network and a value network, and combines the Monte Carlo tree search algorithm to select the node with the most exploration value from the child nodes of the current node as the target node for the current inference, and to doubly evaluate the gains of the state vector of a single cell in the potential space over time, and output the optimal perturbation action command.

[0018] A computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the single-cell fate planning method based on Monte Carlo tree search as described above.

[0019] Compared with the prior art, the beneficial effects of the present invention include: 1. This invention addresses the core pain points of traditional biological experiments and existing AI models in cell fate intervention design, achieving a paradigm shift from predicting cell states to reverse-engineering cell fate. Existing AI models are merely predictors, only able to input cells + perturbations and output results, unable to answer the reverse question of what perturbation should be applied to achieve the goal. This invention achieves forward-looking path planning through an intelligent decision-making module, actively searching for optimal solutions in the perturbation combination space, completing the core paradigm shift from passive prediction to proactive reverse design. Addressing the issue that cell fate determination requires overcoming energy barriers and that a single linear perturbation cannot achieve state transitions, this invention, based on the Monte Carlo tree search algorithm, supports multi-step path planning with a search depth greater than 1. It can discover the synergistic effect of perturbations A followed by B, designing combined intervention schemes that conform to the nonlinear evolution of cells, overcoming the planning limitations of traditional linear models. This invention combines the posterior probability of policy networks with the exploit-exploration balance principle of the Monte Carlo tree search algorithm, eliminating the need for exhaustive searches of thousands of perturbation combinations, and accurately focusing on high-value perturbation paths, solving the problem of explosive growth in computational load in multi-step / multi-drug combination scenarios.

[0020] 2. This invention compresses high-dimensional sparse single-cell whole-genome expression data into low-dimensional, dense latent state vectors, constructing a digital twin representation of cells. This not only preserves the core biological characteristics of cell states but also transforms complex gene data into a low-dimensional space suitable for algorithm deduction, solving the problem that high-dimensional data is difficult to directly perform path planning.

[0021] 3. This invention defines cell state as state space, external disturbance as action space, and state evolution as state transition, providing a standardized algorithm framework for intelligent decision-making. Unlike the pure prediction logic of existing AI models, it abstracts cell fate planning into a reinforcement learning problem, achieving a core breakthrough in decision-making capabilities.

[0022] 4. To address the efficiency requirements of multi-round deduction in the Monte Carlo Tree Search algorithm, this invention designs a simplified value probe with a 2-layer MLP, which can directly predict the expression level of key biomarkers from the latent state in milliseconds and generate instant reward values, skipping the large model decoding step and solving the problem of computational time consumption in traditional evaluation methods.

[0023] 5. The value network of this invention focuses on the long-term evolutionary potential of new cell states, complementing the immediate assessment of value probes. This avoids the system from overlooking nonlinear perturbation paths with long-term value by focusing only on the immediate effects of a single perturbation, making the assessment results more consistent with the biological laws of cell fate evolution. Attached Figure Description

[0024] Figure 1 This is a flowchart of the method of the present invention. Detailed Implementation

[0025] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.

[0026] Example 1 This embodiment discloses a single-cell fate planning method based on Monte Carlo Tree Search (MCTS), the method as follows: Figure 1 As shown, steps S1-S7 are included, and each step is described in detail below: Step S1: Receive the initial cell state and target cell state of the single cell.

[0027] In practical use, before receiving the initial cell state and target cell state of a single cell, the pre-trained single-cell autoencoder and perturbation prediction model weights are loaded to construct a virtual cell environment and complete the unified multimodal action space configuration. The action space is used to generate subsequent perturbation action instructions.

[0028] The action space is specifically a unified embedding vocabulary, and the action embedding vectors in the action space include: Chemical drugs: corresponding to preset small molecule compound IDs; Gene editing: Target gene ID corresponding to CRISPR a / i; Gene overexpression: corresponding to the target factor ID of the transgene or activation.

[0029] In this embodiment, the constructed action space includes small molecule fingerprint codes for 2,000 FDA-approved drugs, overexpression instruction codes for 1,000 transcription factors, and CRISPRi and CRISPR a instruction codes for 2,000 key genes.

[0030] Step S1 is used to clearly specify the starting cell state and the target cell state that need to be planned for fate, and to determine the core direction and ultimate goal of the plan.

[0031] Step S2: The high-dimensional, sparse whole-genome expression data of a single cell is compressed into a low-dimensional, dense latent state vector using a single-cell autoencoder, and a digital twin representation of the cell is constructed as the state space (LatentState) of the planning process.

[0032] Single-cell autoencoders are either existing or self-trained single-cell autoencoders built based on existing technologies, which will not be elaborated here.

[0033] Step S3: Based on the state space, the initial potential state vector of a single cell is used as the root node of the Monte Carlo tree search to initialize the search tree.

[0034] The root node serves as the starting point for exploring all subsequent perturbation paths. Its relevant parameters (visit count, Q value / action value) are initialized to default values, providing a structural basis for subsequent multiple rounds of iterative deduction.

[0035] Step S4: Based on the current potential state vector of a single cell, and combining the parameters of existing nodes in the search tree and the posterior probability of the policy network, select the node with the most exploration value from the child nodes of the current node as the target node for the current inference.

[0036] The policy network is a multi-layer Transformer Encoder structure. After receiving the current potential state vector of a single cell, the policy network outputs a multi-dimensional posterior probability distribution vector of all perturbation actions in the action space. Each posterior probability distribution vector value corresponds to a child node in the search tree.

[0037] In this embodiment, the policy network (Policy Network) ): Employs a 3-layer Transformer Encoder structure. Input: Current latent state The output is a probability distribution vector (Logits) with a dimension of 5000, representing the posterior probability of taking each action in the current state.

[0038] Step S4, the process of selecting the most valuable node for exploration from the child nodes of the current node, includes: Input the current potential state vector of a single cell into the policy network to obtain the multidimensional posterior probability distribution vector of all perturbation actions. Each posterior probability distribution vector value corresponds to a child node in the search tree. Count the number of visits to each node in the search tree. The number of visits to each node includes the total number of visits to each parent node and the historical number of visits to each child node. The exploration value is determined based on the posterior probability distribution vector value and the node visit count. The higher the posterior probability distribution vector value, the higher the exploration value; the lower the node visit count, the higher the exploration value.

[0039] The posterior probability represents the model's innate potential, based on the biological knowledge learned during training, to determine the perturbation action corresponding to the sub-node, enabling the cell to evolve from its current state to a target state. The higher the posterior probability, the greater the likelihood that the sub-node will drive the cell to achieve its destiny planning goal in the model's understanding, and the higher the posterior exploration and utilization value.

[0040] Obtaining the posterior probability involves using existing model learning results to prioritize child nodes that appear more likely to succeed.

[0041] Node visit count is a dynamic statistic of the search tree exploration process. The fewer the historical visits of a child node, the fewer times the perturbation path has been virtually deduced, and the higher its potential value has not been fully explored. Conversely, child nodes with too many visits have had their value fully verified and do not need to be explored first.

[0042] Statistical node access counting is used to explore unknown perturbation paths and prevent the system from focusing only on a few child nodes with high posterior probabilities, thus avoiding getting stuck in local optima. For example, some cross-modal perturbation sequences with low posterior probabilities but which can actually achieve cell state transitions will be missed if they are not explored.

[0043] In some implementations, when determining the exploration value by combining the posterior probability distribution vector value and the node visit count, the qualitative dimension is not relied upon alone. Instead, the posterior probability of the policy network is integrated into the classic Upper Confidence Bound (UCB) algorithm of Monte Carlo Tree Search (MCTS). The exploration value is determined by quantifying the UCB value of each child node. The higher the UCB value, the higher the exploration value of the child node.

[0044] The exploration value of a child node is not a fixed value, but is dynamically updated with each round of Monte Carlo tree search deduction and backtracking steps. After each round of deduction, the system will backtrack and update the nodes on the search tree path, synchronously modifying the Q value (action value) and n value (visit count) of the child nodes, and the N value of the parent node will also increase synchronously. In the next round of selection, the UCB value of all child nodes will be recalculated based on the updated parameters, and the child node with the highest value will be selected again to realize the dynamic iteration of exploration value determination.

[0045] In another embodiment, when determining the exploration value by combining the posterior probability distribution vector value and the node visit count, the exploration value of the child nodes is calculated by using the Polynomial Upper Confidence Trees (PUCT).

[0046] Step S5: Based on the current potential state vector of a single cell and the perturbation action instructions obtained automatically in the space, the single cell perturbation prediction model is used to predict the state vector of the single cell evolving over time in the potential space.

[0047] Step S5, the process of automatically obtaining spatial disturbance action commands includes: Extracting action embedding vectors from the action space; The action embedding vector is interacted with the single-cell latent state through an attention mechanism to obtain perturbation action instructions.

[0048] The specific interaction process is as follows: each action embedding vector is mapped to a vector with a dimension of 768, which is then used as an additional token to be concatenated to the front end of the cell's potential state sequence during simulation.

[0049] Step S6: Based on the value probe and the value network, the gains of a single cell in the state vector that evolves over time in the potential space are evaluated in a dual manner.

[0050] After obtaining the state vector of a single cell evolving over time in the potential space, the value probe predicts the expression level of key cell state markers, and based on the predicted expression level of key cell state markers, calculates the quantified immediate reward value according to the rules set by the biological objective, and outputs the immediate reward signal.

[0051] The lightweight value probe consists of a minimal two-layer MLP with only 50,000 parameters. It is specifically designed to predict the expression levels of a specific set (e.g., 50) of key cell state markers from latent states in milliseconds, serving as an immediate reward signal.

[0052] Value Network (V): Employs a Multilayer Perceptron (MLP) structure. Input is the current latent state. The output is a scalar in the range [-1, 1], used to estimate the long-term expected return of the current state to the target state.

[0053] In step S6, the process of dually evaluating the gains of a single cell's state vector evolving over time in the potential space based on value probes and value networks includes: Input the state vector of a single cell evolving over time in the potential space into a value probe to obtain an immediate reward signal; Input the state vector of a single cell evolving over time in the potential space into the value network to obtain the long-term expected return for reaching the target cell state; The instant reward signal and long-term expected return are standardized and aligned, and a comprehensive return value is obtained through weighted fusion calculation and output.

[0054] Step S7: Based on the results of the dual evaluation, update the core parameters of all nodes on the entire path of the current inference node in the search tree. If the preset number of loops has not been reached, return to step S4; otherwise, output the optimal perturbation action instruction based on the access distribution of the root node of the search tree.

[0055] The core parameters of all nodes along the entire path of the currently inferred node in the updated search tree include the node's Q-value (action value) and the node's historical visit count. The Q-value is specifically evaluated by the value network and the reward probe. The higher the overall benefit value, the higher the Q-value, and the better the perturbation action.

[0056] If the target cell state requires short-term intervention, the optimal perturbation action command output is the optimal single-step perturbation action. If the target cell state requires nonlinear cell state transition, the optimal perturbation action command output is the optimal multi-step perturbation sequence.

[0057] Example 2 This embodiment, based on Embodiment 1 above, discloses a single-cell fate planning system based on Monte Carlo tree search. The system includes: The information receiving module is used to receive the initial cell state and the target cell state of a single cell. The state-aware module is used to construct a digital twin representation of cells. It uses a single-cell autoencoder to compress the whole-genome expression data of a single cell into a low-dimensional dense latent state vector, and constructs the digital twin representation of the cell as the state space of the planning process. The dynamics deduction module is used as an environment simulator for the system. Based on the current potential state vector of a single cell and the perturbation action command obtained automatically in space, it uses a single cell perturbation prediction model to predict the state vector of a single cell in the potential space over time. The intelligent decision-making module is equipped with a policy network and a value network, and combines the Monte Carlo tree search algorithm to select the node with the most exploration value from the child nodes of the current node as the target node for the current inference, and to doubly evaluate the gains of the state vector of a single cell in the potential space over time, and output the optimal perturbation action command.

[0058] The system also includes an additional reward module for scoring the deduced cell states.

[0059] For details regarding the above modules, please refer to the relevant descriptions and effects in Example 1 for further understanding.

[0060] The core idea of ​​this invention is to abstract cell biology problems into reinforcement learning (RL) problems. The system constructs a single-cell virtual environment consisting of an autoencoder and a perturbation prediction model, and uses the Monte Carlo Tree Search (MCTS) algorithm as an agent to perform prospective search and planning in the potential space.

[0061] Example 3 Based on Embodiment 1, this embodiment provides an electronic device, including: one or more processors and a memory, wherein the memory stores one or more programs, and the one or more programs include instructions for executing the single-cell fate planning method based on Monte Carlo tree search as described above.

[0062] At the hardware level, the electronic device includes a processor, internal bus, network interface, memory, and non-volatile memory, and may also include other hardware required for business operations. The processor reads the corresponding computer program from the non-volatile memory into memory and then runs it to implement the single-cell fate planning method based on Monte Carlo tree search described above. Of course, in addition to software implementation, this invention does not exclude other implementation methods, such as logic devices or a combination of hardware and software, etc. That is to say, the execution subject of the following processing flow is not limited to individual logic units, but can also be hardware or logic devices.

[0063] Memory may include non-persistent storage in computer-readable media, such as random access memory (RAM) and / or non-volatile memory, such as read-only memory (ROM) or flash RAM. Memory is an example of computer-readable media.

[0064] Computer-readable media include both permanent and non-permanent, removable and non-removable media that can store information using any method or technology. Information can be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, disk storage or other magnetic storage devices, or any other non-transferable medium that can be used to store information accessible by a computing device. As defined herein, computer-readable media does not include transient computer-readable media, such as modulated data signals and carrier waves.

[0065] The above description is merely a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any person skilled in the art can easily conceive of various equivalent modifications or substitutions within the technical scope disclosed in the present invention, and these modifications or substitutions should all be covered within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.

Claims

1. A single-cell fate planning method based on Monte Carlo tree search, characterized in that, The method includes: Step S1: Receive the initial cell state and target cell state of a single cell; Step S2: Use a single-cell autoencoder to compress the whole-genome expression data of a single cell into a low-dimensional dense latent state vector, and construct a digital twin representation of the cell as the state space of the planning process. Step S3: Based on the state space, the initial potential state vector of a single cell is used as the root node of the Monte Carlo tree search to initialize the search tree; Step S4: Based on the current potential state vector of a single cell, and combining the parameters of existing nodes in the search tree and the posterior probability of the policy network, select the node with the most exploration value from the child nodes of the current node as the target node for the current inference. Step S5: Based on the current potential state vector of the single cell and the perturbation action command for automatic spatial acquisition, use the single cell perturbation prediction model to predict the state vector of the single cell evolving over time in the potential space. Step S6: Based on the value probe and the value network, the gains of the single cell in the state vector evolving over time in the potential space are evaluated in a dual manner. Step S7: Based on the results of the dual evaluation, update the core parameters of all nodes on the entire path of the current inference node in the search tree. If the preset number of loops has not been reached, return to step S4; otherwise, output the optimal perturbation action instruction based on the access distribution of the root node of the search tree.

2. The single-cell fate planning method based on Monte Carlo tree search according to claim 1, characterized in that, In step S5, the process of automatically acquiring the disturbance action command in space includes: Extracting action embedding vectors from the action space; The action embedding vector is interacted with the single-cell latent state through an attention mechanism to obtain perturbation action instructions.

3. The single-cell fate planning method based on Monte Carlo tree search according to claim 2, characterized in that, The action space is a unified embedding vocabulary, and the action embedding vectors in the action space include: Chemical drugs: corresponding to preset small molecule compound IDs; Gene editing: Target gene ID corresponding to CRISPR a / i; Gene overexpression: corresponding to the target factor ID of the transgene or activation.

4. The single-cell fate planning method based on Monte Carlo tree search according to claim 1, characterized in that, The policy network is a multi-layer Transformer Encoder structure. After receiving the current potential state vector of a single cell, the policy network outputs a multi-dimensional posterior probability distribution vector of all perturbation actions in the action space. Each posterior probability distribution vector value corresponds to a child node in the search tree.

5. The single-cell fate planning method based on Monte Carlo tree search according to claim 1, characterized in that, In step S4, the process of selecting the node with the most exploration value from the child nodes of the current node includes: Input the current potential state vector of the single cell into the policy network to obtain the multidimensional posterior probability distribution vector of all perturbation actions. Each posterior probability distribution vector value corresponds to a child node in the search tree. The node access count of each node in the search tree is counted. The node access count includes the total access count of each parent node and the historical access count of each child node in the search tree. The exploration value is determined based on the posterior probability distribution vector value and the node access count. The higher the posterior probability distribution vector value, the higher the exploration value; the lower the node access count, the higher the exploration value.

6. The single-cell fate planning method based on Monte Carlo tree search according to claim 1, characterized in that, The value probe consists of two MLP layers, and the value network is a multilayer perceptron MLP structure. After obtaining the state vector of a single cell evolving over time in the potential space, the value probe predicts the expression level of key cell state markers, and based on the predicted expression level of key cell state markers, calculates a quantified immediate reward value according to the rules set by the biological objective, and outputs an immediate reward signal.

7. The single-cell fate planning method based on Monte Carlo tree search according to claim 1, characterized in that, In step S6, the process of dually evaluating the gain of the single cell's state vector evolving over time in the potential space based on value probes and value networks includes: The state vector of the single cell evolving over time in the potential space is input into the value probe to obtain an instant reward signal. The state vector of the single cell evolving over time in the potential space is input into the value network to obtain the long-term expected return for reaching the target cell state. The instantaneous reward signal and long-term expected return are standardized and aligned, and a comprehensive return value is obtained through weighted fusion calculation and output.

8. The single-cell fate planning method based on Monte Carlo tree search according to claim 1, characterized in that, If the target cell state requires short-term intervention, the output optimal perturbation action command is the optimal single-step perturbation action; if the target cell state requires nonlinear cell state transition, the output optimal perturbation action command is the optimal multi-step perturbation sequence.

9. A single-cell fate planning system based on Monte Carlo tree search, characterized in that, The system includes: The information receiving module is used to receive the initial cell state and the target cell state of a single cell; The state-aware module is used to construct a digital twin representation of cells. It uses a single-cell autoencoder to compress the whole-genome expression data of a single cell into a low-dimensional dense latent state vector, and constructs the digital twin representation of the cell as the state space of the planning process. The dynamics deduction module is used as an environment simulator for the system. Based on the current potential state vector of a single cell and the perturbation action command obtained automatically in space, it uses a single cell perturbation prediction model to predict the state vector of a single cell in the potential space over time. The intelligent decision-making module is equipped with a policy network and a value network, and combines the Monte Carlo tree search algorithm to select the node with the most exploration value from the child nodes of the current node as the target node for the current inference, and to doubly evaluate the gains of the state vector of a single cell in the potential space over time, and output the optimal perturbation action command.

10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When executed by a processor, the computer program implements the steps of the single-cell fate planning method based on Monte Carlo tree search as described in any one of claims 1-8.