A method and system for digital human program hosting using AI technology

By introducing semantic body temperature and Brownian motion diffusion mechanisms, combined with genetic variation and Jacobi null space projection operators, the rigidity problem of motion generation in AI digital human hosting systems is solved, achieving adaptive balance and enhanced realism in motion performance.

CN122067597BActive Publication Date: 2026-06-30JIANGSU BROADCASTING CORPORATION

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
JIANGSU BROADCASTING CORPORATION
Filing Date
2026-04-21
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing AI digital human hosting systems suffer from rigid mapping and monotonous expression during action generation. They lack semantic-level emotional tension and dynamic boundary adjustment capabilities, and cannot make physical-level dynamic scale adjustments between "absolute precision" and "vividness".

Method used

By introducing "semantic body temperature" as a continuous control variable throughout the entire process, the traditional discrete semantic-driven approach is transformed into an adjustable dynamic control mechanism. By utilizing the negative regulation of Brownian motion diffusion and genetic mutation probability, combined with the Jacobian null space projection operator and the expansion of the probability solution space, the adaptive balance of digital human actions is achieved.

Benefits of technology

It achieves an adaptive balance between stability and agility in the digital human's host movements, improves the continuity and realism of the movements, avoids movement distortion and clipping issues, and enhances the smoothness and visual quality of the generated movements.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122067597B_ABST
    Figure CN122067597B_ABST
Patent Text Reader

Abstract

This invention discloses a method and system for digital human hosting programs using AI technology, relating to the field of artificial intelligence technology. The method includes: firstly, parsing the program schedule to extract key semantics, and dynamically calculating semantic temperature by combining the attention weight gradient change rate and action information entropy density; subsequently, in the action expression stage, using semantic temperature to inversely adjust the Brownian diffusion coefficient; and strictly limiting the random differential increment within redundant degrees of freedom using the Jacobi null space projection operator to ensure the accuracy of core action intentions while endowing posture with agility. A latent space similarity drift evolution mechanism controlled by semantic temperature is established, and the high-frequency action probability space after elimination and alignment is jointly superimposed and normalized for expansion; finally, using an accelerometer integral cost function to filter out mechanical twitches and perform interpolation stitching. This completely overcomes the technical pain point of monotonous digital human expression, generating humanoid actions that balance physical accuracy and extremely high diversity.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of artificial intelligence technology, and in particular to a method and system for a digital human to host a program using AI technology. Background Technology

[0002] With the rapid development of artificial intelligence and computer graphics technologies, AI digital humans have been widely used in diverse scenarios such as news broadcasting, gala hosting, and live-streaming e-commerce. Currently, the mainstream motion generation technology for digital humans (Text-to-Animation) generally adopts a rigid mapping rule-driven logic based on "deterministic state machines" or "keyword-action".

[0003] However, this traditional approach has revealed significant limitations in practical applications:

[0004] First, digital human expression exhibits a severe sense of uniformity and mechanicality. Most existing technologies rely on fixed action dictionaries; when encountering the same or similar trigger words, digital humans repeatedly execute completely identical preset action trajectories. This rigid expression, lacking randomness and subtle variations, easily triggers the "uncanny valley effect" in viewers and causes visual fatigue, severely violating the objective law that real human communication is full of physical diversity.

[0005] Secondly, it lacks the ability to adjust emotional tension and dynamic boundaries at the semantic level. A truly excellent presenter exhibits a stark difference in the freedom and tension of their body language when reporting important, serious news and engaging in lighthearted interactive segments. Existing digital human systems cannot deeply perceive the "information weight" of context, nor can they dynamically adjust the physical scale between "absolute precision (rigor)" and "vivid transition (lively)." Summary of the Invention

[0006] In view of the aforementioned existing problems, the present invention is proposed.

[0007] Therefore, this invention provides a method for digital human hosting programs using AI technology to solve the technical problems of rigid mapping and monotonous expression in existing AI digital human hosting systems during action generation.

[0008] To solve the above-mentioned technical problems, the present invention provides the following technical solution:

[0009] In a first aspect, the present invention provides a method for a digital human to host a program using AI technology, comprising identifying key semantics in a program schedule containing process information and program content;

[0010] The key semantics are defined as digital genes that drive the actions and behaviors of digital humans, and each digital gene is assigned a semantic temperature value based on the importance of the key semantics.

[0011] In the action library, corresponding action expressions are matched according to the digital genes; when performing gene action expressions, the semantic body temperature triggering negatively correlated Brownian motion mechanism is introduced to define the solution space and wandering boundary of the digital genes in the virtual environment.

[0012] By performing fixed-round genetic and mutation evolution on the digital genes, a negative feedback control strategy for the semantic body temperature on the evolution mechanism is established, and the solution space of the alignment position is expanded by aligning the parent genes and the offspring genes.

[0013] In the expanded solution space, a set of candidate action expression sequences with a fluency higher than the preset value is randomly generated, and the actions in the sequence are connected and supplemented to control the hosting behavior of the digital human.

[0014] As a preferred embodiment of the method for digital human hosting programs using AI technology as described in this invention, the program schedule is input into a pre-trained semantic coding model and mapped as a high-dimensional temporal tensor sequence.

[0015] For each candidate semantic node in the tensor sequence, calculate the conditional probability that it will trigger a specific digital human action in the action intent space, which is used as the action information entropy density of the node.

[0016] Simultaneously, in the global context self-attention matrix, a mask perturbation is applied to the candidate semantic nodes, and the rate of change of attention weight gradient caused by the mask perturbation is calculated.

[0017] Candidate semantic nodes whose action information entropy density is less than a first preset threshold and whose gradient change rate is greater than a second preset threshold are extracted and clustered into the key semantics.

[0018] On the timeline, the time interval between adjacent key semantics is calculated, and the interval constraint is set: between key semantics where the time interval is greater than a preset interval, candidate semantic nodes are extracted so that the time interval satisfies the constraint;

[0019] In the process of extracting candidate semantic nodes, the second preset threshold is decreased by a preset step size of 2 and the first preset threshold is increased by a preset step size of 1, and the step size is changed alternately to achieve the extraction and completion of candidate semantic nodes.

[0020] The action intention space includes a multi-dimensional quantity space obtained by arranging the physical positions of each basic action intention of the digital human as coordinate nodes and using the similarity of the basic action intentions between nodes as the distance.

[0021] As a preferred embodiment of the method for digital human-hosted programs using AI technology described in this invention, the importance level includes, for each independent program segment, extracting the attention weight gradient change rate corresponding to candidate semantic node i. With action information entropy density ;

[0022] Adjust the preset weights according to the gradient magnitude. With the preset weights adjusted to be orders of magnitude inverse of entropy ;

[0023] A segmented transformation model is introduced to adjust the weights in different program segments; the adjusted weights are then used to calculate the gradient rate of change of the attention weights. With the entropy density of the action information The reciprocal of the terms is linearly weighted and summed to calculate the comprehensive importance coefficient of each candidate semantic node within an independent program segment;

[0024] The calculated comprehensive importance coefficient The positive mapping is the body temperature at which digital genes are activated. ;

[0025] The segmented conversion model includes: analyzing the distribution characteristics of the motion information entropy density within the current program segment using a spatial clustering algorithm, and extracting the number n of density centers in the motion intent space; based on the number n of density centers of the motion information entropy density, performing... Perform a transformation by a factor of n.

[0026] As a preferred embodiment of the method for digital human hosting programs using AI technology as described in this invention, the action expression includes generating a probability distribution of the action corresponding to each key semantic based on a pre-trained Bayesian algorithm, which is defined as the action probability solution space of the corresponding digital gene, where each solution represents an action.

[0027] A Brownian motion mechanism is introduced into the action probability solution space. For each solution in the action probability solution space, the probability corresponding to the solution is multiplied by a fixed order of magnitude, which is used as the base for solution jittering. The solution space is then jittered in a random direction to obtain the jittered solution space.

[0028] The Brownian motion mechanism is visualized as the stochastic differential increment of a standard Wiener process. ;

[0029] Constructing the Brownian diffusion coefficient Regarding the semantic body temperature Negative correlation function: ;

[0030] in, As an adjustment constant, It is a constant. Body temperature at which digital genes are activated;

[0031] For each independent solution in the action probability solution space, calculate the kinematic Jacobian matrix for the current solution. and its pseudo-inverse matrix ;Utilizing the Jacobi null space projection operator By restricting the Brownian motion increment to redundant degrees of freedom that do not affect the core intent, the action differential step size defining the wander boundary is generated:

[0032]

[0033] Where I is the identity matrix, representing all degrees of freedom.

[0034] As a preferred embodiment of the method for digital human hosting programs using AI technology as described in this invention, the genetic and mutation evolution includes mapping the key semantics in a pre-trained semantic coding model to generate a high-dimensional continuous vector, which is defined as the underlying semantic feature string of the corresponding digital gene.

[0035] Offspring gene sequences are generated using the same parent gene sequence; wherein, each gene in the sequence is mutated by performing similarity shifting on the underlying semantic feature string in the semantic vector space.

[0036] The negative feedback control strategy includes constructing a negative correlation control equation between the drift probability and the semantic body temperature, and a negative correlation control equation between the drift distance and the semantic body temperature; in the gene mutation determination stage, based on the drift probability, similarity drift is triggered on the underlying semantic feature string of the current digital gene.

[0037] When similarity drift is triggered, a completely random high-dimensional direction vector is generated in the semantic vector space, and combined with the calculated drift distance, the spatial position of the initial underlying semantic feature string is updated and reconstructed to generate the mutated offspring semantic feature string.

[0038] After decoding the semantic feature string of the mutated offspring, the key semantics corresponding to the mutated gene are obtained.

[0039] As a preferred embodiment of the method for digital human hosting programs using AI technology as described in this invention, the step of expanding the solution space of the alignment position includes summarizing all gene sequences before and after the mutation, and integrating all genes in the sequence according to the gene position relationship in the gene sequence, matching and integrating genes located at the same gene position one by one to obtain multiple gene expressions at each gene position.

[0040] The solution space of all action probabilities corresponding to each gene position is released, and the probability distributions of the solution space are superimposed and normalized to obtain the final solution space for each gene position.

[0041] As a preferred embodiment of the method for digital human hosting programs using AI technology as described in this invention, wherein: in the final solution space of each gene position, the solution of each gene position is randomly selected according to the probability corresponding to the action;

[0042] The discrete action expressions extracted from each gene location are reproduced according to their temporal position on the program schedule timeline to construct an initial timeline action sequence;

[0043] Identify the action gaps between adjacent discrete action expressions in the timeline action sequence, extract the last frame spatial pose of the preceding action and the first frame spatial pose of the subsequent action, and use a spatial interpolation algorithm to complete the action between the last frame and the first frame to generate a continuous and complete action sequence on the timeline.

[0044] The smoothness is calculated using the jerk integral cost function method, and is used as a measure of the smoothness.

[0045] Secondly, the present invention provides a system for a digital human to host a program using AI technology, comprising an identification unit that identifies key semantics in a program schedule containing process information and program content.

[0046] The definition unit defines the key semantics as digital genes that drive the digital human's actions and behaviors, and assigns semantic temperature values ​​to each digital gene based on the importance of the key semantics.

[0047] The computing unit matches the corresponding action expression to the digital gene in the action library; when performing the action expression of the gene, the semantic body temperature triggering negatively correlated Brownian motion mechanism is introduced to define the expression space and wandering boundary of the digital gene in the virtual environment.

[0048] The evolutionary unit establishes a negative feedback control strategy for the semantic body temperature on the evolutionary mechanism by performing fixed rounds of genetic and mutation evolution on the digital genes, and expands the expression space of the aligned positions by aligning the parent genes and offspring genes.

[0049] The output unit, within the expanded expression space, randomly generates a set of candidate action expression sequences with fluency higher than a preset value, and then connects and supplements the actions in the sequence to control the hosting behavior of the digital human.

[0050] Thirdly, the present invention provides a computer device including a memory and a processor, wherein the memory stores a computer program, wherein: when the computer program is executed by the processor, it implements any step of the method for hosting a program by a digital human using AI technology as described in the first aspect of the present invention.

[0051] Fourthly, the present invention provides a computer-readable storage medium having a computer program stored thereon, wherein: when the computer program is executed by a processor, it implements any step of the method for a digital human hosting a program using AI technology as described in the first aspect of the present invention.

[0052] The beneficial effects of this invention are as follows: By introducing "semantic body temperature" as a continuous control variable throughout the entire process, this invention transforms the traditional discrete semantic-driven approach into an adjustable dynamic control mechanism, achieving an adaptive balance between stability and flexibility in digital human-led actions. Compared to existing rule-based or template-based methods, this invention, through the negative regulation of Brownian motion diffusion intensity and genetic mutation probability by semantic body temperature, ensures that key semantics maintain the precision and restraint of action expression in high-importance stages, while releasing the natural mutation ability of actions in low-importance stages, thereby significantly improving the continuity and realism of action performance. Simultaneously, by introducing the Jacobi null space projection operator, random perturbations are strictly limited to redundant degrees of freedom that do not affect the core action constraints, effectively avoiding action distortion and clipping problems. Furthermore, this invention combines probabilistic solution space expansion with a smoothness screening mechanism based on accelerometers to automatically select the optimal expression path from multiple candidate action sequences, further improving the smoothness and visual quality of action generation. Overall, this invention constructs a digital human action generation method that combines semantic consistency, physical rationality, and dynamic flexibility, showing promising application prospects. Attached Figure Description

[0053] To more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the following description of the embodiments will be briefly introduced. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0054] Figure 1 A flowchart illustrating the method of using AI technology to enable digital humans to host programs.

[0055] Figure 2 A flowchart illustrating the gene mutation mechanism of a method for digital humans hosting programs using AI technology.

[0056] Figure 3 A computer device diagram illustrating a method for digital humans to host programs using AI technology. Detailed Implementation

[0057] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, the specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

[0058] Many specific details are set forth in the following description in order to provide a full understanding of the invention. However, the invention may also be practiced in other ways different from those described herein, and those skilled in the art can make similar extensions without departing from the spirit of the invention. Therefore, the invention is not limited to the specific embodiments disclosed below.

[0059] Secondly, the term "one embodiment" or "embodiment" as used herein refers to a specific feature, structure, or characteristic that may be included in at least one implementation of the present invention. The phrase "in one embodiment" appearing in different places in this specification does not necessarily refer to the same embodiment, nor is it a single or selective embodiment that is mutually exclusive with other embodiments.

[0060] Reference Figures 1-3 As one embodiment of the present invention, this embodiment provides a method for a digital human to host a program using AI technology, comprising the following steps:

[0061] S1: Identify the key semantics in the program list based on the program list containing process information and program content.

[0062] The program list is input into a pre-trained semantic coding model and mapped to a high-dimensional temporal tensor sequence. The pre-trained semantic coding model can be any of the following: a semantic coding network based on a Transformer structure, a bidirectional context coding network, or a temporal semantic coding network with positional encoding. Local nodes, segments, or subsequences in the high-dimensional temporal tensor sequence correspond to different semantic units. Subsequent candidate semantic nodes, key semantics, and digital genes are all identified and extracted based on their corresponding local representations in the high-dimensional temporal tensor sequence; they are represented as a high-dimensional continuous vector.

[0063] The training samples for the pre-trained semantic encoding model can consist of historical program schedules, host scripts, program flow charts, director prompts, scene control instructions, and corresponding manually annotated semantic tags. Each training sample includes at least: program segment text, its position in the flow, surrounding context, time sequence information, and corresponding semantic category or semantic importance annotations. To enable the model to jointly understand the flow semantics and contextual relationships in the program schedule, the original program data can be segmented into segments. Continuous program text can be divided into multiple semantic units according to host segments, time nodes, or semantic pauses. Then, a temporal association between each semantic unit and its upstream and downstream segments can be established to form serialized training samples. The training set can be generated by "cleaning the original program data". The process is completed through a series of steps: semantic unit segmentation, flow alignment, label construction, and sample augmentation. Label construction involves manual or semi-automatic annotation of program segments based on their segment type, semantic role, action triggering tendency, emotional tendency, and criticality. Sample augmentation includes operations such as synonym rewriting, flow rearrangement, partial masking, and noise insertion to improve the model's robustness to different expressions and program structures. Based on this, the samples can be divided into training, validation, and test sets according to a preset ratio, for example, 70% as the training set, 15% as the validation set, and 15% as the test set. This allows the training set to learn model parameters, the validation set to adjust hyperparameters and prevent overfitting, and the test set to evaluate the model's generalization ability for the program schedule semantic encoding task.

[0064] For each candidate semantic node in the tensor sequence The conditional probability of a node triggering a specific digital human action in the action intent mapping network is calculated, and the action information entropy density of that node is calculated accordingly. Where y represents the action category, and P represents the conditional probability.

[0065] It's important to understand that in this embodiment, actions are not limited to the digital human's limb movements but also include facial expressions, making the expressions more closely resemble the natural performance of a real host. Limb movements include rotation and displacement of the arms, fingers, torso, and head, used to convey gestures, postures, and spatial orientation information; facial expressions include subtle changes in the eyes, eyebrows, lips, and facial muscles, used to convey emotions, tone, and reactions. Incorporating both types of actions into the action library and generation process ensures that the digital human not only moves richly and smoothly during program hosting but also possesses natural and coherent emotional expression, thereby enhancing overall hosting performance and audience immersion. In specific implementation, both limb movements and facial expressions are driven by digital genes (described later) to achieve synchronous generation and coordinated control within redundant degrees of freedom, ensuring consistency and natural transition between actions and expressions in time and semantics.

[0066] The conditional probability The training samples are derived from semantic-action pairing data, and are constructed by using semantic units segmented from the program schedule or hosting script as input samples. The system also labels or automatically aligns the corresponding digital human actions for each semantic unit during real or simulated execution, resulting in action category labels. This forms a training pair that establishes a semantic-action correspondence. The action categories can be obtained by classifying and labeling historical digital human execution records, manually demonstrated action data, or standard actions in an action library, or by extracting them from videos or skeletal sequences using an action recognition model. Furthermore, to improve the stability of conditional probability estimation, multiple action instances corresponding to the same semantic unit in different contexts, scenarios, or hosting styles can be statistically summarized to construct a frequency distribution or weight distribution, thereby obtaining the empirical probability distribution of the semantic unit triggering various actions, which serves as the target for model training. During training, by minimizing the cross-entropy loss between the predicted and true distributions, the model learns to perform actions within a given semantic representation. Under the given conditions, output the probability distribution of action categories, ultimately achieving [the goal of] [the ability to] [control] action categories. The estimate.

[0067] Simultaneously, in the global context self-attention matrix, a mask perturbation is applied to the candidate semantic nodes, and the rate of change of the attention weight gradient caused by the mask perturbation is calculated.

[0068] Furthermore, the global contextual self-attention matrix (GFALM) refers to the weight allocation matrix calculated globally by the pre-trained semantic coding model when encoding the program schedule to characterize the strength of the association between any semantic unit and all other semantic units. Essentially, it reflects how much attention a given position in the program schedule should pay to other positions during semantic understanding. For example, when the program schedule contains semantic segments such as "award announcement," "guest introduction," and "commercial break," these segments are not isolated. The model dynamically determines which segments are more crucial for understanding the current segment by combining the sequence of events before and after them, the contextual semantics, and the global flow relationship. This degree of cruciality is reflected in the element values ​​of the GFALM. Each element in the matrix typically represents the attention weight of the i-th semantic node to the j-th semantic node. Therefore, the matrix can express the global coupling relationship between all semantic units within the program schedule. If a candidate semantic node has a strong influence on multiple subsequent nodes in the GFALM, or if the entire attention distribution changes significantly after the node is occluded, it indicates that this node has strong global control significance. Based on this, we can further combine indicators such as action information entropy to identify it as a key semantic element, rather than an ordinary semantic component. Therefore, this matrix plays the role of a "global semantic traction probe" in your method, used to measure the degree of influence of a candidate semantic node on the overall semantic structure of the program.

[0069] Specifically, the text content, flow fields, and time sequence information of the program schedule are usually first input into a pre-trained semantic encoding model, and each semantic unit is encoded into a corresponding high-dimensional vector representation. Then, the model generates a query vector, key vector, and value vector for each semantic unit, and calculates the similarity score between the query vector and the key vector of other semantic units one by one. After scaling and normalization, these scores form the attention weight distribution of the current semantic unit to all other semantic units. The weight distribution of all semantic units is summarized to form the global context self-attention matrix.

[0070] Candidate semantic nodes whose action information entropy density is less than a first preset threshold and whose gradient change rate is greater than a second preset threshold are extracted and clustered into the key semantics. Here, a low action information entropy density means that the appearance of this word (node) points very definitively to one or more specific actions (highly concentrated energy, not divergent). A high gradient change rate means that if this word is masked (masking perturbation), the contextual attention weights of the entire sentence will oscillate dramatically, proving that it is a connecting "structural skeleton".

[0071] To address the issue of discontinuous action-driven signals caused by uneven distribution of key semantics along the timeline, this method addresses the problem of sparse distribution of action-driven signals due to the uneven distribution of key semantics across time segments. Since key semantics are obtained through information entropy and attention gradient filtering, they inherently tend to capture nodes with strong directional influence on the global semantic structure. Therefore, in some program segments, they may exhibit a sparse distribution with excessively large intervals, resulting in a lack of effective action-driven basis within the corresponding time periods. This leads to pauses, abrupt transitions, or rhythmic imbalances in the digital human's actions. On the timeline, the time interval between adjacent key semantics is calculated, and a constraint is set for this interval: between key semantics with time intervals greater than a preset interval, candidate semantic nodes are extracted to ensure the time interval meets the constraint. This mechanism introduces necessary intermediate semantic support nodes without disrupting the original key semantic structure, providing a stable input sequence for subsequent action generation. This results in a smoother evolution of digital human actions over time and effectively avoids discontinuous or unnatural jumps in action expression caused by semantic sparsity, thereby improving the overall coherence and viewing quality of the hosting behavior.

[0072] In the process of extracting candidate semantic nodes, the second preset threshold is decreased by a preset step size of 2 and the first preset threshold is increased by a preset step size of 1, and the step size is changed alternately to complete the extraction of candidate semantic nodes. The phrase "alternately changing the step size" can be understood as: not relaxing both screening conditions at the same time, nor relaxing only one of them, but adjusting the two thresholds in turn in a sequential manner, so that the process of completing candidate semantic nodes presents a gradual and controllable expansion.

[0073] If two thresholds are significantly relaxed simultaneously, it can easily introduce a large number of ordinary semantic nodes that are not critical enough, leading to over-completeness and causing the key semantic set to lose its sparse and high-value characteristics. By adjusting them alternately, one type of core condition can be retained as much as possible first, and then the other type of condition can be gradually relaxed, thus achieving a hierarchical and gradual completion mechanism. This can both fill the gaps in key semantics on the timeline and avoid the disorderly expansion of candidate semantic nodes, ensuring that the completed key semantic sequence still has strong semantic representativeness and action-driven capabilities.

[0074] The action intent space includes a multi-dimensional space where each basic action intent of the digital human is used as a coordinate node, and the physical location is arranged according to the similarity of the basic action intents between nodes. This results in a space where the spatial distance between intent nodes with similar action performances is shorter, and the spatial distance between intent nodes with greater differences in action performances is greater. When a candidate semantic node is mapped to this action intent space, if its conditional probability distribution is highly concentrated in a local neighborhood in terms of spatial distance, then the certainty of triggering the action is high, and the output action information entropy density is smaller. If the conditional probability distribution is diffuse across multiple distant nodes in terms of spatial distance, then the output action information entropy density is larger.

[0075] S2: Define the key semantics as digital genes that drive the actions and behaviors of the digital human, and assign semantic temperature values ​​to each digital gene based on the importance of the key semantics.

[0076] Furthermore, the importance level includes, for each independent program segment, extracting the gradient change rate of the attention weight corresponding to the candidate semantic node i. With action information entropy density .

[0077] Adjust the preset weights according to the gradient magnitude. With the preset weights adjusted to be orders of magnitude inverse of entropy The weights are used to balance the numerical differences between the gradient change rate and the reciprocal of the action information entropy density in the original calculation.

[0078] A segmented transformation model is introduced to adjust the weights in different program segments; the adjusted weights are then used to calculate the gradient rate of change of the attention weights. With the entropy density of the action information The reciprocal of the terms is linearly weighted and summed to calculate the overall importance coefficient of each candidate semantic node within an independent program segment.

[0079] The calculated comprehensive importance coefficient The positive mapping is the body temperature at which digital genes are activated. .

[0080] It should be noted that, since different program segments may not revolve around a single core content in the action intention space, but rather contain multiple parallel or alternating core content simultaneously, if a fixed inverse entropy weight is still used... Unified calculation can easily lead to the weighting of multiple core semantics, resulting in an average weakening of the importance of each core node. This further manifests as an overall low semantic temperature during subsequent semantic temperature assignment, a phenomenon known as "hyperthermia." Once hypothermia occurs, key semantics that should maintain high stability and control strength are misjudged as freely diffusible and highly variable ordinary semantics during the action generation stage, thus weakening the digital human's ability to accurately express multiple core contents. To address this, this invention uses a spatial clustering algorithm to analyze the distribution structure of action information entropy density within the action intent space of the current program segment, extracting the number of density centers *n* to represent the actual number of core contents in the current segment, and based on this, assigning... By performing an n-fold transformation, the inverse of the entropy term is adaptively enhanced in multi-core scenarios. This avoids the collective deweighting of core semantics when multiple content cores coexist, ensuring that each core semantic remains sufficiently important and semantically relevant. As a result, digital humans can maintain the accuracy, hierarchy, and stability of their actions in program segments with multiple themes, transitions, or key points.

[0081] S3: In the action library, match the corresponding action expression according to the digital gene; when performing the action expression of the gene, introduce the semantic body temperature trigger negative correlation Brownian motion mechanism to define the solution space and wander boundary of the digital gene in the virtual environment.

[0082] Specifically, the action representation elevates the discrete mapping between key semantics and specific actions into a probability distribution model with uncertain expressive capabilities, thus providing a foundation for subsequent action mutation, diffusion control, and multi-solution selection. This includes generating a probability distribution for each action corresponding to a key semantic based on a pre-trained Bayesian algorithm, defined as the action probability solution space for the corresponding digital gene, where each solution represents an action. This effectively solves the action rigidity problem caused by the "one-to-one correspondence between semantics and actions" in traditional methods, enabling the system to possess diverse expressive capabilities while maintaining semantic consistency. Simultaneously, this probability solution space provides an operational mathematical basis for subsequently introducing Brownian motion perturbations, semantic thermoregulation, and genetic mutation evolution, allowing different candidate actions to undergo random walks and competitive selection under controlled conditions. Furthermore, through a probability normalization mechanism, a unified evaluation scale can be established among multiple candidate actions, providing a basis for the generation and optimization of the final action sequence, thereby achieving a balance between the flexibility, rationality, and controllability of action representation.

[0083] It should be noted that the training samples for "generating the probability distribution of each key semantic corresponding to an action based on a pre-trained Bayesian algorithm" can consist of paired data of "key semantics and action expressions". Specifically, raw data can be extracted from historical program schedules, hosting scripts, director's instructions, digital human execution logs, action library call records, and manually recorded hosting demonstration videos. First, the program text is segmented according to hosting segments, semantic pauses, or time nodes to obtain multiple semantic units. Then, combined with manual annotation or automatic alignment results, each semantic unit is matched with its corresponding action category, action pose, or action segment in actual execution, thereby forming sample pairs. ,in A semantic representation representing key or candidate semantics. This represents the corresponding action label, action category, or action parameter cluster. To improve the reliability of probability distribution learning, multiple action instances corresponding to the same key semantic in different program scenarios, contexts, and hosting styles can be summarized and statistically analyzed to construct an empirical distribution of different actions triggered by the key semantic, which can then be used as the prior or supervision target of the Bayesian model. The training set can be generated according to the process of "cleaning the original program data - semantic unit segmentation - action alignment - action clustering or classification labeling - sample denoising - sample enhancement". Action alignment can be achieved through timestamp synchronization, subtitle alignment, skeletal frame alignment, or manual verification. Action clustering can group similar poses or similar action segments into the same action category to reduce sample sparsity. Sample enhancement can include semantic rewriting, synonym replacement, slight action perturbation, context splicing, etc., to improve the robustness of the model to different expression forms. After the sample construction is completed, it can be divided into training set, validation set and test set according to the preset ratio. For example, 70% is used as training set to learn the mapping relationship from key semantics to action distribution, 15% is used as validation set to adjust prior parameters, smoothing coefficient and model structure, and 15% is used as test set to evaluate the consistency between the generated action probability distribution and the real action distribution, so as to obtain a pre-trained Bayesian model that can be used to output the action probability distribution corresponding to each key semantic.

[0084] A Brownian motion mechanism is introduced into the action probability solution space. For each solution in the action probability solution space, the probability corresponding to the solution is multiplied by a fixed magnitude, which serves as the base for solution jittering. This jittering is then performed in random directions to obtain the jittered solution space. The fixed magnitude is a preset value related to the accuracy of the probability; 10 to the power of u ensures that the last digit of the accuracy is converted to a single digit. Each solution in the action probability solution space is not a discrete label but corresponds to a specific location point in the action space, which can be represented by an action latent space vector. Therefore, jittering each solution essentially involves continuously updating the coordinates of that location point in the solution space with tiny displacements, thereby causing fine-grained changes in the action in space.

[0085] It should be noted that each action solution in this invention can be represented as a definite position point in the solution space using digital human joint angle vectors, skeletal pose parameters, or action latent space coordinates. Therefore, the so-called jitter is essentially a small-step numerical update of this position point, rather than an abstract conceptual jump. The random perturbation term can be approximated by generating Wiener process increments using standard Gaussian random sampling, and the diffusion coefficient can be calculated in real time from semantic body temperature using a simple function. Both of these are mature and stable calculation methods in existing numerical simulations. At the same time, by constraining the perturbation direction using the Jacobian matrix and its null projection operator, the jitter will only act within redundant degrees of freedom that do not affect the core action intent. Therefore, it can form continuous small displacements in spatial position without destroying the main action structure. Finally, the jitter process can be directly implemented in existing animation engines or character-driven systems by performing position updates frame by frame.

[0086] The Brownian motion mechanism is visualized as the stochastic differential increment of a standard Wiener process. .

[0087] Constructing the Brownian diffusion coefficient Regarding the semantic body temperature Negative correlation function: .

[0088] in, As an adjustment constant, It is a constant. The body temperature at which digital genes are activated.

[0089] For each independent solution in the action probability solution space, calculate the kinematic Jacobian matrix for the current solution. and its pseudo-inverse matrix ;Utilizing the Jacobi null space projection operator By restricting the Brownian motion increment to redundant degrees of freedom that do not affect the core intent, the action differential step size defining the wander boundary is generated:

[0090]

[0091] Where I is the identity matrix, representing all degrees of freedom. This indicates the portion of degrees of freedom that will affect the main task (current solution).

[0092] Maintaining the semantic body temperature during gene expression. The constant temperature ensures that the wandering boundary of the high-temperature gene always contracts to guarantee absolute precision of the active action, while the wandering boundary of the low-temperature gene always expands to give the posture flexibility. When the semantic body temperature is in a high-temperature locked state, the Brownian diffusion coefficient approaches zero, suppressing probability drift and forcing the action probability solution space to converge extremely to the extreme peak of the probability distribution, so as to output a unique and precise active action. When the semantic body temperature decreases, the suppression is released and Brownian diffusion is stimulated, causing the peak of the probability distribution to undergo random walk and topological expansion in the solution space, thereby expanding the sampling boundary of the action probability solution space and giving the digital human highly free and statistically consistent vivid body expression.

[0093] S4: By subjecting the digital genes to fixed-round genetic and mutational evolution, a negative feedback control strategy for the semantic body temperature evolutionary mechanism is established. The solution space of the alignment positions is expanded by aligning parent and offspring genes. For example... Figure 2 As shown.

[0094] The genetic and mutation evolution includes mapping the key semantics in a pre-trained semantic coding model to generate a high-dimensional continuous vector, which is defined as the underlying semantic feature string of the corresponding digital gene.

[0095] Offspring gene sequences are generated from the same parent gene sequence (genes arranged in the order of the program schedule). Each gene in the sequence undergoes similarity drift within the semantic vector space, resulting in digital gene mutation. This is called "similarity drift" because the process does not involve re-performing complex discrete semantic matching or full-text semantic comparison. Instead, it directly performs small, continuous spatial displacements of the original semantic vector within the already encoded high-dimensional semantic vector space. Since the pre-trained semantic encoding model maps semantically similar content to regions that are close to each other, controlling the drift direction and magnitude within this vector space naturally maintains high similarity between the newly generated vector and the original semantic vector. In other words, traditional semantic similarity calculations typically require comparing, scoring, and ranking multiple candidate texts one by one. This invention, by directly manipulating the positional changes of the vector itself, transforms the generation of similar semantics into the construction of similar spatial positions, thus bypassing the repetitive explicit similarity calculation process. In other words, similarity no longer needs to be recalculated with each mutation, but is implicitly reflected in the geometric distance and drift range of the vector space, hence the term "similarity drift". This naming emphasizes both the preservation of the original semantic similarity during the mutation process and the technical feature of directly completing semantic proximity transfer through vector displacement at the implementation level. This effectively simplifies the calculation process, improves generation efficiency, and ensures that the semantics of the mutated offspring still fall within a reasonable neighborhood of the original semantics.

[0096] The negative feedback control strategy includes constructing drift probabilities respectively. With the semantic body temperature The negative correlation control equations and drift distance With the semantic body temperature The negative correlation control equation:

[0097]

[0098]

[0099] in, It is a non-negative evolution constant. To prevent extremely small values ​​with a denominator of zero, during the gene mutation determination stage, the calculated drift probability is used. The system determines whether to trigger similarity drift on the underlying semantic feature string of the current digital gene; when similarity drift is triggered, a completely random high-dimensional direction vector is generated in the semantic vector space. And combined with the calculated drift distance For the initial low-level semantic feature string The spatial location is updated and reconstructed to generate the mutated offspring semantic feature string. :

[0100]

[0101] By utilizing the negative feedback control strategy, digital genes with higher semantic body temperature T are forced to have a lower probability of triggering similarity drift and a shorter drift distance, thereby strictly locking the core host intent at the semantic source; digital genes with lower semantic body temperature T have a higher probability of triggering similarity drift and a longer drift distance with random direction, thereby exploring and deriving rich and semantically related secondary action intents in the semantic vector space.

[0102] After decoding the semantic feature string of the mutated offspring, the key semantics corresponding to the mutated gene are obtained. Then, the solution space is constructed and perturbed using the mutated gene to obtain the solution space corresponding to each gene in the mutated gene sequence.

[0103] In essence, by controlling the offset of underlying semantic features in the semantic vector space, the original semantics can generate multiple variant expressions that maintain semantic relevance within their neighborhood, thus providing a multi-source input basis for the construction of the subsequent action probability space. Simultaneously, by constructing a negative feedback mechanism related to semantic temperature, the trigger probability and magnitude of semantic drift are constrained, ensuring that high-importance semantics maintain stable expression and low-importance semantics possess greater exploratory capabilities, thereby achieving a balance between "core locking" and "edge expansion" of semantic expression overall. Furthermore, this mechanism aims to replace the traditional matching process based on explicit similarity calculation by directly manipulating the positional changes of semantic vectors, thereby reducing computational complexity, improving generation efficiency, and ensuring that the expanded semantics remain within a reasonable neighborhood of the original semantics. Finally, by decoding the mutated semantics and participating in the construction of the solution space, the action generation process can obtain richer candidate expression sources under semantic consistency constraints, thus providing a more sufficient foundation for subsequent action selection and optimization.

[0104] Furthermore, expanding the solution space of the alignment positions includes summarizing all gene sequences before and after the mutation, and then, according to the gene position relationships in the gene sequence, matching and integrating genes located at the same gene position one by one to obtain multiple gene expressions for each gene position. The solution space of all action probabilities corresponding to each gene position is released, and the probability distributions of the solution spaces are superimposed and normalized to obtain the final solution space for each gene position.

[0105] Specifically, after the preceding genetic and variational evolution is completed, there is no longer only one original paternal gene sequence, but multiple progeny gene sequences. Since the gene positions of these sequences are aligned in the program schedule—for example, the first position corresponds to opening semantics, the second position to transition semantics, and the third position to emphasis semantics—there are actually multiple "gene expression results" from different sources for the same gene position. Each gene expression result corresponds to an action probability solution space, that is, a set of action candidates and their probability distribution.

[0106] The term "release" can be understood as: no longer keeping these action probability solution spaces separately and closed within their respective parent or offspring paths, but instead opening up and aggregating multiple action probability solution spaces at the same gene location into a single common selection layer. This way, when the system samples actions subsequently, it is not limited to selecting actions from the local solution space of a single sequence, but can comprehensively consider the action possibilities brought about by all genetic and mutation paths at that location. The next step, "normalizing the probability distribution of the solution space after superposition," involves unifying and merging these action candidate probabilities from different sources. For example, suppose that at the... At a given gene location, the parent gene provides a solution space for action probabilities: action A has a probability of 0.6, and action B has a probability of 0.4. A certain offspring gene provides a solution space for action probabilities: action A has a probability of 0.2, and action C has a probability of 0.8. Another offspring gene provides a solution space for action probabilities: action B has a probability of 0.5, and action C has a probability of 0.5. The probabilities of actions at the same location are then summed according to action category. For example, action A becomes 0.6 + 0.2 = 0.8, action B becomes 0.4 + 0.5 = 0.9, and action C becomes 0.8 + 0.5 = 1.3. The summed result is then normalized so that the sum of the probabilities of all candidate actions returns to 1, ultimately forming a unified action probability solution space for that gene location. This final solution space no longer depends on a single gene sequence but represents the "overall action expression probability" after integrating all genetic and variation results at that location.

[0107] Without this step, each parent or offspring sequence can only independently maintain its own range of action choices, and subsequent action construction will still "follow a single path to the end," failing to truly leverage the diversity advantages brought by genetic evolution. However, through solution space release and probability superposition normalization, the system can unify and merge action expressions generated by multiple semantic variants into a richer, more stable, and statistically more representative candidate space at the same time point. This preserves the core action tendencies of the parent semantics while absorbing new action possibilities brought about by the semantic variations of the offspring, thus providing a more abundant source of actions for subsequent random solution selection, action completion, and smooth screening.

[0108] S5: In the expanded solution space, a set of candidate action expression sequences with a fluency higher than the preset value is randomly generated, and the actions in the sequence are connected and supplemented to control the hosting behavior of the digital human.

[0109] In the final solution space for each gene position, a solution is randomly selected based on the probability corresponding to the action. The discrete action expressions extracted from each gene position are then reproduced according to their temporal position on the program schedule timeline to construct an initial timeline action sequence. It's worth noting that by randomly selecting solutions at each gene position based on probability, a moderate degree of uncertainty can be introduced while maintaining semantic consistency. This ensures that action selection conforms to statistical regularities while avoiding repetition and rigidity caused by fixed paths. Simultaneously, reproducing the discrete actions at each gene position according to the time order of the program schedule establishes a one-to-one correspondence between actions and semantic flow, ensuring that action generation strictly adheres to the time constraints of the program structure, thereby guaranteeing the rhythm and coherence of action expressions within the overall flow. Essentially, this process establishes a mapping between a "multiple solution space" and a "single execution path," compressing multiple candidate sources into an initial action sequence that conforms to temporal logic, providing the basic input for subsequent action completion and smoothing optimization.

[0110] To address the unavoidable temporal discontinuities and pose anomalies inherent in discrete action splicing, this method aims to transform a timeline sequence composed of multiple independent action fragments into a continuous, executable overall action representation. It's important to note that because the actions at each gene position in the preceding steps are selected independently based on probability, they only satisfy semantic correspondence on the time axis, but are not naturally continuous in spatial pose and motion trajectory. Direct splicing can easily lead to abrupt changes, stuttering, or unnatural jumps. By identifying the blank intervals between adjacent actions and using the last frame and the first frame as boundary conditions for interpolation completion, a continuous transition path can be constructed while maintaining the original action semantics, thereby eliminating the visual discomfort caused by action breaks.

[0111] Identify the action gaps between adjacent discrete action expressions in the timeline action sequence, extract the spatial pose of the last frame of the action preceding the action gap and the spatial pose of the first frame of the action following the action gap, and use a spatial interpolation algorithm to complete the action between the last frame and the first frame to generate a continuous and complete action sequence on the timeline.

[0112] Firstly, each discrete action in a timeline motion sequence can be represented as a timestamped skeletal pose sequence, and the "blank spaces" between adjacent actions can be directly detected via the timeline. Simultaneously, each action segment typically contains complete spatial pose information for both the first and last frames, including joint angles, positions, and attitude parameters. This allows for accurate extraction of the last frame pose of a preceding action and the first frame pose of a subsequent action as interpolation boundary conditions. Secondly, given the poses at both ends, existing spatial interpolation methods (such as linear interpolation, spline interpolation, or spherical linear interpolation) can generate intermediate transition frames in joint or pose space. These methods are widely used in 3D animation, character-driven animation, and robot motion planning, ensuring the continuity and stability of the interpolation process. Furthermore, the interpolation process involves only numerical operations on vectors or matrices, allowing for efficient execution under a frame-by-frame update mechanism and direct embedding into existing rendering or control pipelines. By extracting endpoints from the action blank spaces and performing spatial interpolation, smooth transitions between action segments can be achieved without altering the original action semantics, thus enabling the stable generation of continuous and complete timeline motion sequences in engineering practice.

[0113] Because the preceding action generation process involves random perturbations, solution space sampling, and discrete action splicing, it is easy to introduce high-frequency discontinuous changes or implicit jitters in the time dimension. Constraints at the position or velocity level alone are insufficient to fully reflect whether the action conforms to the characteristics of real biological movement. By calculating the third derivative (jerk) of each skeletal joint angle position with respect to time and performing square integration over the entire time axis, the severity of changes in the action trajectory can be quantified, thus transforming "smoothness" into a calculable continuous index. Therefore, the smoothness is calculated using the jerk integral cost function method (calculating the third derivative of each skeletal joint angle position with respect to time and performing square integration over the entire time axis) as a measure of fluency. Priority is given to retaining action sequences with gentle changes and natural transitions, suppressing abrupt behaviors caused by randomness or splicing, making the final output action more coherent, stable, and consistent with the dynamic characteristics of human movement, while providing a unified selection criterion among multiple candidate action sequences.

[0114] This embodiment also provides a system for digital human-hosted programs using AI technology, including:

[0115] The identification unit identifies key semantics in the program list, which contains process information and program content.

[0116] The definition unit defines the key semantics as digital genes that drive the actions and behaviors of the digital human, and assigns semantic temperature values ​​to each digital gene based on the importance of the key semantics.

[0117] The computing unit matches the corresponding action expression to the digital gene in the action library; when performing the action expression of the gene, the semantic body temperature triggering negatively correlated Brownian motion mechanism is introduced to define the expression space and wandering boundary of the digital gene in the virtual environment.

[0118] The evolutionary unit establishes a negative feedback control strategy for the semantic body temperature on the evolutionary mechanism by performing fixed rounds of genetic and mutation evolution on the digital genes, and expands the expression space of the aligned positions by aligning the parent genes and offspring genes.

[0119] The output unit, within the expanded expression space, randomly generates a set of candidate action expression sequences with fluency higher than a preset value, and then connects and supplements the actions in the sequence to control the hosting behavior of the digital human.

[0120] This embodiment also provides a computer device applicable to the method of a digital human hosting a program using AI technology, comprising: a memory and a processor; the memory is used to store computer-executable instructions, and the processor is used to execute the computer-executable instructions to implement the method of a digital human hosting a program using AI technology as proposed in the above embodiment.

[0121] The computer device can be a terminal, comprising a processor, memory, communication interface, display screen, and input devices connected via a system bus. The processor provides computing and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system and computer programs. The internal memory provides an environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The communication interface is used for wired or wireless communication with external terminals; wireless communication can be achieved through Wi-Fi, carrier networks, NFC (Near Field Communication), or other technologies. The display screen can be an LCD screen or an e-ink screen. The input devices can be a touch layer covering the display screen, buttons, a trackball, or a touchpad on the computer device's casing, or an external keyboard, touchpad, or mouse.

[0122] This embodiment also provides a storage medium storing a computer program that, when executed by a processor, implements the method for implementing a digital human hosting a program using AI technology as proposed in the above embodiments. The storage medium can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read Only Memory (EPROM), Programmable Red-Only Memory (PROM), Read-Only Memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk.

[0123] In summary, this invention achieves the following: First, it constructs a comprehensive control mechanism from program schedule semantic parsing to action generation and optimization. Key semantics are mapped to digital genes, and semantic temperature is introduced as a unified, continuous control variable to collaboratively regulate random diffusion, semantic variation, and action expression during action generation. Second, it implements similarity drift in the semantic vector space to achieve controlled semantic expansion while maintaining core meaning. Third, it combines the construction and fusion of probabilistic solution spaces to provide a multi-source candidate basis for action generation. Fourth, it utilizes Brownian motion and Jacobian null space projection to achieve continuous perturbation and safety constraints of actions within redundant degrees of freedom, ensuring the flexibility and stability of action expression. Fifth, it forms an initial action sequence on the timeline through probabilistic solution selection and temporal reconstruction, and achieves continuous and smooth optimization of actions through interpolation completion and accelerated integral evaluation. This overall improves the naturalness, coherence, and expressive diversity of digital human hosting actions.

[0124] It should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit it. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all such modifications or substitutions should be covered within the scope of the claims of the present invention.

Claims

1. A method for a digital human to host a program using AI technology, characterized in that: include, Identify key semantics in the program schedule, which contains process information and program content; The key semantics are defined as digital genes that drive the actions and behaviors of digital humans, and each digital gene is assigned a semantic temperature value based on the importance of the key semantics. In the action library, corresponding action expressions are matched according to the digital genes; when performing gene action expressions, the semantic body temperature triggering negatively correlated Brownian motion mechanism is introduced to define the solution space and wandering boundary of the digital genes in the virtual environment. By performing fixed-round genetic and mutation evolution on the digital genes, a negative feedback control strategy for the semantic body temperature on the evolution mechanism is established, and the solution space of the alignment position is expanded by aligning the parent genes and the offspring genes. In the expanded solution space, a set of candidate action expression sequences with a fluency higher than the preset value is randomly generated, and the actions in the sequence are connected and supplemented to control the hosting behavior of the digital human. The action representation includes generating a probability distribution of the action corresponding to each key semantic based on a pre-trained Bayesian algorithm, which is defined as the action probability solution space of the corresponding digital gene, where each solution represents an action; A Brownian motion mechanism is introduced into the action probability solution space. For each solution in the action probability solution space, the probability corresponding to the solution is multiplied by a fixed order of magnitude, which is used as the base for solution jittering. The solution space is then jittered in a random direction to obtain the jittered solution space. The Brownian motion mechanism is visualized as the stochastic differential increment of a standard Wiener process. ; Constructing the Brownian diffusion coefficient Regarding the semantic body temperature Negative correlation function: ; in, As an adjustment constant, T is a constant, and T is the body temperature at which the digital gene is activated; For each independent solution in the action probability solution space, calculate the kinematic Jacobian matrix for the current solution. and its pseudo-inverse matrix ;Utilizing the Jacobi null space projection operator By restricting the Brownian motion increment to redundant degrees of freedom that do not affect the core intent, the action differential step size defining the wander boundary is generated: Where I is the identity matrix, representing all degrees of freedom; The negative feedback control strategy includes constructing a negative correlation control equation between the drift probability and the semantic body temperature, and a negative correlation control equation between the drift distance and the semantic body temperature; in the gene mutation determination stage, based on the drift probability, similarity drift is triggered on the underlying semantic feature string of the current digital gene. When similarity drift is triggered, a completely random high-dimensional direction vector is generated in the semantic vector space, and combined with the calculated drift distance, the spatial position of the initial underlying semantic feature string is updated and reconstructed to generate the mutated offspring semantic feature string. After decoding the semantic feature string of the mutated offspring, the key semantics corresponding to the mutated gene are obtained.

2. The method for digital human hosting a program using AI technology as described in claim 1, characterized in that: The program schedule is input into a pre-trained semantic coding model and mapped to a high-dimensional temporal tensor sequence. For each candidate semantic node in the tensor sequence, calculate the conditional probability that it will trigger a specific digital human action in the action intent space, which is used as the action information entropy density of the node. Simultaneously, in the global context self-attention matrix, a mask perturbation is applied to the candidate semantic nodes, and the rate of change of attention weight gradient caused by the mask perturbation is calculated. Candidate semantic nodes whose action information entropy density is less than a first preset threshold and whose gradient change rate is greater than a second preset threshold are extracted and clustered into the key semantics. On the timeline, the time interval between adjacent key semantics is calculated, and the interval constraint is set: between key semantics where the time interval is greater than a preset interval, candidate semantic nodes are extracted so that the time interval satisfies the constraint; In the process of extracting candidate semantic nodes, the second preset threshold is decreased by a preset step size of 2 and the first preset threshold is increased by a preset step size of 1, and the step size is changed alternately to achieve the extraction and completion of candidate semantic nodes. The action intention space includes a multi-dimensional quantity space obtained by arranging the physical positions of each basic action intention of the digital human as coordinate nodes and using the similarity of the basic action intentions between nodes as the distance.

3. The method for digital human hosting a program using AI technology as described in claim 2, characterized in that: The importance level includes, for each independent program segment, extracting the gradient change rate of the attention weight corresponding to candidate semantic node i. With action information entropy density ; Adjust the preset weights according to the gradient magnitude. With the preset weights adjusted to the order of reciprocal of entropy ; A segmented transformation model is introduced to adjust the weights in different program segments; the adjusted weights are then used to calculate the gradient rate of change of the attention weights. With the entropy density of the action information The reciprocal of the terms is linearly weighted and summed to calculate the comprehensive importance coefficient of each candidate semantic node within an independent program segment; The calculated comprehensive importance coefficient The positive mapping is the body temperature at which digital genes are activated. ; The segmented conversion model includes: analyzing the distribution characteristics of the motion information entropy density within the current program segment using a spatial clustering algorithm, and extracting the number n of density centers in the motion intent space; based on the number n of density centers of the motion information entropy density, performing... Perform a transformation by a factor of n.

4. The method for digital human hosting a program using AI technology as described in claim 3, characterized in that: The genetic and mutation evolution includes mapping the key semantics in a pre-trained semantic coding model to generate a high-dimensional continuous vector, which is defined as the underlying semantic feature string of the corresponding digital gene. Offspring gene sequences are generated using the same parent gene sequence; wherein each gene in the sequence is mutated by performing similarity shifting on the underlying semantic feature string in the semantic vector space.

5. The method for hosting a program using a digital human through AI technology as described in claim 4, characterized in that: The process of expanding the solution space of the alignment position includes summarizing all gene sequences before and after the mutation, and integrating all genes in the sequence according to the gene position relationship in the gene sequence, matching and integrating genes located at the same gene position one by one to obtain multiple gene expressions at each gene position. The solution space of all action probabilities corresponding to each gene position is released, and the probability distributions of the solution space are superimposed and normalized to obtain the final solution space for each gene position.

6. The method for digital human hosting a program using AI technology as described in claim 5, characterized in that: In the final solution space of each gene position, the solution for each gene position is randomly selected according to the probability corresponding to the action; The discrete action expressions extracted from each gene location are reproduced according to their temporal position on the program schedule timeline to construct an initial timeline action sequence; Identify the action gaps between adjacent discrete action expressions in the timeline action sequence, extract the last frame spatial pose of the preceding action and the first frame spatial pose of the subsequent action, and use a spatial interpolation algorithm to complete the action between the last frame and the first frame to generate a continuous and complete action sequence on the timeline. The smoothness is calculated using the jerk integral cost function method, and is used as a measure of the smoothness.

7. A system for a digital human hosting a program using AI technology, based on the method for a digital human hosting a program using AI technology as described in any one of claims 1 to 6, characterized in that: Includes an identification unit that identifies key semantics in a program schedule containing process information and program content; The definition unit defines the key semantics as digital genes that drive the digital human's actions and behaviors, and assigns semantic temperature values ​​to each digital gene based on the importance of the key semantics. The computing unit matches the corresponding action expression to the digital gene in the action library; when performing the action expression of the gene, the semantic body temperature triggering negatively correlated Brownian motion mechanism is introduced to define the expression space and wandering boundary of the digital gene in the virtual environment. The evolutionary unit establishes a negative feedback control strategy for the semantic body temperature on the evolutionary mechanism by performing fixed rounds of genetic and mutation evolution on the digital genes, and expands the expression space of the aligned positions by aligning the parent genes and offspring genes. The output unit, within the expanded expression space, randomly generates a set of candidate action expression sequences with fluency higher than a preset value, and then connects and supplements the actions in the sequence to control the hosting behavior of the digital human.

8. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that: When the processor executes the computer program, it implements the steps of the method for hosting a digital human program using AI technology as described in any one of claims 1 to 6.

9. A computer-readable storage medium having a computer program stored thereon, characterized in that: When the computer program is executed by the processor, it implements the steps of the method for hosting a digital human program using AI technology as described in any one of claims 1 to 6.