A humanoid robot-assisted intelligent teaching method and system

By constructing target knowledge graphs and learning knowledge graphs, and combining teaching data and student status data, the problem of insufficient characterization of knowledge point structural dependencies and cognitive structure consistency in existing intelligent teaching is solved. This achieves adaptive closed-loop optimization of robot-assisted teaching and improves teaching effectiveness.

CN122221902APending Publication Date: 2026-06-16HENAN FORESTRY VOCATIONAL COLLEGE +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HENAN FORESTRY VOCATIONAL COLLEGE
Filing Date
2026-03-12
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Existing intelligent teaching technologies lack a systematic characterization of the structural dependencies between knowledge points and the consistency of the overall cognitive structure, making it difficult to identify deep learning problems, lack structural integrity constraints in teaching resource recommendations, and lack linkage between the robot execution layer and high-level teaching strategies, making it difficult to achieve a structure-driven adaptive teaching closed loop.

Method used

By constructing target knowledge graphs and learning knowledge graphs, and combining teaching data and student status data, a comprehensive assessment of students' cognitive status is achieved, generating a logically coherent set of teaching content, and combining teaching strategies with robot execution path planning to form a closed-loop optimization mechanism.

🎯Benefits of technology

It enables precise identification and dynamic adjustment of students' cognitive states, improves the accuracy and adaptability of teaching decisions, and enhances teaching effectiveness.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122221902A_ABST
    Figure CN122221902A_ABST
Patent Text Reader

Abstract

The application provides a humanoid robot assisted intelligent teaching method and system, and relates to the technical field of data processing.The method comprises the following steps: obtaining teaching data and student state data through a humanoid robot; constructing a target knowledge graph according to the teaching data; constructing a learning knowledge graph according to the student state data; judging whether the learning state of a student meets the teaching target condition based on the target knowledge graph and the learning knowledge graph; if yes, re-obtaining teaching data and student state data through the humanoid robot; otherwise, matching a teaching demand and a teaching resource library according to the target knowledge graph and the learning knowledge graph to generate a teaching content set; generating a teaching strategy according to the teaching content set; generating a robot teaching execution path according to the teaching strategy; and executing the robot teaching execution path.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of data processing technology, and in particular to an intelligent teaching method and system assisted by a humanoid robot. Background Technology

[0002] With the continuous development of artificial intelligence, service robot, and intelligent sensing technologies, humanoid robots are gradually moving from laboratory research to practical applications in education, healthcare, and public services. In the education field, robot-assisted teaching, as an important component of intelligent education, can participate in classroom activities through voice interaction, action demonstration, multimodal perception, and real-time feedback, thereby alleviating teacher workload and enhancing classroom interaction to some extent.

[0003] However, in actual teaching, teaching objectives usually have clear knowledge structure levels and dependencies, while students' cognitive states exhibit dynamic changes and structural differences. How to accurately depict the correspondence between the structure of teaching objectives and the cognitive structure of students in a teaching environment involving robots has become a key technical problem that urgently needs to be solved in the field of intelligent teaching.

[0004] Existing intelligent teaching and robot-assisted teaching technologies generally focus on evaluation based on the accuracy of answers or the probability of knowledge mastery, mainly analyzing at the level of single knowledge points. They lack a systematic characterization of the structural dependencies between knowledge points and the consistency of the overall cognitive structure, making it difficult to identify deep learning problems caused by broken prerequisite relationships or incomplete structural loops. At the same time, existing teaching resource recommendations are mostly based on semantic similarity or simple rule matching, without constraining structural integrity, which easily leads to fragmented teaching content. In addition, the robot execution layer usually adopts preset actions or simple interaction logic, lacking an optimized control mechanism that links with higher-level teaching strategies, resulting in a disconnect between teaching decisions and physical execution, making it difficult to achieve a truly structure-driven adaptive teaching loop. Summary of the Invention

[0005] In view of the shortcomings of the prior art, the purpose of this invention is to provide a humanoid robot-assisted intelligent teaching method that can solve the problems of existing intelligent teaching and robot-assisted teaching technologies, which generally focus on evaluation based on the accuracy of answers or the probability of knowledge mastery. They mainly analyze at the level of single knowledge points and lack a systematic characterization of the structural dependencies between knowledge points and the consistency of the overall cognitive structure. This makes it difficult to identify deep learning problems caused by the breakage of prior relationships or the incompleteness of structural loops. At the same time, existing teaching resource recommendations are mostly based on semantic similarity or simple rule matching, without constraining the structural integrity, which easily leads to the fragmentation of teaching content. In addition, the robot execution layer usually adopts preset actions or simple interaction logic, lacking an optimized control mechanism that links with high-level teaching strategies, resulting in a disconnect between teaching decisions and physical execution, making it difficult to achieve a truly structure-driven adaptive teaching loop.

[0006] A first aspect of this invention provides an intelligent teaching method assisted by a humanoid robot, comprising:

[0007] S1: Acquire teaching data and student status data through humanoid robots; S2: Construct a target knowledge graph based on the teaching data; S3: Construct a learning knowledge graph based on the student status data; S4: Based on the target knowledge graph and the learning knowledge graph, determine whether the student's learning status meets the teaching objective conditions; if yes, return to step S1; otherwise, proceed to step S5. S5: Based on the target knowledge graph and the learning knowledge graph, match the teaching needs and the teaching resource library to generate a set of teaching content; S6: Generate teaching strategies based on the set of teaching content; S7: Generate a robot teaching execution path based on the teaching strategy; S8: Execute the robot teaching execution path.

[0008] A second aspect of this invention provides an intelligent teaching system assisted by a humanoid robot, comprising: a processor and a memory; The memory stores programs or instructions that can run on the processor, which, when executed by the processor, implement the steps of the humanoid robot-assisted intelligent teaching method as described in the first aspect.

[0009] The beneficial effects of the technical solutions provided in the embodiments of the present invention include at least the following: In this embodiment of the invention, by constructing a dual-graph structure model of a target knowledge graph and a learning knowledge graph and determining their consistency, a comprehensive assessment of students' cognitive state at the node level and structural dependency level is achieved. This enables accurate identification of knowledge gaps and dependency breaks. Based on the differences between the two graphs, teaching needs and teaching resource libraries are structurally matched to generate a logically coherent and targeted set of teaching content, improving the accuracy and efficiency of resource allocation. By combining teaching strategies with robot execution path planning, a closed-loop optimization mechanism is formed from cognitive diagnosis, strategy generation to physical execution, enabling the robot to dynamically adjust its teaching behavior according to the student's state, thereby improving the accuracy of teaching decisions, its adaptability, and the overall teaching effect. Attached Figure Description

[0010] The accompanying drawings are for illustrative purposes only and are not intended to limit the invention. Throughout the drawings, the same reference numerals denote the same parts. Obviously, the drawings described below are merely some embodiments of the present invention, and those skilled in the art can obtain other drawings based on these drawings without any creative effort.

[0011] Figure 1 This is a flowchart illustrating an intelligent teaching method assisted by a humanoid robot, provided in an embodiment of the present invention.

[0012] Figure 2 This is a schematic diagram of the structure of an intelligent teaching system assisted by a humanoid robot, provided in an embodiment of the present invention. Detailed Implementation

[0013] To enable those skilled in the art to better understand the technical solutions in the embodiments of the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. It should be understood that these descriptions are merely exemplary and are not intended to limit the scope of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.

[0014] The intelligent teaching method assisted by humanoid robots provided in this invention will be described in detail below with reference to the accompanying drawings, through specific embodiments and application scenarios.

[0015] Reference manual attached Figure 1 The diagram shows a flowchart of an intelligent teaching method assisted by a humanoid robot, provided by an embodiment of the present invention.

[0016] This invention provides an intelligent teaching method assisted by a humanoid robot, which may include the following steps: S1: Acquire teaching data and student status data through humanoid robots.

[0017] In one possible implementation, the teaching data specifically includes: teaching content, knowledge point structure, teaching progress, teaching methods, audio explanations, and blackboard writing.

[0018] Student status data specifically includes: answer results, response time, eye movement trajectory, posture changes, facial features, voice feedback, and operational behavior.

[0019] Specifically, the humanoid robot acts as a data acquisition terminal during the teaching process. First, it acquires teaching content text, course knowledge structure information, current teaching progress indicators, and the type of teaching method used through its built-in voice acquisition module, classroom content access interface, and electronic whiteboard interactive system. Then, it uses a voice recognition module to transcribe the lecture speech in real time, and a whiteboard recognition algorithm to perform structured analysis of the blackboard image or electronic whiteboard data, thus forming a teaching data set. Simultaneously, the robot continuously observes students through a visual perception module (including a camera and depth sensor), obtains student answer results and timestamps through the question-and-answer system interface, calculates response time, extracts eye-tracking trajectory data through an eye-tracking algorithm, extracts posture change features through a human posture recognition model, extracts facial expression features through an expression recognition model, acquires voice feedback information through a voice acquisition and emotion analysis module, and collects student operational behavior data on touch devices or the teaching system through an interactive terminal or operation recording module, thus forming a student state data set containing cognitive performance and behavioral characteristics.

[0020] S2: Construct a target knowledge graph based on teaching data.

[0021] In one possible implementation, S2 specifically includes: S201: Divide the teaching data to obtain a set of teaching segments.

[0022] Specifically, the teaching content text, the transcript of the explanation, and the blackboard images or electronic blackboard data are aligned in chronological order, and the continuous teaching process is segmented according to a preset time window to form multiple teaching segments, thus obtaining a set of teaching segments. Each teaching segment contains text semantic information, speech feature information, and blackboard structure information within a corresponding time range. Those skilled in the art can set the size of the preset time window according to actual needs, and this invention does not limit this.

[0023] S202: Perform semantic embedding processing on the set of teaching segments to obtain a candidate set of concepts.

[0024] Specifically, semantic features are extracted from each teaching segment in the set of teaching segments, and the text content, speech-to-text, and blackboard recognition results are converted into vectorized representations. Semantic similarities or synonyms in different segments are normalized, and a set of candidate concepts is obtained.

[0025] S203: Perform concept saliency screening on the candidate concept set to obtain a node set, and generate node attribute vectors for all nodes in the node set.

[0026] The node attribute vector includes semantic embedding representation, frequency of occurrence, degree of emphasis, and importance score.

[0027] Specifically, for each candidate concept in the candidate concept set, its frequency of occurrence in the teaching segment set, cross-segment stability, and degree of matching with the course syllabus or teaching objectives are statistically analyzed. A concept saliency score is calculated based on these indicators (by normalizing and weighting the frequency of occurrence, cross-segment stability, target matching degree, and emphasis level). Candidate concepts with saliency scores lower than a preset score are eliminated, and candidate concepts with semantic repetition or overlapping expressions are merged. Finally, concepts that meet the saliency score requirements are retained, forming a node set. Each node in the node set serves as a knowledge point entity in the target knowledge graph.

[0028] It should be noted that those skilled in the art can set the preset score according to actual needs, and this invention does not limit this.

[0029] S204: Generate a set of candidate edges based on the set of nodes.

[0030] Specifically, any two different nodes in the node set are paired to construct directed node pairs as candidate relation pairs. Based on their temporal order, co-occurrence, and semantic association in the teaching segment, a preliminary set of candidate edges that may have dependency or semantic association relationships is generated.

[0031] S205: Calculate the conditional information gain for all node pairs in the candidate edge set.

[0032] For example, for any candidate directed node pair ,in, i and j This represents the node number. Based on its joint occurrence probability and conditional occurrence probability in the teaching segment, the calculation is performed on the nodes that have already appeared. under the condition of node The amount of information gain brought about by the occurrence of a node, used to measure the node's information gain. For nodes The degree of dependence contribution; by calculating the conditional information gain of all candidate node pairs, the directional association strength index of each candidate edge is obtained.

[0033] S206: Based on the conditional information gain, filter the candidate edge set to obtain the target edge set.

[0034] Specifically, the conditional information gain value of each node pair is compared with a preset gain. When the conditional information gain is greater than a threshold and has significant directionality, the corresponding directed edge relationship is retained. If the conditional information gain is low or the directionality is not obvious, the candidate relationship is deleted, thus obtaining a set of edges with directional dependence.

[0035] It should be noted that those skilled in the art can set the preset gain according to actual needs, and this invention does not limit this.

[0036] S207: Calculate the relation strength weights of all edges in the target edge set to obtain the weighted edge set.

[0037] Specifically, for the retained directed edge relationships, the strength of each edge is calculated by further combining multi-dimensional teaching characteristics such as the co-occurrence frequency of nodes in the teaching segment, order stability, degree of speech emphasis, and degree of connection of the blackboard structure. By normalizing and integrating the above indicators, the relationship strength value of each edge is obtained, thus forming a weighted edge set.

[0038] S208: Generate edge attribute vectors for each edge in the target edge set.

[0039] The edge attribute vector includes relation type identifiers (such as precedence relation, containment relation, analogy relation, etc.), time order features (average difference in the order of node appearance), stability features (consistency index of cross-segment appearance), and directional saliency identifiers.

[0040] S209: Combine the node set, node attribute vector, target edge set, edge attribute vector, and weighted edge set to generate the target knowledge graph.

[0041] Specifically, by unifying and integrating the node set, node attribute vector, edge set, edge attribute vector, and weighted edge set, a target knowledge graph model that simultaneously contains structural relationship information and multi-dimensional feature information can be formed. This enables the unified expression of the dependency structure, relationship strength, and semantic attributes between knowledge points, thereby improving the completeness and computability of knowledge representation.

[0042] In this embodiment of the invention, by systematically analyzing teaching data and constructing a target knowledge graph, scattered teaching content can be transformed into a structured, hierarchical knowledge network, clarifying the dependencies between knowledge points and the teaching objectives.

[0043] S3: Construct a learning knowledge graph based on student status data.

[0044] Specifically, the student status data is first processed into time segments, and information such as answer results, response time, eye movement trajectory, posture changes, facial features, voice feedback, and operational behaviors are mapped to the node set in the target knowledge graph. Based on the mapping results, the occurrence or mastery status of nodes in each behavioral segment is represented, thereby generating candidate node pairs. By calculating the conditional information gain or association strength between node pairs, edge relationships with directional dependence are selected, and edge weights are calculated in conjunction with the node mastery index. At the same time, a node attribute vector reflecting its actual mastery level is generated for each node, and an edge attribute vector containing relation strength and relation type embeddings is generated for each edge. Finally, the node set, edge set, weighted edge set, node attribute matrix, and edge attribute matrix are integrated to generate a learning knowledge graph reflecting the student's current cognitive structure and mastery level.

[0045] It should be noted that the construction process of the learning knowledge graph is consistent with the construction process of the target knowledge graph in step S2 in terms of overall methodological framework. Both include steps such as data fragmentation processing, semantic or state embedding modeling, node generation, candidate edge construction, direction filtering, relation strength calculation, and node and edge attribute generation, thereby ensuring the consistency between the target knowledge graph and the learning knowledge graph in terms of structural form, attribute dimensions, and representation space. The difference lies in that the input data of step S2 is teaching data, used to characterize the ideal knowledge structure and target mastery requirements in the instructional design, thereby generating a target knowledge graph with target mastery attribute. The input data of step S3 is student state data, used to characterize the actual mastery status and cognitive relationships of students at each knowledge node, thereby generating a learning knowledge graph with actual mastery attribute.

[0046] In this embodiment of the invention, by performing structured modeling on student status data and constructing a learning knowledge graph, the scattered information of students in answering questions, behavioral feedback and cognitive characteristics can be transformed into a computable cognitive structure representation, accurately depicting the degree of mastery of students at each knowledge node and the state of knowledge association.

[0047] S4: Based on the target knowledge graph and the learning knowledge graph, determine whether the student's learning status meets the teaching objective conditions. If yes, return to step S1. Otherwise, proceed to step S5.

[0048] It should be noted that the target knowledge graph is used to depict the ideal knowledge structure and dependencies in the instructional design, while the learning knowledge graph reflects the student's cognitive structure and mastery status at the current moment. Since the two types of graphs maintain consistency in node semantic space and structural representation, a computable correspondence exists between the target knowledge structure and the student's cognitive structure. Therefore, this invention can dynamically evaluate teaching effectiveness based on the structural differences between the two graphs. By determining the consistency between the target knowledge graph and the learning knowledge graph, it can be judged whether the student's current cognitive state simultaneously meets the mastery requirements and dependency constraints set by the instructional objectives at both the node level and the structural dependency level. If the determination result indicates that the learning state meets the conditions of the instructional objectives, the system returns to the data collection step to continue the subsequent stages of instructional content development or to continuously monitor the student's cognitive state. If the determination result indicates the existence of knowledge deviations or structural breaks, the system enters the instructional resource matching and strategy generation step to provide targeted intervention for the unmet knowledge nodes or dependencies.

[0049] In one possible implementation, S4 specifically includes: S401: Encode the target knowledge graph and the learning knowledge graph respectively to obtain the initial node embedding set of the target graph, the initial edge embedding set of the target graph, the initial node embedding set of the learning graph, and the initial edge embedding set of the learning graph.

[0050] Specifically, for each node in both the target knowledge graph and the learning knowledge graph, a node embedding representation is obtained by mapping its node attribute vector through a node encoding network. Simultaneously, for each edge in the graph, a corresponding edge embedding representation is obtained by mapping its edge attribute vector through an edge encoding network. Through these node and edge encoding operations, the discrete attribute information in the original graph structure is converted into continuous vector representations, thereby obtaining the initial node embedding set, initial edge embedding set, initial node embedding set, and initial edge embedding set of the learning graph, respectively.

[0051] S402: Based on the initial node embedding set of the target graph, the initial edge embedding set of the target graph, the initial node embedding set of the learning graph, and the initial edge embedding set of the learning graph, perform in-graph message propagation within the target knowledge graph and the learning knowledge graph respectively to obtain the enhanced initial node embedding set of the target graph and the enhanced initial node embedding set of the learning graph.

[0052] Specifically, in any graph, for each node, its set of neighboring nodes is first determined. Then, for each neighboring node, an in-graph message is constructed based on the current node embedding, neighboring node embeddings, and edge embeddings between the two nodes. The in-graph messages from all neighboring nodes are then aggregated to obtain the current node's structural context vector. Finally, the current node embedding and the structural context vector are input into an update function for fusion and updating, generating the next round of node embeddings. After a predetermined number of iterative propagations, the embedding representation of each node includes not only its own attribute information but also its dependencies, adjacency relationships, and structural roles within the graph structure, thus obtaining the enhanced node embedding set for the target graph and the enhanced node embedding set for the learning graph, respectively.

[0053] S403: Based on the initial node embedding set of the enhanced target graph and the initial node embedding set of the enhanced learning graph, perform cross-graph attention matching on the target knowledge graph and the learning knowledge graph to obtain the cross-graph attention matching matrix.

[0054] Specifically, for each node in the target graph, the similarity between it and the embedded representations of each node in the learning graph is calculated, and the similarity is normalized to generate corresponding matching weights. All matching weights are aggregated to form a cross-graph attention matching matrix, which is used to characterize the node-level soft alignment relationship between the target knowledge graph and the learning knowledge graph.

[0055] The cross-graph attention matching matrix is ​​as follows:

[0056] in, Represents the first in the knowledge graph of learning Knowledge Nodes For the first in the target knowledge graph target nodes The matching weights, i.e., the cross-graph attention matching matrix, are denoted by exp, which represents an exponential function. sh ( ) represents the similarity function. In the target knowledge graph, the first... target nodes Enhanced node embedding vectors, Represents the first in the knowledge graph of learning Knowledge Nodes Enhanced node embedding vectors, This represents the index variable used to traverse all nodes in the knowledge graph.

[0057] S404: Calculate the cross-graph matching residual based on the cross-graph attention matching matrix:

[0058]

[0059] in, This indicates that, under the influence of the cross-graph attention matching matrix, the knowledge graph learned is the first... target nodes The reconstructed embedding representation, p The index variable represents the node in the knowledge graph. Indicates the first target nodes Cross-graph matching residual vectors.

[0060] S405: Calculate the node matching energy based on the cross-graph matching residual.

[0061] Specifically, for each node in the target knowledge graph, based on the cross-graph matching residual vector, the magnitude or norm of the residual vector is calculated to characterize the degree of alignment deviation between the node and the target knowledge structure and the learned knowledge structure. Subsequently, the residual magnitudes of all nodes are accumulated or weighted to obtain the overall node matching energy, which reflects the overall matching degree of the learned knowledge graph relative to the target knowledge graph at the node level. In this embodiment of the invention, the smaller the node matching energy, the closer the learned structure is to the target structure at the node representation level. The larger the node matching energy, the more significant the knowledge point deviation or missing information.

[0062] S406: Calculate the structural closure energy based on the cross-graph attention matching matrix and the edge relationships of the target knowledge graph.

[0063] Specifically, for any edge in the target knowledge graph that has a dependency relationship Target nodes are determined based on cross-graph attention matching matrices. and The soft matching distribution in the learning knowledge graph is then used to calculate the weighted reachability strength between matching node sets using the edge weight adjacency relationship of the learning knowledge graph. The weighted reachability strength is then compared with the expected strength of the corresponding edge in the target knowledge graph to characterize the structural closure consistency of the target dependency in the learning knowledge graph. The deviations of all target dependency edges are accumulated to obtain the structural closure energy, which is used to measure whether the learning knowledge graph satisfies the dependency constraints of the target knowledge graph at the structural level.

[0064] The structural closure energy is specifically:

[0065]

[0066] in, The first term in the target knowledge graph i target nodes In the learning graph, the matching weight vector, This represents the cross-graph attention matching matrix. Represents the cross-graph attention matching matrix The Middle i Column vectors Represents the structural closure energy. This represents the set of edges in the target knowledge graph. T Indicates transpose. The weighted adjacency matrix represents the knowledge graph being learned. The first term in the target knowledge graph j target nodes In the learning graph, the matching weight vector, Representing edges in the target knowledge graph The expected dependence strength.

[0067] S407: Calculate the constraint energy between the target knowledge graph and the learned knowledge graph based on the node matching energy and structural closure energy.

[0068] It should be noted that node matching energy reflects the alignment between the target knowledge structure and the learned knowledge structure at the node representation level, while structural closure energy reflects the degree to which the target dependency relationship is structurally satisfied in the learned knowledge graph. By weighting or uniformly measuring these two types of energy, the constraint energy value between the target knowledge graph and the learned knowledge graph is obtained. This constraint energy is used to comprehensively evaluate whether the student's current cognitive structure simultaneously meets the teaching objective requirements at both the node and structural levels.

[0069] S408: Determine whether the constraint energy is less than or equal to the preset constraint energy. If yes, the learning state meets the teaching objective conditions, and return to step S1. Otherwise, the learning state does not meet the teaching objective conditions, and proceed to step S5.

[0070] It should be noted that those skilled in the art can set the magnitude of the constraint energy according to actual needs, and this invention does not limit it.

[0071] In this embodiment of the invention, a method is used to set up a target knowledge graph and learned knowledge. Figure 1 The consistency assessment process enables dynamic evaluation of students' cognitive status at the structural level, rather than solely based on surface grades. This allows for accurate identification of knowledge mastery deviations and dependency gaps, and timely triggering of targeted intervention mechanisms. This enhances the accuracy and real-time nature of teaching decisions, strengthens the adaptability of the teaching process, and improves overall regulatory efficiency.

[0072] S5: Based on the target knowledge graph and the learning knowledge graph, match the teaching needs with the teaching resource library to generate a set of teaching content.

[0073] In one possible implementation, S5 specifically includes: S501: Construct a teaching demand query graph based on the target knowledge graph and the learning knowledge graph.

[0074] Specifically, a difference analysis is performed on the target knowledge graph and the learning knowledge graph to extract knowledge nodes that are currently not up to standard or have broken dependencies, along with their associated edges, to construct a teaching needs query graph. The node set contains the knowledge points that need to be strengthened or repaired, and the edge set represents the dependencies or prerequisite relationships that these knowledge points should satisfy in the target knowledge structure. This graph is used to clarify "what knowledge needs to be supplemented and what structure needs to be repaired."

[0075] S502: Construct a teaching resource data graph based on the teaching resource database.

[0076] Specifically, each independently accessible smallest teaching unit in the teaching resource repository (such as knowledge explanation segments, example explanations, exercises, experimental demonstrations, or interactive tasks) is abstracted as a resource node. Node attributes include information such as the corresponding knowledge point, difficulty level, teaching type, and resource duration. Simultaneously, resource edges are established based on the logical connections between different resources in terms of knowledge dependence, sequence, inclusion relationships, or ability progression relationships. These edges represent the structural relationships and connection strength between resources, thus forming a teaching resource data graph composed of resource nodes and their structural connections.

[0077] S503: Calculate the matching matrix between the teaching demand query graph and the teaching resource data graph.

[0078] Specifically, a node matching matrix is ​​constructed by calculating metrics such as semantic similarity and structural relevance between query graph nodes and resource graph nodes. Simultaneously, an edge matching matrix is ​​constructed by calculating the degree of correspondence between query graph edges and resource graph edges.

[0079] S504: Divide the matching matrix into a node matching matrix and an edge matching matrix.

[0080] Specifically, the overall matching result obtained in step S503 is split according to the graph structure element type. The part representing the strength of the correspondence between each node in the teaching demand query graph and each resource node in the teaching resource data graph constitutes the node matching matrix, which is used to characterize the semantic matching degree. The part representing the strength of the correspondence between each edge in the query graph and each edge in the resource graph constitutes the edge matching matrix, which is used to characterize the structural relationship matching degree, thereby realizing separate modeling and constraint of node semantic matching and edge structural matching.

[0081] S505: Construct the edge feature consistency loss function and the node feature consistency loss function based on the node matching matrix and the edge matching matrix, respectively:

[0082]

[0083] in, This represents the node feature consistency loss function. This represents the edge feature consistency loss function. This represents the set of query nodes in the teaching demand query graph. Indicates query graph node a eigenvectors, Indicates the query node a With resource nodes b Matching weights Represents resource graph nodes b eigenvectors, The set of query edges representing the query graph of teaching needs. Represents the query edge a With resource edge b Matching weights Represents the edges in the teaching resource graph ( b , l The weight of ) Represents the query edge k With resource edge l Matching weights between them The edge in the teaching demand query graph represents the edge ( a , k The weight of ).

[0084] It should be noted that those skilled in the art can set the matching weights according to actual needs. Matching weight Weight and weight The size is not limited in this invention.

[0085] S506: Constructing the structural consistency constraint loss function:

[0086]

[0087] in, This represents the structural consistency constraint loss. Indicates the query node Matching weight vector in the resource graph, The weighted adjacency matrix represents the graph of teaching resource data. Indicates querying nodes in the graph k Matching weight vector in the resource graph, Represents the node matching matrix The row vectors T This indicates transpose.

[0088] S507: Construct the total loss function based on the structural consistency constraint loss function, the edge feature consistency loss function, and the node feature consistency loss function.

[0089] Specifically, the node feature consistency loss, edge feature consistency loss, and structural consistency constraint loss are weighted and fused to construct a unified total loss function. By setting the weight coefficients of each loss term, the semantic matching degree and the structural matching degree are balanced, thereby forming an overall optimization objective that comprehensively evaluates the quality of the current matching results.

[0090] S508: The optimal matching subgraph is obtained by minimizing the total loss function.

[0091] Specifically, by iteratively updating the node matching matrix and edge matching matrix through optimization algorithms, the total loss function is minimized. Under the premise of satisfying matching constraints, the set of resource nodes in the teaching resource data graph that best matches the teaching demand query graph and their corresponding structural relationships are determined, thereby obtaining the optimal matching subgraph.

[0092] S509: Perform structural integrity verification on the optimal matching subgraph. If the verification passes, proceed to step S510. Otherwise, adjust the constraint parameters of the structural consistency constraint loss function and return to step S507.

[0093] Specifically, the optimal matching subgraph is tested for structural constraints such as connectivity, dependency chain integrity, and key node coverage to determine whether it meets the structural integrity requirements of the teaching requirement graph. If dependency breaks or missing key nodes exist, the weight parameters of the structural consistency constraint terms are increased and the optimization process is re-executed until the matching result meets the structural integrity requirements.

[0094] S510: Generate a set of teaching content based on the optimal matching subgraph.

[0095] Specifically, the resource nodes corresponding to the optimal matching subgraph are extracted as a set of candidate teaching resources, and the organization order and logical arrangement of the resources are determined according to the structural relationship in the subgraph, so as to finally generate a set of teaching content that meets the current teaching needs.

[0096] In this embodiment of the invention, by performing differential analysis on the target knowledge structure and the student's actual cognitive structure, and combining it with the teaching resource database for structured matching and optimized screening, it is possible to accurately locate the unmet knowledge points and their dependencies and allocate resources accordingly. This avoids blindly pushing teaching content, improves resource utilization efficiency and the pertinence of teaching intervention, and enhances the adaptability and overall optimization effect of the teaching process.

[0097] S6: Generate teaching strategies based on the set of teaching content.

[0098] In one possible implementation, S6 specifically includes: S601: Construct a teaching decision model based on the set of teaching content. The teaching decision model specifically includes: state space, action space, state transition function, observation function, and reward function.

[0099] It should be noted that the state space is used to characterize the students' mastery level and cognitive structure state at each knowledge node, the action space is used to define the types of executable macro-teaching actions, the state transition function is used to describe the change pattern of the students' mastery state after performing a certain teaching action, the observation function is used to characterize the student behavioral feedback and answer result distribution that may occur after the teaching action is performed, and the reward function is used to measure the contribution of a certain teaching action to the degree of achievement of the teaching objectives.

[0100] In one possible implementation, the action space includes: teaching actions, questioning actions, and testing actions.

[0101] S602: Initialize teacher beliefs and policy functions.

[0102] The policy function is a parameterized probabilistic policy function, whose output is the probability distribution of each macro action. The policy function can be modeled using a multilayer perceptron structure.

[0103] S603: Extract policy input features based on teacher beliefs.

[0104] Specifically, the probability of mastering knowledge nodes, the strength of structural dependencies, and historical feedback statistics contained in the current teacher's beliefs are encoded or vectorized to construct the input feature representation required by the policy function, so as to map the student's current cognitive state into a numerical feature vector that can be used for action decision-making.

[0105] S604: Based on the policy input characteristics, select macro actions in the action space through the policy function.

[0106] Specifically, the policy input features are input into the policy function, and based on the action probability distribution or value estimation result output by the policy function, the current optimal or sampled macro-teaching action is selected from the action space, including one of the teaching action, questioning action or testing action.

[0107] S605: Execute macro actions to determine observation results and rewards.

[0108] Specifically, the teaching interaction process is organized according to the selected macro-instructional action, and behavioral feedback data generated by students under this action is obtained, including information such as answer results, response time, facial expression changes, or voice feedback. This feedback data is used as the observation result. At the same time, the immediate reward value of the macro-instructional action for achieving the teaching objective under the current belief state is calculated according to the reward function, which is used to evaluate the teaching effectiveness of the action.

[0109] S606: Update teacher beliefs based on the state transition function, observation function, and observation results.

[0110] Specifically, the prior changes in students' knowledge state after executing macro actions are predicted based on the state transition function. Then, the probability is updated by combining the observation function and the actual observation results. The mastery probability of each knowledge node of students is updated by Bayesian method, thereby obtaining the updated teacher belief state.

[0111] S607: Add macro actions, observations, rewards, teacher beliefs, and updated teacher beliefs to the experience pool.

[0112] S608: Sample the experience pool and update the policy function based on the sampling results.

[0113] In one possible implementation, S608 specifically includes: S6081: Sample the experience pool to obtain an experience sample quadruple.

[0114] Specifically, several historical interaction data are selected from the experience pool according to a preset sampling strategy (including random sampling or priority-based weighted sampling). Each data includes the current teacher belief state, the macro action performed, the immediate reward obtained, and the updated teacher belief state, thus forming an experience sample quadruple.

[0115] S6082: Construct a mirror operator based on empirical sample quadruples.

[0116]

[0117] in, This represents the mirror operator, i.e., in the teacher's belief state. Next, candidate new strategies Compared to the old strategy The mirror improvement value under the evaluation system described express In belief state Continue to implement the old strategy The expected cumulative reward obtained, B Indicates the teacher's belief state. Indicating a state of belief B Next, execute the action. uAnd according to the action value function after strategy π, Indicated in the new strategy Below, the expectation of the action distribution, Indicates a new candidate strategy. express and The divergence between them Indicating a state of belief B The divergence penalty weight coefficient.

[0118] It should be noted that those skilled in the art can set the divergence penalty weight coefficient according to actual needs. The size is not limited in this invention.

[0119] S6083: Update the policy function based on the mirror operator.

[0120] Specifically, a constrained optimization problem is constructed in the parameter space of the policy function, with the optimization direction being to maximize the comprehensive objective defined by the mirror operator. By iteratively updating the policy parameters, the new policy can improve the long-term reward expectation while controlling the magnitude of policy distribution changes, thereby achieving stable policy improvement.

[0121] S609: Determine if the maximum number of iterations has been reached. If yes, generate a teaching strategy based on the updated strategy function. Otherwise, return to step S603.

[0122] S7: Generate robot teaching execution path based on teaching strategy.

[0123] In one possible implementation, S7 specifically includes: S701: Map teaching strategies to execution planning tasks to obtain end-user value.

[0124] Specifically, based on the current teaching strategy and the robot's current state, the teaching action types, target objects, and execution rhythms contained in the strategy are semantically parsed and mapped into planning task parameters that can be computed at the execution layer. These parameters include reference state sequences such as target position, target orientation, gesture or joint reference trajectory, and speech and resource presentation timing. Simultaneously, within a limited field-of-view prediction framework, a terminal value function is constructed based on the reference states to measure the degree of matching between the predicted terminal state and the target state. The terminal cost can be defined as the negative value of the outer-loop value function, thus transforming the abstract teaching strategy into terminal constraints in the execution layer path optimization.

[0125] S702: Construct a limited-view MPC cost function based on the terminal value.

[0126] Specifically, based on the terminal value, and combined with the robot's current state and the predicted field of view length, a finite field of view model predictive control (MPC) cost function is constructed, which includes step-by-step execution cost and terminal cost:

[0127] in, This represents the cost function of the finite-view MPC. Indicates time t The robot's state Indicates time t The control sequence, H Indicates the predicted field of view length. Indicates the discount factor. Indicates the first h The instantaneous execution cost of each prediction step. Indicates time t Forward prediction of the first h The robot's state in steps, Indicates time t Forward prediction of the first h Step control input, Indicates the terminal cost. Indicates time t Forward prediction of the first H The robot's current state.

[0128] It should be noted that those skilled in the art can set the size of the discount factor according to actual needs, and this invention does not limit it.

[0129] S703: Construct a robot motion prediction model based on a dynamics model.

[0130] Specifically, firstly, based on the robot's structural parameters and actuator type, a unified model of the robot's state variables is established. The state vector is defined to include chassis pose, velocity information, joint angles and angular velocities, end effector posture, and interaction-related state variables. Subsequently, based on the robot's kinematic relationships and dynamic constraints, a state transition function is established between control inputs and state changes. This function describes the evolution of the robot's state at the next moment, given the current state and control commands. The state transition function can be constructed based on analytical kinematic models, dynamic equations, or data-driven training models. Based on this, a motion prediction model capable of forward recursive calculation within the prediction field of view is formed, thereby simulating and calculating future multi-step trajectories and providing state prediction support for finite field-of-view optimization.

[0131] S704: Generate initial control distribution parameters based on the current robot state and teaching strategy.

[0132] Specifically, based on the current robot state and the execution planning objective, and combined with the behavior type and target state information corresponding to the teaching strategy, an initial control sequence estimate is constructed within the prediction horizon, and control distribution parameters are constructed using the control sequence as the mean. The control distribution describes the probability distribution of control inputs for each future prediction step, and its parameters include the control mean sequence and covariance matrix.

[0133] S705: Construct the optimization objective function based on the initial control distribution parameters and the finite-view MPC cost function.

[0134] Specifically, based on the existing finite-view MPC cost function, and combined with the current initial control distribution parameters, the expected path cost of the future control sequence is estimated. Based on this expected cost, an optimization objective function containing gradient guidance terms and distribution stability constraints is constructed to measure the degree of deviation between the control distribution parameters and the initial distribution while reducing the path cost.

[0135] The specific optimization objective function is as follows:

[0136]

[0137] in, ( ) represents the objective function to be optimized. express t Learning rate at any given moment express t Control distribution parameters at time, This represents the gradient operator. express t The robot's state at any given moment. Represents the candidate control distribution parameters. J ( ) represents the expected path cost. Indicates the Bregman divergence. This indicates that the control sequence follows the distribution. Sampling is performed, and the state follows the dynamic model. The expected value under the condition of recursion. Indicates that the control distributed parameter is Controlled distribution, Indicates time t Forward prediction of the first h+ The robot's state at step 1.

[0138] S706: With the goal of minimizing the objective function, an optimization algorithm is used to obtain the updated initial control distribution parameters.

[0139] Specifically, the control distribution parameters are minimized by using optimization methods such as gradient descent or dynamic mirror descent to calculate the update direction and magnitude of the control distribution parameters, thereby reducing the desired path cost while satisfying the distribution stability constraint. The optimized updated control distribution parameters are used to characterize a better control input distribution within the future prediction field of view, enabling the robot to approach the planned objective more closely while maintaining control smoothness.

[0140] S707: Input the updated initial control distribution parameters into the robot motion prediction model to obtain a set of candidate paths.

[0141] Specifically, the updated control distribution parameters are used as the basis for generating control inputs. The control sequence is sampled multiple times within the prediction field of view, and the sampled control sequences are input into the robot motion prediction model for forward recursive calculation to obtain the corresponding future state trajectory. By summarizing the state trajectories formed by different sampling results, a candidate path set is constructed, where each candidate path includes the corresponding control input sequence and its predicted state sequence.

[0142] S708: Optimize the candidate path set to generate the robot teaching execution path.

[0143] Specifically, each path in the candidate path set is evaluated based on the finite-view MPC cost function, its comprehensive path cost is calculated, and the path with the minimum cost or that satisfies the optimal criterion is selected as the current optimal execution path, i.e., the robot teaching execution path.

[0144] In this embodiment of the invention, by mapping the high-level teaching strategy to a model predictive control optimization problem under a limited field of view, and combining the robot dynamics model with the control distributed parameter update mechanism, an effective connection between the teaching semantic goal and the physical execution path is achieved. This enables the robot to dynamically approach the teaching goal while ensuring the smoothness and stability of the action. The method takes into account both real-time performance and global goal consistency, and has the advantages of rolling optimization, adaptive adjustment and quantifiable evaluation of path quality, thereby improving the accuracy, robustness and overall coordination of the teaching execution process.

[0145] S8: Execute the robot teaching execution path.

[0146] Reference manual attached Figure 2 The diagram shows a structural schematic of an intelligent teaching system assisted by a humanoid robot, provided by an embodiment of the present invention.

[0147] This invention provides an intelligent teaching system 20 assisted by a humanoid robot, comprising: a processor 201 and a memory 202; The memory 202 stores programs or instructions that can run on the processor 201. When the program or instructions are executed by the processor 201, they implement the steps of the humanoid robot-assisted intelligent teaching method described above and achieve the same technical effect. To avoid repetition, the present invention will not elaborate further.

[0148] It should be understood that the processor 201 in this embodiment of the invention may be a central processing unit (CPU), or it may be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or any conventional processor.

[0149] It should also be understood that the memory 202 in the embodiments of the present invention can be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. The non-volatile memory can be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. The volatile memory can be random access memory (RAM), which is used as an external cache. By way of example, but not limitation, many forms of random access memory are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), double data rate synchronous dynamic random access memory (DDR SDRAM), enhanced synchronous dynamic random access memory (ESDRAM), synchronous link dynamic random access memory (SLDRAM), and direct memory bus RAM (DR RAM).

[0150] The above embodiments can be implemented, in whole or in part, by software, hardware (such as circuits), firmware, or any other combination thereof. When implemented using software, the above embodiments can be implemented, in whole or in part, as a computer program product. The computer program product includes one or more computer instructions or computer programs. When the computer instructions or computer programs are loaded or executed on a computer, all or part of the processes or functions described in the embodiments of the present invention are generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions can be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium that a computer can access or a data storage device such as a server or data center that includes one or more sets of available media. The available medium can be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. A semiconductor medium can be a solid-state drive.

[0151] It should be understood that, in various embodiments of the present invention, the order of the above-mentioned process numbers does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

[0152] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of this invention.

[0153] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the devices, apparatuses, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.

[0154] In the several embodiments provided by this invention, it should be understood that the disclosed devices, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another device, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between devices or units may be electrical, mechanical, or other forms.

[0155] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0156] In addition, the functional units in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.

[0157] If the aforementioned functions are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this invention, essentially, or the part that contributes to the prior art, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0158] This invention provides a readable storage medium comprising: storing a program or instructions on the readable storage medium, wherein when the program or instructions are executed by a processor, the program or instructions implement the steps of the above-described humanoid robot-assisted intelligent teaching method and achieve the same technical effect. To avoid repetition, this invention will not elaborate further.

[0159] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the embodiments of the present invention, and are not intended to limit them. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention. Any changes or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in the present invention should be included within the protection scope of the present invention.

Claims

1. A humanoid robot-assisted intelligent teaching method, characterized in that, include: S1: Acquire teaching data and student status data through humanoid robots; S2: Construct a target knowledge graph based on the teaching data; S3: Construct a learning knowledge graph based on the student status data; S4: Based on the target knowledge graph and the learning knowledge graph, determine whether the student's learning status meets the teaching objective conditions; If so, return to step S1; Otherwise, proceed to step S5; S5: Based on the target knowledge graph and the learning knowledge graph, match the teaching needs and the teaching resource library to generate a set of teaching content; S6: Generate teaching strategies based on the set of teaching content; S7: Generate a robot teaching execution path based on the teaching strategy; S8: Execute the robot teaching execution path.

2. The humanoid robot-assisted intelligent teaching method according to claim 1, characterized in that, The teaching data specifically includes: teaching content, knowledge point structure, teaching progress, teaching methods, audio explanations, and blackboard writing. The student status data specifically includes: answer results, response time, eye movement trajectory, posture changes, facial features, voice feedback, and operational behavior.

3. The intelligent teaching method assisted by a humanoid robot according to claim 1, characterized in that, S2 specifically includes: S201: Divide the teaching data to obtain a set of teaching segments; S202: Perform semantic embedding processing on the set of teaching segments to obtain a set of candidate concepts; S203: Perform concept saliency screening on the concept candidate set to obtain a node set, and generate node attribute vectors for all nodes in the node set; S204: Generate a candidate edge set based on the node set; S205: Calculate the conditional information gain of all node pairs in the candidate edge set; S206: Based on the conditional information gain, filter the candidate edge set to obtain the target edge set; S207: Calculate the relation strength weights of all edges in the target edge set to obtain a weighted edge set; S208: Generate an edge attribute vector for each edge in the target edge set; S209: Combine the node set, the node attribute vector, the target edge set, the edge attribute vector, and the weighted edge set to construct the target knowledge graph.

4. The humanoid robot-assisted intelligent teaching method according to claim 1, characterized in that, S4 specifically includes: S401: Encode the target knowledge graph and the learning knowledge graph respectively to obtain the target graph initial node embedding set, the target graph initial edge embedding set, the learning graph initial node embedding set, and the learning graph initial edge embedding set; S402: Based on the target graph initial node embedding set, the target graph initial edge embedding set, the learning graph initial node embedding set, and the learning graph initial edge embedding set, perform graph message propagation within the target knowledge graph and the learning knowledge graph respectively to obtain the enhanced target graph initial node embedding set and the enhanced learning graph initial node embedding set; S403: Based on the initial node embedding set of the enhanced target graph and the initial node embedding set of the enhanced learning graph, perform cross-graph attention matching on the target knowledge graph and the learning knowledge graph to obtain a cross-graph attention matching matrix; S404: Calculate the cross-graph matching residual based on the cross-graph attention matching matrix; S405: Calculate the node matching energy based on the cross-graph matching residual; S406: Calculate the structural closure energy based on the cross-graph attention matching matrix and the edge relationships of the target knowledge graph; S407: Calculate the constraint energy between the target knowledge graph and the learned knowledge graph based on the node matching energy and the structural closure energy; S408: Determine whether the constraint energy is less than or equal to the preset constraint energy; if yes, the learning state meets the teaching objective condition, and return to step S1; otherwise, the learning state does not meet the teaching objective condition, and proceed to step S5.

5. The humanoid robot-assisted intelligent teaching method according to claim 1, characterized in that, S5 specifically includes: S501: Construct a teaching demand query graph based on the target knowledge graph and the learning knowledge graph; S502: Construct a teaching resource data graph based on the teaching resource database; S503: Calculate the matching matrix between the teaching demand query graph and the teaching resource data graph; S504: Divide the matching matrix into a node matching matrix and an edge matching matrix; S505: Construct the edge feature consistency loss function and the node feature consistency loss function based on the node matching matrix and the edge matching matrix, respectively; S506: Construct the structural consistency constraint loss function; S507: Construct the total loss function based on the structural consistency constraint loss function, the edge feature consistency loss function, and the node feature consistency loss function; S508: Determine the optimal matching subgraph with the objective of minimizing the total loss function; S509: Perform structural integrity verification on the optimal matching subgraph; if the verification passes, proceed to step S510; otherwise, adjust the constraint parameters of the structural consistency constraint loss function and return to step S507. S510: Generate the set of teaching content based on the optimal matching subgraph.

6. The humanoid robot-assisted intelligent teaching method according to claim 1, characterized in that, S6 specifically includes: S601: Construct a teaching decision model based on the set of teaching content, wherein the teaching decision model specifically includes: a state space, an action space, a state transition function, an observation function, and a reward function; S602: Initialize teacher beliefs and strategy functions; S603: Extract policy input features based on the teacher's beliefs; S604: Based on the policy input features, select a macro action in the action space through the policy function; S605: Execute the macro action, determine the observation results and reward; S606: Update the teacher's belief based on the state transition function, the observation function, and the observation results; S607: Place the macro action, the observation result, the reward, the teacher belief, and the updated teacher belief into the experience pool; S608: Sample the experience pool and update the policy function based on the sampling results; S609: Determine whether the maximum number of iterations has been reached; if so, generate the teaching strategy based on the updated strategy function; otherwise, return to step S603.

7. The humanoid robot-assisted intelligent teaching method according to claim 6, characterized in that, The action space includes: teaching actions, questioning actions, and testing actions.

8. The humanoid robot-assisted intelligent teaching method according to claim 6, characterized in that, Specifically, S608 includes: S6081: Sample the experience pool to obtain an experience sample quadruple; S6082: Construct a mirror operator based on the empirical sample quadruple; S6083: Update the policy function according to the mirror operator.

9. The humanoid robot-assisted intelligent teaching method according to claim 1, characterized in that, Specifically, S7 includes: S701: Map the teaching strategy to an execution planning task to obtain the terminal value; S702: Construct a finite-view MPC cost function based on the terminal value; S703: Construct a robot motion prediction model based on a dynamics model; S704: Generate initial control distribution parameters based on the current robot state and the teaching strategy; S705: Construct an optimization objective function based on the initial control distribution parameters and the finite field-of-view MPC cost function; S706: With minimizing the optimization objective function as the objective, an optimization algorithm is used to obtain the updated initial control distribution parameters; S707: Input the updated initial control distribution parameters into the robot motion prediction model to obtain a candidate path set; S708: Perform optimal screening on the candidate path set to generate the robot teaching execution path.

10. A humanoid robot-assisted intelligent teaching system, characterized in that, include: Processor and memory; The memory stores programs or instructions that can run on the processor, which, when executed by the processor, implement the steps of the humanoid robot-assisted intelligent teaching method as described in any one of claims 1 to 9.