An education intelligent agent construction method and device based on intention recognition and a knowledge graph
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- BEIJING UNIV OF POSTS & TELECOMM
- Filing Date
- 2026-01-16
- Publication Date
- 2026-06-12
AI Technical Summary
Existing large language models in the education vertical domain suffer from difficulties in constructing structured knowledge, unstable path retrieval, incomplete inference chain generation, and easy degradation of strategy optimization, resulting in a lack of structural integrity and generation instability of the inference chains generated in educational tasks.
By constructing an educational intelligent agent based on intent recognition and knowledge graphs, including generating samples and mapping them to a fused knowledge graph, enhancing intent recognition reasoning, and combining knowledge distillation and inter-group relative policy optimization algorithms, a high-quality teaching logic chain is formed, improving the coverage of the knowledge graph and the structural stability of the reasoning chain.
It significantly improves the coverage and structured relationships of knowledge points, prerequisites, and reasoning steps, reduces the generation illusion rate, improves the completeness of multi-hop reasoning links and the consistency of logical steps, and ensures that the generated content conforms to the teaching dependency structure.
Smart Images

Figure CN122197933A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of large language model applications and education technology, and more specifically, to a method and apparatus for constructing an educational intelligent agent based on intent recognition and knowledge graph. Background Technology
[0002] With the rapid development of large language models, they have demonstrated strong capabilities in general-domain question answering and text generation tasks. However, when applying general-purpose models directly to the education vertical, three key technical bottlenecks still exist.
[0003] First, data in the education field is naturally dominated by narrative text, lacking structured knowledge representations that can be directly retrieved, reasoned about, and computationally processed by models. While textbooks, examples, and teaching cases contain rich knowledge points, their semantic relationships, prerequisite dependencies, and reasoning steps are difficult to automatically parse, resulting in sparse knowledge graph nodes and missing relationships. In particular, cross-concept and cross-step teaching reasoning chains are difficult to extract from natural language corpora, leaving models without computable structured evidence to support multi-hop reasoning.
[0004] Secondly, general-purpose models lack sufficient mathematical logic reasoning capabilities in educational scenarios, which traditional retrieval augmentation methods struggle to reliably mitigate. When faced with subject-specific questions, models often fail to fully present the reasoning chain, easily resulting in missing steps, skipped steps, or broken logical chains. Even with a retrieval augmentation framework incorporating vector retrieval, models may still generate incomplete reasoning paths when relevant document hits are insufficient or semantic recall is unstable, leading to structurally incorrect answers such as missing steps or path jumps. Such answers appear coherent on the surface, but their reasoning process contradicts reality, constituting a typical knowledge illusion. Because rule-dependent retrieval augmentation cannot cover pre-requisite dependencies, knowledge hierarchies, and other pedagogical relationships, its retrieval stability and path consistency still face technical bottlenecks.
[0005] Finally, existing post-training methods suffer from narrowed capability boundaries and illusion rebound in educational tasks. Reinforcement learning or reward-driven optimization methods face two prominent challenges in educational scenarios: (1) Binary scoring mechanisms often fail to accurately distinguish between “correct thinking process but wrong result” and “direct guessing”, resulting in high noise in reward signals and unstable strategy optimization direction, which reinforces illusory answers; (2) During training, the exploration space of the model gradually shrinks, which can easily lead to policy collapse, resulting in model degradation such as shortened inference chains and error rate rebound. These problems make it difficult for existing methods to maintain the structural integrity and generation stability of long-chain inference tasks.
[0006] In summary, existing technologies generally suffer from problems such as difficulty in constructing structured knowledge, unstable path retrieval, incomplete inference chain generation, and easy degradation of strategy optimization in educational information processing tasks. There is an urgent need for a systematic technical solution that can simultaneously improve the quality of knowledge representation, retrieval accuracy, and inference chain stability. Summary of the Invention
[0007] To address the shortcomings of existing technologies, this invention provides a method and apparatus for constructing educational intelligent agents based on intent recognition and knowledge graphs.
[0008] According to one aspect of the present invention, a method for constructing an educational intelligent agent based on intent recognition and knowledge graph is provided, comprising: Based on a pre-built integrated data generation system oriented towards the education vertical, generated samples containing teaching logic chains are generated from limited seed materials, and the generated samples are mapped to an integrated knowledge graph to form an integrated knowledge graph. Based on the fusion of knowledge graphs, reasoning enhancement is performed on the pre-built educational agent to identify intent, resulting in an educational agent with domain knowledge. This educational agent is used to generate reasoning content constrained by structured paths based on the input target semantics. By combining knowledge distillation and inter-group relative policy optimization algorithms, the teaching reasoning chain of an educational agent with domain knowledge is aligned and the policy is updated to obtain an educational agent with enhanced reasoning ability. This educational agent is used to generate corresponding reasoning results based on user input.
[0009] According to another aspect of the present invention, an educational intelligent agent construction apparatus based on intent recognition and knowledge graph is provided, comprising: The fusion module is used to generate samples containing teaching logic chains from limited seed materials based on a pre-built fusion data generation system oriented towards the education vertical, and to map the generated samples to the fusion knowledge graph to form the fusion knowledge graph. The reasoning enhancement module is used to enhance the reasoning of a pre-built educational agent based on the fused knowledge graph, thereby obtaining an educational agent with domain knowledge. The educational agent is used to generate reasoning content constrained by structured paths based on the input target semantics. The update module combines knowledge distillation and inter-group relative policy optimization algorithms to align the teaching reasoning chain and update the policy of the educational agent with domain knowledge, resulting in an educational agent with enhanced reasoning ability. The educational agent is used to generate corresponding reasoning results based on user input.
[0010] According to another aspect of the present invention, a computer-readable storage medium is provided, the storage medium storing a computer program for performing the methods described in any of the above aspects of the present invention.
[0011] According to another aspect of the present invention, an electronic device is provided, the electronic device comprising: a processor; a memory for storing executable instructions of the processor; the processor being configured to read the executable instructions from the memory and execute the instructions to implement the method described in any of the preceding aspects of the present invention.
[0012] Therefore, this invention automatically generates high-quality synthetic corpora containing teaching thought chains through a collaborative mechanism between generating and evaluating agents. Combined with semantic extraction technology, it achieves structured fusion of real and synthetic data, significantly improving the knowledge graph coverage and structured relationship completion rate of data such as knowledge points, prerequisites, and reasoning steps. Based on a combination of intent recognition and knowledge graph path retrieval, the generated content is constrained by a clear teaching dependency structure, preventing large models from erroneously expanding based on surface similarity, thereby reducing the generation of unfounded content and lowering the graph retrieval error rate and model generation illusion rate. Through knowledge distillation and inter-group relative strategy optimization, this invention significantly improves the structural stability of multi-hop reasoning in the model, ensuring that the generated reasoning chain maintains more complete and coherent logical steps. In actual teaching reasoning tasks, it improves the integrity of multi-hop reasoning links and reduces the reasoning step deviation rate. Attached Figure Description
[0013] Exemplary embodiments of the present invention can be more fully understood by referring to the following figures: Figure 1 This is a flowchart illustrating an exemplary embodiment of the present invention for constructing an educational intelligent agent based on intent recognition and knowledge graph. Figure 2 This is a schematic diagram of a data synthesis and graph fusion system based on dual-agent collaboration provided by an exemplary embodiment of the present invention; Figure 3 This is a schematic diagram illustrating the principle of an intent recognition-driven reasoning enhancement mechanism provided in an exemplary embodiment of the present invention; Figure 4 This is a flowchart of the training process for a teaching reasoning optimization model based on inter-group relative strategy optimization, provided by an exemplary embodiment of the present invention. Figure 5 This is a schematic diagram of the structure of an educational intelligent agent construction device based on intent recognition and knowledge graph provided in an exemplary embodiment of the present invention; Figure 6 This is the structure of an electronic device provided in an exemplary embodiment of the present invention. Detailed Implementation
[0014] Hereinafter, exemplary embodiments according to the present invention will be described in detail with reference to the accompanying drawings. Obviously, the described embodiments are merely some embodiments of the present invention, and not all embodiments of the present invention. It should be understood that the present invention is not limited to the exemplary embodiments described herein.
[0015] It should be noted that, unless otherwise specifically stated, the relative arrangement, numerical expressions, and values of the components and steps described in these embodiments do not limit the scope of the invention.
[0016] Those skilled in the art will understand that the terms "first," "second," etc., in the embodiments of the present invention are only used to distinguish different steps, devices, or modules, and do not represent any specific technical meaning, nor do they indicate a necessary logical order between them.
[0017] It should also be understood that in the embodiments of the present invention, "multiple" can refer to two or more, and "at least one" can refer to one, two or more.
[0018] It should also be understood that any component, data or structure mentioned in the embodiments of the present invention can generally be understood as one or more unless explicitly defined or given contrary instructions in the context.
[0019] Furthermore, the term "and / or" in this invention is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, or B existing alone. Additionally, the character " / " in this invention generally indicates that the preceding and following related objects have an "or" relationship.
[0020] It should also be understood that the description of the various embodiments in this invention emphasizes the differences between the various embodiments, and the similarities or similarities can be referred to each other. For the sake of brevity, they will not be described in detail.
[0021] At the same time, it should be understood that, for ease of description, the dimensions of the various parts shown in the accompanying drawings are not drawn according to actual scale.
[0022] The following description of at least one exemplary embodiment is merely illustrative and is in no way intended to limit the invention or its application or use.
[0023] Techniques, methods, and equipment known to those skilled in the art may not be discussed in detail, but where appropriate, they should be considered part of the specification.
[0024] It should be noted that similar labels and letters in the following figures indicate similar items; therefore, once an item is defined in one figure, it does not need to be discussed further in subsequent figures.
[0025] The embodiments of this invention can be applied to electronic devices such as terminal devices, computer systems, and servers, and can operate together with a wide range of other general-purpose or special-purpose computing system environments or configurations. Well-known examples of terminal devices, computing systems, environments, and / or configurations suitable for use with electronic devices such as terminal devices, computer systems, and servers include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, network PCs, minicomputer systems, mainframe computer systems, and distributed cloud computing environments including any of the above systems, etc.
[0026] Electronic devices such as terminal devices, computer systems, and servers can be described in the general context of computer system executable instructions (such as program modules) executed by a computer system. Typically, program modules can include routines, programs, object programs, components, logic, data structures, etc., which perform specific tasks or implement specific abstract data types. Computer systems / servers can be implemented in distributed cloud computing environments, where tasks are executed by remote processing devices linked through communication networks. In distributed cloud computing environments, program modules can reside on local or remote computing system storage media, including storage devices.
[0027] Exemplary methods Figure 1 This is a flowchart illustrating an exemplary embodiment of the present invention regarding a method for constructing an educational intelligent agent based on intent recognition and knowledge graphs. This embodiment can be applied to electronic devices, such as... Figure 1 As shown, the method 100 for constructing an educational intelligent agent based on intent recognition and knowledge graph includes the following steps: Step 101: Based on the pre-constructed integrated data generation system for the education vertical field, generate generated samples containing teaching logic chains from limited seed materials, and map the generated samples to the integrated knowledge graph to form the integrated knowledge graph; Step 102: Based on the fusion knowledge graph, the pre-built educational agent is subjected to reasoning enhancement for intent recognition to obtain an educational agent with domain knowledge. The educational agent is used to generate reasoning content constrained by structured paths based on the input target semantics. Step 103: Combining knowledge distillation and inter-group relative policy optimization algorithms, the teaching reasoning chain of the educational agent with domain knowledge is aligned and the policy is updated to obtain an educational agent with enhanced reasoning ability. The educational agent is used to generate corresponding reasoning results based on user input.
[0028] Specifically, this invention provides an educational knowledge enhancement and reasoning method based on agent intent-driven logic, mainly comprising three parts: the construction of a fusion data generation system, the construction of an intent-based reasoning enhancement mechanism, and the construction of a teaching reasoning optimization model. This method can run on servers, cloud platforms, or local computing devices, and outputs teaching content by processing, retrieving, and reasoning from input text. Specifically, it includes the following steps: S1. Constructing a converged data generation system oriented towards the education sector: See Figure 2 In this embodiment, the fusion data generation system addresses the shortage of high-quality reasoning data in the education field. It constructs a fusion data system usable for instructional reasoning through two technologies: dual-agent collaborative generation and semantic extraction alignment. This system can generate samples containing teaching logic chains from limited seed materials and further structure and map these samples to a knowledge graph to form foundational teaching data that can be used for subsequent reasoning enhancement.
[0029] S2. Constructing an intent-based reasoning enhancement mechanism: See [link / reference] Figure 3 Based on the integration of knowledge graphs, this mechanism provides logical constraints for model generation through intent recognition and structured path retrieval, ensuring that the reasoning process aligns with the instructional dependency structure. The mechanism consists of three parts: intent recognition, path retrieval, and controlled generation. By recognizing the learner's target semantics and locating its corresponding path in the knowledge graph, it enhances the logic of the generation process.
[0030] S3. Constructing an instructional reasoning optimization model based on the post-training paradigm: See [link / reference] Figure 4 By combining knowledge distillation and inter-group relative policy optimization algorithms, the teaching reasoning chain of the model is aligned and the policy is updated, so that the reasoning process of the model in educational tasks remains stable and consistent in terms of step structure and teaching logic.
[0031] Furthermore, the specific steps for constructing the integrated data generation system oriented towards the education vertical field, as described in S1, are as follows: S11. Data Synthesis through Dual-Agent Collaboration: Automated generation of teaching reasoning chain samples is achieved through a collaborative process between a generating agent and an evaluating agent. The generating agent performs retrieval, planning, and controlled generation based on the task objective, outputting initial samples containing the step chain; the evaluating agent provides semantic consistency and logical structure feedback on the samples; the generating agent performs multiple rounds of correction based on the feedback until the sample quality meets the preset requirements.
[0032] S12. Data Fusion Based on Agent Semantic Extraction: S12 is used to perform semantic extraction and alignment between the generated samples and real teaching data, enabling them to be mapped into the structural framework of the teaching knowledge graph. This step includes entity extraction, relation extraction, and cross-source alignment to transform the samples into structured content.
[0033] Furthermore, the specific process of data synthesis based on dual-agent cooperation in S11 is as follows: The system first receives seed materials. An initial generation scenario is constructed, consisting of teaching objectives, knowledge point descriptions, and prerequisite constraints; the generating agent generates the first... Wheel Sample: in Indicates the first The teaching samples generated in rounds; The generator function for generating intelligent agents; This is the result of the previous round of evaluation.
[0034] The agent is evaluated based on feedback generated by the Reflexion mechanism, assessing semantic consistency, step rationality, and adherence to preconditions. in, To evaluate the agent's feedback function, These represent the review information for logic, semantics, and preconditions, respectively.
[0035] The system determines the quality score: in, For quality evaluation functions, This is a preset quality threshold.
[0036] If the score is insufficient, the generating agent is driven to perform the next round of correction based on feedback; if the threshold is met, the sample is included in the teaching dataset. This iterative mechanism automatically obtains high-quality reasoning samples with logical chains.
[0037] Furthermore, the specific process of data fusion based on agent semantic extraction in S12 is as follows: The system generates data respectively With real data Semantic extraction is performed to obtain concept nodes, teaching activity nodes, skill nodes, and their corresponding relationships; then alignment is performed based on semantic similarity calculation. in The set of results after alignment; and These are nodes derived from real data and generated data, respectively. For semantic alignment functions; This refers to the semantic similarity between nodes; This is the similarity threshold.
[0038] After aligning nodes and relationships, the system maps the aligned structure to the knowledge graph and performs consistency processing to ensure that newly added nodes and relationships do not violate constraints such as prerequisite relationships and instructional dependencies. This processing satisfies: in and The graphs at time 10 and 21 respectively and State; Consistency function Used to eliminate conflicts and redundancies, ensuring the final map maintains the correctness of the teaching logic.
[0039] Furthermore, the specific steps for constructing the intent recognition-based reasoning enhancement mechanism in S2 are as follows: S21. Intent Recognition and Semantic Mapping: Mapping learner input into an intent vector representation that can be located in the knowledge graph. The system receives the user's question and its contextual information, and processes it through the intent encoding function. The intent vector is calculated as follows: in, For learning objectives, As contextual information, the generated intent vector contains the explicit goal of the question and the implicit cognitive level, which is used to determine the degree of correlation between user needs and each node of the knowledge graph.
[0040] S22. Knowledge Graph-Based Logical Path Retrieval: Retrieves instructional dependency paths from the fused knowledge graph based on intent vectors to provide the necessary structured constraints for generation. The system relies on intent vectors... With graph nodes Teaching Dependence Distance Calculate node correlation and apply it based on a threshold. Select a set of nodes that satisfy the constraints so that the search results meet the following requirements: in, To integrate knowledge graphs, the retrieved paths can reflect the pre-existing conceptual relationships, reasoning order, and instructional dependency structure involved in the problem.
[0041] S23. Enhanced Generation: Controlled reasoning generation is performed based on the path retrieval results. The system uses relevant nodes in the logical path and their associated teaching content as contextual evidence input to the model. By constraining the path order and associated knowledge, the generation process can follow the graph structure for reasoning, ensuring that the answers are consistent with the teaching structure in terms of content organization and logical relationships. This guarantees the controllability of the reasoning process and reduces the risk of generating unfounded content.
[0042] Furthermore, the specific process for calculating the correlation between intent and knowledge graph nodes in S21 is as follows: When the intent vector is calculated Then, the system calculates the correlation between the intent and the knowledge graph nodes: in Intent vectors and graph nodes The correlation between them; It is a non-linear activation function; This is a correlation parameter matrix; Intent vector and node embedding The splicing representation.
[0043] Furthermore, the specific process of logical path retrieval based on knowledge graphs in S22 is as follows: The system performs semantic relevance filtering on nodes based on the relevance described in S21, and then uses a semantic threshold. Get the node: In knowledge graph Define a set of nodes }, and construct a graph structure with teaching dependencies as weights. For any node The system calculates the teaching path distance between the instruction vector and the instruction vector: in, For nodes With intent vector The distance of instructional dependence between them; This represents the semantic embedding of nodes; This is a distance metric function defined in a high-dimensional space.
[0044] The system is based on a distance threshold. Construct the logical node set of the design: Furthermore, the specific process of enhanced generation in S23 is as follows: The system first determines the node set Constructing a sequence of path evidence: in, This is the final sequence of contextual evidence used to constrain the generation; For nodes The corresponding authoritative teaching content or structured knowledge.
[0045] The input representation of the augmented generative model is as follows: in, To enhance input; A concatenated structure representing the user's original input and path evidence.
[0046] The generative model then outputs path-constrained reasoning content based on this: in, The final inference result; For parameters Generative model strategy function.
[0047] Furthermore, the specific steps for constructing the instructional reasoning optimization model based on the post-training paradigm, as described in S3, are as follows: S31. Mind Chain Transfer Based on Knowledge Distillation: High-quality teaching trajectories generated by the teacher model are used as soft labels to guide the student model in learning the distribution of reasoning steps, achieving mind chain transfer from the teacher model to the student model. The distillation objective function is: S32. Policy Update Based on Inter-group Relative Policy Optimization: In the absence of an explicit value network, multiple inference trajectories are generated by sampling the same instruction. The intra-group composite reward is calculated, and the policy parameters are updated based on the reward difference to increase the probability of generating high-quality inference chains. The optimization objective is: Furthermore, the specific process of thought chain transfer based on knowledge distillation in S31 is as follows: For the same teaching sample First, a teacher model is adopted. For input Through reasoning, we obtain a length of The trajectory of teaching thinking ),in Indicates the first The intermediate states corresponding to each reasoning step; at each time step The conditional probability distribution of the teacher model output in the vocabulary or action space. ,in The temperature coefficient is used to smooth the probability distribution; Student model In the same input and state Below, output the corresponding distribution. For each time step, calculate the Kullback-Leibler divergence between the teacher distribution and the student distribution: And according to weight The total distillation loss for a single sample is obtained by weighting the importance of different time steps; among them, This can be used to emphasize the impact of key reasoning steps on the overall teaching logic. This is achieved by analyzing batch samples in a dataset. minimize on This enables the student model to gradually align with the teacher model's strategy distribution throughout the entire reasoning chain, allowing the student model to reproduce the teacher model's thought process in educational scenarios.
[0048] Specifically, at each time step t, the system calculates the difference between the teacher model policy distribution and the student model policy distribution. This difference is measured using the Kullback-Leibler divergence (KL), and the specific form of the difference is as follows: Subsequently, according to weight The KL divergence terms corresponding to each time step are weighted, where The specific calculation method can be determined based on the importance, location, or prior settings of the reasoning steps. This invention does not limit the specific calculation method. Through the above method, the difference measurement result and the weighting coefficient jointly determine the contribution of each time step to the total distillation loss, thereby making the distillation process more focused on strategy alignment for key reasoning steps. The specific form of the contribution is as follows: Next, the weighted KL divergence of all time steps in the inference chain is summed to obtain the distillation loss corresponding to a single sample. This is used to measure how closely the student model approximates the teacher model's policy distribution throughout the inference process, and its specific form is: Furthermore, the specific process of policy update based on inter-group relative policy optimization in S32 is as follows: Modeling the teaching reasoning process as a Markov decision process ,in For state space, For the action space, Let be the state transition probability. For the reward function; in this invention, the state This may include the current dialogue history, partially generated thought chain representations, and intent vectors. Information, actions, etc. Indicates the first The next generation unit or inference operation selected in the next step.
[0049] For the same teaching instruction From the current strategy Mid-sampling generation Reasoning trajectory: For each trajectory A composite reward is constructed based on the correctness of the answer, the logical consistency of the steps, and the degree of adherence to the prerequisite constraints. , can be represented as: in, Used to measure whether the final answer is correct. Used to measure whether the steps in the thought process are consistent with the reference solution or the teacher's strategy. Used to measure whether reasoning conforms to the pre-dependency constraints in the knowledge graph. These are the corresponding weighting coefficients.
[0050] For the same instruction The average return within a group is calculated using the trajectory: And define the advantage value for each trajectory: Advantage value This reflects the relative merits of the trajectory compared to other trajectories within the group. When When this indicates that the trajectory is superior to the group average in terms of answer correctness and teaching logic, the probability of generating its corresponding action sequence should be increased; when If the situation is as described, then the weight of the strategy corresponding to that trajectory is reduced. Based on the advantage value, the strategy parameters are adjusted. Execute policy gradient update: in, In practical implementation, normalization operations can be added when calculating the advantage value, or gradients can be pruned to improve training stability and prevent gradient explosion. Through multiple rounds of iterative updates, the policy network gradually increases the probability of generating high-reward trajectories and suppresses low-reward trajectories, enabling the model to obtain a stable teaching inference policy without relying on an explicit value network.
[0051] Compared with existing technologies, this invention has the following beneficial effects: 1) Alleviating the problem of data scarcity in the education vertical domain. This invention automatically generates high-quality synthetic corpora containing teaching thought chains through a collaborative mechanism between generating agents and evaluating agents. Combined with semantic extraction technology, it achieves structured fusion of real and synthetic data, significantly improving the knowledge graph coverage and structured relationship completion rate of data such as knowledge points, prerequisites, and reasoning steps. 2) Reducing the risk of illusions in the reasoning process. Based on a combination of intent recognition and knowledge graph path retrieval, this invention ensures that the generated content is constrained by a clear teaching dependency structure, avoiding erroneous expansion of large models based on surface similarity. This reduces the generation of unfounded content and lowers the graph retrieval error hit rate and model generation illusion rate. 3) Improving the stability and consistency of the reasoning chain. Through knowledge distillation and inter-group relative strategy optimization, this invention can significantly improve the structural stability of multi-hop reasoning in the model, ensuring that the generated reasoning chain maintains more complete and coherent logical steps. In actual teaching reasoning tasks, it improves the completeness of multi-hop reasoning links and reduces the reasoning step deviation rate.
[0052] Exemplary device Figure 5 This is a schematic diagram of the structure of an educational intelligent agent construction device based on intent recognition and knowledge graph provided in an exemplary embodiment of the present invention. Figure 5 As shown, the device 500 includes: The fusion module 510 is used to generate generated samples containing teaching logic chains from limited seed materials based on a pre-built fusion data generation system oriented towards the education vertical, and to map the generated samples to the fusion knowledge graph to form the fusion knowledge graph. The reasoning enhancement module 520 is used to perform intention recognition reasoning enhancement on a pre-built educational agent based on a fused knowledge graph, resulting in an educational agent with domain knowledge. The educational agent is used to generate reasoning content constrained by structured paths based on the input target semantics. The update module 530 is used to combine knowledge distillation and inter-group relative policy optimization algorithms to align the teaching reasoning chain and update the policy of the educational agent with domain knowledge, resulting in an educational agent with enhanced reasoning ability. The educational agent is used to generate corresponding reasoning results based on user input.
[0053] Exemplary electronic devices Figure 6 This is the structure of an electronic device provided in an exemplary embodiment of the present invention. For example... Figure 6 As shown, the electronic device 60 includes one or more processors 61 and a memory 62.
[0054] The processor 61 may be a central processing unit (CPU) or other form of processing unit with data processing and / or instruction execution capabilities, and may control other components in the electronic device to perform desired functions.
[0055] The memory 62 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and / or non-volatile memory. The volatile memory may include, for example, random access memory (RAM) and / or cache memory. The non-volatile memory may include, for example, read-only memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium, and the processor 61 may execute the program instructions to implement the methods of the software programs of the various embodiments of the present invention described above, and / or other desired functions. In one example, the electronic device may also include an input device 63 and an output device 64, these components being interconnected via a bus system and / or other forms of connection mechanisms (not shown).
[0056] In addition, the input device 63 may also include, for example, a keyboard, a mouse, etc.
[0057] The output device 64 can output various information to the outside. The output device 64 may include, for example, a display, a speaker, a printer, and a communication network and its connected remote output devices, etc.
[0058] Of course, for the sake of simplicity, Figure 6 Only some of the components of this electronic device relevant to the present invention are shown, omitting components such as buses, input / output interfaces, etc. In addition, the electronic device may include any other suitable components depending on the specific application.
[0059] Exemplary computer program products and computer-readable storage media In addition to the methods and apparatus described above, embodiments of the present invention may also be computer program products, which include computer program instructions that, when executed by a processor, cause the processor to perform the steps in the methods according to various embodiments of the present invention described in the "Exemplary Methods" section above.
[0060] The computer program product can be written in any combination of one or more programming languages to perform the operations of the embodiments of the present invention. The programming languages include object-oriented programming languages such as Java and C++, as well as conventional procedural programming languages such as C or similar languages. The program code can be executed entirely on the user's computing device, partially on the user's computing device, as a standalone software package, partially on the user's computing device and partially on a remote computing device, or entirely on a remote computing device or server.
[0061] Furthermore, embodiments of the present invention may also be computer-readable storage media storing computer program instructions thereon, which, when executed by a processor, cause the processor to perform the steps of the methods according to various embodiments of the present invention described in the "Exemplary Methods" section above.
[0062] The computer-readable storage medium may be any combination of one or more readable media. A readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, system, or device, or any combination thereof. More specific examples of readable storage media (a non-exhaustive list) include: an electrical connection having one or more wires, a portable disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof.
[0063] The basic principles of the present invention have been described above with reference to specific embodiments. However, it should be noted that the advantages, benefits, and effects mentioned in the present invention are merely examples and not limitations, and should not be considered as essential features of each embodiment of the present invention. Furthermore, the specific details disclosed above are for illustrative and facilitative purposes only, and are not limitations. These details do not limit the present invention to the necessity of employing the aforementioned specific details.
[0064] The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on its differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. For system embodiments, since they largely correspond to method embodiments, the description is relatively simple; relevant parts can be referred to the descriptions in the method embodiments.
[0065] The block diagrams of devices, systems, devices, and systems involved in this invention are merely illustrative examples and are not intended to require or imply that they must be connected, arranged, or configured in the manner shown in the block diagrams. As those skilled in the art will recognize, these devices, systems, devices, and systems can be connected, arranged, and configured in any manner. Words such as “comprising,” “including,” “having,” etc., are open-ended terms meaning “including but not limited to,” and are used interchangeably with them. The terms “or” and “and” as used herein refer to the terms “and / or,” and are used interchangeably with them unless the context clearly indicates otherwise. The term “such as” as used herein refers to the phrase “such as but not limited to,” and is used interchangeably with it.
[0066] The methods and systems of the present invention may be implemented in many ways. For example, they may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order of steps for the methods is for illustrative purposes only, and the steps of the methods of the present invention are not limited to the order specifically described above unless otherwise specifically stated. Furthermore, in some embodiments, the present invention may also be implemented as a program recorded on a recording medium, the program comprising machine-readable instructions for implementing the methods according to the present invention. Thus, the present invention also covers recording media storing programs for performing the methods according to the present invention.
[0067] It should also be noted that in the systems, apparatus, and methods of the present invention, the components or steps can be disassembled and / or recombined. These disassemblies and / or recombinations should be considered equivalents of the present invention. The above description of the disclosed aspects is provided to enable any person skilled in the art to make or use the invention. Various modifications to these aspects will be readily apparent to those skilled in the art, and the general principles defined herein can be applied to other aspects without departing from the scope of the invention. Therefore, the invention is not intended to be limited to the aspects shown herein, but rather to be carried out within the widest scope consistent with the principles and novel features disclosed herein.
[0068] The above description has been given for purposes of illustration and description. Furthermore, this description is not intended to limit the embodiments of the invention to the forms disclosed herein. Although numerous exemplary aspects and embodiments have been discussed above, those skilled in the art will recognize certain variations, modifications, alterations, additions, and sub-combinations thereof.
Claims
1. A method for constructing an educational intelligent agent based on intent recognition and knowledge graph, characterized in that, include: Based on a pre-built integrated data generation system oriented towards the education vertical, generated samples containing teaching logic chains are generated from limited seed materials, and the generated samples are mapped to an integrated knowledge graph to form an integrated knowledge graph. Based on the fused knowledge graph, the pre-constructed educational agent is subjected to reasoning enhancement for intent recognition, resulting in an educational agent with domain knowledge. The educational agent is used to generate reasoning content constrained by structured paths based on the input target semantics. By combining knowledge distillation and inter-group relative policy optimization algorithms, the teaching reasoning chain of an educational agent with domain knowledge is aligned and the policy is updated to obtain an educational agent with enhanced reasoning ability. The educational agent is used to generate corresponding reasoning results based on user input.
2. The method according to claim 1, characterized in that, The fusion data generation system includes a generating agent and an evaluating agent, and Based on a pre-built, education-focused integrated data generation system, generated samples containing teaching logic chains are produced from limited seed materials, including: The fusion data generation system is used to receive seed materials and construct an initial generation scene; The generating agent generates the k-th round sample based on the initial generation scenario; The evaluation agent is used to generate feedback on the semantic consistency, step rationality, and prior dependency compliance of the k-th round sample based on the Reflexion mechanism. Calculate the quality score of the sample based on the feedback; If the quality score is greater than or equal to the preset quality threshold, or if the score is insufficient, the generating agent is driven to perform corrections based on the feedback; if the threshold is met, the sample is used as the final generated sample.
3. The method according to claim 2, characterized in that, The feedback The expression is: In the formula, To evaluate the agent's feedback function, These represent the review information for logical, semantic, and precondition constraints, respectively. The expression for the quality score is: In the formula, For quality evaluation functions, This is a quality evaluation function.
4. The method according to claim 1, characterized in that, Mapping the generated samples to a fused knowledge graph to form a fused knowledge graph includes: Semantic extraction is performed on the generated samples and real samples to obtain concept nodes, teaching activity nodes, skill nodes and their corresponding relationships. Based on semantic similarity, the execution alignment between each node and its corresponding relationship is calculated to obtain the alignment structure; The alignment structure is mapped to the knowledge graph and subjected to consistency processing to form the fused knowledge graph.
5. The method according to claim 4, characterized in that, The semantic similarity is expressed as a function of its structure A: In the formula, and These are nodes derived from real data and generated data, respectively. For semantic alignment functions; Semantic similarity between nodes; The similarity threshold; The expression for the consistency processing is: In the formula, and The graphs at time 10 and 21 respectively and The state; It is a consistency function.
6. The method according to claim 1, characterized in that, Based on the fused knowledge graph, the pre-built educational agent is subjected to reasoning enhancement for intent recognition, resulting in an educational agent with domain knowledge, including: The learner input is mapped to an intent vector representation z that can be located in the fused knowledge graph; Based on the intent vector representation z, the teaching dependency path is retrieved in the fused knowledge graph G to obtain the relevant node set R(z, G); Based on the relevant node set R(z, G), a contextual evidence sequence C is constructed, and this sequence is used to constrain the generation process, outputting the path-constrained reasoning content y, thus obtaining an educational intelligent agent with domain knowledge.
7. The method according to claim 6, characterized in that, The expression for the intention vector z is: In the formula, For intent encoding functions; For learning objectives, For contextual information; The expression for the set of related nodes R(z, G) is: In the formula, To integrate knowledge graphs; For graph nodes; For nodes With intent vector The distance between teaching dependencies The preset threshold; This represents the semantic embedding of nodes; This is a distance metric function defined in a high-dimensional space.
8. The method according to claim 6, characterized in that, Based on the relevant node set R(z, G), a contextual evidence sequence C is constructed, and this sequence is used to constrain the generation process, outputting path-constrained reasoning content y, including: Based on the node set Constructing a sequence of path evidence , This is the final sequence of contextual evidence used to constrain the generation; For nodes The corresponding authoritative teaching content or structured knowledge; Based on the input representation of the path evidence sequence enhancement generation model, the enhanced input is obtained. , To enhance input; A concatenated structure representing the user's original input and path evidence; Based on the enhanced input, the output is path-constrained reasoning content. , This is the final generated reasoning content; For parameters Generative model strategy function.
9. The method according to claim 1, characterized in that, By combining knowledge distillation and inter-group relative policy optimization algorithms, the teaching reasoning chain of an educational agent with domain knowledge is aligned and the policy is updated, resulting in an educational agent with enhanced reasoning ability, including: Using high-quality teaching trajectories generated by the teacher model as soft labels, the student model learns the distribution of reasoning steps, thereby achieving the gradual alignment of the student model with the distribution of the teacher model's strategies throughout the entire reasoning chain; After the reasoning chains of the educational agent are aligned, multiple reasoning trajectories are generated by sampling the same teaching instruction in the absence of an explicit value network. The composite reward within the group is calculated and the policy parameters are updated based on the reward difference, resulting in the educational agent with enhanced reasoning ability.
10. The method according to claim 9, characterized in that, High-quality teaching trajectories generated by the teacher model are used as soft labels to guide the student model in learning the distribution of reasoning steps. This achieves gradual alignment of the student model with the teacher model's strategy distribution throughout the entire reasoning chain, including: For the same teaching sample Using a teacher model For input Through reasoning, we obtain a length of The trajectory of teaching thinking ),in Indicates the first The intermediate states corresponding to each reasoning step; For each time step Calculate the Kullback-Leibler divergence between the teacher distribution and the student distribution. KL ; Based on the divergence KL Measure each time step t Results of the difference measurement between the teacher model strategy distribution and the student model strategy distribution ; The weighting coefficients are determined based on the importance, position, or prior knowledge of the reasoning steps. ; Based on the difference measurement results and the weighting coefficients, the contribution of each time step to the total distillation loss is determined: ; Based on the aforementioned contribution, the divergence of all time steps on the inference chain KL Perform a weighted summation to obtain the distillation loss for a single sample. This achieves alignment of the student model with the teacher model's policy distribution throughout the inference process, where distillation loss... The expression is: In the formula, For parameters Generative model strategy function; For conditional probability distribution, This is the temperature coefficient.
11. The method according to claim 9, characterized in that, In the absence of an explicit value network, multiple inference trajectories are generated by sampling the same instruction, calculating the intra-group composite reward, and updating policy parameters based on reward differences, including: For the same instruction, n inference trajectories τ are generated from the current strategy sample. i ; For each reasoning trajectory τ i The compound reward is calculated based on the correctness of the answer, the logical consistency of the steps, and the degree of adherence to prerequisite constraints. R i ; According to the compound return R i Calculate the average return within the group. and the advantage value of each trajectory A i ; Based on the aforementioned advantages A i The policy parameters are updated via policy gradients, wherein the policy parameters are... The gradient update formula is: In the formula, For the first i One action; For the first i There are several states.
12. An educational intelligent agent construction device based on intent recognition and knowledge graph, characterized in that, include: The fusion module is used to generate generated samples containing teaching logic chains from limited seed materials based on a pre-built fusion data generation system oriented towards the education vertical, and to map the generated samples to the fusion knowledge graph to form the fusion knowledge graph. The reasoning enhancement module is used to perform reasoning enhancement on the pre-built educational agent based on the fused knowledge graph to obtain an educational agent with domain knowledge. The educational agent is used to generate reasoning content constrained by structured paths based on the input target semantics. The update module is used to combine knowledge distillation and inter-group relative policy optimization algorithms to align the teaching reasoning chain and update the policy of the educational agent with domain knowledge, resulting in an educational agent with enhanced reasoning ability. The educational agent is used to generate corresponding reasoning results based on user input.
13. A computer-readable storage medium, characterized in that, The storage medium stores a computer program for performing the method described in any one of claims 1-11.
14. An electronic device, characterized in that, The electronic device includes: processor; Memory used to store the processor's executable instructions; The processor is configured to read the executable instructions from the memory and execute the instructions to implement the method described in any one of claims 1-11.