An intelligent mathematics tutoring method and system based on step-by-step guidance planning and state alignment memory

By constructing an intelligent math tutoring model and utilizing step-by-step guided planning and state-aligned memory, the problems of discontinuous teaching and insufficient adaptability of large language models in math teaching are solved, and stable multi-round teaching dialogues and adaptive feedback are achieved.

CN122199232APending Publication Date: 2026-06-12BEIJING NORMAL UNIVERSITY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
BEIJING NORMAL UNIVERSITY
Filing Date
2026-04-27
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing technologies for teaching mathematics using large language models suffer from problems such as fragmented guidance processes, strategy drift, lack of explicit step-by-step instructional planning, and difficulty in adapting to changes in learners' cognitive states, resulting in inconsistent teaching and insufficient adaptability.

Method used

An intelligent math tutoring model is constructed, which includes a step-by-step guidance planning component and an assessment-driven tutoring component. Through step-by-step teaching planning and state-aligned memory, a stable teaching framework is generated, and the model is made capable of adaptive feedback through training data synthesis and fine-tuning.

🎯Benefits of technology

It achieves continuity, controllability, and standardization in multi-round teaching dialogues, ensuring that the guidance process is gradual, adapts to the learner's real-time status changes, and avoids answer leakage or path shortening.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122199232A_ABST
    Figure CN122199232A_ABST
Patent Text Reader

Abstract

The application discloses an intelligent mathematics tutoring method and system based on step-by-step guidance planning and state alignment memory. The application solves the problems of fragmentation of the guidance process and strategy drift in the prior art by constructing an intelligent mathematics tutoring model including a step-by-step guidance planning component and an evaluation-driven tutoring component, explicitly decomposing complex mathematical problems into ordered intermediate teaching goals, and forming a stable teaching framework. Through the evaluation-driven teaching memory module, the cognitive state and teaching progress of the learner are continuously tracked, adaptive feedback based on state alignment is realized, the model can dynamically adjust the guidance strategy according to the real-time performance of the learner, and the premature disclosure of the answer or the shortening of the teaching path is avoided. At the same time, through the synthesis of structured training data and the fine-tuning of instructions, the model internalizes the step-by-step planning and state evaluation capabilities in a unified parameter space, ensuring the coherence, controllability and teaching normativity of the multi-round long-range teaching dialogue.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of natural language processing and intelligent education technology, specifically to an intelligent mathematics tutoring method and system based on step-by-step guided planning and state-aligned memory. Background Technology

[0002] Large language models have achieved significant success in natural language understanding and complex logical reasoning, particularly demonstrating strong potential in mathematical problem-solving. As their core capabilities mature, research is increasingly focused on transforming these models from mere problem-solving tools into personalized instructional mentors. Effective mentors should not simply provide answers but should guide learners to independently construct solutions, adhering to pedagogical principles. The core challenge in developing current instructional dialogue systems lies in ensuring that large language models maintain mathematical rigor while possessing pedagogical sensitivity, guaranteeing that the guidance process is both logically sound and effectively adaptable to learners' evolving cognitive states. Past algorithms have primarily included guidance algorithms based on cue word engineering, instructional imitation algorithms based on supervised fine-tuning, and instructional alignment algorithms based on reinforcement learning.

[0003] In existing technologies, instructional guidance algorithms based on cue word engineering typically pre-define teacher roles, teaching rules, and dialogue constraints within the system cue words or contextual cues of a large language model. This allows the model to output more appropriate guiding language during interactions without updating model parameters. These methods rely on the design and assembly of cue word templates and continuously inject teaching objectives and behavioral guidelines into the model across multiple rounds of dialogue to achieve multi-round instructional guidance. However, the effectiveness of these methods is highly dependent on the quality of the cue words and the dialogue context, making them prone to issues such as strategy drift and unstable guidance granularity. When the number of dialogue rounds increases or the cue words change slightly, the model may fail to maintain a consistent teaching logic, exhibiting skipping steps, repetition, or even prematurely revealing key problem-solving information. Furthermore, these methods often lack explicit step-by-step instructional planning representations and verifiable intermediate goal constraints, as well as structured recording and tracking mechanisms for learner states, making it difficult to maintain stability and control over multi-round instructional processes.

[0004] Supervised fine-tuning (SFT)-based instructional imitation algorithms collect or synthesize instructional dialogue data and use this data to fine-tune large language models, enabling the model to generate instructional responses similar to the training samples given a dialogue history and learner input. The key to this type of method lies in the construction and annotation of the instructional dialogue corpus: it can originate from real dialogues with human participation or be automatically generated and synthesized by multi-agent systems or large models, and then the model is trained through behavioral imitation. However, this type of method often focuses on dialogue imitation and lacks strong constraints on explicit, verifiable, step-by-step instructional planning. This leads to problems such as fragmented guidance processes, unstable connections between steps, or inconsistent problem-solving logic in complex problems. On the other hand, learners' cognitive states (such as confusion, errors, and questions) are often difficult to strictly control and verify in the data, resulting in insufficient ability of the model to identify and adaptively respond to learner states during inference, easily leading to generalized responses or error correction.

[0005] Instructional alignment algorithms based on reinforcement learning (RL) or preference optimization treat multi-turn instructional dialogues as a sequential decision-making process. By designing instruction-related reward signals, they optimize the model's generation strategy, making the model more aligned with the expected teaching objectives in multi-turn interactions. These methods typically require the construction of online or offline interactive evaluation mechanisms to score the model's long-term performance in multi-turn dialogues and back-optimize the model's strategy. In instructional dialogue scenarios, reinforcement learning-based instructional alignment methods usually rely on the overall interaction results or final state performance to construct reward signals. However, such reward signals often suffer from sparsity and mixed objectives in practical applications. When reward design is insufficient or constraints are inadequate, the model may tend to improve surface performance metrics by shortening the instructional path or providing key information in advance. This can lead to a deviation between the model's optimization objective and the gradual teaching objectives, weakening the standardization and interpretability of the instructional guidance process.

[0006] Furthermore, existing reinforcement learning alignment methods typically do not incorporate explicit step-by-step instructional planning representations and lack structured modeling and continuous tracking mechanisms for learners' cognitive states. This makes it difficult for the model to stably identify the current teaching stage and learner state changes in multi-round, long-term instructional dialogues, thus making it difficult to continuously output consistent, coherent, and targeted adaptive instructional feedback. Summary of the Invention

[0007] To address the technical problems mentioned above, this invention provides an intelligent math tutoring method based on step-by-step guided planning and state-aligned memory, comprising the following steps: S1: Obtain the math problem to be tutored; S2: Construct an intelligent math tutoring model, which includes a step-by-step guided planning component, an evaluation-driven tutoring component, and a training data synthesis and fine-tuning module; S3: Input the math problem into the intelligent math tutoring model to complete the intelligent math tutoring.

[0008] Preferably, the intelligent math tutoring model uses a step-by-step guided planning component to generate a step-by-step teaching plan for math problems, uses an assessment-driven tutoring component to generate teaching feedback based on the learner's real-time input and assessment-driven teaching memory in multiple rounds of interaction, and uses a training data synthesis and fine-tuning module to fine-tune the base model with instructions to complete intelligent math tutoring.

[0009] Preferably, step-by-step instructional plans that utilize step-by-step guided planning components to generate mathematical problems include: Based on a mathematical problem, obtain the step-by-step derivation process of the solution to the mathematical problem, which consists of multiple logical steps; By solving each step of the derivation process step by step, we obtain the guiding sub-questions and the corresponding standard reference answers; Based on the guiding questions and standard reference answers for each step, a step-by-step teaching plan consisting of multiple ordered intermediate learning objectives is obtained.

[0010] Preferably, before fine-tuning the base model using the training data synthesis and fine-tuning module, the method further includes: Based on the original set of mathematical problems, obtain a planning dataset consisting of multiple ordered guiding sub-problems and their corresponding standard reference answers; Based on the planned dataset, a dialogue dataset containing inference logic, recognition status, and teaching response is obtained through interaction simulation between learner agent and tutor agent. Based on the planning dataset and the dialogue dataset, obtain a hybrid training corpus for fine-tuning instructions.

[0011] Preferably, after obtaining a hybrid training corpus for instruction fine-tuning based on the planning dataset and the dialogue dataset, the method further includes: Based on the teaching planning data in the hybrid training corpus, the base model is planned, generated, and fine-tuned to obtain an intermediate model that can decompose mathematical problems into teachable steps. Based on dialogue guidance data in the hybrid training corpus, the intermediate model is fine-tuned through interactive guidance to obtain an intelligent math tutoring model with the ability to perform state alignment and adaptive guidance based on cognitive state.

[0012] Preferably, after generating a step-by-step instructional plan for mathematical problems using the step-by-step guided planning component, it also includes: Based on the step-by-step teaching plan, initialize the progress vector in the assessment-driven memory; During multiple rounds of interaction, the completion status of the current step is obtained based on the teaching objectives of the current step and the learner's input. Update the progress vector based on the completion status of the current step; The coaching session ends when the progress vector indicates that all steps have been completed.

[0013] This invention also provides an intelligent math tutoring system based on step-by-step guided planning and state-aligned memory. The system is used to implement the above method and includes: S1: Obtain the math problem to be tutored; S2: Construct an intelligent math tutoring model, which includes a step-by-step guidance planning component, an assessment-driven tutoring component, and a training data synthesis and fine-tuning module; S3: Input the math problem into the intelligent math tutoring model to complete the intelligent math tutoring.

[0014] Compared with the prior art, the beneficial effects of the present invention are as follows: This invention constructs an intelligent math tutoring model that includes a step-by-step guided planning component and an assessment-driven tutoring component. It explicitly decomposes complex math problems into ordered intermediate learning objectives, forming a stable teaching framework and solving the problems of fragmented guidance processes and strategy drift in existing technologies. Through an assessment-driven teaching memory module, it continuously tracks the learner's cognitive state and learning progress, achieving state-aligned adaptive feedback. This allows the model to dynamically adjust its guidance strategy based on the learner's real-time performance, avoiding premature disclosure of answers or shortening of the teaching path. Simultaneously, through structured training data synthesis and instruction fine-tuning, the model internalizes step-by-step planning and state assessment capabilities within a unified parameter space, ensuring the coherence, controllability, and standardization of multi-round, long-term teaching dialogues. Attached Figure Description

[0015] To more clearly illustrate the technical solution of the present invention, the drawings used in the embodiments are briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0016] Figure 1 This is a schematic diagram of the model structure according to an embodiment of the present invention. Detailed Implementation

[0017] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0018] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.

[0019] Example 1 This embodiment provides an intelligent math tutoring method based on step-by-step guided planning and state-aligned memory, the steps of which include: S1. Obtain the math problem to be tutored.

[0020] S2. Construct an intelligent math tutoring model, which includes a step-by-step guidance planning component, an assessment-driven tutoring component, and a training data synthesis and fine-tuning module.

[0021] This embodiment constructs a planning-guided teaching framework for multi-round mathematics teaching dialogues, enabling explicit step-by-step guidance and adaptive teaching based on cognitive states within a unified process. The overall framework architecture is as follows: Figure 1 As shown, the framework is built around three core elements: instructional planning, status assessment, and instructional memory (corresponding to the step-by-step guidance planning component, assessment-driven tutoring component, and training data synthesis and fine-tuning module), to support coherent and controllable multi-round instructional interactions.

[0022] The framework first generates a step-by-step instructional plan for a given problem. This plan consists of a series of ordered intermediate learning objectives and corresponding guiding questions, providing a stable instructional framework for subsequent dialogues. During the instructional process, the framework continuously analyzes learner input through an assessment-driven control loop, determines their cognitive state, and evaluates whether the current instructional step has been achieved. This determines whether to provide further guidance for the current step or move on to the next instructional stage. To ensure consistency and traceability across multiple rounds of instruction, the framework maintains an assessment-driven instructional memory module to uniformly store the instructional plan, step-by-step progress, inferred learner states, and historical dialogue information. This information collectively serves as the contextual condition for instructional decisions, ensuring that subsequent instructional feedback remains consistent with the established learning objectives and the learner's current state. Based on this design, progressive instructional guidance can be implemented within a unified framework, and instructional strategies can be dynamically adjusted based on learner feedback in real time, thereby supporting high-quality, state-aware multi-round instructional dialogues.

[0023] S3. Input the math problem into the intelligent math tutoring model to complete the intelligent math tutoring.

[0024] (1) Problem Description This embodiment models the math tutoring task as a multi-round interactive process between the learner and the large language model tutor. Given a math problem Subsequently, the entire coaching session was defined as a sequence of interactive rounds. And the first round of learner inputL 1 is the original problem. In specific interaction rounds, the model will adjust its working memory based on the currently maintained working memory. and learners' real-time statements Li To generate the next round of teaching responses. Ti .

[0025] (2) Gradually guide the planning components This embodiment's step-by-step guided planning component breaks down complex mathematical problems into independently executable reasoning steps through a structured, step-by-step guidance mechanism. The core logic of this component is to decompose complex tasks into multiple intermediate goals, enabling the generative model to guide the user through reasoning via manageable checkpoints and providing a basis for accurate diagnosis of learning progress. Specifically, it allows... This represents the input mathematical problem. The step-by-step guided planning component first... It can be broken down into a step-by-step solution derivation process consisting of N logical steps. For each step The model generates guiding content corresponding to the current stage, as well as reference information to characterize the goals of that stage. Together, these elements constitute a complete sequence of instructional plans for solving the problem.

[0026] The plan was in N Step termination, in which aN As a question The final answer. Each pair A specific local task objective is defined. Within the execution logic of this component, the system stores the steps and answers. at Used for subsequent evaluation and state alignment, and to impose constraints to prevent the leakage of information for subsequent steps.

[0027] (3) Evaluation of the driving coaching components The assessment-driven coaching component aims to dynamically adjust coaching strategies based on the user's real-time cognitive state. This component consists of two core modules: a state alignment generator and an assessment-driven memory. The two modules interact through a four-step loop: the generator interprets the user's state and generates responses through "assessment" and "execution" operations, while the memory component maintains planning progress and interaction history through "tracking" and "recording" operations.

[0028] During the k-th round of dialogue, evaluate the driving memory. Maintaining three key elements: curriculum planning Progress vector This is used to indicate the completion status of each step; and the dialogue history sequence. This memory stores the user input for each round, the inferred user state, and the system response. In this embodiment, the memory state is defined as: The interaction loop of this component specifically includes the following four operations: Evaluation operation (Assess): Based on the current user response Lk and the current memory state. Inferring the user's cognitive state sk: The system retrieves the current target from the memory component. The user's status is determined by comparing the consistency between the user's input and the standard answer.

[0029] Execution Operation (Act): Based on the inferred state sk, select the corresponding guidance strategy and generate the system response Tk. For example, if sk is in the "correct" state, the system will execute the affirmative action and transition to the next objective. The strategy.

[0030] Track operation: Update the progress vector pk based on the evaluation results. If the evaluation results indicate that the current step has been completed, update the state at the corresponding position: Otherwise, the progress vector remains unchanged.

[0031] Record operation: Archives the interaction data of the current round into the dialogue history to complete memory synchronization: When the user enters a question At that time, the system first constructs the planning sequence. And initialize the progress vector In each round of interaction, the system identifies the current step t and executes the above four-step loop until all elements in the progress vector p are 1, at which point the session ends.

[0032] (4) Training data synthesis and fine-tuning module To enable the model to learn and internalize the step-by-step instruction planning mechanism and evaluation-driven instruction guidance strategy proposed in this embodiment, the present invention sets up a training data synthesis process before model training. This process transforms the original mathematical problem into structured supervised samples that can be used for instruction fine-tuning. This training data synthesis process serves as a pre-step for model training and fine-tuning, and its output data is directly used as the input corpus for subsequent training stages.

[0033] Specifically, the training data synthesis process includes the following steps: ① Planning and Generation Phase: For the original set of mathematical problems (in (For reference answer) A detailed solution derivation process is generated using a planning model. And extract step-by-step guiding plans from them. To ensure data quality, this embodiment implements a strict filtering mechanism: requiring a certain number of steps... The derived final answer aN must be consistent with... Consistent; and the logical validity of the step jumps is verified by the validation module. The selected high-quality samples constitute the planning dataset: .

[0034] ② Dual-Agent Simulation Stage: This embodiment synthesizes the dialogue trajectory by simulating the dynamic interaction between the "learner agent" and the "tutor agent." The interaction begins with the learner asking a question. Initially, the tutor agent initializes its memory state and poses an initial sub-problem q1. In each subsequent round of interaction... In the process, the learner agent answers based on the current step. at A target cognitive state sk is sampled and a response Lk is generated. In this embodiment, a consistency check is performed, where the tutor agent verifies whether Lk conforms to the preset state sk. If the check fails, corrective suggestions are provided, and the learner agent is required to regenerate the response; if the check passes, the final training objective is generated, including the derivation logic (containing evaluation evidence and guiding actions) and the teaching response. This process iterates until the solution is complete, and the tutor agent generates a summary statement.

[0035] ③ Serialization and Sample Construction Stage: After synthesizing the dialogue trajectory, this embodiment removes trajectories that are logically incoherent or of abnormal length. Valid trajectories are flattened into single-round training samples. The input for each round includes the memory information of the current round (including planning, progress, and dialogue history) and the learner's input; the target output adopts an "analysis first, response later" structure, providing the derivation logic, the identified state, and the final response. This constitutes the dialogue dataset: in, This represents the derivation logic generated by the model for state recognition.

[0036] Through the above training data synthesis process, this invention explicitly transforms the planning decisions, state evaluations, and guidance behaviors implicit in the teaching process into structured supervision signals, providing direct data support for the subsequent model training and fine-tuning stages. This enables the model to learn progressive teaching planning capabilities and adaptive teaching guidance capabilities based on evaluation results within a unified parameter space.

[0037] This invention employs instruction fine-tuning technology to train the base model Qwen2.5-7B-Instruct, enabling it to possess progressive instruction planning capabilities and adaptive instruction guidance capabilities based on cognitive states. During the training phase, the system constructs a hybrid training corpus composed of instruction planning data and dialogue guidance data generated during the training data synthesis process, and performs unified instruction fine-tuning on the base model based on this corpus. During fine-tuning, different types of training samples are distinguished by corresponding instruction prefixes, thereby guiding the model to learn different but related instruction behaviors within the same parameter space.

[0038] Specifically, the fine-tuning process includes the following two types of training tasks: ① Fine-tuning the planning generation task: For instructional planning data, guided by instructional prefixes, the model learns to generate a complete sequence of instructional plans consisting of multiple ordered steps, using the original mathematical problem as input. This sequence includes progressively guided sub-problems and corresponding standard reference answers, used to characterize the overall problem-solving path. Through fine-tuning this task, the model gains the ability to explicitly decompose complex mathematical problems into teachable steps.

[0039] ② Interactive Guidance Fine-Tuning Task: For dialogue guidance data, under the constraint of instruction prefixes, the model learns to generate structured instructional outputs based on the current round's teaching memory and learner input. The outputs include the analysis and identification results of the learner's cognitive state, and the instructional feedback generated based on that state. Through fine-tuning in this task, the model learns to align its state with the instructional plan and evaluation results during multiple rounds of interaction, and dynamically adjusts its instructional strategies.

[0040] Through the aforementioned fine-tuning process, the model can simultaneously master global problem-solving planning and local adaptive guidance capabilities within a unified parameter space. The fine-tuned model constitutes the mathematical tutoring model of this invention, capable of continuously generating step-by-step guidance feedback that conforms to heuristic teaching principles during actual teaching dialogues, based on pre-generated teaching plans, real-time maintained teaching memories, and the learner's current cognitive state.

[0041] Example 2 This embodiment also provides an intelligent math tutoring system based on step-by-step guided planning and state-aligned memory, including: a data acquisition module for acquiring the math problem to be tutored; a construction module for constructing an intelligent math tutoring model, the intelligent math tutoring model including a step-by-step guided planning component, an evaluation-driven tutoring component, and a training data synthesis and fine-tuning module; and a tutoring module for inputting the math problem into the intelligent math tutoring model to complete the intelligent math tutoring.

[0042] The embodiments described above are merely preferred embodiments of the present invention and are not intended to limit the scope of the present invention. Various modifications and improvements made to the technical solutions of the present invention by those skilled in the art without departing from the spirit of the present invention should fall within the protection scope defined by the claims of the present invention.

Claims

1. An intelligent math tutoring method based on step-by-step guided planning and state-aligned memory, characterized in that, Includes the following steps: S1: Obtain the math problem to be tutored; S2: Construct an intelligent math tutoring model, which includes a step-by-step guided planning component, an evaluation-driven tutoring component, and a training data synthesis and fine-tuning module; S3: Input the math problem into the intelligent math tutoring model to complete the intelligent math tutoring.

2. The intelligent math tutoring method based on step-by-step guided planning and state-aligned memory according to claim 1, characterized in that, The intelligent math tutoring model uses a step-by-step guided planning component to generate a step-by-step teaching plan for math problems, uses an assessment-driven tutoring component to generate teaching feedback based on the learner's real-time input and assessment-driven teaching memory in multiple rounds of interaction, and uses a training data synthesis and fine-tuning module to fine-tune the base model to complete intelligent math tutoring.

3. The intelligent math tutoring method based on step-by-step guided planning and state-aligned memory according to claim 2, characterized in that, A step-by-step instructional plan that uses a step-by-step guided planning component to generate mathematical problems includes: Based on a mathematical problem, obtain the step-by-step derivation process of the solution to the mathematical problem, which consists of multiple logical steps; By solving each step of the derivation process step by step, we obtain the guiding sub-questions and the corresponding standard reference answers; Based on the guiding questions and standard reference answers for each step, a step-by-step teaching plan consisting of multiple ordered intermediate learning objectives is obtained.

4. The intelligent math tutoring method based on step-by-step guided planning and state-aligned memory according to claim 2, characterized in that, Before fine-tuning the base model using the training data synthesis and fine-tuning module, the following steps are also included: Based on the original set of mathematical problems, obtain a planning dataset consisting of multiple ordered guiding sub-problems and their corresponding standard reference answers; Based on the planned dataset, a dialogue dataset containing inference logic, recognition status, and teaching response is obtained through interaction simulation between learner agent and tutor agent. Based on the planning dataset and the dialogue dataset, obtain a hybrid training corpus for fine-tuning instructions.

5. The intelligent math tutoring method based on step-by-step guided planning and state-aligned memory according to claim 4, characterized in that, After obtaining the hybrid training corpus for instruction fine-tuning based on the planning dataset and the dialogue dataset, the following is also included: Based on the teaching planning data in the hybrid training corpus, the base model is planned, generated, and fine-tuned to obtain an intermediate model that can decompose mathematical problems into teachable steps. Based on dialogue guidance data in the hybrid training corpus, the intermediate model is fine-tuned through interactive guidance to obtain an intelligent math tutoring model with the ability to perform state alignment and adaptive guidance based on cognitive state.

6. The intelligent math tutoring method based on step-by-step guided planning and state-aligned memory according to claim 2, characterized in that, After generating a step-by-step instructional plan for a math problem using the step-by-step guided planning component, it also includes: Based on the step-by-step teaching plan, initialize the progress vector in the assessment-driven memory; During multiple rounds of interaction, the completion status of the current step is obtained based on the teaching objectives of the current step and the learner's input. Update the progress vector based on the completion status of the current step; The coaching session ends when the progress vector indicates that all steps have been completed.

7. An intelligent math tutoring system based on step-by-step guided planning and state-aligned memory, the system being used to implement the method described in any one of claims 1-6, characterized in that, include: The data acquisition module is used to acquire the math problems to be tutored. The building module is used to construct an intelligent math tutoring model, which includes a step-by-step guidance planning component, an assessment-driven tutoring component, and a training data synthesis and fine-tuning module. The tutoring module is used to input math problems into the intelligent math tutoring model to complete intelligent math tutoring.