A conversational personalized learning resource dynamic scheduling method and system

By constructing dynamic learner profiles and a multi-agent reinforcement learning framework, the shortcomings of personalized learning resource scheduling systems in adapting to dynamic needs are addressed, enabling precise and real-time scheduling of learning resources and continuous optimization of learning outcomes.

CN122198484APending Publication Date: 2026-06-12BEIJING YIJIAO LANTIAN TECH DEV CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
BEIJING YIJIAO LANTIAN TECH DEV CO LTD
Filing Date
2026-03-12
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing personalized learning resource scheduling systems struggle to adapt to dynamically changing learning needs, lack semantic understanding capabilities based on natural dialogue, and are unable to accurately identify users' implicit knowledge gaps, resulting in a disconnect between resource recommendations and actual needs.

Method used

By collecting multimodal data, a dynamic learner profile integrating cognitive and emotional states is constructed. A multi-agent reinforcement learning framework is used for decision-making to generate personalized learning resource scheduling sequences. The scheduling effect is optimized through closed-loop feedback, including facial video data micro-expression recognition, dual-channel graph neural network model and multi-agent reinforcement learning.

🎯Benefits of technology

It achieves precise capture and real-time response to learning needs, improves learning efficiency and short-term knowledge consolidation rate, maintains positive learning emotions, broadens cognitive breadth, and forms a self-iterable personalized learning resource scheduling system.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122198484A_ABST
    Figure CN122198484A_ABST
Patent Text Reader

Abstract

The application relates to the technical field of personalized learning resource scheduling. In particular, it relates to a dialogue type personalized learning resource dynamic scheduling method and system. The method comprises the following steps: collecting multi-modal data such as dialogue text, interactive behavior and physiological state of learners; based on this, a dynamic learner portrait integrating cognitive and emotional states is constructed and updated in real time; then, by using a multi-agent reinforcement learning framework, combining the current dialogue intent, through the collaborative decision of multiple agents respectively responsible for knowledge remediation, interest maintenance and ability expansion, and through the arbitration of a scheduler, a personalized learning resource scheduling sequence is generated; the resources are sequentially pushed and feedback is monitored in real time, and finally, the portrait model and the scheduling strategy are closed-loop optimized according to the feedback data. The application realizes dynamic comprehensive perception of the cognition and emotion of learners, and through multi-agent collaboration and closed-loop optimization, the most suitable learning resources can be adaptively scheduled, and the degree of personalization and the overall efficiency of learning are effectively improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of personalized learning technology, and more specifically, to a method and system for dynamic scheduling of conversational personalized learning resources. Background Technology

[0002] With the deepening of digital transformation in education, personalized learning has become a core direction for improving learning efficiency and adapting to differentiated learning needs, and the precise allocation of learning resources is a key support for realizing personalized learning. In existing technologies, personalized learning resource allocation systems mostly construct static user profiles based on users' historical learning behaviors (such as browsing history and answer results), and use traditional recommendation algorithms such as collaborative filtering and content matching to push resources, which can initially meet users' basic needs for resources on specific knowledge points.

[0003] However, existing personalized learning resource scheduling technologies are difficult to adapt to dynamically changing learning needs and complex learning scenarios. In other words, the accuracy of demand capture is insufficient. Existing systems rely heavily on explicit behavioral data triggered by users (such as clicks and favorites), lack semantic understanding capabilities based on natural dialogue, and cannot effectively identify unknown knowledge gaps hidden in user dialogues (such as knowledge weaknesses that users do not explicitly express), resulting in a disconnect between resource delivery and actual learning needs.

[0004] Therefore, how to accurately capture dynamic learning needs through dialogue and interaction, construct real-time updated learner profiles, achieve flexible real-time scheduling of learning resources, and continuously optimize the scheduling effect based on closed-loop feedback has become a pressing technical problem in the field of personalized learning resource scheduling. Based on this, it is necessary to propose a dialogue-based dynamic scheduling method and system for personalized learning resources to overcome the aforementioned shortcomings of existing technologies. Summary of the Invention

[0005] The technical problem this invention aims to solve is the insufficient accuracy in capturing learning needs. To address the aforementioned deficiencies in existing technologies, this invention provides a method and system for dynamic scheduling of conversational personalized learning resources.

[0006] The technical solution adopted by this invention to solve its technical problem is: on the one hand A method for dynamically scheduling conversational personalized learning resources includes the following steps: Step S1: Collect learners' multimodal data and preprocess the multimodal data; the multimodal data includes dialogue text data, interaction behavior data, and physiological state data; Step S2: Based on the preprocessed multimodal data, construct and update the dynamic learner profile in real time. The dynamic learner profile is a comprehensive representation that integrates cognitive and emotional states. Step S3: Based on the dynamic learner profile and the dialogue intent in the current learning dialogue, a multi-agent reinforcement learning framework is used to make decisions and generate a personalized learning resource scheduling sequence. The multi-agent reinforcement learning framework includes at least a first agent for knowledge recovery, a second agent for interest maintenance, a third agent for capability expansion, and a scheduler. The first agent, the second agent, and the third agent all output candidate resource sequences to the scheduler. The scheduler arbitrates and merges each candidate resource sequence according to a preset global reward function and outputs the final learning resource scheduling sequence. Step S4: Push the learning resources in the learning resource scheduling sequence to the learner's end in sequence, and collect and monitor the learner's feedback data in real time during the learning process; Step S5: Based on the feedback data, optimize and iterate the model parameters of the dynamic learner profile and the strategy of the multi-agent reinforcement learning framework to form a closed loop.

[0007] Preferably, in step S1, the physiological state data includes facial video data acquired through an image acquisition device; in step S2, the step of constructing and updating a dynamic learner profile that integrates cognitive and emotional states includes: Step S21: Perform micro-expression recognition on the facial video data to determine the learner's real-time emotion label; Step S22: Integrate the real-time sentiment tags, the sentiment tendencies analyzed from the dialogue text data, and the behavioral patterns analyzed from the interaction behavior data to generate a sub-portrait of the sentiment state.

[0008] Preferably, in step S2, a dual-channel graph neural network model is used to construct and update the dynamic learner profile, the model comprising: Cognitive channels, with knowledge points as nodes, the degree of mastery of the knowledge points as node attributes, and the prerequisites and dependencies between knowledge points as edges, are used to construct and update cognitive knowledge graphs. The emotional channel, with the real-time emotional tags as nodes and the transition probabilities between emotional states as edges, is used to construct an emotional association network; The attention fusion layer is used to perform attention-weighted fusion of the cognitive feature vector output by the cognitive knowledge graph and the emotional feature vector output by the emotional association network to generate a unified feature vector representing the dynamic learner profile.

[0009] Preferably, in step S3, the goal of the first intelligent agent is to maximize the short-term knowledge mastery, and the state space of the first intelligent agent includes the mastery of knowledge points associated with the current dialogue intent and the real-time cognitive load; the goal of the second intelligent agent is to maximize the duration of positive emotions in the learning process, and the state space of the second intelligent agent includes the emotional state sub-profile and the frequency of resource interaction; the goal of the third intelligent agent is to maximize the breadth and relevance of the long-term knowledge system, and the state space of the third intelligent agent includes the topological structure features of the knowledge graph and the exploration degree of historical learning paths.

[0010] Preferably, the scheduler's global reward function R_global is composed of a weighted sum of the following reward items: Based on the growth rate of knowledge mastery R_knowledge in subsequent evaluations; Based on the effective time spent learning to draw and the percentage of positive emotions (R_engagement); The negative penalty term R_overload is based on the cognitive load exceeding the threshold.

[0011] Preferably, in step S3, generating a personalized learning resource scheduling sequence further includes: For each learning resource in the learning resource scheduling sequence, an emotion-adaptive prompt is matched. The style of the prompt is dynamically selected based on the emotion state sub-profile, and the style includes motivational, reassuring, or challenging.

[0012] Preferably, the preprocessing of the multimodal data in step S1 includes: using a lightweight convolutional neural network model locally on the user terminal to extract features from the facial video data, obtaining an anonymized emotional feature vector, and then uploading the emotional feature vector to the server.

[0013] Preferably, the optimization and iteration of the strategy of the multi-agent reinforcement learning framework in step S5 specifically includes: A distributed reinforcement learning algorithm is used to train the first agent, the second agent, and the third agent independently. Furthermore, a Bayesian optimization method is employed to dynamically adjust the weight coefficients of each reward item, R_knowledge, R_engagement, and R_overload, in the global reward function based on long-term learning performance feedback.

[0014] on the other hand A conversational personalized learning resource dynamic scheduling system, used to implement any of the methods described above, includes: The multimodal data acquisition module is used to collect learners' dialogue text data, interaction behavior data, and physiological state data; The learner profiling engine module is used to construct and update a dynamic learner profile that integrates cognitive and emotional states based on the multimodal data. The scheduling decision module is used to make decisions based on the dynamic learner profile and the current dialogue intent, using a multi-agent reinforcement learning framework, and generate a personalized learning resource scheduling sequence; the scheduling decision module includes a first agent, a second agent, a third agent, and a scheduler unit; The resource push and monitoring module is used to push learning resources in the learning resource scheduling sequence in sequence and monitor feedback data in real time during the learning process. The closed-loop feedback optimization module is used to optimize the model parameters and strategies of the learner profiling engine module and the scheduling decision module based on the feedback data.

[0015] Preferably, the scheduling decision module, learner profiling engine module, and closed-loop feedback optimization module are deployed on a cloud server; the multimodal data acquisition module includes a local processing unit deployed on the user terminal, used to perform localized feature extraction and desensitization processing on the original physiological state data containing sensitive information before data upload.

[0016] The beneficial effects of this invention are as follows: 1. By integrating knowledge mastery with real-time cognitive load to create a precise cognitive profile, and combining this with a first agent focused on knowledge remediation, the system can diagnose learning weaknesses in real time and dynamically allocate the most targeted learning resources within the learner's cognitive load capacity. This avoids the blindness of resource recommendations, thereby effectively improving learning efficiency and short-term knowledge consolidation rate.

[0017] 2. By constructing dynamic emotional state sub-profiles through micro-expression recognition and emotional association networks based on facial videos, and combining this with a second agent aimed at maintaining interest, the system can perceive learners' emotional changes in real time, such as frustration, boredom, or excitement. By scheduling emotionally adaptive resources and guidance, the system can provide comfort and encouragement when learning encounters difficulties, and inject challenges and motivation when the learner is in a good state, thereby effectively maintaining positive learning emotions, reducing mid-course abandonment, and enhancing learning immersion and long-term engagement motivation.

[0018] 3. By introducing a third intelligent agent aimed at expanding capabilities, the system not only focuses on mastering current knowledge points but also plans long-term learning paths based on the global topology of the knowledge graph and historical exploration paths. This further encourages learners to explore knowledge connections, broaden their cognitive scope, and prevent them from falling into the trap of repetitive training or narrow knowledge, ultimately promoting the construction of a systematic and structured knowledge system.

[0019] 4. A multi-agent reinforcement learning framework is adopted, with the scheduler arbitrating through a global reward function. This mechanism quantifies and coordinates three often conflicting objectives: the first agent improving knowledge mastery, the second agent maintaining positive emotions, and the third agent expanding capability boundaries. The system dynamically adjusts the weights of each objective through closed-loop feedback and Bayesian optimization, thereby adaptively finding the optimal scheduling strategy for different learners and at different learning stages, maximizing global learning gains.

[0020] 5. Employing a cloud-based collaborative and locally anonymized architecture, the system continuously updates and iterates while ensuring the privacy and security of sensitive biometric information such as learners' facial data. Through distributed reinforcement learning and closed-loop feedback optimization, the entire system's profiling model and decision-making strategies can continuously evolve with the input of more data feedback, self-improving scheduling accuracy and forming an intelligent and personalized learning resource scheduling system. Attached Figure Description

[0021] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the present invention will be further described below in conjunction with the accompanying drawings and embodiments. The drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort: Figure 1 This is a schematic diagram of the steps of the conversational personalized learning resource scheduling method in Embodiment 1 of this application.

[0022] Figure 2 This is a schematic diagram of the structure of the conversational personalized learning resource scheduling system in Embodiment 2 of this application. Detailed Implementation

[0023] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, a clear and complete description will be provided below in conjunction with the technical solutions in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative effort are within the protection scope of the present invention.

[0024] Example 1 The preferred embodiments of the present invention are shown in the appendix. Figure 1 As shown, a method for dynamically scheduling conversational personalized learning resources includes the following steps: Step S1: Collect learners' multimodal data and preprocess it. Multimodal data includes dialogue text data, interaction behavior data, and physiological state data. As an optional embodiment, dialogue text data refers to the text sequence generated during the learner's dialogue interaction with the system. It is collected through the system's dialogue interface, such as a chat window or speech recognition, and recorded in a structured data format. Each data entry includes at least a timestamp, learner identifier, speech content, and a preliminary identified dialogue intent. Interactive behavior data refers to the operation logs generated by learners during their interaction with learning resources. Specifically, this may include resource click frequency, learning duration, quiz accuracy rate, and the number of knowledge point jumps. Resource click frequency is the number of clicks on various learning resources (such as videos, exercises, documents, etc.) within a unit of time. Learning duration is the time elapsed from when a learner first encounters a learning resource until they actively close it or the system records the learner's attention shift. Quiz accuracy rate is the percentage of questions answered correctly in the most recent set of assessment questions for a specific knowledge point or module. The number of knowledge point jumps is the number of times a learner actively switches between resources related to different knowledge points during a continuous learning session.

[0025] Physiological state data includes facial video data acquired through image devices. Preprocessing of multimodal data includes: extracting features from the facial video data locally on the user's end to obtain an anonymized emotional feature vector, and then uploading the emotional feature vector to the server. As an optional embodiment, this may include general preprocessing and localized anonymization processing for sensitive data; that is, segmenting, removing stop words, and vectorizing the dialogue text data; and cleaning, normalizing, and serializing the interaction behavior data. For facial video data containing sensitive personal biometric information, local anonymization feature extraction on the user's end specifically includes: deploying a lightweight convolutional neural network model (e.g., MobileNetV2 or ShuffleNet) on the terminal. This model receives aligned and normalized facial image sequences (e.g., three consecutive 96x96 pixel RGB images) and outputs a fixed-dimensional (e.g., 128-dimensional) abstract feature vector. This vector represents the temporal pattern of emotion-related facial action units (e.g., eyelid and mouth movements) and does not contain information that can reconstruct the original facial image or directly identify an individual. The desensitized emotional feature vectors are encrypted and uploaded to the cloud server, while the original video data is safely deleted after local processing.

[0026] Step S2: Construct and update a dynamic learner profile that integrates cognitive and emotional states based on multimodal data in real time. This includes the following steps: Step S21: Perform micro-expression recognition on facial video data to determine the learner's real-time emotional label; using desensitized emotional feature vectors, combined with micro-expression recognition algorithms (e.g., a classifier based on temporal difference features and support vector machine SVM), identify the learner's real-time emotional state. The emotional state is labeled using a two-dimensional continuous model, outputting a combined label of pleasure level (high / low) and activation level (high / low), such as (high pleasure, high activation), (low pleasure, low activation), etc.

[0027] Step S22: Integrate real-time sentiment tags, sentiment tendencies analyzed from dialogue text data, and behavioral patterns analyzed from interaction behavior data to generate a sentiment state sub-profile.

[0028] Specifically, a dual-channel graph neural network model is used, comprising a cognitive channel, an affective channel, and an attention fusion layer. The cognitive channel uses knowledge points from a pre-defined subject knowledge system as graph nodes. Each node's attribute is the knowledge point mastery level, and edges represent the prerequisites and dependencies between knowledge points. This is used to construct and update the cognitive knowledge graph. As an optional implementation, its value is a scalar between 0 and 1, calculated using a function f: Mastery Level = f(Historical Answer Accuracy Rate, Cumulative Learning Time for Knowledge Points, Recent Related Assessment Score), where f can be a weighted average or a small regression network. Edges between nodes represent prerequisites or strong dependencies between knowledge points (derived from the pre-defined knowledge graph). The edge weights are initialized to 1 and can be dynamically adjusted based on the learner's actual learning path success rate. This channel uses a graph convolutional network (GCN) for information propagation and aggregation, outputting a feature vector c representing the current cognitive state.

[0029] The emotion channel uses real-time emotion tags as graph nodes, with edges representing the transition relationships between emotion states. The weights (i.e., transition probabilities) are calculated by statistically analyzing the frequency of emotion state transitions in historical paintings and then smoothing them using Laplace's algorithm, thus constructing a dynamic emotion state transition network. The emotion channel processes the current (and historical) emotion state sequence through a neural network, outputting a feature vector e representing the current emotion state.

[0030] The attention fusion layer performs attention-weighted fusion of the cognitive feature vector output from the cognitive knowledge graph and the sentiment feature vector output from the sentiment association network to generate a unified feature vector representing the dynamic learner profile. The cognitive feature vector c and the sentiment feature vector e are concatenated and input into an attention mechanism network. This network learns to generate an adaptive weight scalar α (0 ≤ α ≤ 1). Finally, the unified feature vector u of the dynamic learner profile is obtained through weighted fusion: Vector u is a digital representation of a dynamic learner profile that integrates real-time cognitive and emotional states.

[0031] Step S3: Based on the dynamic learner profile and the dialogue intent in the current learning dialogue, a multi-agent reinforcement learning framework is used to make decisions and generate a personalized learning resource scheduling sequence. The multi-agent reinforcement learning framework includes at least a first agent for knowledge recovery, a second agent for interest maintenance, a third agent for capability expansion, and a scheduler. The first, second, and third agents all output candidate resource sequences to the scheduler. The scheduler arbitrates and merges the candidate resource sequences according to a global reward function. As an optional embodiment, the Borda counting method is used, where the resource ranked nth in each agent's candidate sequence receives (N-n+1) points (where N is the sequence length). Each agent is assigned a dynamic reputation weight (based on its historical recommendation success rate). The scheduler calculates the weighted total score for each resource across all sequences, sorts them in descending order of total score, and outputs the final learning resource scheduling sequence. The first agent's goal is to maximize short-term knowledge mastery. The first agent's state space includes the mastery of knowledge points related to the current dialogue intent and real-time cognitive load. The second agent's goal is to maximize the duration of positive emotions during the learning process. The second agent's state space includes emotional state sub-profiles and resource interaction frequency. The third agent's goal is to maximize the breadth and relevance of the long-term knowledge system. The third agent's spatial state includes the topological structure features of the knowledge graph and the exploration degree of historical learning paths.

[0032] The scheduler evaluates and optimizes the overall decision-making effect based on a global reward function R_global, which is a weighted sum of three parts: the growth rate of knowledge mastery based on subsequent assessments, R_knowledge, the proportion of effective market and positive emotions in learning to draw, R_engagement, and the negative penalty term R_overload based on cognition exceeding the threshold.

[0033] The learning resource sequence also includes matching emotionally appropriate prompts for each learning resource in the learning resource scheduling sequence. The style of the prompts is dynamically selected based on the emotional state sub-profile, and the expression style includes encouraging, reassuring, or challenging. As an optional embodiment, based on the currently dominant emotional dimension (such as "low pleasure") in the dynamic learner profile, a template with the corresponding style (such as "reassuring") is selected, and personalized by combining the learner's name and current knowledge point information to generate a prompt such as "(Learner's Name), don't worry, let's understand the core of this concept through a simple example first."

[0034] Step S4 involves sequentially pushing resources from the learning resource scheduling sequence to the learner's end and monitoring the learner's feedback data in real time during the learning process. As an optional embodiment, the feedback data may include explicit feedback, implicit feedback, and new multimodal data; explicit feedback includes resource ratings, likes or dislikes, and text comments; implicit feedback includes the duration of time spent on newly pushed resources, interaction completion rate, whether skipped, and answers to subsequent related questions; new multimodal data includes newly generated interactive behaviors, dialogue texts, and physiological data during the learning process.

[0035] Step S5: Based on feedback data, optimize and iterate the model parameters of the dynamic learner profile and the policies of the multi-agent reinforcement learning framework to form a closed loop. As an optional implementation, model parameter optimization involves continuously updating the parameters of the dynamic learner profile model using new state, action, and reward sequences from the feedback data, making the dynamic learner profile model more accurate. Policy optimization employs a distributed reinforcement learning algorithm to independently train the first, second, and third agents, with multiple environment instances (simulated or real user sessions) running in parallel, generating experience data stored in a shared replay buffer. A proximal policy optimization algorithm is used to sample experience from the buffer and update the respective policy network parameters. Furthermore, a Bayesian optimization method is used to dynamically adjust the weight coefficients of each reward item (R_knowledge, R_engagement, and R_overload) in the scheduler's global reward function based on long-term learning performance.

[0036] Example 2 A conversational personalized learning resource dynamic scheduling system, see attached document. Figure 2 It is used to implement any one of the methods in Embodiment 1, including a multimodal data acquisition module, a scheduling decision module, a resource push and monitoring module, and a closed-loop feedback optimization module.

[0037] The multimodal data acquisition module is deployed on learner terminal devices (such as personal computers and tablets) to collect learners' raw multimodal data. This module includes a local processing unit for localized feature extraction and desensitization of raw physiological state data (such as facial videos) containing sensitive information, and then uploads the desensitized feature vectors to the cloud via an encrypted channel. The multimodal data acquisition module is also responsible for collecting and initially formatting dialogue text data and interaction behavior data.

[0038] The learner profiling engine module receives preprocessed data from the multimodal data acquisition module, constructs and updates a dynamic learner profile that integrates cognitive and emotional states in real time, and outputs a unified feature vector.

[0039] The scheduling decision module is used to make decisions based on dynamic learner profiles and current dialogue intentions, utilizing a multi-agent reinforcement learning framework to generate personalized learning resource scheduling sequences. The scheduling decision module includes a first agent, a second agent, a third agent, and a scheduler (i.e., a resource metadata database). The parallel policy networks of the first, second, and third agents focus on knowledge recovery, interest maintenance, and capability expansion, respectively. The scheduler receives the outputs from the first, second, and third agents, and generates the final learning resource scheduling sequence through a fusion arbitration mechanism such as weighted Borda counting, matching each resource with an emotion-adaptive prompt. The resource metadata database stores metadata tags for all learning resources, enabling agent decision-making and scheduler invocation.

[0040] The resource push and monitoring module is used to sequentially push learning resources in the learning resource scheduling sequence and monitor feedback data in real time during the learning process. The closed-loop feedback optimization module receives feedback data streams from the resource push and monitoring module and optimizes the model parameters and policies of the learner profiling engine module and the scheduling decision module. Specifically, it uses feedback data to perform supervised or self-supervised learning updates on the neural network parameters in the learner profiling engine module. The distributed PPO algorithm is used to independently train and update the policy networks of the three agents in the scheduling decision module, while periodically running the Bayesian optimizer to dynamically adjust the weight coefficients of the global reward function in the scheduler.

[0041] The scheduling decision module, learner profiling engine module, and closed-loop feedback optimization module are deployed on a cloud server. The cloud server adopts a microservice architecture, and each module can scale independently. The database is designed with sharding and partitioning for learner profiling data, log data, and resource data to ensure high availability and scalability of the system. The multimodal data acquisition module includes a local processing unit deployed on the user terminal, used to extract and de-identify the original physiological state data containing sensitive information before data upload. The system's data flow is as follows: terminal acquisition and de-identification, encrypted upload, cloud profiling update, cloud multi-agent decision-making, cloud scheduling, resource and guidance delivery to the terminal, terminal interaction and feedback, encrypted upload feedback, and cloud closed-loop optimization.

[0042] The implementation principle of the conversational personalized learning resource dynamic scheduling system in this application is as follows: First, the system securely collects and preprocesses multimodal data on the terminal side, and extracts desensitized emotional features through a local lightweight model. In the cloud, the system uses a dual-channel graph neural network to deeply integrate the learner's real-time cognitive state (a dynamic mastery graph with knowledge points as nodes) and emotional state (a transfer network based on emotional tags), constructing a unified and quantifiable dynamic digital profile of the learner. Based on this, the system introduces a multi-agent reinforcement learning framework, which internally consists of three agents with different optimization objectives. These agents generate candidate resource scheduling strategies in parallel based on the current profile and conversational intent, focusing on three dimensions: knowledge recovery, interest maintenance, and ability expansion. The scheduler, according to a global objective function that integrates multi-dimensional rewards, performs weighted fusion and arbitration on the strategies of each agent, outputting the final emotionally adapted personalized resource sequence. After the sequence is pushed to the learner for execution, the system collects explicit and implicit feedback data in real time and uses this data to simultaneously drive the update of the learner profile model and the optimization of the multi-agent policy network. Through distributed reinforcement learning and Bayesian hyperparameter tuning, a complete, self-iterative closed loop from perception, decision-making, feedback to optimization is formed, thereby enabling the system's personalized scheduling capability to continuously and adaptively enhance as the interaction process deepens.

[0043] It should be understood that those skilled in the art can make improvements or modifications based on the above description, and all such improvements and modifications should fall within the protection scope of the appended claims.

Claims

1. A method for dynamic scheduling of conversational personalized learning resources, characterized in that, Includes the following steps: Step S1: Collect learners' multimodal data and preprocess the multimodal data; the multimodal data includes dialogue text data, interaction behavior data, and physiological state data; Step S2: Based on the preprocessed multimodal data, construct and update the dynamic learner profile in real time. The dynamic learner profile is a comprehensive representation that integrates cognitive and emotional states. Step S3: Based on the dynamic learner profile and the dialogue intent in the current learning dialogue, a multi-agent reinforcement learning framework is used to make decisions and generate a personalized learning resource scheduling sequence. The multi-agent reinforcement learning framework includes at least a first agent for knowledge recovery, a second agent for interest maintenance, a third agent for capability expansion, and a scheduler. The first agent, the second agent, and the third agent all output candidate resource sequences to the scheduler. The scheduler arbitrates and merges each candidate resource sequence according to a preset global reward function and outputs the final learning resource scheduling sequence. Step S4: Push the learning resources in the learning resource scheduling sequence to the learner's end in sequence, and collect and monitor the learner's feedback data in real time during the learning process; Step S5: Based on the feedback data, optimize and iterate the model parameters of the dynamic learner profile and the strategy of the multi-agent reinforcement learning framework to form a closed loop.

2. The method for dynamic scheduling of conversational personalized learning resources according to claim 1, characterized in that, In step S1, the physiological state data includes facial video data acquired through an image acquisition device; in step S2, the step of constructing and updating a dynamic learner profile that integrates cognitive and emotional states in real time includes: Step S21: Perform micro-expression recognition on the facial video data to determine the learner's real-time emotion label; Step S22: Integrate the real-time sentiment tags, the sentiment tendencies analyzed from the dialogue text data, and the behavioral patterns analyzed from the interaction behavior data to generate a sub-portrait of the sentiment state.

3. The method for dynamic scheduling of conversational personalized learning resources according to claim 2, characterized in that, In step S2, a dual-channel graph neural network model is used to construct and update the dynamic learner profile. The model includes: Cognitive channels, with knowledge points as nodes, the degree of mastery of the knowledge points as node attributes, and the prerequisites and dependencies between knowledge points as edges, are used to construct and update cognitive knowledge graphs. The emotional channel, with the real-time emotional tags as nodes and the transition probabilities between emotional states as edges, is used to construct an emotional association network; The attention fusion layer is used to perform attention-weighted fusion of the cognitive feature vector output by the cognitive knowledge graph and the emotional feature vector output by the emotional association network to generate a unified feature vector representing the dynamic learner profile.

4. The method for dynamic scheduling of conversational personalized learning resources according to claim 3, characterized in that, In step S3, the goal of the first agent is to maximize the short-term knowledge mastery, and the state space of the first agent includes the mastery of knowledge points associated with the current dialogue intent and the real-time cognitive load; the goal of the second agent is to maximize the duration of positive emotions in the learning process, and the state space of the second agent includes the emotional state sub-profile and the frequency of resource interaction; the goal of the third agent is to maximize the breadth and relevance of the long-term knowledge system, and the state space of the third agent includes the topological structure features of the knowledge graph and the exploration degree of historical learning paths.

5. The method for dynamic scheduling of conversational personalized learning resources according to claim 4, characterized in that, The scheduler's global reward function R_global is composed of a weighted sum of the following reward items: Based on the growth rate of knowledge mastery R_knowledge in subsequent evaluations; Based on the effective time spent learning to draw and the percentage of positive emotions (R_engagement); The negative penalty term R_overload is based on the cognitive load exceeding the threshold.

6. A method for dynamic scheduling of conversational personalized learning resources according to claim 2 or 5, characterized in that, In step S3, generating a personalized learning resource scheduling sequence further includes: For each learning resource in the learning resource scheduling sequence, an emotion-adaptive prompt is matched. The style of the prompt is dynamically selected based on the emotion state sub-profile, and the style includes motivational, reassuring, or challenging.

7. The method for dynamic scheduling of conversational personalized learning resources according to claim 2, characterized in that, The preprocessing of the multimodal data in step S1 includes: using a lightweight convolutional neural network model on the user's local device to extract features from the facial video data, obtaining an anonymized emotional feature vector, and then uploading the emotional feature vector to the server.

8. The method for dynamic scheduling of conversational personalized learning resources according to claim 1, characterized in that, Step S5 involves optimizing and iterating the policy of the multi-agent reinforcement learning framework, specifically including: A distributed reinforcement learning algorithm is used to train the first agent, the second agent, and the third agent independently. Furthermore, a Bayesian optimization method is employed to dynamically adjust the weight coefficients of each reward item, R_knowledge, R_engagement, and R_overload, in the global reward function based on long-term learning performance feedback.

9. A conversational personalized learning resource dynamic scheduling system, used to implement the method of any one of claims 1-8, characterized in that, include: The multimodal data acquisition module is used to collect learners' dialogue text data, interaction behavior data, and physiological state data; The learner profiling engine module is used to construct and update a dynamic learner profile that integrates cognitive and emotional states based on the multimodal data. The scheduling decision module is used to make decisions based on the dynamic learner profile and the current dialogue intent, using a multi-agent reinforcement learning framework, and generate a personalized learning resource scheduling sequence. The scheduling decision module includes a first intelligent agent, a second intelligent agent, a third intelligent agent, and a scheduler; The resource push and monitoring module is used to push learning resources in the learning resource scheduling sequence in sequence and monitor feedback data in real time during the learning process. The closed-loop feedback optimization module is used to optimize the model parameters and strategies of the learner profiling engine module and the scheduling decision module based on the feedback data.

10. A conversational personalized learning resource dynamic scheduling system according to claim 9, characterized in that, The scheduling decision module, learner profiling engine module, and closed-loop feedback optimization module are deployed on a cloud server; the multimodal data acquisition module includes a local processing unit deployed on the user terminal, used to perform localized feature extraction and desensitization processing on the original physiological state data containing sensitive information before data upload.