Multi-agent cooperative personalized learning path planning method, device and equipment
By acquiring task information from multiple sources through a multi-agent collaborative system and generating personalized learning paths by combining confidence and user preferences, the problem of low personalization in planning learning paths in existing technologies is solved, and personalized learning path planning in dynamic learning scenarios is realized.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SHENZHEN BOYUE DOMESTIC GOODS
- Filing Date
- 2026-03-26
- Publication Date
- 2026-06-12
AI Technical Summary
Existing smart learning companion devices have low personalization when planning learning paths and cannot match the dynamic changes in the user's actual situation, resulting in a lack of robustness and true personalization in the planning results.
A personalized learning path planning method based on multi-agent collaboration is adopted. The system obtains task information from multiple sources through multi-agent collaboration, combines confidence level and user learning preferences to generate personalized learning paths, and monitors and intervenes in the learning process in real time to ensure the adaptability and accuracy of the paths.
It achieves highly robust aggregation of learning tasks in complex information environments, generates personalized learning paths that conform to real-world scenarios, and combines user preferences and learning patterns to achieve a leap from passive response to active learning support. It fully understands the user's internal state and deeply fits real-world application scenarios.
Smart Images

Figure CN121920682B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of artificial intelligence education, and in particular to a method, apparatus and device for planning personalized learning paths through multi-agent collaboration. Background Technology
[0002] This application pertains to the field of artificial intelligence education. Currently, how to efficiently and scientifically guide children in completing homework and cultivating good study habits has become a major concern for many parents. Research has led to the development of intelligent learning companion devices specifically designed for homework assistance, boasting powerful learning companion functions. To a certain extent, these devices function as electronic workbooks, distributing assignments to users. These intelligent learning companion devices can be broadly categorized according to their functions. The first category is the all-in-one AI learning machine; its comprehensive functions include course tutoring, homework correction, and parental control. The second category is the lightweight AI learning assistant; it integrates AI capabilities into a mobile app, focusing on quick and easy Q&A and practice, with its biggest advantages being low cost, easy access, and even free availability, instantly transforming ordinary devices into learning tools. The third category is the companion-type AI early childhood education companion device, targeting young children's early learning. These typically appear in cute and portable hardware forms, stimulating children's interest in exploring the world through dialogue, photography, and other methods.
[0003] Existing intelligent learning companion devices primarily rely on optical character recognition (OCR) to acquire homework information. After OCR, text is processed using underlying algorithms. However, while current technology performs well with well-formatted, clean printed text, it struggles with non-standard formats such as complex text with mixed graphics. Lacking the ability to understand the overall structure and navigate autonomously, it falls into a perceptual blind spot, limiting its accuracy and recall to a superficial level of information acquisition. Furthermore, existing learning planning algorithms assume relatively fixed learning tasks and user states, aiming to find an initial optimal or feasible path. However, real-world learning scenarios are dynamic and open; tasks may be added or removed temporarily, and users' energy, emotions, and levels of difficulty with different subjects change in real time. Static algorithms cannot model and adaptively adjust to these dynamic variables, resulting in planning results that lack robustness and true personalization in actual execution.
[0004] In summary, existing technologies have a low degree of personalization in planning learning paths and lack differentiated learning paths that match the actual situation of users.
[0005] The above content is only used to help understand the technical solution of this application and does not represent an admission that the above content is prior art. Summary of the Invention
[0006] The main purpose of this application is to provide a personalized learning path planning method for multi-agent collaboration, which aims to solve the problem that in the existing technology, the degree of personalization in planning learning paths is low and there is a lack of differentiated learning paths that match the actual situation of users.
[0007] Firstly, a method for planning personalized learning paths through multi-agent collaboration is provided. This method is applied to a target information source in a system for planning personalized learning paths through multi-agent collaboration. The system further includes at least a first information source and a second information source. The first information source interacts externally via a graphical user interface and stores published assignments. The second information source and the target information source are both intelligent learning companion devices and are on the same consortium blockchain. The method for planning personalized learning paths through multi-agent collaboration includes:
[0008] In response to the planning instructions for the planned path, the system obtains the first job task published from the first information source and the second job task from the second information source from the perspective of the second information source.
[0009] Based on the corresponding confidence level preset for each information source, and the target task, the first task, and the second task stored locally from its own perspective, a task list is determined.
[0010] Based on locally recorded user learning preferences and user learning efficiency distribution patterns, the task list is planned to obtain the specific learning path corresponding to the task list.
[0011] The specific learning path is pushed to the user and the user's learning progress is monitored based on the internally preset monitoring module.
[0012] If the learning situation is determined to be abnormal based on the monitored learning situation, then the intervention measures corresponding to the abnormality level are selected to intervene in the user until it is determined that the user has completed the specific learning path.
[0013] In one possible implementation of this application, the step of obtaining a first job task from the first information source and a second job task from the second information source's perspective in response to a planning instruction for a planned path includes:
[0014] In response to the planning instructions for the planned path, the system autonomously navigates the graphical user interface from the first information source based on a preset visual language model and extracts the first task during navigation;
[0015] Generate a zero-knowledge proof regarding the user-authorized query, wherein the query permission content is the homework task in other smart learning companion devices on the same consortium blockchain;
[0016] The zero-knowledge proof is sent to the second information source for verification by the second information source;
[0017] In response to the message that the second source has been verified, the job task is queried from the second source to obtain the second job task from the perspective of the second source. During the query, the user's user information is not disclosed to the second source being queried.
[0018] In one possible implementation of this application, after determining the task list based on the pre-stored target task, the first task, and the second task from the perspective of each information source, by combining the corresponding preset confidence level of each information source, the task list includes:
[0019] Detect whether there is a conflict in the content of the target task, the first task, and the second task;
[0020] If a conflict is determined to exist, then based on the conflict situation of the task content, calculate the posterior probability when one of the conflict situations is true, wherein the conflict situation includes not only the one party, but also the opposing party;
[0021] If it is determined that the posterior probability exceeds a preset probability threshold, the task list is updated based on one of the conflict scenarios.
[0022] In response to the user's confirmation instruction on the updated task list, the confidence level of each information source is updated, wherein the confidence level of the information source corresponding to one of the conflicting situations is increased to enhance its influence, and the execution level of the information source corresponding to the opposing side of the conflicting situation is decreased to reduce its influence.
[0023] In one possible implementation of this application, the step of planning the task list based on locally recorded user learning preferences and user learning efficiency distribution patterns to obtain the specific learning path corresponding to the task list includes:
[0024] Based on the user's learning preferences recorded locally, the tasks in the task list are prioritized to obtain a preliminary sorting table of tasks to be completed.
[0025] Based on the distribution pattern of user learning efficiency, the current learning efficiency of the user is estimated to obtain the learning estimated efficiency.
[0026] Based on the learning prediction efficiency, the initial sorting table is sorted a second time to obtain a secondary sorting table;
[0027] Based on the secondary sorting table, the specific learning path corresponding to the task list is determined.
[0028] In one possible implementation of this application, after planning the task list based on locally recorded user learning preferences and user learning efficiency distribution patterns to obtain the specific learning path corresponding to the task list, the process includes:
[0029] In response to the user's negative instruction regarding the specific learning path, obtain the user's self-planned path.
[0030] If a conflict is detected between the self-planned path and the specific learning path, the user is advised to pause both the self-planned path and the specific learning path based on the user's learning efficiency distribution pattern.
[0031] Based on the distribution pattern of user learning efficiency, learning suggestions that are conducive to learning the self-planned path are pushed to the user, wherein the learning suggestions are completed before the user begins to complete the self-planned path;
[0032] In response to the user's instruction to accept the learning suggestion, the specific learning path is updated based on the learning suggestion to obtain the updated specific learning path;
[0033] Save the record of this negotiation with the user for future adjustments and learning.
[0034] In one possible implementation of this application, the monitoring module includes a visual monitoring module and an acoustic monitoring module. The step of pushing the specific learning path to the user and starting to monitor the user's learning progress based on the internally preset monitoring module includes:
[0035] The specific learning path will be pushed to the user;
[0036] The system uses an internal visual monitoring module to capture the user's facial expressions and postures while learning, and performs image recognition based on the captured images to obtain image features.
[0037] The system acquires the user's voice during learning based on the internal acoustic monitoring module, and performs voice recognition based on the acquired voice to obtain voice features.
[0038] The image features and the sound features are fused in a multimodal manner to obtain a state evaluation result of the user's current learning state;
[0039] If an abnormal state exists in the state evaluation result and continues for a preset time, then the current learning situation is judged to be abnormal.
[0040] In one possible implementation of this application, before obtaining the first job task published from the first information source and the second job task from the second information source's perspective in response to the planning instruction for the planned path, the process includes:
[0041] Upon first launch, request the user to perform initial settings.
[0042] Obtain the user's initialization settings, wherein the settings include at least one of the following: the comparison granularity of the comparison task, the size of the comparison range of the second information source to be compared, and whether to monitor the learning companion.
[0043] In response to the user's signature confirmation instruction for the setting information, the user's signature and the setting data corresponding to the setting information are broadcast to the consortium blockchain.
[0044] In one possible implementation of this application, the step of determining the task list based on the pre-stored target task, the first task, and the second task from the perspective of each information source, by combining the corresponding preset confidence level of each source, includes:
[0045] The preset confidence level for each information source is compared with the preset confidence level threshold.
[0046] For information sources whose confidence exceeds a preset confidence threshold, the corresponding tasks are combined to obtain a task list. If the confidence of each information source exceeds the threshold, the local pre-stored target task, the first task, and the second task from the user's perspective are combined to obtain the task list.
[0047] Secondly, a multi-agent collaborative personalized learning path planning device is provided. The multi-agent collaborative personalized learning path planning method is applied to a target information source in a multi-agent collaborative personalized learning path planning system. The device is deployed at the target information source. The system further includes at least a first information source and a second information source. The first information source interacts externally via a graphical user interface and stores published assignments. The second information source and the target information source are both intelligent learning companion devices and are on the same consortium blockchain. The multi-agent collaborative personalized learning path planning device includes:
[0048] The acquisition module is used to respond to the planning instructions for the planned path, acquire the first job task published from the first information source, and acquire the second job task from the second information source from the perspective of the second information source.
[0049] The task list determination module is used to determine the task list based on the corresponding confidence level preset for each information source, the locally pre-stored target task, the first task, and the second task from its own perspective.
[0050] The path planning module is used to plan the task list based on locally recorded user learning preferences and user learning efficiency distribution patterns, and obtain the specific learning path corresponding to the task list.
[0051] The learning monitoring module is used to push the specific learning path to the user and start monitoring the user's learning status based on the internally preset monitoring module;
[0052] The intervention module is used to select intervention measures corresponding to the abnormality level to intervene in the user if the learning situation is determined to be abnormal based on the monitored learning situation, until it is determined that the user has completed the specific learning path.
[0053] Thirdly, a personalized learning path planning device for multi-agent collaboration is provided. The personalized learning path planning device for multi-agent collaboration is a physical node device. The personalized learning path planning device for multi-agent collaboration includes: a memory, a processor, and a personalized learning path planning program for multi-agent collaboration stored in the memory and executable on the processor. The processor executes the personalized learning path planning program for multi-agent collaboration to implement the steps of the personalized learning path planning method for multi-agent collaboration.
[0054] This application provides a method, apparatus, and device for planning personalized learning paths through multi-agent collaboration. Compared to existing technologies that suffer from low personalization and lack differentiated learning paths tailored to individual user circumstances, this application addresses the issue of personalized learning path planning through multi-agent collaboration. The method is applied to a target information source within a system that also includes at least a first information source and a second information source. The first information source interacts with the target information source via a graphical user interface and stores published assignments. The second information source and the target information source are both intelligent learning companion devices on the same blockchain. The method includes: responding to a planning instruction for the planned path by obtaining the issued assignment from the first information source. The application involves obtaining a first task from multiple sources and a second task from the perspective of the second source. Combining the pre-set confidence levels of each source with the locally stored target task, the first task, and the second task, a task list is determined. Based on locally recorded user learning preferences and user learning efficiency distribution patterns, the task list is planned to obtain a specific learning path. This specific learning path is then pushed to the user, and the user's learning progress is monitored using an internally pre-set monitoring module. If the monitored learning progress indicates an abnormality, intervention measures corresponding to the abnormality level are selected to intervene until the user completes the specific learning path. In this application, task assignments from multiple sources are obtained from their respective perspectives, eliminating reliance on traditional OCR technology. Learning tasks are robustly aggregated within complex information. The task list generated by combining the confidence levels of each source more closely reflects real-world learning situations. Personalized planning based on user preferences and learning patterns achieves a fundamental leap from passive response to proactive learning support. It also monitors and intervenes in the subsequent learning process, fully understands the user's internal state, and the new personalized learning path planning is deeply aligned with real application scenarios. Attached Figure Description
[0055] To more clearly illustrate the technical solutions of the embodiments of this application, the drawings used in the description of the embodiments of this application will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0056] Figure 1 This is a flowchart illustrating a personalized learning path planning method for multi-agent collaboration in one embodiment of this application;
[0057] Figure 2This is a schematic diagram of a personalized learning path planning device for multi-agent collaboration in one embodiment of this application;
[0058] Figure 3 This is a schematic diagram of the structure of a computer device according to one embodiment of this application;
[0059] Figure 4 This is another structural schematic diagram of a computer device in one embodiment of this application. Detailed Implementation
[0060] To make the above-mentioned objectives, features, and advantages of this application more apparent and understandable, the technical solutions in the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are merely some embodiments of this application, and not all embodiments. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0061] Example 1
[0062] This application provides a method for planning personalized learning paths through multi-agent collaboration.
[0063] This application pertains to the field of artificial intelligence education. Currently, how to efficiently and scientifically guide children in completing homework and cultivating good study habits has become a major concern for many parents. Research has led to the development of intelligent learning companion devices specifically designed for homework assistance, boasting powerful learning companion functions. To a certain extent, these devices function as electronic workbooks, distributing assignments to users. These intelligent learning companion devices can be broadly categorized according to their functions as follows: First, all-in-one AI learning machines; these offer comprehensive functions, integrating precise learning, course tutoring, homework correction, and parental control. Second, lightweight AI learning assistants; these integrate AI capabilities into mobile apps, focusing on quick and easy Q&A and practice. Their biggest advantages are low cost, easy accessibility, and even free availability, instantly transforming ordinary devices into learning tools. Third, companion-type AI early childhood education companion devices, designed for young children, typically appearing in cute and portable hardware forms, stimulating children's interest in exploring the world through dialogue, photography, and other methods.
[0064] Existing intelligent learning companion devices primarily rely on optical character recognition (OCR) to acquire homework information. After OCR, text is processed using underlying algorithms. However, while current technology performs well with well-formatted, clean printed text, it struggles with non-standard formats such as complex text with mixed graphics. Lacking the ability to understand the overall structure and navigate autonomously, it falls into a perceptual blind spot, limiting its accuracy and recall to a superficial level of information acquisition. Furthermore, existing learning planning algorithms assume relatively fixed learning tasks and user states, aiming to find an initial optimal or feasible path. However, real-world learning scenarios are dynamic and open; tasks may be added or removed temporarily, and users' energy, emotions, and levels of difficulty with different subjects change in real time. Static algorithms cannot model and adaptively adjust to these dynamic variables, resulting in planning results that lack robustness and true personalization in actual execution.
[0065] In summary, existing technologies have a low degree of personalization in planning learning paths and lack differentiated learning paths that match the actual situation of users.
[0066] This application relates to the fields of artificial intelligence education, multimodal human-computer interaction, distributed intelligent agent networks, and reinforcement learning. In particular, it relates to an adaptive system applied to intelligent learning companion devices for planning, managing, and guiding children's learning tasks, and includes a corresponding privacy protection and authorization mechanism based on cryptography and distributed consensus.
[0067] like Figure 1 In the first embodiment of the multi-agent collaborative personalized learning path planning method of this application, the multi-agent collaborative personalized learning path planning method is applied to the target information source in the multi-agent collaborative personalized learning path planning system. The system further includes at least a first information source and a second information source. The first information source interacts externally through a graphical user interface and stores the published job tasks. The second information source and the target information source are both intelligent learning companion devices and are on the same consortium blockchain. The multi-agent collaborative personalized learning path planning method includes steps S110-S150:
[0068] S110, In response to the planning instruction for the planned path, obtain the first job task issued from the first information source and the second job task from the second information source from the perspective of the second information source;
[0069] A multi-agent collaborative personalized learning path planning system includes a target information source, a first information source, and a second information source. The first information source primarily interacts externally through a graphical user interface, storing and publishing assignments; for example, the first information source could be a small program specifically used by the teacher to publish assignments. The second information source, like the target information source, is an intelligent learning companion device on the same consortium blockchain. For instance, the second and target information sources are two intelligent learning companion devices on a consortium blockchain composed of classroom devices. As an example, the user of the intelligent learning companion device corresponding to the target information source is Xiaoming, and the user of the intelligent learning companion device corresponding to the second information source is Xiaohong.
[0070] In some embodiments, although Xiaoming and Xiaohong are in the same class, and their class has formed a consortium blockchain for smart learning companion devices, there is a possibility that Xiaoming's target information source only receives the Chinese homework to be completed, while Xiaohong's second information source only receives the math homework. In reality, the class teacher simultaneously assigns Chinese and math homework, and even English homework. Therefore, this application designs a system where the target information source can obtain homework from other information sources on the same consortium network for comparison, avoiding missing any assignments.
[0071] When Xiaoming needs to plan his learning path, he clicks the corresponding button to enable the device to plan. Xiaoming's smart learning companion device, which is the target information source, responds to the planning instructions by obtaining the first assignment from the first information source and the second assignment from Xiaohong's smart learning companion device, which is the second information source, from the perspective of the second information source.
[0072] S110, in response to the planning instruction for the planned path, before obtaining the first job task published from the first information source and the second job task from the second information source's perspective, including steps A1-A4:
[0073] In this embodiment, before the system is first activated and responds to any learning path planning instructions, an initialization authorization process is executed to ensure that all subsequent data collaboration activities are carried out within a framework that is explicitly authorized by the guardian and protected by cryptography.
[0074] Ms. Wang, Xiaoming's mother, installed and opened the learning companion app on the family tablet for the first time. The system detected this was the first launch and automatically entered the initial setup wizard, guiding Ms. Wang through the privacy and authorization configuration.
[0075] A1. Upon first launch, request the user to perform initial settings.
[0076] The system detected that the device was being used for the first time, the local database was empty, and no authorization record bound to the device was found on the consortium blockchain. Therefore, the system automatically redirected to the "Privacy and Authorization Center" initialization page and displayed a welcome message and initialization guide to Ms. Wang.
[0077] A2. Obtain the user's initialization settings, wherein the settings include at least one of the following: the comparison granularity of the comparison task, the size of the comparison range of the second information source to be compared, and whether to monitor the learning companion.
[0078] The system generates a public-private key pair locally and scans the node information of the consortium blockchain network, such as a consortium blockchain composed of parents in a class, in preparation for subsequent broadcasts.
[0079] The comparison granularity can be either ① comparing only the task item name, or ② comparing the task item name plus detailed information.
[0080] The comparison scope can be ① limited to classmates, or ② limited to the friend list, or ③ all members of the consortium blockchain.
[0081] Whether to monitor the learning companion can be either ① enable monitoring, or ② plan without monitoring.
[0082] After Ms. Wang made her selections, a "Preview Settings Summary" was displayed at the bottom of the page. After confirming that everything was correct, she clicked "Next".
[0083] A3. In response to the user's signature confirmation instruction for the setting information, broadcast the user's signature and the setting data corresponding to the setting information to the consortium blockchain.
[0084] Ms. Wang opened the parent-side app on her phone and scanned the QR code on the screen. The app displayed a digest to be signed. After verifying that it was correct, she signed the digest using her private key. The signed result, along with the original settings data, the device's public key, and the timestamp, was sent to the consortium blockchain node via the parent-side app.
[0085] The consortium blockchain node verifies the signature's validity by decrypting it using Ms. Wang's public key and comparing it with the digest. Once the signature matches and the authorization record is confirmed as legitimate, it is packaged into a new block. This block contains the following key information: Xiaoming's device's anonymous identifier, Ms. Wang's public key, the authorization policy, Ms. Wang's signature on the policy, and the effective timestamp.
[0086] Once a block is uploaded to the blockchain, it becomes an immutable authorization record. Other nodes on the consortium blockchain can verify the existence of the device's authorization, but cannot decrypt the specific policy content.
[0087] The basis for generating zero-knowledge proofs: After the authorization record is uploaded to the blockchain, when Xiaoming's device needs to initiate a query later, it can generate a zero-knowledge proof based on this record to prove that "he has legal authorization and the scope of the authorization includes this query," without revealing the specific authorization content or the guardian's identity. The verifier only needs to verify whether the proof matches a record on the blockchain.
[0088] Ms. Wang can view, modify, or revoke authorization at any time through the parent app. Any modification will generate a new authorization record and append it to the blockchain; the old record is marked as invalid but is still retained as an audit trail. At this time, Xiaoming's device enters standby mode. When Xiaoming starts studying in the evening, the system responds to the planning instruction and can seamlessly execute the zero-knowledge proof query in step S110 (corresponding to S1102-S1104) based on the existing on-chain authorization, without needing to request authorization again.
[0089] By mandating the completion of privacy settings before enabling the feature, all subsequent data operations are guaranteed to have clear permission guidelines, comply with privacy design principles, and avoid the risk of subsequent accountability or unauthorized data collection.
[0090] It offers fine-grained configuration options in multiple dimensions such as comparison granularity and comparison range, allowing guardians to flexibly adjust according to their own privacy preferences, thus achieving a balance between security and functionality.
[0091] In step S110, in response to the planning instruction for the planned path, the first job task is obtained from the first information source and the second job task from the second information source's perspective is obtained from the second information source, including steps S1101-S1104:
[0092] S1101. In response to the planning instructions for the planned path, the system autonomously navigates the graphical user interface from the first information source based on a preset visual language model and extracts the first task during navigation.
[0093] This application demonstrates how to use VLM to overcome the perception blind spots of traditional OCR and autonomously obtain tasks from a deep GUI. More importantly, it demonstrates how to break down "data silos" using zero-knowledge proofs under strict privacy protection, and complete the collaborative acquisition of multi-source data without disclosing user privacy. This solves the privacy paradox problem mentioned in the background technology and lays the foundation for generating a high-confidence complete task list.
[0094] For example, when Xiaoming prepares to start his evening study, his smart learning companion device, i.e., the Agent on the target information source, receives the instruction to "start planning tonight's homework." In order to obtain a complete task list, the system initiates a multi-source task aggregation process.
[0095] Autonomous GUI Exploration and First Assignment Task Extraction Based on a Vision Language Model (VLM). In response to the planning instruction, the system first activates the autonomous GUI exploration module. This module calls the pre-trained VLM, automatically logs in, and navigates to the graphical user interface of the official application where teachers post assignments, such as the Smart Campus mini-program. By understanding the semantic information of the screenshots, the VLM simulates human click operations, delving deeper until it finds the assignment posting section. During this process, the VLM identifies and extracts the text information posted by the teacher, and after structured processing, obtains the first assignment task: "Complete page 20 of 'Basic Training'". This task is temporarily marked as the task item T_lang to be merged.
[0096] S1102. Generate a zero-knowledge proof regarding the user-authorized query, wherein the query permission content is the homework task in other smart learning companion devices on the same consortium blockchain;
[0097] Xiaoming's device is the master agent, and Xiaohong's device is the slave agent. A zero-knowledge proof is generated for social queries. To verify whether there are more missing tasks, such as English homework, the system triggers a multi-agent collaborative verification function. Before the query, the system calls the locally stored user authorization information to generate a zero-knowledge proof (ZKP) for "user-authorized query." This proof demonstrates to the verifier (the second source, Xiaohong's device) that Xiaoming's guardian, Ms. Wang, has authorized Xiaoming's device to query the names of his classmates' homework assignments on the consortium blockchain through the parental control panel, without revealing Ms. Wang's identity, Xiaoming's specific class, or his friend list.
[0098] S1103. Send the zero-knowledge proof to the second information source for verification by the second information source;
[0099] The device broadcasts a zero-knowledge proof to initiate a verification request. Xiaoming's smart learning companion encapsulates this zero-knowledge proof into a query request and broadcasts it to online devices in its social graph via a distributed network based on a consortium blockchain. This request does not contain any real user identity information such as phone numbers or device IDs; it only contains the proof data to be verified and an anonymous session identifier.
[0100] S1104. In response to the message that the second information source has been verified, query the job task from the second information source to obtain the second job task from the perspective of the second information source. During the query, the user's user information is not disclosed to the second information source being queried.
[0101] After successful verification, an anonymous query and second task retrieval are performed. The second information source located in the network receives the query request. Its built-in verification module uses the public consensus mechanism on the consortium blockchain to verify the validity of the zero-knowledge proof. After successful verification, Xiaohong's agent recognizes that this is a legitimate authorization request and responds to the query. Xiaohong's device retrieves the currently existing task that is allowed to be shared according to the authorization granularity from its local task list: "Read the text aloud for 10 minutes". This task information (T_eng) is returned to Xiaoming's device through an encrypted channel. Throughout the query process, Xiaohong's device only knows that a legitimate authorized device is querying the task, but cannot know who the user of the device is, thus protecting the privacy of user information. At this point, the system has successfully aggregated the first task from the first information source and the second task from the second information source, providing a data foundation for the subsequent confidence fusion steps.
[0102] S120. Combining the corresponding confidence level preset for each information source, and based on the locally pre-stored target task, the first task, and the second task from its own perspective, determine the task list.
[0103] As an example, Xiaoming's smart learning companion device has completed multi-source task aggregation and currently holds three types of task data: ① Homework tasks pre-stored locally on the device, manually entered by parent Ms. Wang through the App or recognized by OCR; ② The first homework task extracted from the teacher's mini-program through VLM autonomous navigation; ③ The second homework task obtained from Xiaohong's device through anonymous querying using zero-knowledge proof. At this point, the system enters the confidence fusion stage to determine which tasks should be included in the final execution list.
[0104] The task list is generated based on dynamic confidence level comparison.
[0105] In this embodiment, step S120 indicates that after obtaining multi-source heterogeneous task data, the final task list is determined by combining the corresponding confidence level preset for each information source, the locally stored task from the perspective of the target information source itself, the first task obtained from the first information source, and the second task obtained from the second information source, through confidence level threshold comparison and union operation.
[0106] Threshold comparison of confidence levels for multiple information sources. The system calls the confidence fusion engine to read the conditional probability tables of each information source stored in the distributed ledger and obtain the current confidence weight of each information source for the authenticity of the task. In this embodiment, the preset confidence levels of each information source are as follows:
[0107] The target information source is a locally stored parent app. Confidence level WA = 0.92. This information source originates from screenshots of Chinese homework scanned by parents using the App's OCR function. Since parents directly confirm the data, and historical data shows extremely high accuracy, a high confidence level is assigned.
[0108] The primary source of information for the teacher's mini-program. With a confidence level of WB=0.85, this source is derived from math assignments extracted via the VLM autonomous navigation. Although the VLM model possesses a high level of interface understanding, considering that the mini-program interface may undergo layout changes due to version updates, resulting in a certain probability of error, the confidence level is slightly lower than that of the parent's app.
[0109] Second source – Xiaohong device. Confidence level WC=0.78. This source originates from the English assignment returned by the multi-agent collaborative network. Due to factors such as network transmission and delays in updating the task status of the other device, its confidence level is relatively the lowest.
[0110] S120 combines the corresponding confidence level preset for each information source with the locally pre-stored target task, the first task, and the second task from its own perspective to determine the task list, including steps B1-B2:
[0111] B1. Compare the preset confidence level of each information source with the preset confidence level threshold respectively;
[0112] The system has a preset confidence threshold of 0.75, and the confidence fusion engine compares each one.
[0113] WA=0.92>0.75, passed;
[0114] WB=0.85>0.75, passed;
[0115] WC=0.78>0.75, passed.
[0116] B2. For information sources whose confidence exceeds a preset confidence threshold, the corresponding job tasks are combined to obtain a job task list. If the confidence of each information source exceeds the threshold, the local pre-stored target job task, the first job task, and the second job task from the user's perspective are combined to obtain the job task list.
[0117] Since the confidence levels of all three information sources exceed the preset threshold, the system determines that the task information provided by these three sources is high-confidence and valid, and should be included in the final execution list. The confidence fusion engine performs a union operation to merge the task items from the three information sources.
[0118] Tasks provided by the target information source:
[0119] TA={Subject: 'Chinese Language', Content: 'Complete page 20 of the Basic Training Workbook'}
[0120] Tasks provided by the first information source:
[0121] TB={Subject: 'Mathematics', Content: 'One mental arithmetic flashcard'}
[0122] The tasks provided by the second information source:
[0123] TC={Subject: 'English', Content: 'Read aloud the text for 10 minutes'}
[0124] Since the three tasks belong to different subjects and there are no duplicates, a union operation is performed. The union result is a simple combination of the three.
[0125] The union result Lfull = {Chinese homework: 'Complete page 20 of "Basic Training"', Math homework: 'One mental arithmetic flashcard', English homework: 'Read the text aloud for 10 minutes'}
[0126] This result is the final task list, which is output to the path planning engine as input for subsequent reinforcement learning decisions.
[0127] This application introduces a pre-set reliability threshold comparison mechanism to avoid subjective judgment and ensure that only tasks that meet the reliability standard can be included in the execution list, preventing low-quality or erroneous information from interfering with learning planning. When all information sources reach the threshold, a union operation strategy is adopted to aggregate information from parents, teachers, and peers to the maximum extent, forming a complete and comprehensive task view, thus solving the technical problem of task omissions caused by limited information acquisition in the background technology.
[0128] S120, after determining the task list based on the pre-set confidence level of each information source, the locally pre-stored target task, the first task, and the second task from its own perspective, includes steps C1-C4:
[0129] C1. Detect whether there is a conflict in the content of the target task, the first task, and the second task;
[0130] In this embodiment, following step S120 which determines the task list, the system further executes a conflict detection and handling process to ensure the accuracy of the list and the dynamic optimization of source weights. After generating the task list, Xiaoming's smart learning companion device performs consistency checks on each task in the list. The system finds that for the "Mathematics" subject, there are two conflicting task items:
[0131] Target information source: The content is "one mental arithmetic card";
[0132] First source: The content is "Complete page 5 of the workbook".
[0133] These two tasks cover the same subject but have significantly different descriptions. "Mental arithmetic flashcards" and "workbook" refer to different types of assignments and cannot be merged into the same task through semantic matching. At this point, the second information source provides no information about the math assignments and therefore does not participate in the conflict determination. The system determines that there is a conflict in the assignment content and triggers the conflict resolution process.
[0134] The system iterates through each item in the comprehensive list, categorizes them by subject, and compares their descriptions. For the mathematics subject, if two different descriptions are detected and cannot be determined to be the same task using the preset semantic similarity algorithm, they are marked as "conflict state," the conflict pair (target source and first source) is recorded, and the conflict resolution subprocess is initiated.
[0135] C2. If a conflict is determined to exist, then based on the conflict situation of the job content, calculate the posterior probability when one of the conflict situations is true, wherein the conflict situation includes not only the one party, but also the opposing party.
[0136] If a conflict is determined, the posterior probability of one of the conflicting scenarios being true is calculated based on the conflicting nature of the task content.
[0137] C3. If it is determined that the posterior probability exceeds a preset probability threshold, the task list is updated based on one of the conflict situations.
[0138] The system's preset conflict resolution probability threshold is 0.75. The calculated posterior probability 0.770 > 0.75, therefore, the credibility of the target source content "one mental arithmetic card" exceeding the threshold is determined, and the system will adopt the target source's report as a more reliable basis. The task list will then be updated based on one of the conflict scenarios, and the conflicting item, the workbook, will be removed from the comprehensive list. The updated task list is then temporarily stored, awaiting user confirmation.
[0139] The system presents the updated list to users through multimodal interactive interfaces such as push notifications to the parent's app and voice broadcasts to the child's app. In this scenario, after reviewing the list, Ms. Wang, Xiaoming's mother, confirmed that the math homework was indeed "one mental arithmetic flashcard" (real feedback) by comparing it with messages in the class group or by contacting the teacher directly.
[0140] C4. In response to the user's confirmation instruction on the updated job task list, update the confidence level corresponding to each information source, wherein the confidence level of the information source corresponding to one of the conflicting situations is increased to improve its influence, and the execution level of the information source corresponding to the opposing side of the conflicting situation is decreased to reduce its influence.
[0141] Ms. Wang, Xiaoming's mother, clicked "Confirm" on the parent-side app. After receiving this confirmation, the system triggered the confidence update process.
[0142] Since the target information source's report is correct, the system fine-tunes its conditional probability table using the expectation-maximization algorithm or gradient descent method, thereby increasing its confidence level and enhancing the source's influence in future decisions.
[0143] The primary source reported an error. The system adjusted its CPT (Consciousness Level) and lowered its confidence level to weaken the source's influence on future decision-making.
[0144] In this application, because the process of updating the confidence level for each information source is non-linear and adaptive, it ensures that the weights of each information source can be dynamically adjusted based on historical performance, thereby continuously improving the accuracy of the overall fusion decision. The updated confidence levels are re-encrypted and stored in the distributed ledger for use in subsequent fusion tasks.
[0145] S130. Based on locally recorded user learning preferences and user learning efficiency distribution patterns, the task list is planned to obtain the specific learning path corresponding to the task list.
[0146] The system first accesses the user's historical behavior database stored locally to extract Xiaoming's user learning preference characteristics recorded locally. These characteristics are generated by combining long-term recorded user selection behaviors, such as the order of subjects actively chosen each time learning begins in the past week, the user's tendency adjustment patterns in the negotiation history, and the preferences set by the parents on the parent's end, such as "the child is more interested in math".
[0147] A user's learning efficiency is not uniform but fluctuates. For example, Xiaoming's efficiency is lower when performing calculations immediately after a meal. Similar patterns of fluctuating learning efficiency exist. Based on locally recorded past learning history or user settings, a predicted pattern of the user's learning efficiency distribution can be generated. By combining the user's subjective learning preferences with the objective pattern of learning efficiency distribution, a specific learning path can be planned based on the list of tasks to be completed.
[0148] S130, based on locally recorded user learning preferences and user learning efficiency distribution patterns, plans the task list to obtain the specific learning path corresponding to the task list, including steps S1301-S1304:
[0149] S1301. Based on the user's learning preferences recorded locally, prioritize each task in the task list to obtain a preliminary sorting table of tasks to be completed.
[0150] Historical database data shows that the preference weight vector output by the preference model is as follows: Chinese preference weight: Plang=0.6, Math preference weight: Pmath=0.8, English preference weight: Peng=0.3. This indicates that Xiaoming has a slight aversion to "English repetition". The weights reflect Xiaoming's highest preference for Math and the lowest weight for English, indicating a slight aversion. The system initially sorts the assignments according to their preference weights from highest to lowest, resulting in a preliminary ranking table. This preliminary ranking table only considers the user's static preferences.
[0151] Math (one mental arithmetic flashcard) — Preference weight 0.8
[0152] Chinese Language (Complete page 20 of the "Basic Training" workbook) — Preference weight 0.6
[0153] English (read the text aloud for 10 minutes) — Preference weight 0.3.
[0154] S1302. Based on the distribution pattern of user learning efficiency, the current learning efficiency of the user is estimated to obtain the learning estimated efficiency.
[0155] After dinner, Xiaoming suggested, "I want to do English first." The negotiation module detected that this suggestion conflicted with the model's "optimal" recommendation. It activated the conflict resolution mechanism, responding, "Finishing English first is great! However, we just finished eating, and our brains haven't rested enough. How about we spend 15 minutes on Chinese first to warm up, and then go all out on English? That might be faster!" It provided a reason for "warming up" and an incentive for "faster" progress.
[0156] Since users' learning efficiency in math is low after meals, and their preference for English is the lowest, the model's optimal suggestion that they should complete English first after meals is conflicting. Therefore, a negotiation process with the user is initiated. Based on the distribution pattern of user learning efficiency, the user's current learning efficiency is estimated to obtain the predicted learning efficiency.
[0157] The system invokes the user learning efficiency prediction module, which maintains a learning efficiency curve model trained based on historical data. This model takes time, subject, and user physiological state (such as current heart rate and fatigue level) as input, obtained through a multimodal perception module, and outputs the user's estimated learning efficiency for each subject at the current moment, such as the number of knowledge points completed per minute or task progress. In this embodiment, the current time is 7:10 PM, and Xiaoming has just finished eating. Historical data shows that within 30 minutes after eating, Xiaoming's efficiency in mathematical calculation tasks is low because digestion consumes cognitive resources; the estimated efficiency coefficient for mathematics is 0.7. Chinese reading tasks are less affected by post-meal state, with an estimated efficiency coefficient of 0.9. English reading tasks require oral output; post-meal state has little impact on pronunciation, but based on historical data, English efficiency is moderate around 7 PM, with an estimated efficiency coefficient of 0.85. The system also considers the user's current state as reflected by the multimodal perception module: Xiaoming's facial expressions and posture indicate that he is relaxed and shows no signs of fatigue; therefore, the above estimated efficiency coefficients do not need to be further reduced.
[0158] S1303. Based on the learning prediction efficiency, perform a second sort on the preliminary sorting table to obtain a second sorting table;
[0159] The system comprehensively considers the preference weights and estimated efficiency in the initial ranking table and designs a comprehensive scoring function to adjust the order. This embodiment uses a weighted product method to calculate the dynamic execution suitability of each task. The tasks are then re-ranked from highest to lowest to obtain a secondary ranking table.
[0160] S1304. Based on the secondary sorting table, determine the specific learning path corresponding to the task list.
[0161] The system outputs a secondary sorting table as the initial recommended learning path. Specifically, it suggests that Xiaoming complete his homework in the following order: first, do basic Chinese language exercises; then, do English reading practice; and finally, do mental math flashcards.
[0162] This path is encapsulated as specific learning path data and passed to the next human-computer negotiation module. Simultaneously, the system presents the path information to Xiaoming and his guardian in visual or audio form.
[0163] By introducing a two-stage ranking system that combines static factors of user learning preferences with dynamic factors of learning efficiency prediction, the learning path respects user habits and adapts to the current physiological state, avoiding a rigid, fixed sequence and improving the executability of the path and the user experience.
[0164] Both preferences and efficiency are derived from modeling and analysis of historical data. This data-driven approach demonstrates the system's learning capabilities and provides a solid initial strategic foundation for subsequent, more advanced reinforcement learning planning. The transparent two-stage ranking process helps users, especially guardians, understand why the system makes these recommendations, thus enhancing human-machine trust.
[0165] The above are the target information source's recommendations for user learning. Whether or not to actually implement them rests with the user; this decision-making power is given to the user. After recommending a specific learning path to the user, obtain their feedback.
[0166] In some embodiments, the user accepts a specific learning path recommended by the target information source. Xiaoming accepts the agent's proposal. This successful negotiation interaction—conflict-solution-acceptance—is recorded as high-quality training data for future fine-tuning of the reinforcement learning model, making its suggestions and negotiation language more in line with Xiaoming's habits.
[0167] In some embodiments, the user does not want to accept the specific learning path recommended by the target information source, that is, there is a conflict in the negotiation. The conflict resolution mechanism is as follows.
[0168] S130, based on locally recorded user learning preferences and user learning efficiency distribution patterns, plans the task list to obtain the specific learning path corresponding to the task list, including steps D1-D5:
[0169] D1. In response to the user's negative instruction on the specific learning path, obtain the self-planned path input by the user;
[0170] The user's actions led to the rejection of the planned learning path, and instead, they independently proposed their own learning path.
[0171] D2. If a conflict is detected between the self-planned path and the specific learning path, then based on the user's learning efficiency distribution pattern, it is recommended that the user pause both the self-planned path and the specific learning path.
[0172] The system detects whether there is a conflict between the self-planned path and the specific learning path. If so, it suggests that the user pause both the self-planned path and the specific learning path based on the user's learning efficiency distribution pattern.
[0173] D3. Based on the distribution pattern of user learning efficiency, push learning suggestions to the user that are conducive to learning the self-planned path, wherein the learning suggestions are completed before the user begins to complete the self-planned path;
[0174] For example, the target information source recommends that the user study mathematics first, while the user suggests studying English first. Studying Chinese after meals is less stressful and conducive to subsequent learning tasks. Therefore, based on the distribution pattern of the user's learning efficiency, learning suggestions that are conducive to the user's self-planned learning path are pushed to the user.
[0175] The path planning engine receives a state vector containing three tasks. It outputs an action probability distribution, such as Chinese 0.6, English 0.3, and Math 0.1. The system therefore recommends Chinese. When Xiaoming suggests doing English first, calculations show a decrease of 0.2. Since 0.2 exceeds the negotiation threshold of 0.15, the agent triggers negotiation. Its built-in language model receives a prompt: "The user wants to do English first, but this will cause a decrease of 0.2 in value because historical data shows he needs a longer startup time before doing English. Therefore, a friendly message encouraging him to do Chinese first is generated."
[0176] D4. In response to the user's instruction to accept the learning suggestion, update the specific learning path based on the learning suggestion to obtain the updated specific learning path;
[0177] D5. Save the negotiation record with the user for future adjustments and learning.
[0178] In some embodiments, this negotiation interaction is also recorded as training data for future fine-tuning of the reinforcement learning model, so that its suggestions and negotiation language are more in line with Xiaoming's habits.
[0179] S140. Push the specific learning path to the user and start monitoring the user's learning status based on the internally preset monitoring module;
[0180] This application embeds a monitoring module to monitor user learning and provide personalized planning throughout the user's learning process.
[0181] Xiaoming's smart learning companion device has generated a recommended learning path: first, do math mental arithmetic flashcards; then, do basic Chinese language exercises; and finally, do English reading aloud. The system pushes this path to Xiaoming via voice and on-screen animation. Xiaoming clicks "Start Learning" to officially begin the math homework session. At the same time, the system activates the device's built-in multimodal perception module, beginning to monitor Xiaoming's behavior and emotions without any disturbance throughout the process.
[0182] The monitoring module includes a visual monitoring module and an acoustic monitoring module. In step S140, the specific learning path is pushed to the user and the user's learning progress is monitored based on the internally preset monitoring module, including steps S1401-S1405:
[0183] S1401. Push the specific learning path to the user;
[0184] The system's accompanying agent presents the learning path to Xiaoming in a friendly and approachable manner. The specific implementation is as follows: (1) Visual presentation: The device screen displays a cartoon task list, with math tasks listed first and marked with a "Start" button, followed by Chinese and English tasks, each with an estimated duration indicated by a small icon. (2) Voice broadcast: The agent says in an encouraging tone, "Xiaoming, let's first challenge the math mental arithmetic cards, then do basic Chinese training, and finally complete the English reading practice. Good luck!" Xiaoming clicks the "Start" button to confirm the interaction. The system records the completion of the path push and enters the learning monitoring phase.
[0185] S1402. The internal visual monitoring module captures the user's facial expressions and postures during learning, and performs image recognition based on the captured images to obtain image features;
[0186] During the math homework assignment, the device's front-facing camera continuously captures Xiaoming's facial images and upper body posture at a rate of 5 frames per second. A lightweight visual model running on the device (based on the MobileNet-V3 architecture) processes each frame in real time, extracting two types of keypoint features: facial keypoints and posture keypoints. The coordinates of 68 facial keypoints are used to represent facial expression changes, such as eyebrow angles and mouth corner curvature. The coordinates of 17 body keypoints are used to determine sitting posture, such as whether the person is leaning on the table or tilting their head. Every 16 frames constitute a visual feature sequence. That is, each keypoint has x and y coordinates, totaling (68+17)*2=170 dimensions, so the dimension is 16×170. This sequence is temporarily stored in a memory buffer, awaiting alignment and fusion with acoustic features.
[0187] S1403. The user's voice during learning is acquired based on the internal acoustic monitoring module, and voice recognition is performed based on the acquired voice to obtain voice features;
[0188] The device's built-in microphone continuously collects ambient sound at a sampling rate of 16kHz. A lightweight audio model running on the device extracts acoustic features in 1-second windows. Every second, 40-dimensional Mel-frequency cepstral coefficients are extracted as the acoustic feature vector for that moment. This process is synchronized with the vision module. After feature extraction, the visual and acoustic feature sequences are fed into a multimodal fusion model.
[0189] S1404. Perform multimodal fusion of the image features and the sound features to obtain a state evaluation result of the user's current learning state;
[0190] A multimodal emotion recognition model based on the Transformer architecture is deployed on the system call side. The model consists of two main parts: a feature encoding part and a Transformer fusion layer.
[0191] In the feature encoding part, the visual feature sequence and the acoustic feature sequence are mapped to the same embedding dimension through a linear projection layer to obtain the visual embedding sequence and the acoustic embedding sequence.
[0192] The Transformer fusion layer concatenates two sequences into a single 32-bit sequence and adds positional encoding. This sequence is then fed into a Transformer encoder with four layers of multi-head self-attention. The self-attention mechanism captures temporal correlations within the same modality, such as the gradual changes in facial expressions, while cross-modal attention captures the interaction between visual and acoustic features, such as the simultaneous occurrence of frowning and sighing.
[0193] Take the vector corresponding to the last visual frame in the Transformer output sequence, pass it through a two-layer MLP classification head, and output the probability distribution of the user's current state in multiple dimensions.
[0194] S1405. If there is an abnormal state in the state evaluation result and it lasts for a preset time, then the current learning situation is judged to be abnormal.
[0195] Abnormal states are defined as follows: when the probability of a certain state exceeds a threshold of 0.8, that state is considered activated. If the same abnormal state is activated continuously for more than a preset threshold, such as frustration lasting for 30 seconds, confusion lasting for 60 seconds, or distraction lasting for 90 seconds, then the learning situation is considered abnormal.
[0196] In this scenario, Xiaoming gets stuck on a multiplication problem with carrying while doing mental math. He starts frowning, sighing, and slumping onto the table. The multimodal fusion model provides continuous output. Through precise perception via multimodal fusion, it accurately captures the user's true emotional state, avoiding misjudgments from single-modal approaches. The lightweight on-device model ensures low-latency processing, allowing abnormal states to be detected promptly. Continuous monitoring provides real-time data input for subsequent dynamic adjustments, enabling the system to flexibly optimize the learning plan based on the user's state.
[0197] S150. If the learning situation is determined to be abnormal based on the monitored learning situation, then the intervention measures corresponding to the abnormality level are selected to intervene in the user until it is determined that the user has completed the specific learning path.
[0198] The system determines that the current learning situation is abnormal and triggers an abnormal event. This event is encapsulated as a record containing a timestamp, anomaly type of frustration, intensity of 0.88, and duration of 31 seconds, and is passed to the accompanying adjustment module to initiate the corresponding intervention strategy.
[0199] This application provides a method, apparatus, and device for planning personalized learning paths through multi-agent collaboration. Compared to existing technologies that suffer from low personalization and lack differentiated learning paths tailored to individual user circumstances, this application addresses the issue of personalized learning path planning through multi-agent collaboration. The method is applied to a target information source within a system that also includes at least a first information source and a second information source. The first information source interacts with the target information source via a graphical user interface and stores published assignments. The second information source and the target information source are both intelligent learning companion devices on the same blockchain. The method includes: responding to a planning instruction for the planned path by obtaining the issued assignment from the first information source. The application involves obtaining a first task from multiple sources and a second task from the perspective of the second source. Combining the pre-set confidence levels of each source with the locally stored target task, the first task, and the second task, a task list is determined. Based on locally recorded user learning preferences and user learning efficiency distribution patterns, the task list is planned to obtain a specific learning path. This specific learning path is then pushed to the user, and the user's learning progress is monitored using an internally pre-set monitoring module. If the monitored learning progress indicates an abnormality, intervention measures corresponding to the abnormality level are selected to intervene until the user completes the specific learning path. In this application, task assignments from multiple sources are obtained from their respective perspectives, eliminating reliance on traditional OCR technology. Learning tasks are robustly aggregated within complex information. The task list generated by combining the confidence levels of each source more closely reflects real-world learning situations. Personalized planning based on user preferences and learning patterns achieves a fundamental leap from passive response to proactive learning support. It also monitors and intervenes in the subsequent learning process, fully understands the user's internal state, and the new personalized learning path planning is deeply aligned with real application scenarios.
[0200] It should be understood that the sequence number of each step in the above embodiments does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application.
[0201] In one embodiment, such as Figure 2This invention provides a multi-agent collaborative personalized learning path planning device. The multi-agent collaborative personalized learning path planning method is applied to a target information source in a multi-agent collaborative personalized learning path planning system. The device is deployed at the target information source. The system further includes at least a first information source and a second information source. The first information source interacts externally via a graphical user interface and stores published job tasks. The second information source and the target information source are both intelligent learning companion devices on the same consortium blockchain. This multi-agent collaborative personalized learning path planning device corresponds one-to-one with the multi-agent collaborative personalized learning path planning method described in the above embodiments. Figure 2 As shown, the multi-agent collaborative personalized learning path planning device includes an acquisition module 101, a task list determination module 102, a path planning module 103, a learning monitoring module 104, and an intervention module 105. Detailed descriptions of each functional module are as follows:
[0202] The acquisition module 101 is used to, in response to the planning instructions for the planned path, acquire the first job task published from the first information source and the second job task from the second information source from the perspective of the second information source;
[0203] The task list determination module 102 is used to determine the task list based on the corresponding confidence level preset for each information source, the locally pre-stored target task, the first task, and the second task from its own perspective.
[0204] The path planning module 103 is used to plan the task list based on locally recorded user learning preferences and user learning efficiency distribution patterns to obtain the specific learning path corresponding to the task list.
[0205] The learning monitoring module 104 is used to push the specific learning path to the user and start monitoring the user's learning status based on the internally preset monitoring module;
[0206] The intervention module 105 is used to select the intervention measures corresponding to the abnormality level to intervene in the user if the learning situation is determined to be abnormal based on the monitored learning situation, until it is determined that the user has completed the specific learning path.
[0207] In one possible implementation of this application, the acquisition module 101, while executing the planning instruction in response to the planned path, acquires a first job task published from the first information source and a second job task from the second information source's perspective, specifically for:
[0208] In response to the planning instructions for the planned path, the system autonomously navigates the graphical user interface from the first information source based on a preset visual language model and extracts the first task during navigation;
[0209] Generate a zero-knowledge proof regarding the user-authorized query, wherein the query permission content is the homework task in other smart learning companion devices on the same consortium blockchain;
[0210] The zero-knowledge proof is sent to the second information source for verification by the second information source;
[0211] In response to the message that the second source has been verified, the job task is queried from the second source to obtain the second job task from the perspective of the second source. During the query, the user's user information is not disclosed to the second source being queried.
[0212] In one possible implementation of this application, after determining the task list based on the pre-stored target task, the first task, and the second task from its own perspective, by combining the corresponding confidence level preset for each information source, the multi-agent collaborative personalized learning path planning device further includes a confidence level update module. The confidence level update module is specifically used for:
[0213] Detect whether there is a conflict in the content of the target task, the first task, and the second task;
[0214] If a conflict is determined to exist, then based on the conflict situation of the task content, calculate the posterior probability when one of the conflict situations is true, wherein the conflict situation includes not only the one party, but also the opposing party;
[0215] If it is determined that the posterior probability exceeds a preset probability threshold, the task list is updated based on one of the conflict scenarios.
[0216] In response to the user's confirmation instruction on the updated task list, the confidence level of each information source is updated, wherein the confidence level of the information source corresponding to one of the conflicting situations is increased to enhance its influence, and the execution level of the information source corresponding to the opposing side of the conflicting situation is decreased to reduce its influence.
[0217] In one possible implementation of this application, the path planning module 103, after executing the user learning preferences and user learning efficiency distribution patterns based on locally recorded data, plans the task list to obtain the specific learning path corresponding to the task list, specifically for:
[0218] Based on the user's learning preferences recorded locally, the tasks in the task list are prioritized to obtain a preliminary sorting table of tasks to be completed.
[0219] Based on the distribution pattern of user learning efficiency, the current learning efficiency of the user is estimated to obtain the learning estimated efficiency.
[0220] Based on the learning prediction efficiency, the initial sorting table is sorted a second time to obtain a secondary sorting table;
[0221] Based on the secondary sorting table, the specific learning path corresponding to the task list is determined.
[0222] In one possible implementation of this application, after planning the task list based on locally recorded user learning preferences and user learning efficiency distribution patterns to obtain the specific learning path corresponding to the task list, the multi-agent collaborative personalized learning path planning device further includes a negotiation module, which is specifically used for:
[0223] In response to the user's negative instruction regarding the specific learning path, obtain the user's self-planned path.
[0224] If a conflict is detected between the self-planned path and the specific learning path, the user is advised to pause both the self-planned path and the specific learning path based on the user's learning efficiency distribution pattern.
[0225] Based on the distribution pattern of user learning efficiency, learning suggestions that are conducive to learning the self-planned path are pushed to the user, wherein the learning suggestions are completed before the user begins to complete the self-planned path;
[0226] In response to the user's instruction to accept the learning suggestion, the specific learning path is updated based on the learning suggestion to obtain the updated specific learning path;
[0227] Save the record of this negotiation with the user for future adjustments and learning.
[0228] In one possible implementation of this application, the learning monitoring module 104, when executing the monitoring module including a visual monitoring module and an acoustic monitoring module, pushes the specific learning path to the user and begins monitoring the user's learning progress based on the internally preset monitoring module, specifically for:
[0229] The specific learning path will be pushed to the user;
[0230] The system uses an internal visual monitoring module to capture the user's facial expressions and postures while learning, and performs image recognition based on the captured images to obtain image features.
[0231] The system acquires the user's voice during learning based on the internal acoustic monitoring module, and performs voice recognition based on the acquired voice to obtain voice features.
[0232] The image features and the sound features are fused in a multimodal manner to obtain a state evaluation result of the user's current learning state;
[0233] If an abnormal state exists in the state evaluation result and continues for a preset time, then the current learning situation is judged to be abnormal.
[0234] In one possible implementation of this application, before the multi-agent collaborative personalized learning path planning device obtains the first job task published from the first information source and the second job task from the second information source's perspective in response to the planning instruction for the planning path, the multi-agent collaborative personalized learning path planning device further includes an initialization module, which is specifically used for:
[0235] Upon first launch, request the user to perform initial settings.
[0236] Obtain the user's initialization settings, wherein the settings include at least one of the following: the comparison granularity of the comparison task, the size of the comparison range of the second information source to be compared, and whether to monitor the learning companion.
[0237] In response to the user's signature confirmation instruction for the setting information, the user's signature and the setting data corresponding to the setting information are broadcast to the consortium blockchain.
[0238] In one possible implementation of this application, the job list determination module 102, when performing the task, determines a job list based on the pre-stored target job task, the first job task, and the second job task from its own perspective, by combining the corresponding confidence levels preset for each information source. Specifically, this is used for:
[0239] The preset confidence level for each information source is compared with the preset confidence level threshold.
[0240] For information sources whose confidence exceeds a preset confidence threshold, the corresponding tasks are combined to obtain a task list. If the confidence of each information source exceeds the threshold, the local pre-stored target task, the first task, and the second task from the user's perspective are combined to obtain the task list.
[0241] This application provides a method, apparatus, and device for planning personalized learning paths through multi-agent collaboration. Compared to existing technologies that suffer from low personalization and lack differentiated learning paths tailored to individual user circumstances, this application addresses the issue of personalized learning path planning through multi-agent collaboration. The method is applied to a target information source within a system that also includes at least a first information source and a second information source. The first information source interacts with the target information source via a graphical user interface and stores published assignments. The second information source and the target information source are both intelligent learning companion devices on the same blockchain. The method includes: responding to a planning instruction for the planned path by obtaining the issued assignment from the first information source. The application involves obtaining a first task from multiple sources and a second task from the perspective of the second source. Combining the pre-set confidence levels of each source with the locally stored target task, the first task, and the second task, a task list is determined. Based on locally recorded user learning preferences and user learning efficiency distribution patterns, the task list is planned to obtain a specific learning path. This specific learning path is then pushed to the user, and the user's learning progress is monitored using an internally pre-set monitoring module. If the monitored learning progress indicates an abnormality, intervention measures corresponding to the abnormality level are selected to intervene until the user completes the specific learning path. In this application, task assignments from multiple sources are obtained from their respective perspectives, eliminating reliance on traditional OCR technology. Learning tasks are robustly aggregated within complex information. The task list generated by combining the confidence levels of each source more closely reflects real-world learning situations. Personalized planning based on user preferences and learning patterns achieves a fundamental leap from passive response to proactive learning support. It also monitors and intervenes in the subsequent learning process, fully understands the user's internal state, and the new personalized learning path planning is deeply aligned with real application scenarios.
[0242] Specific limitations regarding the device for personalized learning path planning in multi-agent collaboration can be found in the limitations of the method for personalized learning path planning in multi-agent collaboration mentioned above, and will not be repeated here. Each module in the aforementioned personalized learning path planning device for multi-agent collaboration can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in or independent of the processor in the computer device in hardware form, or stored in the memory of the computer device in software form, so that the processor can call and execute the corresponding operations of each module.
[0243] In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as follows: Figure 3As shown, the computer device includes a processor, memory, network interface, and database connected via a system bus. The processor provides computational and control capabilities. The memory includes non-volatile and / or volatile storage media and internal memory. The non-volatile storage media stores operating devices, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The network interface is used to communicate with external clients via a network connection. When executed by the processor, the computer program implements server-side functions or steps of a multi-agent collaborative personalized learning path planning method.
[0244] In one embodiment, a computer device is provided, which may be a client, and its internal structure diagram may be as follows: Figure 4 As shown, the computer device includes a processor, memory, network interface, display screen, and input devices connected via a device bus. The processor provides computational and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores operating devices and computer programs. The internal memory provides an environment for the operation of the operating devices and computer programs stored in the non-volatile storage media. The network interface is used to communicate with an external server via a network connection. When executed by the processor, the computer program implements client-side functions or steps of a multi-agent collaborative personalized learning path planning method.
[0245] In one embodiment, a computer device is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to perform the following steps:
[0246] In response to the planning instructions for the planned path, the system obtains the first job task published from the first information source and the second job task from the second information source from the perspective of the second information source.
[0247] Based on the corresponding confidence level preset for each information source, and the target task, the first task, and the second task stored locally from its own perspective, a task list is determined.
[0248] Based on locally recorded user learning preferences and user learning efficiency distribution patterns, the task list is planned to obtain the specific learning path corresponding to the task list.
[0249] The specific learning path is pushed to the user and the user's learning progress is monitored based on the internally preset monitoring module.
[0250] If the learning situation is determined to be abnormal based on the monitored learning situation, then the intervention measures corresponding to the abnormality level are selected to intervene in the user until it is determined that the user has completed the specific learning path.
[0251] In one embodiment, a computer-readable storage medium is provided having a computer program stored thereon, the computer program performing the following steps when executed by a processor:
[0252] In response to the planning instructions for the planned path, the system obtains the first job task published from the first information source and the second job task from the second information source from the perspective of the second information source.
[0253] Based on the corresponding confidence level preset for each information source, and the target task, the first task, and the second task stored locally from its own perspective, a task list is determined.
[0254] Based on locally recorded user learning preferences and user learning efficiency distribution patterns, the task list is planned to obtain the specific learning path corresponding to the task list.
[0255] The specific learning path is pushed to the user and the user's learning progress is monitored based on the internally preset monitoring module.
[0256] If the learning situation is determined to be abnormal based on the monitored learning situation, then the intervention measures corresponding to the abnormality level are selected to intervene in the user until it is determined that the user has completed the specific learning path.
[0257] It should be noted that the functions or steps that can be implemented by the computer-readable storage medium or computer device described above can be referred to the relevant descriptions on the server side and client side in the foregoing method embodiments. To avoid repetition, they will not be described one by one here.
[0258] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium. When executed, the computer program can include the processes of the embodiments of the above methods. Any references to memory, storage, databases, or other storage media used in the embodiments provided in this application can include non-volatile and / or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link DRAM (SLDRAM), RAMbus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
[0259] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the above-described functional modules and their division are merely examples. In practical applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the device can be divided into different functional modules to complete all or part of the functions described above.
[0260] The above-described embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit them. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this application, and should all be included within the protection scope of this application.
Claims
1. A personalized learning path planning method for multi-agent collaboration, characterized in that, The multi-agent collaborative personalized learning path planning method is applied to the target information source in a multi-agent collaborative personalized learning path planning system. The system further includes at least a first information source and a second information source. The first information source interacts externally via a graphical user interface and stores published assignments. The second information source and the target information source are both intelligent learning companion devices and are on the same consortium blockchain. The multi-agent collaborative personalized learning path planning method includes: In response to the planning instructions for the planned path, the system obtains the first job task published from the first information source and the second job task from the second information source from the perspective of the second information source. Based on the corresponding confidence level preset for each information source, and the target task, the first task, and the second task stored locally from its own perspective, a task list is determined. Based on the user's learning preferences recorded locally, the tasks in the task list are prioritized to obtain a preliminary sorting table of tasks to be completed. Based on the distribution pattern of user learning efficiency, the current learning efficiency of the user is estimated to obtain the learning estimated efficiency. Based on the learning prediction efficiency, the initial sorting table is sorted a second time to obtain a secondary sorting table; Based on the secondary sorting table, the specific learning path corresponding to the task list is determined; The specific learning path is pushed to the user and the user's learning progress is monitored based on the internally preset monitoring module. If the learning situation is determined to be abnormal based on the monitored learning situation, then the intervention measures corresponding to the abnormality level are selected to intervene in the user until it is determined that the user has completed the specific learning path.
2. The personalized learning path planning method for multi-agent collaboration according to claim 1, characterized in that, The method of responding to the planning instructions for the planned path, obtaining the first job task published from the first information source and obtaining the second job task from the second information source's perspective, includes: In response to the planning instructions for the planned path, the system autonomously navigates the graphical user interface from the first information source based on a preset visual language model and extracts the first task during navigation; Generate a zero-knowledge proof regarding the user-authorized query, wherein the query permission content is the homework task in other smart learning companion devices on the same consortium blockchain; The zero-knowledge proof is sent to the second information source for verification by the second information source; In response to the message that the second source has been verified, the job task is queried from the second source to obtain the second job task from the perspective of the second source. During the query, the user's user information is not disclosed to the second source being queried.
3. The personalized learning path planning method for multi-agent collaboration according to claim 1, characterized in that, After determining the task list by combining the corresponding confidence level preset for each information source and based on the locally pre-stored target task, the first task, and the second task from its own perspective, the process includes: Detect whether there is a conflict in the content of the target task, the first task, and the second task; If a conflict is determined to exist, then based on the conflict situation of the task content, calculate the posterior probability when one of the conflict situations is true, wherein the conflict situation includes not only the one party, but also the opposing party; If it is determined that the posterior probability exceeds a preset probability threshold, the task list is updated based on one of the conflict scenarios. In response to the user's confirmation instruction for the updated job task list, the confidence level of each information source is updated, wherein the confidence level of the information source corresponding to one of the conflicting situations is increased to enhance its influence, and the confidence level of the information source corresponding to the opposing side of the conflicting situation is decreased to reduce its influence.
4. The personalized learning path planning method for multi-agent collaboration according to claim 1, characterized in that, After planning the task list based on locally recorded user learning preferences and user learning efficiency distribution patterns to obtain the specific learning paths corresponding to the task list, the process includes: In response to the user's negative instruction regarding the specific learning path, obtain the user's self-planned path. If a conflict is detected between the self-planned path and the specific learning path, the user is advised to pause both the self-planned path and the specific learning path based on the user's learning efficiency distribution pattern. Based on the distribution pattern of user learning efficiency, learning suggestions that are conducive to learning the self-planned path are pushed to the user, wherein the learning suggestions are completed before the user begins to complete the self-planned path; In response to the user's instruction to accept the learning suggestion, the specific learning path is updated based on the learning suggestion to obtain the updated specific learning path; Save the record of this negotiation with the user for future adjustments and learning.
5. The personalized learning path planning method for multi-agent collaboration according to claim 1, characterized in that, The monitoring module includes a visual monitoring module and an acoustic monitoring module. The step of pushing the specific learning path to the user and starting to monitor the user's learning progress based on the internally preset monitoring module includes: The specific learning path will be pushed to the user; The system uses an internal visual monitoring module to capture the user's facial expressions and postures while learning, and performs image recognition based on the captured images to obtain image features. The system acquires the user's voice during learning based on the internal acoustic monitoring module, and performs voice recognition based on the acquired voice to obtain voice features. The image features and the sound features are fused in a multimodal manner to obtain a state evaluation result of the user's current learning state; If an abnormal state exists in the state evaluation result and continues for a preset time, then the current learning situation is judged to be abnormal.
6. The personalized learning path planning method for multi-agent collaboration according to claim 1, characterized in that, Before the planning instructions for the planned path are received, and before the first job task is received from the first information source and the second job task from the second information source's perspective is received from the second information source, the following steps are included: Upon first launch, request the user to perform initial settings. Obtain the user's initialization settings, wherein the settings include at least one of the following: the comparison granularity of the comparison task, the size of the comparison range of the second information source to be compared, and whether to monitor the learning companion. In response to the user's signature confirmation instruction for the setting information, the user's signature and the setting data corresponding to the setting information are broadcast to the consortium blockchain.
7. The personalized learning path planning method for multi-agent collaboration according to claim 1, characterized in that, The process involves combining the pre-set confidence levels of each information source with the locally pre-stored target task, the first task, and the second task from the user's perspective to determine a task list, including: The preset confidence level for each information source is compared with the preset confidence level threshold. For information sources whose confidence exceeds a preset confidence threshold, the corresponding tasks are combined to obtain a task list. If the confidence of each information source exceeds the threshold, the local pre-stored target task, the first task, and the second task from the user's perspective are combined to obtain the task list.
8. A personalized learning path planning device for multi-agent collaboration, characterized in that, The multi-agent collaborative personalized learning path planning method according to any one of claims 1-7 is applied to a target information source in a multi-agent collaborative personalized learning path planning system. The device is deployed at the target information source. The system further includes at least a first information source and a second information source. The first information source interacts externally via a graphical user interface and stores published job tasks. The second information source and the target information source are both intelligent learning companion devices and are on the same consortium blockchain. The multi-agent collaborative personalized learning path planning device includes: The acquisition module is used to respond to the planning instructions for the planned path, acquire the first job task published from the first information source, and acquire the second job task from the second information source from the perspective of the second information source. The task list determination module is used to determine the task list based on the corresponding confidence level preset for each information source, the locally pre-stored target task, the first task, and the second task from its own perspective. The path planning module is used to prioritize the tasks in the task list based on locally recorded user learning preferences to obtain a preliminary ranking table of tasks to be completed; to estimate the user's current learning efficiency based on the distribution pattern of user learning efficiency to obtain the estimated learning efficiency; to perform a secondary ranking on the preliminary ranking table based on the estimated learning efficiency to obtain a secondary ranking table; and to determine the specific learning path corresponding to the task list based on the secondary ranking table. The learning monitoring module is used to push the specific learning path to the user and start monitoring the user's learning status based on the internally preset monitoring module; The intervention module is used to select intervention measures corresponding to the abnormality level to intervene in the user if the learning situation is determined to be abnormal based on the monitored learning situation, until it is determined that the user has completed the specific learning path.
9. A personalized learning path planning device for multi-agent collaboration, characterized in that, The method includes a memory, a processor, and a multi-agent collaborative personalized learning path planning program stored in the memory and executable on the processor. The processor executes the multi-agent collaborative personalized learning path planning program to implement the steps of the multi-agent collaborative personalized learning path planning method according to any one of claims 1 to 7.