A method and system for constructing a knowledge graph and a personalized learning path

By using a large language model and an improved PageRank algorithm, personalized learning paths are automatically constructed, solving the problems of high cost of knowledge graph construction and learning content not matching learners' styles in existing technologies. This enables low-cost, personalized learning paths and content generation.

CN117952200BActive Publication Date: 2026-06-12BEIJING UNIV OF POSTS & TELECOMM

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING UNIV OF POSTS & TELECOMM
Filing Date
2023-12-27
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing technologies struggle to efficiently build personalized learning paths and cannot automatically adjust learning content based on learners' learning styles and habits. Furthermore, existing knowledge graphs are costly to build and cannot generate specific graphs for learners' specific content.

Method used

It uses a large language model to parse hierarchical data in Markdown files, automatically constructs a knowledge graph, sorts learning paths using an improved PageRank algorithm, and generates learning content that matches the learner's style.

🎯Benefits of technology

It enables low-cost automatic construction of knowledge graphs, solves the problems of multi-objective paths and loops, provides personalized learning paths and stylized learning content, and improves learning efficiency and interest.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117952200B_ABST
    Figure CN117952200B_ABST
Patent Text Reader

Abstract

The application is a kind of knowledge graph and personalized learning path construction method and system, which is suitable for knowledge recommendation system.The system of the application comprises a graph analysis platform, a graph content display platform and a large language model platform.The method of the application comprises: constructing a tree-shaped knowledge graph for a text uploaded by a user;extracting concepts under each title content by using a large language model, and clustering the concepts;correcting the relationship between the title node and the concept node after clustering;performing sentence matching on a learning goal input by the user in the text, obtaining associated title and concept nodes, finding target nodes based on a capability graph, sorting the target nodes to generate a learning path;and generating learning content based on the content style input by the user.The application automatically constructs a knowledge graph by using a large language model, effectively solves the problems of multi-target path and path with a ring, and can consider the capability graph of a learner, thereby providing different learning paths and personalized learning content for different learners.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of knowledge graphs and recommendation systems, and relates to a method and system for constructing knowledge graphs and personalized learning paths. Background Technology

[0002] With the rapid development of information technology, a wealth of learning resources have emerged online, providing learners with abundant knowledge. The teaching process is gradually shifting towards student-centered, personalized learning. Traditional teaching methods, such as textbooks or online courses, typically begin with basic knowledge and gradually progress in depth. However, different students have different interests and preferred different content styles, which leads to a certain degree of diminishing learners' interest and efficiency with traditional teaching methods. How to utilize technology to provide personalized learning plans and content based on learners' interests and goals has become a primary research topic in the field of education.

[0003] A knowledge graph is a data model that represents knowledge in a graphical structure (usually a graph or network), typically containing nodes representing entities and edges representing relationships between entities. Its corresponding visual representation is a knowledge map. Building such a database allows for the visualization of relationships between knowledge points and lets users know where to find the knowledge they need.

[0004] In addressing the issue of personalized learning recommendations for students, path customization algorithms are a core component of personalized learning solutions. They can generate personalized learning paths based on the user's potential goals, background, and a pre-constructed knowledge map.

[0005] However, existing personalized learning solutions have some drawbacks:

[0006] (1) First, there is still a lack of a convenient and efficient method to construct knowledge graphs. Most knowledge graphs on the market are constructed manually by experts, and some models with embedded natural language capabilities often require a lot of manual annotation training. Such construction methods are costly and often cannot quickly generate specific knowledge graphs for the specific content that learners are learning.

[0007] (2) Secondly, although learning paths can be found using techniques such as reverse search, the learning order among multiple targets remains unknown. Therefore, a series of path concatenation algorithms have emerged, or topological sorting algorithms are used to sort the targets themselves. However, the prerequisite for using topological sorting algorithms is that the network cannot contain loops, which are common in real-world knowledge graphs. Some algorithms make topological sorting usable by forcibly removing loops, but the basis for removing loops is not easily determined.

[0008] (3) In addition, when users search for their learning goals, although existing technologies can generate learning paths, the content contained in these paths may not necessarily match the learners' learning habits. For example, some students prefer descriptions in computer language, some prefer descriptions in mathematical language, and some prefer expressions that use examples. Summary of the Invention

[0009] The purpose of this invention is to provide a method and system for constructing knowledge graphs and personalized learning paths. By parsing the hierarchical data inherent in Markdown (Lightweight Markup Language) files, a large language model is used to complete the edges outside the hierarchical relationships, thereby automatically constructing a knowledge graph and solving the problem of high construction costs and difficulty for ordinary people to construct knowledge graphs. This invention also proposes a method for ranking network nodes based on PageRank to solve the problem of paths with cycles. After obtaining the learning path, a large language model is introduced to generate personalized content based on the original document content and the user's preferred style, solving the problem that existing personalized learning path methods cannot automatically adjust according to the learner's learning style.

[0010] This invention provides a knowledge graph and personalized learning path construction system, including a graph analysis platform, a graph content display platform, and a large language model platform.

[0011] The aforementioned knowledge graph analysis platform includes a knowledge graph construction module, an input module, a target node search module, a learning path generation module, and a capability graph update module. The knowledge graph construction module constructs a tree-like knowledge graph from the Markdown text uploaded by the user. Each node in the graph represents a title, and the content of a title node is the text content between that title and the next title at the same level. The input module obtains the user's learning objectives and content style, and acquires the user's capability graph, which records the concept nodes and title nodes that the user has mastered. In the target node search module, sentences matching the user's learning objectives are first searched in the uploaded text. The top K most similar sentences are found, and associated title nodes and concept nodes are obtained. Then, target nodes are searched based on the user's capability graph and associated nodes. The learning path generation module sorts the target nodes using an improved PageRank algorithm to generate learning paths. The capability graph update module marks a target node as mastered and updates the capability graph after the user has completed learning the corresponding content for that target node; where K is a positive integer.

[0012] The large language model platform includes a concept node extraction module, a concept clustering module, a relationship correction module, and a learning content generation module. The concept node extraction module uses the large language model to traverse the content of each title node and extract concept nodes. The concept clustering module calculates the similarity between all concepts and clusters them based on similarity, merging concepts that belong to the same cluster into a single concept node. The relationship correction module obtains the context content of the title node to which each concept belongs for each concept node, uses the large language model to determine whether the relationship between the title node and the clustered concept nodes is a definition or a reference, and corrects the direction of the directed edges between the title node and the concept nodes. The learning content generation module constructs prompt words based on the obtained learning path and the content style input by the user, and uses the large language model to generate learning content of the required style from the uploaded text.

[0013] The graph content display platform displays the graph of the uploaded text, the subgraph of the target node, and the generated learning path.

[0014] Accordingly, the present invention provides a method for constructing a knowledge graph and a personalized learning path, comprising the following steps:

[0015] Step 1: Construct a tree-like knowledge graph for the Markdown text uploaded by the user. Each node in the graph represents a title, and the content of the title node is the text content between that title and the next title of the same level.

[0016] Step 2: Use the large language model to traverse each title node, perform knowledge point analysis and concept extraction on the content of each title node, and obtain the concept nodes before clustering. At this time, the relationship between each paragraph and the concept nodes before clustering extracted from that paragraph is assumed to be a reference relationship. Calculate the similarity between all concepts, cluster the concepts according to the similarity, and merge the concepts that are clustered into one concept node.

[0017] Step 3: Update the relationship between each concept node and the title node after concept clustering; for each concept appearing in each clustered concept node, obtain the context content of the title node to which these concepts belong, and use the large language model to determine the relationship between the title node and the concept node; among them, the relationship between the title node and the concept node has two types of relationships: definition and reference; correct the direction of the directed edge between the title node and the concept node according to the updated relationship;

[0018] Step 4: The user inputs learning goals and content style. The system searches the text uploaded by the user for sentences that match the user's learning goals. The system finds the top K sentences with high similarity and obtains the associated title nodes and concept nodes. Then, the system searches for target nodes based on the user's competency graph and associated nodes. The competency graph records the concept nodes and title nodes that the user has mastered. The system sorts the target nodes based on the improved PageRank algorithm and generates a learning path.

[0019] Step 5: Based on the acquired learning path and the content style of the user input, construct prompt words, and use a large language model to generate the required learning content from the uploaded text;

[0020] Step 6: After learning, the user marks the corresponding target node as mastered, updates the competency graph, and then proceeds to Step 4 to regenerate the learning path, obtain the learning content, and then proceeds to Step 4 to execute, update the target node, and regenerate the learning path.

[0021] In step 2, (2.1) traverse each title node, construct prompt words, and use a large language model to extract all concepts appearing in the title content; (2.2) use a word embedding model to represent each concept as an embedding vector, and calculate the cosine similarity between pairs of word vectors; (2.3) based on the cosine similarity matrix of all concepts, cluster the same or sufficiently similar concepts, and merge the concepts into one concept node after clustering.

[0022] In step 3, the paragraph node containing each concept in the clustered concept nodes is obtained, and the relationship between the paragraph node and the concept node is determined. If the relationship between the paragraph node and the concept node is a definition relationship, it means that the concept is defined in the content of the corresponding title node, and there is a directed edge from the title node to the concept node. If the relationship between the paragraph node and the concept node is a reference relationship, it means that the concept is referenced in the content of the corresponding title node, and there is a directed edge from the concept node to the title node.

[0023] In step 4, the user's learning objective is embedded; all sentences in each paragraph of the uploaded text are segmented and embedded; the embedded representation of the learning objective is compared with the sentence vectors of the uploaded text to obtain the top K sentences with high similarity, and the associated concept nodes are obtained from the corresponding paragraph nodes.

[0024] In step 4, the method for generating the learning path includes: assuming the subgraph corresponding to all target nodes is 'graph', for each target node k, if node k has no parent node in the subgraph, setting the PageRank value x of node k. k,0 Set the value to 1; otherwise, set the PageRank value x of node k. k,0Set the weight to 0; then execute the iteration process: initialize the weight of each target node to 0;

[0025] In the t-th iteration, for each target node k, obtain the parent node of node k in the subgraph. If node k has no parent node, update the PageRank value of node k by incrementing it by 1. k,t =x k,t-1 +1; If node k has a parent node, then obtain the number of child nodes and the PageRank value of each parent node. Let the number of parent nodes of the target node k be N, where the PageRank value of the i-th parent node is p. k,i,t Then the weight of the target node k is updated as follows: Update the PageRank value x of the target node k k,t as follows:

[0026]

[0027] Where r1 represents the retention rate of mastery, r2 represents the conversion rate of mastery, and α represents the growth weight;

[0028] After iterating through the set number of algebras, the target nodes are sorted in descending order of their PageRank values ​​to generate a learning path.

[0029] The advantages and positive effects of this invention are as follows:

[0030] (1) The system and method constructed in this invention automatically construct knowledge graphs using large language models. Through natural language processing technology, key knowledge points are extracted and organized into knowledge graphs, reducing the difficulty and cost of manually constructing knowledge graphs.

[0031] (2) The system and method of this invention provide learning paths based on inverse search and the PageRank algorithm, which can effectively handle multi-objective paths and path problems with loops. At the same time, the system and method of this invention can consider the learner's ability graph and provide different learning paths for different learners.

[0032] (3) The system and method of the present invention can generate explanatory texts of different styles according to the learning path and the original text content, and provide learners with different styles of learning content according to their style preferences, such as mathematical formula style, professional terminology style or popular and easy-to-understand style. Attached Figure Description

[0033] Figure 1 This is an implementation framework diagram of the knowledge graph and personalized learning path construction system of the present invention;

[0034] Figure 2 This is a flowchart illustrating the method for constructing the knowledge graph and personalized learning path of the present invention;

[0035] Figure 3 This is a tree diagram constructed from excerpted textbook text in an embodiment of the present invention;

[0036] Figure 4 This is a schematic diagram illustrating the concept of extracting excerpted tutorial text using a large language model in an embodiment of the present invention;

[0037] Figure 5 This is an embodiment of the present invention. Figure 4 The graph after clustering and merging the concepts in the middle;

[0038] Figure 6 This is a diagram in which the pointing direction of the concept nodes has been corrected according to an embodiment of the present invention;

[0039] Figure 7 This is a learning path diagram generated by an embodiment of the present invention based on the user's learning objectives and ability map;

[0040] Figure 8 This is an example diagram showing the user's learning path and required content style as reflected in an embodiment of the present invention;

[0041] Figure 9 This is a learning path diagram regenerated after updating a node that the user has already mastered, according to an embodiment of the present invention. Detailed Implementation

[0042] The present invention will now be described in further detail with reference to the accompanying drawings and embodiments.

[0043] This invention provides a method and system for constructing knowledge graphs and personalized learning paths. Based on text materials uploaded by users, combined with users' personalized characteristics and query content, it provides users with personalized knowledge explanation paths and explanatory texts.

[0044] like Figure 1 As shown, the knowledge graph and personalized learning path construction system of this invention includes a graph analysis platform, a graph content display platform, and a large language model platform. The graph content display platform displays the graph constructed from the uploaded text, the subgraphs of the target nodes, and the generated learning paths, etc. After a user has learned the corresponding learning content of a target node, the user can directly mark the node in the graph as mastered.

[0045] The graph analysis platform of this invention includes a graph construction module, an input module, a target node search module, a learning path generation module, and a capability graph update module. The graph construction module constructs a tree-like knowledge graph from the text uploaded by the user. Each node in the graph represents a title, and the content of a title node is the text content between that title and the next title. The input module obtains the user's learning objectives and content style to acquire the user's capability graph. The capability graph records the concept nodes and title nodes that the user has mastered. In the target node search module, sentences matching the user's learning objectives are first searched in the uploaded text. The top K sentences with high similarity are found, and associated title nodes and concept nodes are obtained. Then, target nodes are searched based on the user's capability graph and associated nodes. The learning path generation module sorts the target nodes using an improved PageRank algorithm to generate learning paths. The capability graph update module updates the capability graph when the user updates a node to be mastered. Here, K is a positive integer.

[0046] The large language model platform of this invention includes a concept node extraction module, a concept clustering module, a relationship correction module, and a learning content generation module. The concept node extraction module uses the large language model to traverse the content of each title node and extract concept nodes. The concept clustering module calculates the similarity between all concepts, clusters concepts based on similarity, merges concepts that belong to the same cluster into one concept node, and associates the concept node with the corresponding title node. The relationship correction module obtains the context content of the title node to which each concept belongs for each concept node, uses the large language model to determine whether the relationship between the title node and the clustered concept nodes is a definition or a reference relationship, and corrects the direction of the directed edges between the title node and the concept nodes. The learning content generation module constructs prompt words based on the obtained learning path and the content style input by the user, and uses the large language model to generate learning content of the required style from the uploaded text.

[0047] The implementation of the functional modules in the graph analysis platform and the large language model platform will be explained in the corresponding method descriptions below.

[0048] The present invention provides a method for constructing a knowledge graph and a personalized learning path, and an implementation process is as follows: Figure 2 As shown, the following is a breakdown into 6 steps.

[0049] Step 1: Preprocess the text data uploaded by users and construct a tree-structured knowledge graph.

[0050] In this embodiment of the invention, when a user uploads a document in Markdown format, the system parses the Markdown text based on the Markdown heading format and constructs a basic tree diagram. Each node in the diagram represents a heading, referred to as a heading node, and the content within each node is the text content between that heading and the next sibling heading. If the uploaded text is in another format, it can be pre-processed into Markdown text before automatic parsing. Within the same level of heading nodes, there is an inclusion relationship between the next level of heading nodes. The tree diagram is constructed based on the principle that a major heading contains its minor headings; for example, the first major heading contains all minor headings up to the second major heading.

[0051] The Markdown parsing algorithm used in this embodiment of the invention is as follows:

[0052]

[0053]

[0054] This invention uses an excerpt from a textbook on nonlinear dynamics as an example. The Markdown text corresponding to this excerpt is shown below:

[0055]

[0056]

[0057]

[0058] This Markdown text document is analyzed using the Markdown parsing algorithm described above, and then a tree diagram structure is constructed. The result is as follows: Figure 3 As shown.

[0059] Step 2: Utilize a large language model for knowledge point analysis and concept extraction, including the following steps:

[0060] Step 2.1) Traverse each title node, construct prompt words, and extract all concepts appearing under this title using a large language model. The concept extraction method in this embodiment of the invention is as follows:

[0061]

[0062]

[0063] Based on the above concept extraction method, the node parsing of the selected tutorial of the embodiment of the present invention yields the following results: Figure 4The graph shown associates each acquired concept as a node with its corresponding title node. For example, in the introduction section, concepts such as general system, one-dimensional or first-order system, and phase space are acquired and associated with the introduction title node. At this point, the relationship between each paragraph and the pre-clustering concept nodes extracted from that paragraph is a reference relationship by default. However, this reference relationship will be corrected in step 3, and more connections will be identified and supplemented for the post-clustering concept nodes.

[0064] Step 2.2) Use a word embedding model to embed each concept obtained from each node in Step 2.1 into a vector, and calculate the cosine similarity of each pair of word vectors to estimate the word similarity, as follows:

[0065]

[0066] cosine similarity(A,B) calculates the cosine similarity between two word vectors A and B.

[0067] Word embedding is a method that converts words in a corpus into vectors. Currently, there are many mature models available; the model used in this embodiment is Word2Vec. The Markdown corpus used for parsing is segmented and then input into the pre-trained Word2Vec model to obtain word vectors from the specific corpus.

[0068] Cosine similarity is a commonly used method to measure the similarity between two vectors. In this embodiment of the invention, it is used to calculate the similarity between two words.

[0069] The concept nodes obtained in step 2.1 are represented by word vector embeddings, and the cosine similarity between concept nodes is calculated. Step 1 only extracted the citation relationships, and it was found that some concepts were repeated or similar among the extracted concepts. Therefore, this embodiment introduces a word embedding database to calculate the relevance between words, and calculates the word similarity using cosine similarity based on the embedded word vectors. The similarity of all concepts is calculated to construct a concept similarity matrix.

[0070] Step 2.3) Using the concept similarity matrix obtained in Step 2.2, cluster the same or sufficiently similar concepts and merge them into one cluster node.

[0071] An implementation algorithm for concept clustering and merging in this embodiment of the invention is as follows:

[0072]

[0073]

[0074] For two nodes, one is designated as the source node (source_name), and the other is designated as the target node (target_name). The link_name indicates whether the two nodes are similar. Similar concepts are grouped into a single concept node, which contains all concept names.

[0075] right Figure 4 Clustering of concept nodes yields, for example: Figure 5 As shown.

[0076] Step 2.4) Cluster nodes can be manually modified according to the actual situation.

[0077] Step 3: Prompt Project: Generate aliases, definitions, and referenced paragraphs for each node. For each concept appearing in each clustered concept node, obtain the contextual content of these concepts, i.e., paragraph nodes. Each paragraph in the input text corresponds to a paragraph node. Use a large language model to determine the relationship between paragraph nodes and concepts. The relationship between paragraph nodes and concept nodes has two types: "definition" and "reference".

[0078] For the merged nodes, the corresponding content in the original text is concatenated into prompt words, and a large language model is used to categorize the relationships into "definition" and "cited". The prompt word template is as follows:

[0079]

[0080] To improve the recognition accuracy of existing large language models, some real-world examples can be added for reference. One example is provided below:

[0081]

[0082]

[0083] By obtaining the relationship between paragraph nodes and concept nodes, the direction of the lines connecting the nodes is corrected. If the relationship between a paragraph node and a concept node is a definition relationship, it means that the concept is defined in the content of the corresponding title node, and there exists a directed edge from the corresponding title node to the concept node. If the relationship between a paragraph node and a concept node is a reference relationship, it means that the concept is referenced in the content of the corresponding title node, and there exists a directed edge from the concept node to the corresponding title node. This embodiment of the invention... Figure 5 After correcting the pointers of the concept nodes, we get Figure 6 .according to Figure 6 By observing the pointers between heading nodes and concept nodes, we can determine where the concept is defined and which part of the text references it.

[0084] Step 4: Based on user input, match target nodes, sort the nodes, and generate a learning path. This step includes the following three sub-steps 4.1 to 4.3. This embodiment of the invention generates a knowledge parsing path from user-uploaded documents based on the user's learning objectives and personalized characteristics. The user then follows this path to learn and achieve their learning objectives.

[0085] Step 4.1) The user enters their learning goals and desired style.

[0086] For example, the user's learning objective is: "Explain the stability of fixed points", and the content style is: "Visualize using MATLAB".

[0087] Step 4.2) Embed sentences in each node, and embed the learning target into sentences. By calculating the cosine similarity between sentence vectors, the sentence most relevant to the learning target is obtained, thus obtaining the text paragraph most relevant to the learning target.

[0088] One process for performing embedding search in this embodiment of the invention is as follows:

[0089] Objective: To embed the user's learning objectives into a representation and search for concept nodes and title nodes related to the learning objectives;

[0090] Perform the following steps:

[0091] a) Text preprocessing, including: first, segmenting all sentences in each paragraph of the input document and performing word segmentation using existing Python libraries; second, removing single words, stop words, and all numbers and special characters.

[0092] b) Word embedding and sentence embedding representation, including: using the trained doc2vec model to perform sentence embedding, and obtaining sentence vectors, i.e. sentence embedding representation.

[0093] c) Perform the search, including: representing the learning target sentence input by the user with sentence embeddings as in step b; comparing the similarity between the sentence embedding vector of the learning target and all vectors in the document sentence vector; calculating the Euclidean distance (norm 2); and selecting and returning the 10 sentences with the smallest Euclidean distance; returning the paragraph nodes containing the 10 sentences. Further, based on the obtained paragraph nodes, obtain the corresponding title nodes and associated concept nodes, and store them in the list need_list.

[0094] Step 4.3) Calculate the learning path based on the improved PageRank algorithm, avoiding the problem of removing loops required by traditional topological sorting.

[0095] The concept nodes and title nodes that the user has mastered are stored in the capability graph list known_list.

[0096] First, use reverse search to obtain relevant nodes, then calculate priority values ​​to determine the order of nodes in the path.

[0097] This invention uses a depth-first search algorithm to obtain all nodes related to the target node:

[0098]

[0099]

[0100] Here, G represents the graph object of the input document, which is generated by steps 1 to 3 above. G.nodes represents all nodes in the graph, and the dictionary a_dict records the nodes in the graph and their parent node relationships.

[0101] This invention uses a depth-first search method to traverse each node a_n in need_list. If node a_n is already in the known list known_list, it is skipped because it indicates that the user already understands the concept / section and does not need to retrieve related content or perform further processing. If node a_n has no parent node or no dependencies, only node a_n itself is output. If node a_n exists in a_dict (i.e., node a_n has a parent node) and is not in the known list known_list, then node a_n is output. If this node has dependencies, its parent node's dependencies are recursively searched in both the known list and the a_dict, just as with node a_n.

[0102] All the output nodes form a list of target nodes for the user to learn. After the node search is completed, the improved PageRank algorithm is used to sort the obtained target nodes.

[0103] First, let me explain the difference between the improved PageRank algorithm of this invention and the original PageRank algorithm:

[0104] The first difference is that the PageRank (PR) algorithm was originally used to measure the importance of a node. Therefore, when node A references node B, A's PR value is assigned to B. Thus, the higher the PR value, the more frequently the node is referenced, and therefore the more important it is. However, this invention measures the order of nodes. When A references B, it means B is a more fundamental concept, so B should come first, and therefore B should have a higher priority in being learned.

[0105] The second difference is that traditional PR algorithms assign high PR values ​​to nodes with more parent nodes (i.e., nodes pointed to by more nodes), indicating that these nodes are the most important. However, the method in this invention envisions a "degree of mastery" that gradually permeates from simple nodes to their child nodes (the nodes they point to). This degree of mastery diminishes with each layer, meaning that the more nodes a node relies on, the less likely it is to be mastered.

[0106] In this invention, the degree of user control over node i at time t+1 is calculated as x. i,t+1 as follows:

[0107]

[0108] Where, x i,t Ri represents the user's level of mastery over node i at time t; r1 represents the retention rate of mastery from the previous time step to the next time step, which is set to 0.5 in this embodiment; r2 represents the conversion rate of mastery from mastered nodes to unmastered nodes, which is set to 0.5 in this embodiment; N i w represents the number of neighbors of node i; i,j w represents the connection strength between nodes i and j. If node j points to node i in the network, then w i,j =1, otherwise w i,j =0; α represents the weight of the difficulty increase when a node depends on multiple knowledge points, because N i The larger the α value, the more prerequisites the node is considered to depend on, making it more difficult for the new node to learn. In this embodiment, α is set to 1.5. Therefore, the above... It can be understood as a weighted average.

[0109] This invention calculates the user's level of understanding of the nodes based on the above formula, and the designed function is as follows:

[0110]

[0111]

[0112] Here, the subgraph corresponding to the list of target nodes is called a graph. An improved PageRank algorithm is applied to this subgraph to obtain the learning path. max_iterations represents the maximum number of iterations, tolerance represents the tolerance, and damping_factor represents the damping factor.

[0113] The specific sorting calculation is as follows: (1) Traverse the nodes in the graph, that is, the nodes in the target node list; for each node, if the node has no parent node, set the PR (pagerank) value of the node to 1, otherwise set it to 0. (2) Execute the iteration process. First, initialize an empty dictionary new_pagerank, and initialize the weight rank_sum of each target node to 0; in each iteration process: for each target node, get its parent node neighbor in the graph. If the node has no parent node, increment the PR value of the node by 1; if the node has a parent node, get the PR value neighbor_pagerank and the number of child nodes neighbor_neighbors of each parent node neighbor, update the weight rank_sum of the node, and update the weight to the sum of the PR values ​​of all the parent nodes of the node divided by the number of parent nodes of the node to the power of α. Finally, use the updated weight to calculate the PR value of the node. In the above formula for updating the PR value, x k,t Let x be the PR value obtained by node in the t-th iteration. k,t-1 Let p be the PR value obtained by node in the (t-1)th iteration, N be the number of parent nodes of node node, and p be the PR value obtained by node node in the (t-1)th iteration. k,i,t Let be the PR value of the i-th parent node of node. (3) After the iteration is completed, sort the target nodes in descending order of PR value to generate the learning path. The larger the PR value, the higher the priority of the concept learning of the corresponding node.

[0114] In this embodiment, the learning path obtained for the learning objective input in step 4.1 is as follows: Figure 7 As shown.

[0115] Reverse search is a common algorithm, but relying solely on it can present some problems. For example, when there are multiple targets, multiple paths will be found, leading to path concatenation issues. The method described in this invention avoids both finding multiple paths and the path concatenation problem.

[0116] Step 5: Generate explanations and answers. Obtain the learning path generated in Step 4, and based on this path and the style input by the user in Step 4.1, construct prompt words. Reorganize the content using a large language model to generate the final content.

[0117] In this embodiment of the invention, for the user-input learning objective: explaining the stability of fixed points; and for the content style: using MATLAB for visualization, the following program is executed to generate the content style required by the user.

[0118]

[0119] This embodiment generates a learning path and the content style required by the user, as follows: Figure 8 As shown.

[0120] Step 6: Users learn and correct their own capability graphs and regenerate paths.

[0121] This invention provides a knowledge graph visualization platform that displays a knowledge graph constructed from uploaded documents according to the method of this invention. After a user learns a node, they can right-click the node and mark it as mastered; this node will then be marked in their own knowledge graph. Nodes in the knowledge graph will be considered mastered knowledge nodes. The node information `know_list` in the knowledge graph will be used as input during path generation in step 4.3, thereby forming a personalized learning path for the user.

[0122] In this embodiment of the invention, for example, when a user adds the concept node of "fixed point" to their ability graph, the generated personalized learning path is as follows: Figure 9 As shown, with Figure 8 In comparison, the learning path will change.

[0123] Except for the technical features described in the specification, all other technologies are known to those skilled in the art. Descriptions of well-known components and technologies are omitted in this invention to avoid redundancy and unnecessary limitation. The embodiments described above do not represent all embodiments consistent with this application. Various modifications or variations that can be made by those skilled in the art without creative effort based on the technical solutions of this invention are still within the protection scope of this invention.

Claims

1. A method for constructing a knowledge graph and personalized learning paths, characterized in that, Includes the following steps: Step 1: Construct a tree-like knowledge graph for the Markdown text uploaded by the user. Each node in the graph represents a title, and the content of the title node is the text content between that title and the next sibling title. Step 2: Use a large language model to traverse each title node, extract concept nodes from the node content, and uniformly set the relationship between paragraphs and concept nodes extracted from those paragraphs as a reference relationship; calculate the similarity between all concepts, cluster the concepts according to the similarity, and merge concepts that are clustered into one concept node; Step 3: For each clustered concept node, obtain the context content of the title node to which each concept belongs in the node, use the large language model to determine whether the relationship between the title node and the clustered concept node is a definition or a reference, and correct the direction of the directed edge between the title node and the concept node. Step 4: The user inputs learning objectives and content style. The system searches the text uploaded by the user for sentences that match the user's learning objectives. If the top K sentences with high similarity are found, the associated title nodes and concept nodes are obtained. Then, the system searches for target nodes based on the user's competency graph and associated nodes. The competency graph records the concept nodes and title nodes that the user has mastered. The target nodes are sorted based on the improved PageRank algorithm to generate a learning path; Where K is a positive integer; The method for generating learning paths includes: assuming a subgraph corresponding to all target nodes is denoted as 'graph', for each target node k, if node k has no parent node in the subgraph, setting the PageRank value of node k. Set the value to 1; otherwise, set the PageRank value of node k. Set the weight to 0; then execute the iteration process: initialize the weight of each target node to 0; In the t-th iteration, for each target node k, obtain the parent node of node k in the subgraph. If node k has no parent node, update the PageRank value of node k by incrementing it by 1. = If node k has a parent node, then obtain the number of child nodes and the PageRank value of each parent node. Let the number of parent nodes of the target node k be N, where the PageRank value of the i-th parent node is... Then the weight of the target node k is updated as follows: Update the PageRank value of target node k. as follows: ; in, Indicates the retention rate as a measure of mastery. Indicates the degree of mastery and conversion rate. Indicates the weight of the increase; After iterating through the set number of algebras, the target nodes are sorted in descending order of their PageRank values ​​to generate a learning path. Step 5: Based on the acquired learning path and the content style of the user input, construct prompt words, and use a large language model to generate the required learning content from the uploaded text; Step 6: After learning, the user marks the corresponding target node as mastered, updates the competency graph, and then proceeds to step 4 to update the target node and regenerate the learning path.

2. The method according to claim 1, characterized in that, In step 2, each title node is traversed, prompt words are constructed, all concepts appearing in the title content are extracted using a large language model, each concept is represented as an embedding vector using a word embedding model, the cosine similarity between pairwise word vectors is calculated, and the concepts are clustered based on the similarity.

3. The method according to claim 1, characterized in that, In step 3, the paragraph node containing each concept in the clustered concept nodes is obtained, and the relationship between the paragraph node and the concept node is determined. If the relationship between the paragraph node and the concept node is a definition relationship, it means that the concept is defined in the content of the corresponding title node, and there is a directed edge from the title node to the concept node. If the relationship between the paragraph node and the concept node is a reference relationship, it means that the concept is referenced in the content of the corresponding title node, and there is a directed edge from the concept node to the title node.

4. The method according to claim 1, characterized in that, In step 4, the depth-first search method is used to find the target node, including: (4.1) Assume that the user's ability graph known_list is obtained; assume that the concept nodes and title nodes associated with the learning target are obtained from the uploaded text and stored in the list need_list; assume that the dictionary a_dict records the dependency relationship of each node in the graph of the text; (4.2) Traverse the list need_list and take a node in the list need_list; (4.3) Mark the currently taken node as a_n, first determine whether the node a_n is in the list known_list. If it is, it means that the user has mastered the corresponding concept or chapter content, and skip the node; if it is not, output the node a_n to the target node list, and search for the node that node a_n has a dependency relationship in the dictionary a_dict; (4.4) If the node a_n does not have a node with a dependency relationship, continue to take the next node in need_list and go to (4.3) to execute. If it exists, go to (4.3) to execute the node that has a dependency relationship with a_n; (4.5) Finally, output the target node list.

5. The method according to claim 1, characterized in that, In step 4, the user's learning objective is embedded; all sentences in each paragraph of the uploaded text are segmented and embedded; the embedded representation of the learning objective is compared with the sentence vectors of the uploaded text to obtain the top K sentences with high similarity, and the associated concept nodes are obtained from the corresponding paragraph nodes.

6. A system for constructing knowledge graphs and personalized learning paths, characterized in that, This includes a graph analysis platform, a graph content display platform, and a large language model platform; The aforementioned knowledge graph analysis platform includes a knowledge graph construction module, an input module, a target node search module, a learning path generation module, and a capability graph update module. The knowledge graph construction module constructs a tree-like knowledge graph from the Markdown text uploaded by the user. Each node in the graph represents a title, and the content of each title node is the text content between that title and the next title. The input module is used to obtain the user's learning objectives and content style, and to obtain the user's capability graph, which records the concept nodes and title nodes that the user has mastered. In the target node search module, sentences matching the user's learning objectives are first searched in the user-uploaded text. The top K sentences with high similarity are found, and associated title nodes and concept nodes are obtained. Then, target nodes are searched based on the user's capability graph and associated nodes. The learning path generation module sorts the target nodes based on the improved PageRank algorithm to generate a learning path; The competency graph update module marks a target node as mastered and updates the competency graph after a user has completed learning the corresponding learning content for that target node; where K is a positive integer. The learning path generation module generates learning paths as follows: Let the subgraph corresponding to all target nodes be 'graph'. For each target node k, if node k has no parent node in the subgraph, set the PageRank value of node k. Set the value to 1; otherwise, set the PageRank value of node k. Set the weight to 0; then execute the iteration process: initialize the weight of each target node to 0; In the t-th iteration, for each target node k, obtain the parent node of node k in the subgraph. If node k has no parent node, update the PageRank value of node k by incrementing it by 1. = ; If node k has a parent node, then obtain the number of child nodes and the PageRank value of each parent node. Let the number of parent nodes of the target node k be N, where the PageRank value of the i-th parent node is... Then the weight of the target node k is updated as follows: Update the PageRank value of target node k. as follows: ; in, Indicates the retention rate as a measure of mastery. Indicates the degree of mastery and conversion rate. Indicates the weight of the increase; After iterating through the set number of algebras, the target nodes are sorted in descending order of their PageRank values ​​to generate a learning path. The large language model platform includes a concept node extraction module, a concept clustering module, a relationship correction module, and a learning content generation module. The concept node extraction module uses the large language model to traverse the content of each title node and extract concept nodes. The concept clustering module calculates the similarity between all concepts, clusters concepts based on similarity, merges concepts that belong to the same cluster into a single concept node, and associates the concept node with the corresponding title node. The relationship correction module obtains the context content of the title node to which each concept belongs for each concept node, uses the large language model to determine whether the relationship between the title node and the clustered concept nodes is a definition or a reference relationship, and corrects the direction of the directed edges between the title node and the concept node. The learning content generation module constructs prompt words based on the obtained learning path and the content style input by the user, and uses the large language model to generate learning content of the required style from the uploaded text. The graph content display platform displays the graph of the uploaded text, the subgraph of the target node, and the generated learning path.

7. The system according to claim 6, characterized in that, The target node search module uses a depth-first search method to find target nodes. The implementation process includes: (4.1) Assume that the user's ability graph known_list is obtained; assume that concept nodes and title nodes associated with the learning objectives are obtained from the uploaded text and stored in the list need_list; assume that the dictionary a_dict records the dependency relationships of each node in the graph of the text; (4.2) Traverse the list need_list and take a node from the list need_list; (4.3) Mark the currently taken node as a_n, first determine whether the node a_n is in the list known_list. If it is, it means that the user has mastered the corresponding concept or chapter content, and skip the node; if it is not, output the node a_n to the target node list, and search for nodes that have a dependency relationship with node a_n in the dictionary a_dict; (4.4) If there is no node that has a dependency relationship with node a_n, continue to take the next node in need_list and go to (4.3) to execute. If it exists, go to (4.3) to execute the node that has a dependency relationship with a_n; (4.5) Finally, output the target node list.