Method for hierarchical construction of chinese education knowledge graph based on semantic association

By extracting resource semantic vectors from user learning interaction data and combining them with a forgetting curve model, the high-cost migration problem during version updates of Chinese educational knowledge graphs is solved, enabling dynamic reconstruction and accurate evaluation of user state views and supporting cross-version teaching capability comparison.

CN122242693APending Publication Date: 2026-06-19YUNNAN NORMAL UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
YUNNAN NORMAL UNIV
Filing Date
2026-05-14
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing Chinese educational knowledge graphs require a full migration of historical data during version updates, resulting in high computational resource consumption and data distortion, making it impossible to achieve cross-version teaching comparisons and fair evaluations.

Method used

By acquiring user learning interaction data, extracting resource semantic vectors and storing them in an event log library independent of the knowledge graph hierarchy, calculating user state views using the association rule set of the target graph version, and adjusting contribution weights using a forgetting curve model, dynamic reconstruction is achieved.

Benefits of technology

The map update can be completed without migrating all historical data, ensuring the accuracy and scientific nature of the evaluation, supporting cross-version teaching ability comparison, and meeting the business needs of multiple evaluation systems.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122242693A_ABST
    Figure CN122242693A_ABST
Patent Text Reader

Abstract

This invention discloses a hierarchical construction method for Chinese educational knowledge graphs based on semantic association, relating to the field of educational information technology. The method includes the following steps: acquiring user learning interaction data to obtain a first user behavior log, and storing the first user behavior log in an event log database; responding to a user status query request for a target graph version, obtaining the version identifier of the target graph version, and loading the corresponding first graph association rule set from the configuration center based on the version identifier; retrieving a second user behavior log corresponding to the user from the event log database based on the first graph association rule set; determining the contribution weight of each behavior log to each knowledge node; and calculating the user's status value for each knowledge node under the target graph version based on the contribution weight and the behavior results of the behavior log, generating a first user status view. This invention solves the problem of high-cost migration and business interruption caused by graph updates in the prior art.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of educational information technology, specifically a hierarchical construction method for Chinese educational knowledge graphs based on semantic association. Background Technology

[0002] In modern adaptive learning platforms for Chinese language education (such as Chinese language, classical Chinese, and writing), knowledge graphs are the core data structures used to depict the knowledge system in the Chinese language field (such as characters, words, sentences, paragraphs, rhetoric, and literary knowledge) and their internal logical connections. Student ability profiles built based on Chinese education knowledge graphs are an important foundation for realizing personalized reading recommendations, writing tutoring, and accurate learning diagnosis.

[0003] Currently, such systems generally adopt a "snapshot-migration" model. Under this model, the system calculates and aggregates students' interactive behavior data in Chinese learning in real time (such as choosing word definitions, answering questions about classical Chinese punctuation, and answering questions about understanding the main idea of ​​an article) based on the fixed structure of the current version of the Chinese education knowledge graph. The calculation results (such as students' mastery of the usage of specific classical Chinese function words) are used as the final state snapshot and directly stored in the database. This state snapshot is closely bound to the structure of the Chinese education knowledge graph that generated it, forming a single, static view of students' Chinese ability assessment. This technical solution has long supported the operation of many Chinese learning analysis applications.

[0004] However, the Chinese education knowledge system is in a dynamic evolution. Textbook revisions and curriculum standard updates will lead to structural version changes in the Chinese education knowledge graph. The existing "snapshot-migration" model binds the user's Chinese learning status snapshot with the old version of the Chinese education knowledge graph. When adapting to the new graph, a massive amount of historical data needs to be migrated in its entirety. This not only consumes a lot of computing and storage resources, but also may cause the status data to be distorted under the new graph structure. At the same time, the overlay update will lose the original status view of the old graph, making it impossible to achieve cross-version teaching comparison and fair evaluation. Summary of the Invention

[0005] To address the shortcomings of existing technologies, this invention provides a hierarchical construction method for Chinese educational knowledge graphs based on semantic association.

[0006] To solve the above-mentioned technical problems, the present invention provides the following technical solution: This invention provides a hierarchical construction method for Chinese educational knowledge graphs based on semantic association, comprising the following steps: S1: Obtain user learning interaction data, extract resource features from the learning interaction data to obtain a first user behavior log containing resource semantic vectors, and store the first user behavior log in the event log library. The data structure of the first user behavior log is independent of the hierarchical topology of any version of the knowledge graph. S2: In response to the user status query request of the target graph version, obtain the version identifier of the target graph version, and load the corresponding first graph association rule set from the configuration center according to the version identifier. The first graph association rule set includes the feature vector data of each knowledge node in the knowledge graph of this version and the hierarchical topology between nodes. S3: Based on the first graph association rule set, retrieve the second user behavior log corresponding to the user from the event log library; S4: Calculate the semantic similarity between the resource semantic vector of each behavior log in the second user behavior log and the feature vector data of each knowledge node in the target graph version, and determine the contribution weight of each behavior log to each knowledge node. S5: Based on the contribution weight and the behavior results of the behavior log, calculate the status value of each knowledge node of the user under the target graph version, and generate the first user status view.

[0007] As a preferred technical solution of the present invention, obtaining the first user behavior log containing resource semantic vectors specifically includes the following steps: Collect users' answer records, video viewing records, or reading records in the teaching system as learning interaction data; The learning interaction data is input into a pre-trained semantic analysis model for encoding, multi-dimensional content features are extracted, and the resource semantic vector is generated. The resource semantic vector is combined with the user identifier, generation timestamp, and operation type identifier in the learning interaction data to construct the first user behavior log.

[0008] As a preferred technical solution of the present invention, step S4 specifically includes the following steps: Calculate the cosine similarity between the resource semantic vector and the feature vector data; A first preset threshold is set. When the cosine similarity value is less than the first preset threshold, the contribution weight is reset to zero. When the cosine similarity value is greater than or equal to the first preset threshold, the cosine similarity value is normalized to obtain an initial weight value. Obtain the generation timestamp of the behavior log, calculate the time decay coefficient based on the difference between the generation timestamp and the current system time, and use the time decay coefficient to correct the initial weight value to obtain the contribution weight of the behavior log for the corresponding knowledge node.

[0009] As a preferred embodiment of the present invention, the calculation of the time decay coefficient is based on the forgetting curve model, which is constructed according to the learning and memory patterns of Chinese educational knowledge, and the time difference is negatively correlated with the decay coefficient.

[0010] As a preferred embodiment of the present invention, calculating the state value of each knowledge node of the user under the target graph version specifically includes the following steps: In the target graph version, the set of leaf nodes and the set of parent nodes that have a hierarchical inclusion relationship with the set of leaf nodes are identified; Based on the contribution weight, calculate the basic state value of each leaf node of the user in the leaf node set; Obtain the hierarchical aggregation weight table; Based on the hierarchical aggregation weight table, the basic state values ​​of the leaf nodes are weighted and aggregated from bottom to top to obtain the comprehensive state value of the parent node.

[0011] As a preferred embodiment of the present invention, the method further includes a cache management mechanism, specifically comprising the following steps: The first user status view is stored in the cache database, and the graph version number, generation time and knowledge node set on which the first user status view depends are recorded. When a new user status query request is received, it is determined whether a second user status view matching the current user and the target graph version exists in the cache database; If the second user status view exists and is not marked as invalid, then the second user status view is output directly. If the second user status view does not exist or the second user status view has been marked as invalid, a recalculation process is triggered.

[0012] As a preferred embodiment of the present invention, the marking as invalid is triggered based on at least one of the following conditions: A knowledge graph version change has been detected, and the change involves any node in the knowledge node set on which the second user state view depends. The difference between the current system time and the generation time of the second user status view exceeds the preset validity period; The real-time confidence level of the second user state view calculated based on the forgetting curve model is lower than the preset confidence threshold.

[0013] As a preferred embodiment of the present invention, when the failure flag is triggered, the method further includes a cache recovery step, specifically comprising the following steps: Statistical analysis of user system access frequency within a preset period; If the access frequency is greater than the activity threshold, the first graph association rule set corresponding to the current graph version is called asynchronously in the background to pre-calculate and update the first user status view of the user. If the access frequency is less than or equal to the activity threshold, the invalidation state is maintained until a real-time query request from the user is received and the calculation is synchronized.

[0014] As a preferred embodiment of the present invention, the method further includes a multi-version parallel computing step, specifically comprising the following steps: Receive a comparison query request containing the first version identifier and the second version identifier; Load the first graph association rule set corresponding to the first version identifier and the second graph association rule set corresponding to the second version identifier, respectively; For the same second user behavior log, a first calculation process based on the first graph association rule set and a second calculation process based on the second graph association rule set are executed in parallel. Output the state view based on the first version and the state view based on the second version respectively, and calculate the numerical difference between the two state views on the same knowledge dimension.

[0015] As a preferred embodiment of the present invention, the method further includes a model optimization step: Receive feedback correction instructions for the first user status view, the instructions including correction parameters for the knowledge node status values; Training samples are generated based on the difference between the correction parameters and the state values ​​of each knowledge node; The training samples are used to train a semantic analysis model for extracting resource semantic vectors, and the parameters of the semantic analysis model are updated.

[0016] The beneficial effects of this invention are: 1. In this invention, the first user behavior log, whose data structure is independent of the hierarchical topology of any version of the knowledge graph, is stored in the event log library. During the query, the corresponding first graph association rule set is loaded according to the version identifier of the target graph version. This decouples the user's historical data from the graph structure. This design allows the historical first user behavior log to be dynamically reconstructed without migrating the data when the hierarchical topology of the Chinese education knowledge graph changes. This solves the problem of high-cost migration and business interruption caused by graph updates in the prior art.

[0017] 2. In this invention, the semantic similarity between resource semantic vectors and feature vector data is used to determine the contribution weight, and the initial weight value is corrected by combining the time decay coefficient based on the forgetting curve model. This avoids the evaluation distortion caused by simple mapping when splitting or merging Chinese knowledge points, and ensures that the generated first user status view can accurately reflect the user's true semantic mastery level under the current target graph version. This effectively improves the accuracy and scientific nature of Chinese education learning diagnosis in the graph evolution process.

[0018] 3. This invention, through multi-version parallel computing steps, can load the first graph association rule set corresponding to different versions in parallel based on the same second user behavior log, and output the state view based on the first version and the state view based on the second version respectively. This solves the problem of loss of historical evaluation perspective caused by traditional overlay updates, supports the generation of ability comparison charts, and enables educators to intuitively compare the ability differences of the same student under the old and new teaching standards of Chinese education, thus meeting the business needs of multiple evaluation systems coexisting. Attached Figure Description

[0019] The accompanying drawings are provided to further illustrate the invention and form part of the specification. They are used in conjunction with embodiments of the invention to explain the invention and do not constitute a limitation thereof. In the drawings: Figure 1 This is a schematic diagram of the overall workflow of the present invention; Figure 2 This is a diagram illustrating the comparison of user capabilities based on different versions of the graph. Detailed Implementation

[0020] The technical solutions of this application will now be clearly and completely described with reference to the accompanying drawings. Obviously, the described embodiments are merely some embodiments of this application, and not all embodiments. The components of this application described and shown in the accompanying drawings can generally be arranged and designed in various different configurations. Therefore, the following detailed description of the embodiments of this application provided in the accompanying drawings is not intended to limit the scope of the claimed application, but merely to illustrate selected embodiments of this application. All other embodiments obtained by those skilled in the art based on the embodiments of this application without inventive effort are within the scope of protection of this application.

[0021] It should be noted that similar reference numerals and letters in the following figures indicate similar items; therefore, once an item is defined in one figure, it does not need to be further defined and explained in subsequent figures. Furthermore, in the description of this application, terms such as "first," "second," etc., are used only to distinguish descriptions and should not be construed as indicating or implying relative importance.

[0022] like Figure 1 As shown, a hierarchical construction method for Chinese educational knowledge graph based on semantic association is proposed. It should be noted that this method is implemented in an educational information technology system, which mainly includes a data acquisition module, an event log library, a configuration center, a computing engine, and a query service module.

[0023] The data acquisition module is used to capture various learning interaction data of users on the teaching platform in real time. The event log library is a persistent storage component that supports data appending and writing, used to store structured first user behavior logs. The configuration center is a storage management system used to maintain and manage the first graph association rule set corresponding to different versions of the knowledge graph. The calculation engine is the core processing unit of the system, used to perform calculation tasks such as semantic similarity calculation, weight allocation and state value aggregation. The query service module receives external user state query requests and coordinates various components to complete the generation and return of state views.

[0024] This method includes the following steps: S1: Obtain user learning interaction data, extract resource features from the learning interaction data to obtain a first user behavior log containing resource semantic vectors, and store the first user behavior log in the event log library. The data structure of the first user behavior log is independent of the hierarchical topology of any version of the knowledge graph. This step is the raw data collection and standardization stage, which aims to transform users' specific learning behaviors into standardized logs that the system can process and that are independent of the knowledge graph version.

[0025] First, the system acquires user learning interaction data through the data acquisition module. Then, the system extracts resource features from the learning interaction data. This process is completed by calling a pre-trained semantic analysis model in the education domain. The resource semantic vector represents the inherent and abstract knowledge content semantics of the learning resource itself. Then, the system combines and encapsulates the extracted resource semantic vector with the user identifier, generation timestamp, and operation type identifier in the original learning interaction data to construct a structured data record, namely the first user behavior log.

[0026] The data structure of this log is predefined, and its core fields include, but are not limited to: user ID, timestamp, resource ID, operation type, operation result, and resource semantic vector. The design of this data structure makes it completely independent of the node division, hierarchical relationship and other topological structures of any version of the knowledge graph, thereby achieving decoupling between the original behavioral facts and the graph interpretation rules.

[0027] Finally, the system writes the generated first user behavior log into the event log library. The event log library uses a data storage method that only supports append operations to ensure that all historical behavior records are permanently saved immutably, providing a unified and unique data source of facts for the state calculation of all subsequent versions.

[0028] S2: In response to the user status query request of the target graph version, obtain the version identifier of the target graph version, and load the corresponding first graph association rule set from the configuration center according to the version identifier. The first graph association rule set includes the feature vector data of each knowledge node in the knowledge graph of this version and the hierarchical topology between nodes. This step is the query initialization and rule preparation stage, the core of which is to dynamically select the corresponding knowledge graph interpretation rules based on the user's query intent.

[0029] When a teacher or student needs to view a learning report based on a specific version of a knowledge graph (such as the 2023 edition of junior high school mathematics published by People's Education Press), the query service module will receive a user status query request, which will explicitly include the version identifier of the target knowledge graph.

[0030] The system parses the request and obtains the version identifier. Subsequently, the query service module sends a request to the configuration center based on this version identifier. The configuration center pre-stores the first graph association rule set corresponding to each version of the knowledge graph. Each first graph association rule set is a data structure or configuration file that fully defines the two core elements of the knowledge graph for that version: first, the feature vector data (i.e., node semantic vector) of each knowledge node in the graph. This vector is in the same semantic space as the resource semantic vector generated in step S1, which facilitates the calculation of similarity; second, the hierarchical topology between all knowledge nodes, such as parent-child relationships, sibling relationships, etc., which defines the subordination and association between knowledge points.

[0031] The system successfully loaded the first graph association rule set that matches the target graph version from the configuration center, which is beneficial for subsequent state calculations.

[0032] S3: Based on the first graph association rule set, retrieve the second user behavior log corresponding to the user from the event log library; This step is the data retrieval stage, the purpose of which is to locate the user's historical behavior data that needs to be included in the calculation based on the context of the current query.

[0033] After obtaining the first graph association rule set, the system clarifies the target graph structure for computation. The computation engine generates data retrieval conditions based on the user identifier in the query request and possible query scope limitations (e.g., only calculating the seventh-grade algebra part), combined with the node range defined in the first graph association rule set.

[0034] Subsequently, the computing engine queries the event log library and retrieves all first user behavior logs related to the user within the specified time range based on conditions such as user ID and time range. These retrieved log subsets are the second user behavior logs that need to be processed in the current computing process. This step ensures that the computing is performed only on the relevant user and the relevant historical behavior data, thus improving processing efficiency.

[0035] S4: Calculate the semantic similarity between the resource semantic vector of each behavior log in the second user behavior log and the feature vector data of each knowledge node in the target graph version, and determine the contribution weight of each behavior log to each knowledge node. This step is the semantic matching and weight allocation stage, which is the core link in dynamically interpreting the meaning of users' historical behavior under the new graph.

[0036] S5: Based on the contribution weight and the behavior results of the behavior log, calculate the status value of each knowledge node of the user under the target graph version, and generate the first user status view; This step is the state aggregation and view generation stage. Its goal is to comprehensively calculate the user's capability status under the current knowledge graph version based on the assigned weights and historical behavior results.

[0037] After completing the weight allocation of all relevant second user behavior logs, the system calculates the state value for each knowledge node in the target graph version, and finally obtains the user's mastery, proficiency or other defined state index values ​​for each knowledge node under the current target graph version.

[0038] The system organizes these discrete node state values ​​according to the hierarchical topology defined in the first graph association rule set (for example, aggregating the leaf node states upwards to obtain the parent node states) to form a complete, structured data object, namely the first user state view. This view dynamically reflects the user's real-time capability profile under a specific version of the knowledge graph and is ultimately returned to the requester through the query service module.

[0039] Furthermore, obtaining the first user behavior log containing resource semantic vectors specifically includes the following three sequentially executed sub-steps, aimed at transforming the original user learning behavior into a standardized intermediate data format rich in semantic information, as follows: Collect users' answer records, video viewing records, or reading records in the teaching system as learning interaction data; Answer logs refer to the data generated by users when completing Chinese exercises, tests, online quizzes, etc. A complete answer log usually includes the user identifier, the question resource identifier, the answer submitted by the user, the result judged by the system, and the timestamps of the start and end of the answer. For example, after a student solves an application problem about "solving a quadratic equation in one variable" in the math learning module, the system will record the above information of this interaction. Video viewing records refer to the behavioral data generated when users watch Chinese teaching videos and micro-lessons. These records typically include user identifiers, video resource identifiers, start and end times of viewing, cumulative viewing time, and status information such as whether the viewing is completed. For example, if a student watches a video explaining "methods for appreciating classical Chinese poetry," the system will track their viewing progress. Reading records refer to behavioral data generated when users read electronic textbooks, lessons, and supplementary materials. These records may include user identifiers, document resource identifiers, page numbers or chapter ranges read, dwell time, page turning or highlighting, and other interactive actions.

[0040] The learning interaction data is input into a pre-trained semantic analysis model for the education domain for encoding, and multi-dimensional content features are extracted to generate the resource semantic vector. The system is equipped with a pre-trained semantic analysis model for the education domain. This model was trained on a large scale on Chinese educational content (covering subjects such as mathematics, Chinese language, physics, and chemistry) and is able to understand the knowledge concepts, skill requirements, and thinking methods behind the teaching content.

[0041] For the collected learning interaction data, the system first extracts the associated teaching resource content. For example, for a question answer record, the system obtains the complete text and possible chart information such as the question stem, options, and solution process based on the question resource identifier. For video or reading records, the system obtains the corresponding video subtitles, lecture text, or document content. Subsequently, the system inputs the text of these resource contents (or combined with multimedia features) into the semantic analysis model in the education field. The model performs deep semantic understanding and encoding on the input content, and extracts multi-dimensional content features that can represent the core knowledge connotation of the resource through its internal neural network layer. The model ultimately outputs a fixed-length numerical array, namely the resource semantic vector. This vector is located in a high-dimensional semantic space, and the value of each dimension represents the strength or relevance of the resource in a certain abstract semantic concept. For example, a comprehensive mathematical problem may be encoded as a vector with high values ​​in multiple implicit semantic dimensions such as "function graph analysis", "algebraic operation" and "practical application modeling".

[0042] Crucially, the generation of this resource semantic vector relies entirely on the resource content itself, without involving mapping or matching with any specific version of the knowledge graph node, thus achieving an independent representation of the content semantics.

[0043] The resource semantic vector is combined with the user identifier, generation timestamp, and operation type identifier in the learning interaction data to construct the first user behavior log; In this step, the system will combine and encapsulate the following types of information according to a predefined, unified data structure: User ID: Derived from raw learning interaction data, used to uniquely identify the student who performed this behavior; Generate timestamps: derived from raw learning interaction data, accurately recording the moment when the behavior occurred; Operation type identifier: Used to distinguish the category of behavior, such as answering questions, watching videos, and reading. This helps to distinguish the differences in the impact of different behavior types on the state during subsequent calculations. And resource semantic vectors.

[0044] Through the above combination, the system constructs a structured data record, namely the first user behavior log, which fully describes "who (user identifier) ​​when (timestamp) how (operation type) what content (resource semantic vector) produced what result (operation result)" in a learning interaction.

[0045] This data structure is designed to be independent of any version of the knowledge graph hierarchy topology, meaning it does not contain references to any specific graph node. This decouples user behavior facts from graph interpretation rules at the data level. In other words, the data structure of the first user behavior log is independent of any version of the knowledge graph hierarchy topology. Finally, the system persistently stores this first user behavior log in the event log library for use in any subsequent version of the status query process.

[0046] Furthermore, step S4 specifically includes the following steps: Calculate the cosine similarity between the resource semantic vector and the feature vector data; When the computing engine executes step S4, for a behavior log retrieved from the second user behavior log, the system first extracts the resource semantic vector contained therein. At the same time, for each knowledge node in the target graph version (i.e. the knowledge graph version on which the current query is based), the system obtains its corresponding feature vector data from the loaded first graph association rule set. This feature vector data is in the same semantic vector space as the resource semantic vector.

[0047] Subsequently, the system calculates the cosine similarity between the semantic vector of the resource and the feature vector data of a target knowledge node. Cosine similarity is an indicator that measures the degree of similarity between the directions of two vectors. Its value ranges from -1 to 1. The closer the value is to 1, the more similar the semantic content represented by the two vectors is. The system obtains this value through the standard vector dot product and modulus calculation formula. This calculation process is performed one by one for all relevant knowledge nodes in the target graph version, thereby establishing an initial quantitative correlation index between the current behavior log and each node.

[0048] A first preset threshold is set. When the cosine similarity value is less than the first preset threshold, the contribution weight is reset to zero. This step introduces a filtering mechanism designed to filter out influences with low semantic relevance that may be noise or irrelevant. The system has a first preset threshold (for example, the threshold can be set to 0.5 or 0.6), which is a configurable parameter used to define the boundary of significant relevance.

[0049] For the cosine similarity value calculated in the previous step for a specific knowledge node, the system compares it with a first preset threshold. If the cosine similarity value is less than the first preset threshold, it is determined that the semantic relevance between the current behavior log and the knowledge node is insufficient, and the current learning behavior should not affect the state value of the node. Therefore, the system directly sets the contribution weight of the behavior log to the node to zero. This ensures that only knowledge nodes with sufficient semantic relevance to the learning content will be included in the subsequent state calculation, improving the accuracy and relevance of state evaluation.

[0050] When the cosine similarity value is greater than or equal to the first preset threshold, the cosine similarity value is normalized to obtain an initial weight value. For all knowledge nodes whose cosine similarity values ​​are greater than or equal to the first preset threshold, they constitute a set of related nodes.

[0051] The system normalizes the cosine similarity values ​​of these nodes. The preferred normalization method is: Sum the cosine similarity scores of all relevant nodes to obtain a total S. Then, use the cosine similarity score S of each node... i Dividing by the sum S yields the initial weight value (e.g., W) corresponding to that node. i =S i / S,W i (The initial weight value corresponding to node i).

[0052] Through normalization, it is ensured that for a single behavior log, the sum of the initial weight values ​​of all relevant nodes is 1. This indicates that the value contained in the learning behavior (such as a score for answering a question) will be allocated according to the proportion of semantic relevance to each node.

[0053] For example, if a history question has a similarity of 0.8 with node A and a similarity of 0.4 with node B (both greater than the threshold of 0.3), then after normalization, the initial weight value of node A is 0.8 / (0.8+0.4)=0.67, and the initial weight value of node B is 0.33.

[0054] Obtain the generation timestamp of the behavior log, calculate the time decay coefficient based on the difference between the generation timestamp and the current system time, and use the time decay coefficient to correct the initial weight value to obtain the contribution weight of the behavior log for the corresponding knowledge node. This step incorporates a time factor to simulate the objective law that learning effectiveness naturally decays over time, making the state assessment more consistent with the actual memory curve.

[0055] The system reads the generation timestamp from the currently processed behavior log and calculates the difference between the timestamp and the current system time when the status query was performed, i.e., the time interval Δt.

[0056] Then, the system calculates a time decay coefficient based on the time interval Δt according to the preset time decay model. This coefficient is usually a value between 0 and 1 and is a monotonically decreasing function of the time interval Δt. That is, the longer the time interval (meaning that the historical learning behavior occurred a long time ago), the smaller the time decay coefficient value calculated by the forgetting curve model becomes, indicating that the impact of the historical behavior on the current state should be weakened.

[0057] This relationship directly ensures that the revised contribution weights can accurately reflect the time effect and recent Chinese learning behaviors; For example, the classical Chinese reading exercise completed yesterday has a large time decay coefficient due to the short time interval Δt, so its contribution weight to the current classical Chinese reading ability status value is retained relatively high. On the other hand, actions that occurred in the long term, such as Chinese character dictation done several months ago, have a small time decay coefficient, so their contribution weight to the current vocabulary mastery status value is significantly weakened.

[0058] Finally, the system multiplies the initial weight value for a knowledge node obtained in the previous step by the calculated time decay coefficient to obtain the corrected final value, which is the contribution weight of the behavior log for that knowledge node.

[0059] This correction enables the system to achieve the effect that recent learning behaviors contribute more to the state and long-term learning behaviors contribute less, so that the final generated state view can dynamically reflect the user's latest and most relevant ability level.

[0060] Furthermore, the calculation of the time decay coefficient is based on a preset forgetting curve model, which is constructed according to the learning and memory patterns of Chinese educational knowledge, and the time difference is negatively correlated with the decay coefficient.

[0061] The forgetting curve model has a clear subject-specific focus and is constructed based on the learning and memory patterns of Chinese education knowledge. During its construction, the characteristics of various types of knowledge within the Chinese education field must be fully considered, for example: For rote memorization of knowledge, such as the shapes of Chinese characters, pinyin, and the recitation of ancient poems, forgetting typically follows a classic curve of rapid decline followed by a gradual flattening. Understanding applied knowledge, such as the usage of function words in classical Chinese, reading comprehension skills in modern Chinese, and appreciation of writing techniques, may have a different rate of decline in mastery over time compared to rote memorization, and the forgetting curve may be more gradual. Skill-based knowledge, such as the ability to structure and organize ideas in writing, may decline not only with time but also with the frequency of practice.

[0062] When constructing this forgetting curve model, one can conduct statistical analysis based on empirical research data in educational psychology for the aforementioned types of Chinese educational knowledge, historical behavior and test score data of large-scale learners, or adopt a cognitive model validated by subject teaching practice.

[0063] For example, by analyzing the changes in test accuracy of a large number of Chinese learners at different time intervals after learning a specific knowledge point, the forgetting curve parameters corresponding to that knowledge type can be fitted. Finally, these patterns are solidified into one or more functions or parameterized lookup tables that can be called by the system, i.e., the preset forgetting curve model.

[0064] By introducing a forgetting curve model specifically constructed based on the learning and memory patterns of Chinese education knowledge to calculate the time decay coefficient, this invention not only achieves accurate matching of semantic associations in the state assessment of the specific field of Chinese education, but also scientifically quantifies and incorporates the influence mechanism of the time dimension on the sustainability of learning effects. This makes the entire dynamic calculation process closer to the actual cognitive process of Chinese learning, and realizes a more refined, personalized student ability state construction that conforms to the laws of education.

[0065] Furthermore, calculating the state value of each knowledge node for the user under the target graph version specifically includes the following steps: In the target atlas version, identify the set of leaf nodes and the set of parent nodes that have a hierarchical inclusion relationship with the set of leaf nodes; The system traverses all knowledge nodes and finds those nodes that have no child nodes, i.e., leaf nodes. These nodes represent the most detailed and indivisible basic knowledge points in the knowledge system. In the Chinese education scenario, a leaf node may correspond to a specific skill or knowledge point; For example, metaphor rhetorical device recognition, the usage of the Chinese character 'zhi' as a particle in classical Chinese, and the summary of the six elements of a narrative.

[0066] At the same time, the system identifies those nodes that contain at least one of the above leaf nodes as their child nodes, i.e., parent nodes. Parent nodes represent the generalization, abstraction, or synthesis of the knowledge or skills of their child nodes. A parent node may correspond to a knowledge module or ability dimension; For example, mastery of rhetorical devices, understanding of classical Chinese function words, and narrative reading ability.

[0067] Based on the contribution weights, preferentially calculate the basic state values of the user for each leaf node in the set of leaf nodes; For each leaf node in the set of leaf nodes, the system performs the following operations: Traverse all the second user behavior logs. For each log, extract its contribution weight to the leaf node and the behavior result of the log (such as the correct or incorrect score of answering questions, taking values 0 or 1; or the viewing completion rate, taking values between 0 and 1). Multiply the behavior result of each log by its contribution weight to the node to obtain the weighted contribution of this behavior to the node; Then, accumulate the weighted contributions of all relevant historical behaviors and calculate the basic state value of the user on this leaf node according to a preset algorithm (such as averaging, normalization, etc.). This value directly reflects the user's mastery or proficiency in this most detailed knowledge point.

[0068] This calculation process is executed in parallel or serially on the set of leaf nodes to obtain a set of state data with the finest granularity.

[0069] Obtain a preset hierarchical aggregation weight table, where the hierarchical aggregation weight table defines the support coefficient of leaf nodes to parent nodes; This hierarchical aggregation weight table is a configuration file or data structure, and its core role is to define the weight ratio that the state value of the lower-level nodes occupies when aggregated to the upper-level nodes.

[0070] The hierarchical aggregation weight table clearly specifies the support coefficients of each child node for each parent node. The support coefficient represents the importance of the contribution of the state of the child node to the comprehensive state of its parent node. For example, for the parent node "understanding of function words in classical Chinese", the support coefficients of its child nodes "usage of the word 'zhi'", "usage of the word 'er'", and "usage of the word 'qi'" may be assigned 0.4, 0.3, and 0.3 respectively, with a total of 1.

[0071] The setting of these support coefficients can be based on the requirements of the teaching syllabus, the experience of subject experts, or the statistical analysis of historical teaching data. It reflects the relative importance of different lower-level knowledge points in the knowledge system to the upper-level ability dimension.

[0072] According to the hierarchical aggregation weight table, a bottom-up weighted aggregation calculation is performed on the basic state values of the leaf nodes to obtain the comprehensive state value of the parent node, completing the construction of the hierarchical state view; For a parent node, the system looks up the support coefficients corresponding to all its direct child nodes in the hierarchical aggregation weight table. At this time, the state values of these child nodes have been calculated, and the state value of the bottommost child node is the basic state value; Multiply the state value of each direct child node of the parent node by its corresponding support coefficient, and then add up all the product results to obtain the comprehensive state value of the parent node; The above calculation process is carried out in an iterative manner. When the comprehensive state values of all parent nodes at a certain layer are calculated, these values are used as the child node state values of their parent nodes at a higher layer and continue to participate in the weighted aggregation calculation at a higher level; Through this bottom-up weighted aggregation calculation, the system can finally calculate the complete state value sequence from the topmost ability dimension (such as the comprehensive Chinese literacy in junior high school) to the bottommost detailed knowledge points (such as the meaning of specific idioms). The system organizes these state values into a data object with a clear hierarchical relationship according to the hierarchical topological structure defined by the first graph association rule set, thus completing the construction of the hierarchical state view.

[0073] This view not only shows the user's performance at each specific knowledge point, but more importantly, clearly reveals the user's comprehensive level in higher-level ability modules.

[0074] Furthermore, the method also includes a cache management mechanism, which aims to balance the real-time performance of state view generation and the system processing performance. Especially in the Chinese education application scenario facing a large number of users and historical data, the query response speed is significantly improved by caching the calculated state view, specifically including the following steps: Store the first user state view in the cache database, and record the graph version number, generation time, and knowledge node set on which the first user state view depends; After the system generates the first user state view for a specific target graph version for a user by executing steps S1 to S5, it does not immediately discard the calculation result. Instead, the system stores the first user state view as a complete data object in a dedicated cache database. This cache database usually adopts an in-memory database (such as Redis) or a distributed cache system with fast read and write capabilities.

[0075] During storage, the system not only saves the data of the state view itself, but also must synchronously record a set of key metadata. This metadata is closely related to the lifecycle management and validity verification of this state view. This metadata includes at least: Knowledge graph version number: Clearly identifies which specific version of the knowledge graph this state view is based on for calculation; Generation time: Records the specific point in time when this state view was calculated and cached; Knowledge Node Set: A set of IDs that record all knowledge nodes covered and dependent on this state view. This set reflects the scope and specific content dependencies of this state calculation.

[0076] When a new user status query request is received, it is determined whether a second user status view matching the current user and the target graph version exists in the cache database; When the system receives a new user status query request, it will first attempt to retrieve the result from the cache before starting the complete real-time calculation process from steps S2 to S5. The specific process is as follows: The system parses the new user status query request and extracts the key information: the target user identifier and the version identifier of the target graph version. Subsequently, the system queries the cache database to determine whether a second user status view exists that matches the current user and the target graph version. Here, "matching" means searching the cache database for a cache record whose corresponding user identifier is the same as the requesting user identifier, and whose graph version number matches the target graph version identifier in the request. This second user state view could be a view that was previously calculated and cached for the same user under the same graph version (i.e., a subsequent reuse of the first user state view), or it could be a view cached for other users under the same version (specifically referring to the cached view query of the current requesting user).

[0077] If a second user state view exists and is not marked as invalid, it indicates that the currently cached state view is still valid and trustworthy. The system will bypass all complex real-time calculation processes and directly output the second user state view as the response to this query. This greatly shortens the response time and improves the user experience.

[0078] If the second user state view does not exist (i.e., no cache entry is found in the cache that exactly matches the current query request (user + graph version)) or the second user state view has been marked as invalid, the recalculation process is triggered. Triggering the recalculation process means that the system will follow steps S2 to S5, reload the corresponding first graph association rule set according to the latest request parameters, retrieve the relevant second user behavior logs from the event log library, perform semantic similarity calculation, contribution weight determination and state value aggregation, and finally generate a new state view that reflects the latest data.

[0079] Furthermore, the system continuously or periodically monitors the relevant status, and the marking of failure is triggered based on at least one of the following conditions: The first trigger dimension is based on the integrity verification of the knowledge graph structure. When the system backend detects a knowledge graph version change event, such as a revision of the teaching syllabus in Chinese education that leads to a reconstruction of the hierarchical topology of the graph, the system will automatically obtain the list of affected nodes involved in the change and traverse the cache database to compare the set of knowledge nodes that each second user status view depends on with the metadata of each second user status view. Once it finds that the set of knowledge nodes contains any node in the list of affected nodes, it is determined that the dependency basis of the second user status view has changed, and the system immediately triggers an invalidation flag, thereby preventing users from querying outdated evaluation results generated based on the old structure.

[0080] The second trigger dimension is based on the static timeliness constraint of the data. The system periodically or when a query is triggered, reads the generation time of the second user status view and calculates the time difference between the current system time and the generation time. If the time difference exceeds the system's preset validity period, it means that the view data is too old and no longer has reference value. The system forcibly marks it as invalid to drive the data refresh.

[0081] The third trigger dimension is dynamic confidence verification based on cognitive patterns. The system introduces a forgetting curve model to evaluate the effectiveness of the cached view in real time. This model simulates the memory decay process of users' Chinese knowledge points. The system calculates the real-time confidence of the second user state view based on the interval between the generation time and the current time using the forgetting curve model. This real-time confidence quantifies the reliability of the user's memory after natural decay over time. When the calculated real-time confidence is lower than the preset confidence threshold, even if the map structure remains unchanged and the hard validity period has not expired, the system will determine that the view cannot truly reflect the user's current mastery status, thereby triggering an invalidation flag to start a recalculation process that includes the latest time decay coefficient.

[0082] These conditions together constitute a multi-dimensional and refined cache validity awareness mechanism, ensuring that the second user state view of the cache is only used when the data is accurate, the version is consistent, and the timeliness meets the requirements, thereby improving performance while strictly ensuring the correctness of the state evaluation results.

[0083] Furthermore, when the expiration flag is triggered, the method also includes a cache recovery step, specifically comprising the following steps: When the system detects that a user's second user status view has been triggered and marked as invalid, it does not immediately recalculate the view, but first starts the user activity statistics program. The program reads back the historical access logs or login records stored in the system to count the frequency of the user's access to the Chinese education system within a preset period before the current moment (e.g., the past 30 days or a semester). The system access frequency objectively quantifies the user's dependence on the system and the likelihood of initiating a status query again in the near future. The system then compares the statistically obtained system access frequency with the preset activity threshold configured within the system.

[0084] If the system access frequency is determined to be greater than the activity threshold, the system identifies the user as a high-potential user or an active user. Then, an asynchronous computing task is generated on the server backend. This asynchronous computing task runs independently of the user's main interaction thread. It automatically calls the first graph association rule set corresponding to the currently effective target graph version, uses the latest first user behavior log data to pre-calculate the user's latest status value and generate a new user status view. After the calculation is completed, it is silently updated to the cache database. This achieves data preparation before the user logs in again or initiates a query, ensuring that high-frequency users can obtain evaluation results based on the new graph structure without being aware of it.

[0085] Conversely, if the system access frequency is determined to be less than or equal to the activity threshold, the system adopts a resource-saving on-demand computing strategy, keeping the second user status view of the user in the cache database in an invalid state without immediately triggering a recalculation task, until the system actually receives the next real-time user status query request initiated by the user, at which point the calculation process is synchronously triggered to generate the latest view.

[0086] This mechanism effectively avoids the system performing ineffective pre-calculations on a large number of low-frequency users in the early stages of map version updates, significantly reducing the system's concurrency pressure during peak data reconstruction periods.

[0087] Furthermore, the method also includes a multi-version parallel computing step, specifically including the following steps: Receive comparison query requests containing a first version identifier (e.g., the old version of the syllabus) and a second version identifier (e.g., the new version of the curriculum standard).

[0088] The system initiates calls to the configuration center based on these two identifiers, and loads the first graph association rule set corresponding to the first version identifier and the second graph association rule set corresponding to the second version identifier in parallel. These two rule sets define the structure and semantic mapping logic of Chinese education knowledge system at different periods, providing an independent interpretation benchmark for subsequent dual-track calculation.

[0089] Next, the parallel computing process is executed. For the same second user behavior log retrieved from the event log library, the system constructs two independent computing contexts in memory or starts two parallel processing threads. One thread executes the first computing process based on the first graph association rule set, mapping the historical behavior data to the old graph structure. The other thread synchronously executes the second computing process based on the second graph association rule set, mapping the exact same historical behavior data to the new graph structure.

[0090] Since the underlying second user behavior log only records the objective facts carrying resource semantic vectors and does not contain any prior bindings of specific graphs, the same data can be interpreted by different rule sets at the same time without interfering with each other and without data copying. This enables parallel interpretation of the same learning history from multiple perspectives without increasing storage redundancy.

[0091] Finally, the system outputs the results and quantifies the differences. It generates a status view based on the first version and a status view based on the second version. These two views show the user's knowledge mastery under two different evaluation systems. The system then automatically identifies the correspondence between the two status views on the same knowledge dimension (such as the same subject ability dimension or core knowledge points that have not changed), calculates the numerical difference between the two in terms of status value, and visualizes and renders this difference and the detailed distribution data of the two views to generate an ability comparison chart.

[0092] Specifically, such as Figure 2 As shown, a diagram illustrating the comparison of user capabilities based on different versions of the graph is presented.

[0093] The five vertices of this competency comparison chart represent the five core knowledge dimensions in Chinese education: classical Chinese reading, modern Chinese appreciation, writing expression, basic knowledge, and oral communication.

[0094] The radial axis in the figure represents the normalized state value (i.e., mastery), ranging from 0.0 to 1.0. The scale lines indicate the numerical levels of 0.2, 0.4, 0.6, 0.8, and 1.0, respectively.

[0095] The dotted closed area in the figure represents the first user status view (based on the first version). For example, in the dimension of "Modern Literature Appreciation", the user's status value under the first version (such as the old curriculum standard) rules is about 0.60. The solid-lined closed area in the figure represents the second user status view (based on the second version). For example, in the "Modern Literature Appreciation" dimension, the status value of the same user under the second version (such as the new curriculum standard) rules rises to 0.80.

[0096] By comparing the differences in the intercepts of the two curves on the same axis, educators can intuitively see the impact of changes in evaluation criteria on assessment results. For example, the figure shows that in the "written expression" dimension, the second version of the evaluation criteria may be more stringent, causing the user's status value to drop slightly from 0.75 in the first version to 0.70; while in the "oral communication" dimension, the evaluation results of the two versions are basically the same (approximately 0.50 to 0.55).

[0097] Furthermore, the method also includes a model optimization step based on human feedback: First, a feedback receiving process is executed. When a teaching expert, teacher, or authorized user views the first user status view generated by the system, if they find that the mastery level of certain knowledge nodes displayed in the view is significantly inconsistent with the student's actual performance (for example, the system determines that the student has a low mastery level of "inverted sentences," but the teacher believes that the student has mastered it well based on classroom performance), the user can initiate a correction operation through the front-end interactive interface. The system then receives feedback correction instructions for the first user status view. These instructions explicitly include the ID of the target knowledge node to be corrected and the correction parameters of the status value given by the expert.

[0098] Next, the difference analysis and sample construction process is performed. The system compares the received correction parameters with the state values ​​originally calculated by the system to identify the root cause of the evaluation error. Usually, this error stems from the semantic analysis model's failure to accurately capture the deep semantic relationship between certain specific teaching resources (such as obscure question types or metaphorical texts) and the knowledge node, resulting in the calculated semantic similarity being too low or too high. The system uses this discrepancy to trace back and identify the key behavior logs that caused the deviation in the calculation of the node's state value, along with their corresponding original resource texts. These original resource texts are used as input data, and the correct knowledge nodes implied by the expert's correction intention are associated as labels, thereby generating high-quality training samples containing positive incentives or negative penalties.

[0099] Finally, the model fine-tuning and parameter update process is executed. The system adds the generated training samples to the preset training set and starts the fine-tuning training task of the semantic analysis model used to extract resource features. This training process uses the backpropagation algorithm to adjust the neural network weights inside the model, so that when the model processes similar Chinese teaching texts in the future, the generated resource semantic vectors can more accurately approach the correct knowledge feature vector data. After training is completed, the system updates the parameters of the semantic analysis model and redeploys it, that is, it replaces the old model parameters currently in use in the online service with the new parameters. Through this continuous iterative cycle, the system can continuously adapt to the new question types and semantic changes that emerge in the field of Chinese education, and gradually improve the robustness and intelligence of the entire knowledge graph construction method.

[0100] Finally, it should be noted that the above descriptions are merely preferred embodiments of the present invention and are not intended to limit the present invention. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing embodiments or make equivalent substitutions for some of the technical features. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.

Claims

1. A method for hierarchical construction of a Chinese education knowledge graph based on semantic association, characterized in that, Includes the following steps: S1: Obtain user learning interaction data, extract resource features from the learning interaction data to obtain a first user behavior log containing resource semantic vectors, and store the first user behavior log in the event log library. The data structure of the first user behavior log is independent of the hierarchical topology of any version of the knowledge graph. S2: In response to the user status query request of the target graph version, obtain the version identifier of the target graph version, and load the corresponding first graph association rule set from the configuration center according to the version identifier. The first graph association rule set includes the feature vector data of each knowledge node in the knowledge graph of this version and the hierarchical topology between nodes. S3: Based on the first graph association rule set, retrieve the second user behavior log corresponding to the user from the event log library; S4: Calculate the semantic similarity between the resource semantic vector of each behavior log in the second user behavior log and the feature vector data of each knowledge node in the target graph version, and determine the contribution weight of each behavior log to each knowledge node. S5: Based on the contribution weight and the behavior results of the behavior log, calculate the status value of each knowledge node of the user under the target graph version, and generate the first user status view. 2.The method of claim 1, wherein, Obtaining the first user behavior log containing resource semantic vectors specifically includes the following steps: Collect users' answer records, video viewing records, or reading records in the teaching system as learning interaction data; The learning interaction data is input into a pre-trained semantic analysis model for encoding, multi-dimensional content features are extracted, and the resource semantic vector is generated. The resource semantic vector is combined with the user identifier, generation timestamp, and operation type identifier in the learning interaction data to construct the first user behavior log. 3.The method of claim 2, wherein, Step S4 specifically includes the following steps: Calculate the cosine similarity between the resource semantic vector and the feature vector data; A first preset threshold is set. When the cosine similarity value is less than the first preset threshold, the contribution weight is reset to zero. When the cosine similarity value is greater than or equal to the first preset threshold, the cosine similarity value is normalized to obtain an initial weight value. Obtain the generation timestamp of the behavior log, calculate the time decay coefficient based on the difference between the generation timestamp and the current system time, and use the time decay coefficient to correct the initial weight value to obtain the contribution weight of the behavior log for the corresponding knowledge node. 4.The method of claim 3, wherein, The calculation of the time decay coefficient is based on the forgetting curve model, which is constructed according to the learning and memory patterns of Chinese educational knowledge. The time difference is negatively correlated with the decay coefficient.

5. The hierarchical construction method for Chinese educational knowledge graph based on semantic association according to claim 1, characterized in that, The calculation of the user's state value for each knowledge node under the target graph version specifically includes the following steps: In the target graph version, the set of leaf nodes and the set of parent nodes that have a hierarchical inclusion relationship with the set of leaf nodes are identified; Based on the contribution weight, calculate the basic state value of each leaf node of the user in the leaf node set; Obtain the hierarchical aggregation weight table; Based on the hierarchical aggregation weight table, the basic state values ​​of the leaf nodes are weighted and aggregated from bottom to top to obtain the comprehensive state value of the parent node.

6. The hierarchical construction method for Chinese educational knowledge graph based on semantic association according to claim 1, characterized in that, The method also includes a cache management mechanism, which specifically includes the following steps: The first user status view is stored in the cache database, and the graph version number, generation time and knowledge node set on which the first user status view depends are recorded. When a new user status query request is received, it is determined whether a second user status view matching the current user and the target graph version exists in the cache database; If the second user status view exists and is not marked as invalid, then the second user status view is output directly. If the second user status view does not exist or the second user status view has been marked as invalid, a recalculation process is triggered.

7. The hierarchical construction method for Chinese educational knowledge graph based on semantic association according to claim 6, characterized in that, The flag being marked as invalid is triggered based on at least one of the following conditions: A knowledge graph version change has been detected, and the change involves any node in the knowledge node set on which the second user state view depends. The difference between the current system time and the generation time of the second user status view exceeds the preset validity period; The real-time confidence level of the second user state view calculated based on the forgetting curve model is lower than the preset confidence threshold.

8. The hierarchical construction method for Chinese educational knowledge graph based on semantic association according to claim 7, characterized in that, Once the expiration flag is triggered, the method further includes a cache recovery step, specifically comprising the following steps: Statistical analysis of user system access frequency within a preset period; If the access frequency is greater than the activity threshold, the first graph association rule set corresponding to the current graph version is called asynchronously in the background to pre-calculate and update the first user status view of the user. If the access frequency is less than or equal to the activity threshold, the invalidation state is maintained until a real-time query request from the user is received and the calculation is synchronized.

9. The hierarchical construction method for Chinese educational knowledge graph based on semantic association according to claim 1, characterized in that, The method also includes a multi-version parallel computing step, specifically comprising the following steps: Receive a comparison query request containing the first version identifier and the second version identifier; Load the first graph association rule set corresponding to the first version identifier and the second graph association rule set corresponding to the second version identifier, respectively; For the same second user behavior log, a first calculation process based on the first graph association rule set and a second calculation process based on the second graph association rule set are executed in parallel. Output the state view based on the first version and the state view based on the second version respectively, and calculate the numerical difference between the two state views on the same knowledge dimension.

10. The hierarchical construction method for Chinese educational knowledge graph based on semantic association according to claim 2, characterized in that, The method also includes a model optimization step: Receive feedback correction instructions for the first user status view, the instructions including correction parameters for the knowledge node status values; Training samples are generated based on the difference between the correction parameters and the state values ​​of each knowledge node; The training samples are used to train a semantic analysis model for extracting resource semantic vectors, and the parameters of the semantic analysis model are updated.