A student performance analysis and feedback method and system based on natural language processing

By using a student performance analysis and feedback system based on natural language processing, the problems of single analysis dimensions, delayed feedback, and lack of personalized guidance in the education evaluation system have been solved. It enables a comprehensive and detailed analysis of students' learning status and personalized path planning, thereby improving educational effectiveness.

CN122243694APending Publication Date: 2026-06-19CHANGJIANG INST OF TECH +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CHANGJIANG INST OF TECH
Filing Date
2026-03-10
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

The existing education evaluation system is unable to reveal students' cognitive obstacles and ability shortcomings in the learning process. Its analysis dimensions are too narrow, feedback is delayed and template-based, it lacks personalized guidance, teacher-student collaboration efficiency is low, and it lacks a data-driven self-evolution loop.

Method used

The student performance analysis and feedback system based on natural language processing achieves a comprehensive, detailed, and in-depth analysis of students' learning status through data collection, feature extraction and knowledge graph construction, multi-dimensional analysis engine, natural language generation and feedback module, personalized path recommendation and visual interactive dashboard, and forms a self-evolving closed loop through dynamic path planning and teacher collaborative intervention.

Benefits of technology

It enables a comprehensive, detailed, and quantifiable in-depth analysis of students' learning status, generates personalized learning paths, and achieves intelligent, dynamic, and highly customized feedback content and intervention plans, thereby improving educational effectiveness and continuously optimizing system effectiveness through a closed-loop feedback mechanism.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122243694A_ABST
    Figure CN122243694A_ABST
Patent Text Reader

Abstract

This invention discloses a method and system for student performance analysis and feedback based on natural language processing, comprising: a data acquisition module, a feature extraction and knowledge graph construction module, a multi-dimensional analysis engine module, a natural language generation and feedback module, a personalized path recommendation module, a visual interactive dashboard module, and a teacher collaborative intervention module. This system integrates multi-source heterogeneous data and utilizes knowledge graphs, deep learning models, source tracing algorithms, ability assessment algorithms, the Transformer large language model, Tecalo Tree Search (MCTS) combined with the A* search algorithm to achieve a comprehensive and detailed in-depth analysis of students' learning status, realizing intelligent, dynamic, and customized learning feedback and intervention. This invention utilizes large language models and artificial intelligence technology to form a complete solution from data acquisition, intelligent analysis, personalized feedback to path planning and closed-loop optimization, realizing an intelligent educational approach from root cause diagnosis to precise intervention.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of educational technology, and in particular to a method and system for student performance analysis and feedback based on natural language processing. Background Technology

[0002] In the field of education, especially at the primary, secondary, and vocational education levels, accurate assessment, in-depth analysis, and personalized feedback of student learning outcomes are core elements for improving educational quality and achieving individualized instruction. Current educational evaluation systems primarily rely on quantitative data such as structured exam scores and assignment grades (e.g., students' historical grades in various subjects, scores from online quizzes, and completion and quality ratings of online assignments). While these data reflect outcome-level performance, they fail to reveal students' learning processes, cognitive obstacles, skill gaps, and their underlying causes. Traditional manual analysis methods suffer from insufficient data depth, limited analytical dimensions, delayed and template-based feedback, lack of personalized guidance, and low efficiency in teacher-student collaboration when addressing student academic issues. With the development of science and technology and the rise of artificial intelligence, student performance analysis methods based on AI technology have emerged in the education field. For example, Chinese patent CN120387908A discloses a user data analysis method based on big data, which uses big data analysis and machine learning algorithms to deeply mine student learning data, providing teachers with scientific educational decisions. However, analyzing students' learning status requires not only comprehensive and detailed in-depth analysis, but also intelligent, dynamic, and customized learning feedback and intervention, forming a data-driven self-evolutionary closed loop of "analysis-intervention-feedback-optimization" and continuously improving educational effectiveness through a closed-loop feedback mechanism.

[0003] Therefore, how to improve educational effectiveness through a data-driven self-evolution loop of "analysis-intervention-feedback-optimization" in the educational process is an urgent technical problem that needs to be solved. Summary of the Invention

[0004] In view of this, the present invention proposes a student performance analysis and feedback method and system based on natural language processing to solve the problems of insufficient data depth mining, single analysis dimension, delayed and template-based feedback, lack of personalized guidance, low efficiency of teacher-student collaboration, and lack of data-driven self-evolution closed loop.

[0005] To achieve the above objectives, the technical solution of the present invention is implemented as follows: On the one hand, the present invention provides a student performance analysis and feedback system based on natural language processing, including a data acquisition module, a feature extraction and knowledge graph construction module, a multi-dimensional analysis engine module, a natural language generation and feedback module, a personalized path recommendation module, a visual interactive dashboard module, and a teacher collaborative intervention module; The data acquisition module is connected to the academic affairs database, the learning management system, and the feature extraction and knowledge graph construction module, and is used to collect multi-source heterogeneous data to form structured data and unstructured data. The feature extraction and knowledge graph construction module is connected to the data acquisition module. It is used to receive data from the data acquisition module, extract entities and relationships from unstructured text data, construct a domain knowledge graph centered on students and associated with "subject knowledge points - ability performance - cognitive state", and extract statistical features of performance data to form a multi-dimensional feature vector. The multi-dimensional analysis engine module is connected to the feature extraction and knowledge graph construction module. It is used to load the knowledge graph and feature vectors constructed by the feature extraction and knowledge graph construction module, predict the future performance range and potential risk points of students, reason and locate the knowledge gaps and conceptual confusions of individual students and groups on the learning path, integrate performance data and text sentiment / cognitive complexity analysis results, and generate a quantitative ability set of students in different cognitive dimensions such as analysis, application, and evaluation. The natural language generation and feedback module is connected to the multi-dimensional analysis engine module and is used to receive the quantitative analysis results from the multi-dimensional analysis engine module and automatically convert the numerical analysis results and graph visualization information into a structured, context-relevant natural language analysis report. The personalized path recommendation module is connected to the multi-dimensional analysis engine module. Based on the source tracing results of weak knowledge points and the prediction results of the performance evolution trend, it calls the dynamic path optimization algorithm to search and generate a personalized learning intervention path from the student's current cognitive state to the target mastery state on the knowledge graph, forming path nodes of the learning resource sequence. The visual interactive dashboard module is connected to the multi-dimensional analysis engine module, the natural language generation and feedback module, and the personalized path recommendation module, and is used to present the data results in a graphical manner. The teacher collaborative intervention module is connected to the multi-dimensional analysis engine module, the natural language generation and feedback module, the personalized path recommendation module, and the visual interactive dashboard module. It provides teachers with a management interface to view the overall analysis overview and abnormal warning list of all students in the class, review and annotate the feedback reports automatically generated by the system, manually adjust and confirm the paths generated by the personalized path recommendation module, and distribute them to designated students.

[0006] Furthermore, the structured data includes students' historical scores for each subject, scores for in-class online quizzes, and completion rates and quality scores for online assignments; The unstructured data includes teachers' online comments text, students' online classroom questions, and online discussion forum content; The entities in the unstructured text data include subject concepts, ability dimensions, and common error types.

[0007] Furthermore, the multi-dimensional analysis engine module includes a performance evolution trend prediction submodule, a weak knowledge point tracing submodule, and a cognitive ability multi-dimensional assessment submodule; The performance evolution trend prediction submodule uses a deep learning model to predict students' future performance range and potential risk points; The weak knowledge point tracing submodule is based on knowledge graphs and uses tracing algorithms to reason and locate the knowledge gaps and conceptual confusions of individual students and groups in their learning paths; The multi-dimensional cognitive ability assessment submodule uses an ability assessment algorithm to integrate performance data and text sentiment / cognitive complexity analysis results to generate a quantitative assessment of students' cognitive dimensions in analysis, application, and evaluation.

[0008] Furthermore, the natural language analysis report includes a module overview, specific data insights, targeted learning suggestions, and risk warnings and incentives expressed in gentle and encouraging language; The learning resources include, but are not limited to, micro-lessons, exercises, and reading materials; The graphical presentation methods include line graphs of students' personal growth trends, heat maps of weaknesses based on knowledge graphs, radar charts of multidimensional ability assessments, or Gantt charts of personalized learning paths.

[0009] On the other hand, the present invention provides a method for student performance analysis and feedback based on natural language processing, comprising the following steps: S1. Periodically or trigger-based access to the academic affairs database and learning management system to complete the cleaning, de-identification, and standardized storage of raw data; S2. Use a pre-trained natural language processing model to process the structured and unstructured data of all students, update the global knowledge graph, and generate a personalized sub-graph for each student with their personal performance feature vector attached. S3. Based on the personalized sub-map output in step S2, run a deep learning model and use source tracing algorithm and capability assessment algorithm to generate a set of quantitative analysis results covering the past, present and future; S4. Receive the quantitative analysis result set from step S3, generate a structured analysis report using a natural language generation algorithm, and push it to students via email or integrated application interface. S5. Based on the quantitative analysis results set of step S3, call the dynamic path optimization algorithm to dynamically plan the optimal learning intervention sequence and generate specific path suggestions including resource list, time arrangement and expected goals; S6. Present the output results of steps S2, S3, S4 and S5 in a graphical manner; S7. Teachers use the management interface to view the student analysis overview and abnormal warning list, review feedback reports, manually adjust and confirm personalized paths, and distribute them to designated students. S8. The system continuously collects students' follow-up actions on the feedback data and uses this data as a feedback loop to optimize the emotional expression fit and path recommendation accuracy of the natural language generation model.

[0010] Furthermore, in step S2, the pre-trained natural language processing model is a BERT model that has been fine-tuned by combining educational domain corpus. It is used to extract entities containing attributes such as "mastery of knowledge points", "learning methods", and "attitude performance" and their "containment", "association", and "hindrance" relationships from teacher comments and student texts.

[0011] Furthermore, in step S3, the deep learning model is a Long Short-Term Memory (LSTM) network model, used to predict the student's future performance range and potential risk points. The source tracing algorithm specifically includes the following steps: L1. Map students' answer records to corresponding nodes in the knowledge graph; L2. Calculate the anomaly propagation score of each node using the Graph Convolutional Network (GCN) to identify the preceding dependent nodes in the knowledge graph that are strongly associated with the erroneous knowledge points. L3. Based on the PageRank algorithm, the importance of the identified prerequisite nodes is ranked, and the most critical and urgent weak knowledge chains of students are output.

[0012] Furthermore, in step S3, the capability assessment algorithm specifically includes the following steps: M1. Analyze the text features of open-ended questions answered by students using natural language processing techniques, including semantic complexity, frequency and diversity of logical connectors. M2. Combining Bloom's taxonomy of cognitive objectives, the above text features are fused with the feature values ​​of students' structured answer results through multimodal feature fusion. M3 uses a classifier to map student performance to a quantitative scoring vector across six cognitive dimensions: memory, comprehension, application, analysis, evaluation, and creation.

[0013] Further, in step S4, the natural language generation algorithm specifically includes the following steps: N1. Input a structured data template, which contains placeholders and whose content comes from the analysis results of the multi-dimensional analysis engine module. N2. Based on the Transformer-based large language model, and using the structured data template as a foundation, the model selects and fills in the most suitable explanatory sentences and vocabulary according to the student's historical reports, current performance differences, and contextual background, and automatically generates coherent, fluent, and personalized paragraph text. N3. The system uses built-in adjustment parameters for various expression styles to stylize the paragraph text generated by N2, making its language expression more suitable for the psychological characteristics and cognitive levels of students of different age groups.

[0014] Furthermore, in step S5, the dynamic path optimization algorithm specifically includes the following steps: P1. The set of knowledge nodes currently mastered by students and the set of target nodes expected to be achieved are respectively used as the starting point set and the ending point set of the optimization path; P2. In the knowledge graph, the historical learning speed and knowledge transformation efficiency of individual students are set as edge weight adjustment factors. P3. Using Monte Carlo Tree Search (MCTS) combined with the A* search algorithm, the learning resource sequence with the highest comprehensive evaluation score is found while satisfying the total learning time constraint. The evaluation score is calculated by weighting the smoothness of the overall path difficulty, the logical connection between knowledge points, and the matching degree of similar successful paths in the past.

[0015] The student performance analysis and feedback method and system based on natural language processing proposed in this invention have the following advantages over existing technologies: (1) Through knowledge graphs, deep learning models, source tracing algorithms, and ability assessment algorithms, a comprehensive, refined, and quantifiable in-depth analysis of the learning state from the "past" to the "future", from the "appearance" to the "root cause" and "cognitive dimension" has been achieved; (2) Based on the Transformer large language model, the natural language analysis report is automatically generated. The Monte Carlo Tree Search (MCTS) combined with the A* search algorithm is used to dynamically plan the optimal learning resource intervention sequence in the complex knowledge network, forming a personalized learning path. This realizes the intelligent, dynamic and highly customized nature of feedback content and intervention plan. (3) The system, through a visual interactive dashboard and a dedicated teacher collaborative intervention module, enables the efficient integration of the deep analysis capabilities of artificial intelligence with the professional teaching experience of teachers, forming a "human-machine collaborative" decision-making model. In addition, the system continuously collects students' follow-up data on the intervention, which is used to optimize the emotional expression of the generated model and the accuracy of the recommendation algorithm, forming a data-driven self-evolution closed loop of "analysis-intervention-feedback-optimization", and continuously improving the system efficiency through the closed-loop feedback mechanism. Attached Figure Description

[0016] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0017] Figure 1 This is a structural diagram of the student performance analysis and feedback system based on natural language processing according to the present invention; Figure 2 This is a flowchart illustrating the workflow of the student performance analysis and feedback method based on natural language processing according to the present invention. Detailed Implementation

[0018] The technical solutions of the present invention will be clearly and completely described below with reference to the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of the present invention.

[0019] Example 1: As Figure 1 As shown, the student performance analysis and feedback system based on natural language processing provided in Embodiment 1 of the present invention includes a data acquisition module, a feature extraction and knowledge graph construction module, a multi-dimensional analysis engine module, a natural language generation and feedback module, a personalized path recommendation module, a visual interactive dashboard module, and a teacher collaborative intervention module. Among them: the data acquisition module is connected to the academic affairs database, the learning management system and the feature extraction and knowledge graph construction module, and is used to collect multi-source heterogeneous data to form structured data and unstructured data; The feature extraction and knowledge graph construction module is connected to the data acquisition module. It is used to receive data from the data acquisition module, extract entities and relationships from unstructured text data, construct a domain knowledge graph centered on students and related to "subject knowledge points - ability performance - cognitive state", and extract statistical features of performance data to form a multi-dimensional feature vector. The multi-dimensional analysis engine module, connected to the feature extraction and knowledge graph construction module, is used to load the knowledge graph and feature vectors constructed by the feature extraction and knowledge graph construction module, predict students' future performance range and potential risk points, reason and locate the knowledge gaps and conceptual confusions of individual students and groups on the learning path, integrate performance data and text sentiment / cognitive complexity analysis results, and generate a quantitative ability set of students in different cognitive dimensions such as analysis, application, and evaluation. The natural language generation and feedback module is connected to the multi-dimensional analysis engine module. It is used to receive the quantitative analysis results from the multi-dimensional analysis engine module and automatically convert the numerical analysis results and graph visualization information into a structured, context-relevant natural language analysis report. The personalized path recommendation module, connected to the multi-dimensional analysis engine module, is used to call the dynamic path optimization algorithm based on the source tracing results of weak knowledge points and the prediction results of the evolution trend of grades. It searches on the knowledge graph and generates a personalized learning intervention path from the student's current cognitive state to the target mastery state, forming path nodes of the learning resource sequence. The visual interactive dashboard module, connected to the multi-dimensional analysis engine module, natural language generation and feedback module, and personalized path recommendation module, is used to present data results in a graphical manner. The teacher collaborative intervention module connects with the multi-dimensional analysis engine module, natural language generation and feedback module, personalized path recommendation module, and visual interactive dashboard module. It provides teachers with a management interface to view the overall analysis overview and abnormal warning list of all students in the class, review and annotate the feedback reports automatically generated by the system, manually adjust and confirm the paths generated by the personalized path recommendation module, and distribute them to designated students.

[0020] Furthermore, in this embodiment 1, the structured data includes students' historical scores for each subject, scores for in-class online quizzes, and completion rates and quality scores for online assignments; the unstructured data includes teachers' online comments, students' online classroom questions, and content from online discussion forums; the entities in the unstructured text data include subject concepts, ability dimensions, and common error types.

[0021] Furthermore, in this embodiment 1, the multi-dimensional analysis engine module includes a performance evolution trend prediction sub-module, a weak knowledge point tracing sub-module, and a cognitive ability multi-dimensional assessment sub-module; Among them, the performance evolution trend prediction submodule uses a deep learning model to predict students' future performance range and potential risk points; the weak knowledge point tracing submodule is based on knowledge graphs and uses tracing algorithms to reason and locate the knowledge gaps and conceptual confusions of individual students and groups on the learning path; The multi-dimensional cognitive ability assessment submodule uses an ability assessment algorithm to integrate academic performance data and text sentiment / cognitive complexity analysis results to generate quantifications of students' performance in different cognitive dimensions of analysis, application, and evaluation.

[0022] Furthermore, in this embodiment 1, the natural language analysis report includes a module overview, specific data insights, targeted learning suggestions, and risk warnings and incentives expressed in gentle and encouraging language; Learning resources include, but are not limited to, micro-lessons, exercises, and reading materials; The graphical presentation methods include line graphs of students' personal growth trends, heat maps of weaknesses based on knowledge graphs, radar charts of multidimensional ability assessments, or Gantt charts of personalized learning paths.

[0023] Specifically, the student performance analysis and feedback system based on natural language processing provided in Embodiment 1 employs a data acquisition module, a feature extraction and knowledge graph construction module, a multi-dimensional analysis engine module, a natural language generation and feedback module, a personalized path recommendation module, a visual interactive dashboard module, and a teacher collaborative intervention module to realize a student performance analysis and feedback system based on natural language processing. This system integrates multi-source heterogeneous data and utilizes knowledge graphs, deep learning models, source tracing algorithms, ability assessment algorithms, Transformer large language models, Tecalo Tree Search (MCTS) combined with A* search algorithms to conduct a comprehensive and detailed in-depth analysis of students' learning status. This achieves intelligent, dynamic, and customized learning feedback and intervention, realizing intelligent educational methods from root cause diagnosis to precise intervention. Through design, simulation, and verification, a modular product is formed, enabling rapid portability between different platforms and accelerating the product development process.

[0024] Example 2: As Figure 2 As shown, the student performance analysis and feedback method based on natural language processing provided in Embodiment 2 of the present invention adopts the student performance analysis and feedback system based on natural language processing as described in Embodiment 1, and specifically includes the following steps: S1. Periodically or triggeredly access the academic affairs database and learning management system to clean, de-identify, and standardize the raw data before storing it in the database; the raw data includes students' structured and unstructured data. S2. Use a pre-trained natural language processing model to process the structured and unstructured data of all students, update the global knowledge graph, and generate a personalized sub-graph for each student with their personal performance feature vector attached. S3. Based on the personalized sub-map output in step S2, run a deep learning model and use source tracing algorithm and capability assessment algorithm to generate a set of quantitative analysis results covering the past, present and future; S4. Receive the quantitative analysis result set from step S3, generate a structured analysis report using a natural language generation algorithm, and push it to students via email or integrated application interface. S5. Based on the quantitative analysis results set of step S3, call the dynamic path optimization algorithm to dynamically plan the optimal learning intervention sequence and generate specific path suggestions including resource list, time arrangement and expected goals; S6. Present the output results of steps S2, S3, S4 and S5 in a graphical manner; S7. Teachers use the management interface to view the student analysis overview and abnormal warning list, review feedback reports, manually adjust and confirm personalized paths, and distribute them to designated students. S8. The system continuously collects students' follow-up actions on the feedback data and uses this data as a feedback loop to optimize the emotional expression fit and path recommendation accuracy of the natural language generation model.

[0025] Furthermore, in this embodiment 2, in step S2, the pre-trained natural language processing model is a BERT model that is fine-tuned by combining educational domain corpus, used to extract entities containing attributes such as "knowledge point mastery", "learning methods", and "attitude performance" and their "containment", "association", and "hindrance" relationships from teacher comments and student texts.

[0026] Furthermore, in this embodiment 2, in step S3, the deep learning model is a Long Short-Term Memory (LSTM) network model, which is used to predict the student's future performance range and potential risk points.

[0027] Furthermore, in this embodiment 2, step S3 specifically includes the following steps in the source tracing algorithm: L1. Map students' answer records to corresponding nodes in the knowledge graph; L2. Calculate the anomaly propagation score of each node using the graph convolutional network (GCN) to identify the preceding dependent nodes in the graph that are strongly associated with the erroneous knowledge points. L3. Based on the PageRank algorithm, the importance of the identified prerequisite nodes is ranked, and the most critical and urgent weak knowledge chains of students are output.

[0028] Furthermore, in this embodiment 2, step S3 specifically includes the following steps in the capability assessment algorithm: M1. Analyze the text features of open-ended questions answered by students using natural language processing techniques, including semantic complexity, frequency and diversity of logical connectors. M2. Combining Bloom's taxonomy of cognitive objectives, the text features (i.e., the text features analyzed in step M1) are fused with the feature values ​​of the students' structured answer results through multimodal feature fusion. M3 uses a classifier to map student performance to a quantitative scoring vector across six cognitive dimensions: memory, comprehension, application, analysis, evaluation, and creation.

[0029] Furthermore, in this embodiment 2, step S4 of the natural language generation algorithm specifically includes the following steps: N1. Input a structured data template, which contains placeholders. The content of the structured data template comes from the analysis results of the multi-dimensional analysis engine module. N2. Based on the Transformer-based large language model, using the structured data template input in step N1 as a foundation, and according to the student's historical reports, current performance differences, and context, selects and fills in the most suitable explanatory sentence patterns and vocabulary to automatically generate coherent, fluent, and personalized paragraph text. N3. The system uses built-in adjustment parameters for various expression styles to stylize the paragraph text generated by N2, making its language expression more suitable for the psychological characteristics and cognitive levels of students of different age groups.

[0030] Furthermore, in this embodiment 2, step S5 specifically includes the following steps: P1. The set of knowledge nodes currently mastered by students and the set of target nodes expected to be achieved are respectively used as the starting point set and the ending point set of the optimization path; P2. In the knowledge graph, the historical learning speed and knowledge transformation efficiency of individual students are set as edge weight adjustment factors. P3. Using Monte Carlo Tree Search (MCTS) combined with the A* search algorithm, the learning resource sequence with the highest comprehensive evaluation score is found while satisfying the total learning time constraint. The evaluation score is calculated by weighting the smoothness of the overall path difficulty, the logical connection between knowledge points, and the matching degree of similar successful paths in the past.

[0031] Specifically, in this embodiment 2, the deep learning model, the source tracing algorithm, the capability assessment algorithm, the Transformer large language model, and the Tecalo Tree Search (MCTS) combined with the A* search algorithm were all compiled using the standard Python language under the Windows operating system.

[0032] like Figure 2 The diagram shows the workflow of the student performance analysis and feedback method based on natural language processing provided in Embodiment 2 of the present invention. First, the system cleans, de-identifies, and standardizes the raw data before storing it in the database. Then, it generates a personalized sub-map for each student with their individual performance feature vector attached. Next, it runs a deep learning model and uses source tracing and ability assessment algorithms to generate a quantitative analysis result set. Then, it uses a natural language generation algorithm to generate a structured analysis report and pushes it to the student. Next, it calls a dynamic path optimization algorithm to generate specific path suggestions and presents the result data in a graphical manner. Then, the teacher reviews the feedback report, adjusts and confirms it, and distributes it to the designated student. Finally, the system continuously collects student feedback data and continuously optimizes the model.

[0033] Specifically, Embodiment 1 and Figure 2 The provided student performance analysis and feedback method and system based on natural language processing (NLP) is compiled using Python on a Windows operating system. Employing a modular design approach, it utilizes modules for data acquisition, feature extraction and knowledge graph construction, multi-dimensional analysis engine, natural language generation and feedback, personalized path recommendation, a visual interactive dashboard, and teacher collaborative intervention. This system integrates multi-source heterogeneous data and leverages knowledge graphs, deep learning models, source tracing algorithms, ability assessment algorithms, Transformer large language models, Tecalo Tree Search (MCTS) combined with A* search algorithms to conduct comprehensive and detailed in-depth analysis of students' learning status. This enables intelligent, dynamic, and customized learning feedback and intervention, realizing an intelligent educational approach from root cause diagnosis to precise intervention.

[0034] To further illustrate the student performance analysis and feedback method based on natural language processing provided in Embodiment 2 of the present invention, the following uses the first semester of the first year of undergraduate course "Advanced Mathematics (Part 1)" at a key university in China as an application scenario to demonstrate in detail how the system provides students with accurate and personalized learning feedback and intervention by deeply mining and analyzing learning data from core chapters such as limits, derivatives, and integrals.

[0035] The system is deployed in a SaaS model on the university's private cloud platform. Through the campus data platform, the system has established secure and real-time data interface connections with the academic affairs management system, the small-scale restricted online course platform SPOC, and the online assignment system MathXL. An account system was established for the course instructors of "Advanced Mathematics (Volume 1)" (3 main lecturers and 5 teaching assistants) and the 1200 enrolled freshmen. A global knowledge graph for "Advanced Mathematics (Volume 1)" was initialized based on the syllabus. The top-level nodes of the graph include: [Functions and Limits], [Derivatives and Differentials], [Mean Value Theorem and Applications of Derivatives], [Indefinite Integrals], [Definitive Integrals and Their Applications], and [Differential Equations]. Each top-level node is further subdivided into three levels of sub-nodes, such as [Derivatives and Differentials] → [Derivative Calculation] → [Composite Function Differentiation (Chain Rule)].

[0036] Step S1: Multi-source data acquisition and cleaning for "Advanced Mathematics (Volume 1)" Data Sources and Collection: 1. Academic Affairs System: Collect structured data from 1200 students, including: entrance math placement scores, mid-term / final exam scores, and scores from four chapter tests (limits, derivatives, mean value theorem, integrals). 2. SPOC Platform: Collect unstructured and behavioral data periodically via API, including: (1) Video Learning Logs: Viewing duration and pause / replay sequence for each micro-lecture video (e.g., "Using equivalent infinitesimals to find limits"); (2) Discussion Forum Text: Questions posted by students, such as "Teacher, how to accurately construct the auxiliary function f(x) when proving inequalities using the Lagrange mean value theorem?" and corresponding replies; (3) Text Answers: Answers to open-ended questions (e.g., "Please briefly describe the geometric meaning of the Newton-Leibniz formula and its application conditions"). 3. Online Homework System: Collect the answer results (correct / incorrect), answer time, and distribution of incorrect options for each question in each homework assignment. 4. Data preprocessing: All grade data are standardized to a percentage system; Chinese word segmentation and stop word removal are performed on text data (discussion posts, answers); video behavior data are converted into feature indicators such as "effective learning time" (excluding long periods of inactivity) and "frequency of relearning difficult points".

[0037] Step S2: Construction and Feature Extraction of the Domain Knowledge Graph for "Advanced Mathematics (Volume 1)" Entity and Relation Extraction: 1. Model Invocation: The BERT model, finely tuned on corpora of higher education mathematics textbooks, academic papers, and past postgraduate entrance examination questions, is used to process text data from students and teachers.

[0038] For example, from the teacher's comment: "Student B has a weak geometric intuition of the Mean Value Theorem, which makes it difficult to flexibly construct auxiliary functions in proof problems," extract the entities: [Mean Value Theorem] (concept), [geometric intuition] (understanding dimension), [proof problem] (problem type), [constructing auxiliary functions] (method / ability). And identify the relationships: [weak in], [leads to], [applied to].

[0039] From a student's question: "Why calculate..." When performing this type of integration, sometimes integration by parts is used, and sometimes substitution is considered? Extract the entities from: [Integration by parts], [Integration by substitution], and [Selection of integration techniques]. Also identify the relationships: [Comparison] and [Applicable to].

[0040] Building personalized student profiles: Taking student B as an example, the system uses the extracted entities as nodes and relationships as edges, and associates them with student B's structured data (such as a score of 65% in the chapter quiz on "Proof of the Mean Value Theorem of Differentials" compared to the class average of 75%) to form a personalized knowledge sub-graph of "Advanced Mathematics (Volume 1)" for student B. In this graph, the node [Constructing Auxiliary Functions] is marked as "Mastery: Low" and forms associated edges with the nodes [Lagrange Mean Value Theorem], [Roll's Theorem], and [Function Morphology Analysis]. Simultaneously, the system generates a multi-dimensional feature vector to describe student B, including: computational proficiency (0.8), theorem comprehension depth (0.6), proof construction ability (0.5), and learning engagement stability (0.7).

[0041] Step S3: In-depth analysis using a multi-dimensional analysis engine (advanced mathematics scenario) The analysis engine runs three sub-modules in parallel to analyze student B: 1. Performance Evolution Trend Prediction Submodule: Employs the Long Short-Term Memory (LSTM) network model.

[0042] Input: Student B's score sequence in four tests: "Limits", "Derivative Calculation", "Derivative Applications", and "Mean Value Theorem" [82, 78, 70, 65].

[0043] Analysis: The model identified a monotonically decreasing trend in scores with an accelerating rate of decline. Based on time series prediction, the output prediction result is: In the test of the next chapter, "Indefinite Integrals," the predicted score range is [58, 68], with an 85% probability of falling into the high-risk zone of "Learning Difficulty (<70 points)." The system generates an orange alert.

[0044] 2. Sub-module for tracing the origins of weak knowledge points Input: All the incorrect answers student B got in the "Mean Value Theorem" unit test.

[0045] Traceability process: L1 (mapping): The incorrect question "Prove: When x>0, "Mapped to the knowledge graph nodes [Application of Lagrange's Mean Value Theorem] and [Construction of Auxiliary Functions]."

[0046] L2 (Graph Convolution Propagation): Anomaly score propagation calculations are performed within the graph neighborhood of the node using a Graph Convolutional Network (GCN). It was found that not only was the target node's anomaly score high (8.9 / 10), but its upstream nodes also exhibited significant anomalies in [the geometric interpretation of the Lagrange Mean Value Theorem] (7.5 / 10) and the more fundamental [determination of function monotonicity and derivative sign] (6.8 / 10).

[0047] L3 (Sorting and Output): Use the PageRank algorithm to sort the set of outlier nodes by importance, and output the most critical weak knowledge chain: [Weak foundation in function morphology analysis] → [Fluctuation of understanding of the geometric interpretation of the mean value theorem] → [Inability to effectively construct inequality forms] [Such auxiliary functions]. The source tracing report clearly points out that the root cause of the problem is not the Mean Value Theorem itself, but rather insufficient analytical ability in applying derivatives. 3. Multidimensional assessment submodule of cognitive ability Evaluation process: M1 (Text Analysis): An analysis was conducted on student B's response to the open-ended question, "Discuss your understanding of the 'limit process' in the definition of the derivative." Natural Language Processing (NLP) techniques revealed that the response was primarily descriptive, lacking in-depth elucidation vocabulary such as "essentially describes..." or "the key lies in...", and using a limited range of logical conjunctions.

[0048] M2 (Multimodal Fusion): This method fuses the text feature vectors mentioned above with features such as accuracy and speed in answering structured questions (e.g., calculation problems that use the definition of derivatives to find limits).

[0049] M3 (Cognitive Dimension Mapping): The classifier, based on Bloom's Taxonomy of Cognition, outputs student B's quantitative scores across six dimensions: Memory: 88, Comprehension: 75, Application: 72, Analysis: 55, Evaluation: 48, Creativity: 30. The conclusion shows that student B is at a critical bottleneck in the transition from lower-order cognition (memory, comprehension) to higher-order cognition (application, analysis), particularly exhibiting significantly lower abilities in the dimensions of "analysis" (breaking down complex problems) and "evaluation" (judging the merits of methods). This directly leads to their difficulties in proof and comprehensive application problems.

[0050] Step S4: Generate a personalized analysis report on advanced mathematics Template Filling and Generation: The system calls upon a rigorous, rational, yet constructive report template designed for university science and engineering students. It receives all the analysis results from step S3 based on a large Transformer language model. Referencing student B's past reports, it transforms the quantitative data into a coherent and insightful text.

[0051] Example of core paragraphs for generating a report: Hello, your learning data analysis for "Advanced Mathematics (Volume 1)" has been completed. The data shows that you encountered significant challenges in the "Mean Value Theorem of Differentials" unit, which may be a key point in your recent downward trend in grades.

[0052] A deeper diagnosis revealed that, specifically regarding the loss of points in proof questions, the superficial reason was a lack of familiarity with the technique of "constructing auxiliary functions." However, root cause analysis pointed out that the deeper reason might lie in insufficient sensitivity to the fundamental tool of "the correspondence between the sign of a function's derivative and its monotonicity." This makes it difficult to deduce from the form of the inequality to be proved which function needs to be constructed to apply the Midpoint Theorem. This is a typical challenge of the mental transformation encountered when leaping from "computational application" to "logical analysis."

[0053] Your cognitive ability map shows that you have significant room for improvement in the "analysis" and "evaluation" dimensions. It is recommended that you temporarily refrain from pursuing a large number of challenging problems and instead focus on "detailed analysis training": for each incorrect problem, not only redo it, but also write down "why I thought of this solution," "are there other paths," and "what are the advantages and disadvantages of each path." This will systematically upgrade your mathematical thinking patterns.

[0054] Push notification: This detailed report, exceeding 1200 words, was precisely pushed to student B via the email address linked to the school's unified identity authentication and SPOC internal messaging system.

[0055] Step S5: Planning a personalized path to improve advanced mathematics Path optimization algorithm execution: P1 (Define start and end points): Start point set = {Node A (Determining the monotonicity of a function, mastery level 68%), Node B (Statement of the Lagrange Mean Value Theorem, mastery level 90%)}. End point set = {Node C (Proving inequalities using the Mean Value Theorem, mastery level > 85%), Node D (Method selection in comprehensive problems, mastery level > 80%)}.

[0056] P2 (Setting Weights): Based on student B's historical learning data, the weight of the side that watches a "Concept Explanation" video and completes basic exercises is set to 1.0 units of time, while the weight of the side that completes a set of "Reverse Construction Analysis" specialized training is set to 1.8 units of time.

[0057] P3 (Monte Carlo Tree Search + MCTS combined with A* algorithm for optimization): Under the constraint of "4 hours of advanced mathematics time available per week for the next two weeks", the algorithm is simulated and evaluated in tens of thousands of virtual paths (e.g., watching geometric animation → doing 3 basic monotonicity judgment questions → carefully studying the construction idea decomposition of 1 classic example → completing 2 semi-construction questions that "give the conclusion and deduce the function" → independently completing 2 complete proof questions...).

[0058] Path generation scheme: The algorithm ultimately generates a personalized 12-day route, with the core sequence as follows: Days 1-2 (Consolidating the foundation): Review the micro-lesson "The sign of the derivative and the changes in the function shape", and complete 5 special questions on "determining the monotonic intervals of f(x) solely through the graph of f'(x)".

[0059] Days 3-5 (Bridge Building): Learn the interactive case study "Geometric Demonstration of the Mean Value Theorem: Which chord is parallel to the tangent?", and complete 3 transitional exercises "Given f(x) and the conclusion, please write the expression of the Lagrange Mean Value Theorem and find the possible ξ".

[0060] Days 6-9 (Core Breakthrough): Practice the reverse derivation of steps in two classic inequality proof problems (the system will shuffle and hide parts of the standard solution steps, and ask student B to select or fill in "What is the purpose of the previous step?").

[0061] Days 10-12 (Integrated Application): Independently complete 3 inequality proof problems of progressive difficulty. After completing each problem, the system will push a "variation problem" (such as changing the form or conclusion of the inequality) and ask: "Is your method still valid? If not, how should you adjust it?" Output: The path is presented to student B and their instructor in the form of a Gantt chart and a list, with direct links to all resources.

[0062] Steps S6 and S7: Teacher-led visual intervention Teacher Dashboard: After logging in, the system dashboard displays the students' mastery of each knowledge point in a heatmap. The teacher immediately sees the "Application of the Mean Value Theorem" node change from yellow to red. After clicking, the system automatically clusters a group of 15 students, including student B, who have difficulty constructing auxiliary functions.

[0063] Review and Intervention: Teachers can review the analysis reports generated by the system for these 15 students one by one. In the suggestions section of student B's report, the teacher adds the annotation: "Students can be recommended to refer to the geometric interpretation of the 'Fundamental Theorem of Differential Calculus' in Chen Jixiu's 'Mathematical Analysis' to strengthen their intuitive understanding." Based on the general improvement path generated by the system for the "difficult group", the teacher made manual adjustments: inserting a group collaboration task into the path - "in groups of three, discuss and summarize three common motivations for constructing auxiliary functions (simplification, shape matching, zero point)", and setting this task as the third day activity of the path.

[0064] The teacher confirms with one click and distributes the adjusted route plan to the 15-person group.

[0065] Step S8: Closed-loop feedback and model optimization After student B follows this path, their new answer data (such as accuracy in "reverse derivation training" and performance on "variant questions") and the time spent learning new resources are collected by the system. This data is used for: 1. Optimize the Large Language Model (LLM): Verify whether the expression "mindset transformation challenge" in the report accurately stimulates students' identification and willingness to improve, and optimize the expression strategy of "cause attribution" in future reports.

[0066] 2. Optimize the path algorithm: Use student B's actual completion efficiency to revise the time weight estimation model in the algorithm for cognitive transition tasks such as "from understanding to analysis", so that the path time prediction recommended for similar students in the next iteration is more accurate.

[0067] Through the detailed explanation of the above eight steps, this embodiment fully demonstrates how the invention, starting with data collection, undergoes deep intelligent analysis, and ultimately forms a precise educational governance closed loop of "analysis-feedback-intervention-re-optimization" for higher mathematics teaching. The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.

Claims

1. A method for student performance analysis and feedback based on natural language processing, characterized in that, Includes the following steps: S1. Periodically or trigger-based access to the academic affairs database and learning management system to complete the cleaning, de-identification, and standardized storage of raw data; S2. Use a pre-trained natural language processing model to process the structured and unstructured data of all students, update the global knowledge graph, and generate a personalized sub-graph for each student with their personal performance feature vector attached. S3. Based on the personalized sub-map output in step S2, run a deep learning model and use source tracing algorithm and capability assessment algorithm to generate a set of quantitative analysis results covering the past, present and future; S4. Receive the quantitative analysis result set from step S3, generate a structured analysis report using a natural language generation algorithm, and push it to students via email or integrated application interface. S5. Based on the quantitative analysis results set of step S3, call the dynamic path optimization algorithm to dynamically plan the optimal learning intervention sequence and generate specific path suggestions including resource list, time arrangement and expected goals; S6. Present the output results of steps S2, S3, S4 and S5 in a graphical manner; S7. Teachers use the management interface to view the student analysis overview and abnormal warning list, review feedback reports, manually adjust and confirm personalized paths, and distribute them to designated students. S8. The system continuously collects students' follow-up actions on the feedback data and uses this data as a feedback loop to optimize the emotional expression fit and path recommendation accuracy of the natural language generation model.

2. The student performance analysis and feedback method based on natural language processing as described in claim 1, characterized in that, In step S2: The pre-trained natural language processing model is a BERT model fine-tuned in conjunction with educational corpora. It is used to extract entities containing attributes such as "mastery of knowledge points", "learning methods" and "attitude performance" and their "containment", "association" and "hindrance" relationships from teacher comments and student texts.

3. The student performance analysis and feedback method based on natural language processing as described in claim 1, characterized in that, In step S3: The deep learning model is a Long Short-Term Memory (LSTM) network model, used to predict students' future performance range and potential risk points. The source tracing algorithm specifically includes the following steps: L1. Map students' answer records to corresponding nodes in the knowledge graph; L2. Calculate the anomaly propagation score of each node using the Graph Convolutional Network (GCN) to identify the preceding dependent nodes in the knowledge graph that are strongly associated with the erroneous knowledge points. L3. Based on the PageRank algorithm, the importance of the identified prerequisite nodes is ranked, and the most critical and urgent weak knowledge chains of students are output.

4. The student performance analysis and feedback method based on natural language processing as described in claim 1, characterized in that, In step S3: The capability assessment algorithm specifically includes the following steps: M1. Analyze the text features of open-ended questions answered by students using natural language processing techniques, including semantic complexity, frequency and diversity of logical connectors. M2. Combining Bloom's taxonomy of cognitive objectives, the above text features are fused with the feature values ​​of students' structured answer results through multimodal feature fusion. M3 uses a classifier to map student performance to a quantitative scoring vector across six cognitive dimensions: memory, comprehension, application, analysis, evaluation, and creation.

5. The student performance analysis and feedback method based on natural language processing as described in claim 1, characterized in that, In step S4: The natural language generation algorithm specifically includes the following steps: N1. Input a structured data template, which contains placeholders and whose content comes from the analysis results of the multi-dimensional analysis engine module. N2. Based on the Transformer-based large language model, and using the structured data template as a foundation, the model selects and fills in the most suitable explanatory sentences and vocabulary according to the student's historical reports, current performance differences, and contextual background, and automatically generates coherent, fluent, and personalized paragraph text. N3. The system uses built-in adjustment parameters for various expression styles to stylize the paragraph text generated by N2, making its language expression more suitable for the psychological characteristics and cognitive levels of students of different age groups.

6. The student performance analysis and feedback method based on natural language processing as described in claim 1, characterized in that, In step S5: The dynamic path optimization algorithm specifically includes the following steps: P1. The set of knowledge nodes currently mastered by students and the set of target nodes expected to be achieved are respectively used as the starting point set and the ending point set of the optimization path; P2. In the knowledge graph, the historical learning speed and knowledge transformation efficiency of individual students are set as edge weight adjustment factors. P3. Using Monte Carlo Tree Search (MCTS) combined with the A* search algorithm, the learning resource sequence with the highest comprehensive evaluation score is found while satisfying the total learning time constraint. The evaluation score is calculated by weighting the smoothness of the overall path difficulty, the logical connection between knowledge points, and the matching degree of similar successful paths in the past.

7. A system using the student performance analysis and feedback method based on natural language processing as described in any one of claims 1-6, characterized in that, It includes a data acquisition module, a feature extraction and knowledge graph construction module, a multi-dimensional analysis engine module, a natural language generation and feedback module, a personalized path recommendation module, a visual interactive dashboard module, and a teacher collaborative intervention module; The data acquisition module is connected to the academic affairs database, the learning management system, and the feature extraction and knowledge graph construction module, and is used to collect multi-source heterogeneous data to form structured data and unstructured data. The feature extraction and knowledge graph construction module is connected to the data acquisition module. It is used to receive data from the data acquisition module, extract entities and relationships from unstructured text data, construct a domain knowledge graph centered on students and associated with "subject knowledge points - ability performance - cognitive state", and extract statistical features of performance data to form a multi-dimensional feature vector. The multi-dimensional analysis engine module is connected to the feature extraction and knowledge graph construction module. It is used to load the knowledge graph and feature vectors constructed by the feature extraction and knowledge graph construction module, predict the future performance range and potential risk points of students, reason and locate the knowledge gaps and conceptual confusions of individual students and groups on the learning path, integrate performance data and text sentiment / cognitive complexity analysis results, and generate a quantitative ability set of students in analysis, application, evaluation and different cognitive dimensions. The natural language generation and feedback module is connected to the multi-dimensional analysis engine module and is used to receive the quantitative analysis results from the multi-dimensional analysis engine module and automatically convert the numerical analysis results and graph visualization information into a structured, context-relevant natural language analysis report. The personalized path recommendation module is connected to the multi-dimensional analysis engine module. Based on the source tracing results of weak knowledge points and the prediction results of the performance evolution trend, it calls the dynamic path optimization algorithm to search and generate a personalized learning intervention path from the student's current cognitive state to the target mastery state on the knowledge graph, forming path nodes of the learning resource sequence. The visual interactive dashboard module is connected to the multi-dimensional analysis engine module, the natural language generation and feedback module, and the personalized path recommendation module, and is used to present the data results in a graphical manner. The teacher collaborative intervention module is connected to the multi-dimensional analysis engine module, the natural language generation and feedback module, the personalized path recommendation module, and the visual interactive dashboard module. It provides teachers with a management interface to view the overall analysis overview and abnormal warning list of all students in the class, review and annotate the feedback reports automatically generated by the system, manually adjust and confirm the paths generated by the personalized path recommendation module, and distribute them to designated students.

8. The system for student performance analysis and feedback based on natural language processing as described in claim 7, characterized in that, The structured data includes students' historical scores for each subject, scores for in-class online quizzes, and completion rates and quality scores for online assignments. The unstructured data includes teachers' online comments text, students' online classroom questions, and online discussion forum content; The entities in the unstructured text data include subject concepts, ability dimensions, and common error types.

9. The system for student performance analysis and feedback based on natural language processing as described in claim 7, characterized in that, The multi-dimensional analysis engine module includes a performance evolution trend prediction sub-module, a weak knowledge point tracing sub-module, and a cognitive ability multi-dimensional assessment sub-module; The performance evolution trend prediction submodule uses a deep learning model to predict students' future performance range and potential risk points; The weak knowledge point tracing submodule is based on knowledge graphs and uses tracing algorithms to reason and locate the knowledge gaps and conceptual confusions of individual students and groups in their learning paths; The multi-dimensional cognitive ability assessment submodule uses an ability assessment algorithm to integrate performance data and text sentiment / cognitive complexity analysis results to generate a quantitative assessment of students' cognitive dimensions in analysis, application, and evaluation.

10. The system for student performance analysis and feedback based on natural language processing as described in claim 7, characterized in that, The natural language analysis report includes a module overview, specific data insights, targeted learning suggestions, and risk warnings and incentives expressed in gentle and encouraging language; The learning resources include, but are not limited to, micro-lessons, exercises, and reading materials; The graphical presentation methods include line graphs of students' personal growth trends, heat maps of weaknesses based on knowledge graphs, radar charts of multidimensional ability assessments, or Gantt charts of personalized learning paths.