A multi-dimensional problem relationship enhanced programming knowledge tracing method

CN122242689APending Publication Date: 2026-06-19JIANGXI NORMAL UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
JIANGXI NORMAL UNIV
Filing Date
2026-04-28
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing programming knowledge tracing methods have limited dimensions in modeling relationships between programming problems, and the information sources for knowledge state modeling are singular, making it difficult to comprehensively depict the evolution of students' programming abilities and predict their performance on subsequent programming problems.

Method used

We employ a multi-dimensional problem-relationship enhancement approach to programming knowledge tracing. By generating embedding vectors in three dimensions—problem-skill, programming problem text, and code semantics—through a problem representation module, and combining a long short-term memory network and a dual attention mechanism, we construct a relationship matrix between programming problems and a sequence of students' historical interactions, thereby achieving a comprehensive characterization of students' knowledge status.

Benefits of technology

It improves the accuracy of predicting students' performance on subsequent programming questions, comprehensively captures the deep internal connections of programming questions, and provides more scientific and personalized learning guidance.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122242689A_ABST
    Figure CN122242689A_ABST
Patent Text Reader

Abstract

This invention discloses a multi-dimensional problem-relationship enhancement method for programming knowledge tracking, belonging to the fields of computer education and artificial intelligence technology. It aims to address the limitations of existing programming knowledge tracking methods in terms of limited problem relationship modeling dimensions and single information sources for knowledge state modeling. The method first acquires a dataset of student programming behaviors and formalizes it into programming interaction events. Then, it constructs a model containing modules for problem representation, problem relationship modeling, and sequence modeling. These three modules generate multi-dimensional embedding vectors for programming problems, construct a multi-dimensional problem relationship matrix, and combine LSTM and dual-attention mechanisms to model student knowledge states and predict answer performance. The method also includes model training and evaluation steps. This invention achieves multi-dimensional problem relationship modeling and dual-attention knowledge state construction, accurately capturing deep associations in programming problems, comprehensively characterizing students' knowledge mastery status, and effectively improving answer prediction accuracy. It is suitable for knowledge tracking scenarios in programming education.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of computer education and artificial intelligence technology, specifically to a programming knowledge tracking method that enhances multi-dimensional problem relationships. Background Technology

[0002] Knowledge tracing is used to dynamically model learners' knowledge mastery status based on their interaction sequences with practice, and to predict learners' performance on future tasks. With the widespread adoption of online learning and assessment systems, programming education has gradually generated large-scale interaction data, including students' question-and-answer records, submitted code, assessment results, and skill tags associated with the questions. Programming knowledge tracing is an extension of knowledge tracing in programming education, aiming to characterize the evolution of students' programming abilities over time by analyzing their historical programming interaction data, and to predict the correctness of their answers to subsequent programming questions.

[0003] In programming knowledge tracing tasks, fully leveraging the correlation between questions is key to improving prediction performance. Students' learning process involves knowledge transfer, meaning their performance on new questions is often related to their past learning and answering experience with similar questions. Therefore, if the model can accurately identify the correlation between the current question and past questions, it can transfer knowledge of similar past questions to the current prediction, thereby improving prediction accuracy.

[0004] However, existing knowledge tracing methods have the following limitations in modeling the relevance of problems: 1. Limited Dimensions in Problem Relationship Modeling. Existing programming knowledge tracking methods typically do not explicitly model the relationships between programming problems, or they only measure them based on a single dimension such as problem-skill tags or problem text, failing to characterize the connections between programming problems from multiple perspectives. In reality, the code of programming problems contains semantic information such as syntax rules, data structures, and algorithmic logic. This information reflects the inherent connections between programming problems in terms of problem-solving approaches and implementation methods. Existing methods do not characterize problem relationships from the semantic dimension of code, thus making it difficult to capture the deep connections between programming problems.

[0005] 2. Limited sources of information for knowledge state modeling. Existing methods typically model students' knowledge states based solely on their historical interaction sequences or the relationships between questions, failing to integrate historical interaction information with the relationship information between questions for comprehensive modeling. This makes it difficult to fully depict students' knowledge mastery. Summary of the Invention

[0006] To address the shortcomings of existing technologies, this invention provides a multi-dimensional problem-relationship enhancement programming knowledge tracing method, which aims to solve the problems mentioned in the background technology.

[0007] To achieve the above objectives, the present invention provides the following technical solution: a multi-dimensional problem-relationship-enhanced programming knowledge tracking method, comprising the following steps: acquiring a student programming behavior dataset, which contains several code submission records of multiple students on multiple programming problems; each code submission record includes a problem description, submitted code, skill tags, and evaluation results; formalizing the code submission records into programming interaction events, representing programming interaction events through quadruples, wherein each programming interaction event includes a programming problem, a set of skills associated with the programming problem, the code text submitted by the student, and the student's answer; and organizing the programming interaction events of each student into a historical programming interaction sequence according to time order. A multi-dimensional problem-relationship-enhanced programming knowledge tracing model is constructed, which consists of a problem representation module, a problem relationship modeling module, and a sequence modeling module. The problem representation module takes the relationship between programming problems and skills, the programming problem, and the code text submitted by the student as input, and generates embedding vectors of programming problems in three dimensions: problem-skill, programming problem text, and code semantics. The embedding vectors in the three dimensions are concatenated and mapped to generate a unified problem embedding vector. The problem relationship modeling module calculates the correlation between programming problems based on three-dimensional embedding vectors and constructs the corresponding problem relationship matrix; The sequence modeling module uses a long short-term memory network to model students' historical interaction sequences based on the unified question embedding vector generated by the question representation module. The question relationship matrix constructed by the question relationship modeling module queries the correlation between the current programming problem and the historical programming problem. Two attention mechanisms are used to construct the knowledge state perceived by historical interaction and the knowledge state perceived by question relationship, and the fusion is used to predict students' answer performance.

[0008] Furthermore, the specific process of generating embedding vectors for programming problems in the problem-skill dimension is as follows: A problem-skill bipartite graph is constructed based on the relationship between programming problems and skills, and the set consisting of all programming problems is defined as follows: , Indicates the first A programming problem, the set of all skills constitutes... , Indicates the first Each skill, defining the vertex set as... Construct a binary adjacency matrix , Represents a binary adjacency matrix The Line number Column element, corresponding to the first A programming problem With the One skill When the first A programming problem With the One skill When a relationship exists, ,otherwise Based on binary adjacency matrix To satisfy all The A programming problem With the One skill As a pair of vertices Define all vertex pairs as edge sets. Based on vertex set and edge set Constructing a problem-skill bipartite graph; employing the PEBG model to analyze the problem-skill bipartite graph. Encode the problem-skill embedding vector set. ,in Indicates the first Problem-skill embedding vectors for programming problems.

[0009] Furthermore, the specific process of generating the embedding vector of the programming problem in the text dimension of the programming problem is as follows: For the first... For each programming problem, a problem description is extracted from the code submission records in the student programming behavior dataset. This description text is then encoded using a GPT-2 encoder. First, the description text is segmented into a token sequence, and each token in the sequence is mapped to a token embedding vector. The average of all token embedding vectors is then calculated to obtain the first... Text embedding vector of a programming problem Generate text embedding vectors for all programming problems, resulting in a set of text embedding vectors. ,in Indicates the first A text embedding vector for a programming problem.

[0010] Furthermore, the specific process of generating the embedding vector of the programming problem in the semantic dimension of the code is as follows: For the first... For each programming problem, the code text submitted by students corresponding to all correct answers is collected. This code text is then encoded using the CodeBERT model to obtain corresponding code vectors. Finally, the average of all code vectors is calculated to obtain the... Code semantic embedding vectors for a programming problem Generate code semantic embedding vectors for all programming problems, resulting in a set of code semantic embedding vectors. ,in Indicates the first The code semantic embedding vector of a programming problem.

[0011] Furthermore, the specific operations of the problem relationship modeling module are as follows: For any two programming problems, calculate the cosine similarity based on their corresponding problem-skill embedding vectors, and construct a problem-skill-based problem relationship matrix; For any two programming problems, calculate the cosine similarity based on their corresponding text embedding vectors, and construct a problem relationship matrix based on the programming problem texts; For any two programming problems, calculate the cosine similarity based on their corresponding code semantic embedding vectors, and construct a problem relationship matrix based on code semantics. The problem relation matrix is ​​analyzed based on problem-skill, problem relation matrix based on programming problem text, and problem relation matrix based on code semantics. Normalization processes map the elements in the matrix to... The interval is used to obtain the normalized problem relation matrix.

[0012] Furthermore, the specific operations of the sequence modeling module are as follows: Based on students' historical programming interaction sequences, a unified question embedding vector is combined with the answer results to construct an interaction embedding, and an interaction embedding sequence is constructed based on the interaction embedding. The initial hidden state of the Long Short-Term Memory (LSTM) network is randomly initialized. The student's interaction embedding sequence is input into the LSM network, and the hidden state is updated step by step at each time step, thereby obtaining the student's historical hidden state sequence. The question-skill embedding vector is mapped to a query vector through a fully connected layer, and the scaled dot product similarity between the query vector and the historical hidden state sequence is calculated to obtain the historical interaction attention coefficient. The knowledge state representation of historical interaction perception is obtained by calculating a weighted sum with exponential decay using the historical interaction attention coefficient and the historical hidden state sequence. For the student's problem-skill embedding vector in the current programming problem, the problem-skill embedding vector of the current programming problem and the corresponding historical programming problems in the student's historical programming interaction sequence are obtained by querying the normalized problem-skill-based problem relationship matrix, the normalized problem relationship matrix based on programming problem text, and the normalized problem relationship matrix based on code semantics. The problem relationship attention coefficient is obtained by averaging the problem relationship in the three dimensions. The problem-relationship awareness knowledge state representation is obtained by calculating a weighted sum with exponential decay using the problem relation attention coefficient and the historical hidden state sequence. The knowledge state representations of historical interaction perception, question relationship perception, and query vector are concatenated to obtain a fused representation. The fused representation is then input into a feedforward neural network to obtain a predicted feature representation. Finally, the predicted feature representation is passed through a Sigmoid activation function to output the predicted probability value of the student's correct answer.

[0013] Furthermore, we obtain a knowledge state representation of historical interaction perception. , is represented as: ; In the formula, This represents the attenuation rate hyperparameter; Represents the time distance vector. Indicates the current time step; This represents element-wise multiplication; express Activation function; Indicates the historical interaction attention coefficient; Represents a sequence of historical hidden states; Obtain the knowledge state representation of problem relationship perception , is represented as: ; In the formula, This represents the attention coefficient related to the problem.

[0014] Furthermore, it also includes model training and evaluation; the student programming behavior dataset is divided into training, validation, and test sets proportionally, with the data of the same student appearing in only one subset; the multi-dimensional problem relationship enhancement programming knowledge tracking model is trained based on the partitioned training set, the prediction loss is calculated using the binary cross-entropy loss function, and the model parameters are updated through the backpropagation algorithm; the model hyperparameters are adjusted based on the validation set and the optimal model is selected; the final prediction performance of the trained model is evaluated based on the test set.

[0015] An electronic device includes a processor, a memory, and a bus, wherein the processor and the memory are connected via the bus, the memory is used to store a set of program code, and the processor is used to call the program code stored in the memory to execute a multi-dimensional problem relationship enhancement programming knowledge tracing method.

[0016] A non-volatile computer storage medium storing computer-executable instructions that execute a programming knowledge tracing method that enhances multi-dimensional problem relationships.

[0017] Compared with existing technologies, the present invention has the following advantages: (1) This invention models the relationships between programming problems in multiple dimensions and integrates a dual attention mechanism to comprehensively model students’ knowledge mastery status. It fundamentally breaks through the technical limitations of existing programming knowledge tracking methods, which have a single dimension for modeling problem relationships and a single source of information for modeling knowledge status. It not only achieves a comprehensive capture of the deep internal relationships of programming problems, but also completes the accurate depiction of students’ dynamic programming knowledge status. It effectively improves the prediction accuracy of students’ subsequent programming problem answers and provides more scientific and reliable technical support for personalized learning guidance in the field of programming education.

[0018] (2) This invention measures the correlation between programming problems from three dimensions: problem-skill, programming problem text, and code semantics, and constructs a corresponding problem relationship matrix. It fully explores the multi-dimensional internal connections between programming problems in terms of skill association, text description, code implementation logic and problem-solving ideas, and makes up for the shortcomings of existing methods that only model problem relationships from a single dimension. It realizes a comprehensive and in-depth characterization of the relationship between programming problems, and lays a solid foundation for the subsequent accurate transfer of students' learning experience of similar historical problems.

[0019] (3) This invention designs two attention mechanisms, historical interaction attention and problem relationship attention, to construct the knowledge state of historical interaction perception and problem relationship perception respectively, and integrates the two into a model. At the same time, it combines the long short-term memory network to perform temporal modeling of the student's historical programming interaction sequence. This allows the modeling of the student's knowledge state to integrate the temporal information of historical interaction and the correlation information between programming problems, which solves the problem of the single source of information for knowledge state modeling in existing methods. It can more comprehensively and realistically reflect the dynamic changes in the student's programming knowledge mastery status as the learning process progresses. Attached Figure Description

[0020] Figure 1 This is a schematic diagram of the programming knowledge tracking model structure for enhancing multi-dimensional problem relationships according to the present invention. Detailed Implementation

[0021] Example 1

[0022] This invention provides a technical solution: a programming knowledge tracing method with enhanced multi-dimensional problem relationships, comprising the following steps: Step S1: Obtain the student programming behavior dataset from the online assessment platform. The dataset uses the CodeWorkout (Spring and Fall 2019) dataset publicly released by the 2nd CSEDM Data Challenge, containing 46,825 code submission records from 246 students on 50 programming problems. Each code submission record includes a problem description, submitted code, skill tags, and the assessment results from the online assessment system. The student programming behavior dataset is preprocessed, removing incomplete data records and retaining only student samples with at least 3 interactions. The code submission records are formalized into programming interaction events, represented by quadruples for each time step. Programming interaction events , Indicates time step Programming problems Indicates and Related skill sets, Indicates the student's time step Submitted code text, Indicates the student's time step The results of the answers, among which This indicates that the answer is correct. Indicates an incorrect answer; organizes each student's programming interaction events into a historical programming interaction sequence according to time sequence, and sets the maximum length of the historical programming interaction sequence to 200.

[0023] Programming Knowledge Tracing Task Definition: Given a student's historical programming interaction sequence The goal of the programming knowledge tracing task is to predict students' progress at each step of the time. Programming problems The probability of answering correctly, i.e., calculating conditional probability. , Indicates time step Programming interaction events, Indicates the student's time step The historical programming interaction sequence.

[0024] Step S2: Construct a multi-dimensional problem-relationship-enhanced programming knowledge tracing model. This model consists of a problem representation module, a problem relationship modeling module, and a sequence modeling module, as follows: Figure 1 As shown.

[0025] Step S3: The problem representation module takes the relationship between programming problems and skills, the programming problem, and the code text submitted by the student as input, and generates embedding vectors of programming problems in three dimensions: problem-skill, programming problem text, and code semantics. The embedding vectors in the three dimensions are concatenated and mapped to generate a unified problem embedding vector.

[0026] The specific operation of the problem representation module in step S3 is as follows: Step S3.1, Problem-Skill Embedding Vector Generation: Construct a problem-skill bipartite graph based on the relationship between programming problems and skills, and define the set consisting of all programming problems as... , Indicates the first A programming problem, the set of all skills constitutes... , Indicates the first Each skill, defining the vertex set as... Construct a binary adjacency matrix , Represents a binary adjacency matrix The Line number Column element, corresponding to the first A programming problem With the One skill When the first A programming problem With the One skill When a relationship exists, ,otherwise Based on binary adjacency matrix To satisfy all The A programming problem With the One skill As a pair of vertices Define all vertex pairs as edge sets. Based on vertex set and edge set Constructing a problem-skill bipartite graph; employing the PEBG model to analyze the problem-skill bipartite graph. Encode the problem-skill embedding vector set. ,in Indicates the first A programming problem's problem-skill embedding vector. Represents the real number field. The dimension of the problem-skill embedding vector; Figure 1 middle, Indicates the first Problem-skill embedding vectors for programming problems.

[0027] Step S3.2, Text Embedding Vector Generation: For the first... For each programming problem, the problem description is extracted from the code submission records in the student programming behavior dataset described in step S1, serving as the description text for that programming problem. The description text is then encoded using a GPT-2 encoder. First, the description text is segmented into a token sequence, and each token in the token sequence is mapped to a token embedding vector. The average of all token embedding vectors is then calculated to obtain the first... Text embedding vector of a programming problem Generate text embedding vectors for all programming problems, resulting in a set of text embedding vectors. ,in Indicates the first The text embedding vector of a programming problem, Indicates the dimension of the text embedding vector; Figure 1 middle, Indicates the first A text embedding vector for a programming problem.

[0028] Step S3.3, Code Semantic Embedding Vector Generation: For the first... For each programming problem, the code text submitted by students corresponding to all correct answers is collected. This code text is then encoded using the CodeBERT model to obtain corresponding code vectors. Finally, the average of all code vectors is calculated to obtain the... Code semantic embedding vectors for a programming problem Generate code semantic embedding vectors for all programming problems, resulting in a set of code semantic embedding vectors. ,in Indicates the first The code semantic embedding vector of a programming problem The dimension of the code semantic embedding vector; Figure 1 middle, Indicates the first The code semantic embedding vector of a programming problem.

[0029] Step S3.4, Unified Problem Embedding Vector Generation: This involves generating the first... The problem-skill embedding vector corresponding to each programming problem Text embedding vectors and code semantic embedding vector The vectors are concatenated and mapped through a fully connected layer to generate a unified question embedding vector, represented as follows: ; In the formula, Indicates the first A unified problem embedding vector corresponding to each programming problem; Indicates a fully connected layer; This indicates a vector concatenation operation.

[0030] Step S4: The problem relationship modeling module calculates the correlation between programming problems based on three-dimensional embedding vectors and constructs the corresponding problem relationship matrix.

[0031] The specific operation of the problem relationship modeling module in step S4 is as follows: Step S4.1: For any two programming problems, calculate the cosine similarity based on their corresponding problem-skill embedding vectors, and construct a problem-skill-based problem relationship matrix. , is represented as: ; In the formula, Represents a problem-skill-based problem relation matrix The Middle Line number The elements of the column represent the first element. The programming problem and the first The relevance of programming problems along the problem-skill dimension; , They represent the first The programming problem and the first Problem-skill embedding vectors for programming problems.

[0032] Step S4.2: For any two programming problems, calculate the cosine similarity based on their corresponding text embedding vectors, and construct a problem relationship matrix based on the programming problem texts. , is represented as: ; In the formula, A problem relation matrix representing a programming problem text The Middle Line number The elements of the column represent the first element. The programming problem and the first The relevance of each programming problem to the textual dimension of the programming problem; , They represent the first The programming problem and the first A text embedding vector for a programming problem.

[0033] Step S4.3: For any two programming problems, calculate the cosine similarity based on their corresponding code semantic embedding vectors, and construct a problem relationship matrix based on code semantics. , is represented as: ; In the formula, Representing the problem relation matrix based on code semantics The Middle Line number The elements of the column represent the first element. The programming problem and the first The relevance of a programming problem to the semantic dimension of code; , They represent the first The programming problem and the first The code semantic embedding vector of a programming problem.

[0034] Step S4.4: Perform the following steps on the problem-skill-based problem relationship matrix, the problem relationship matrix based on programming problem text, and the problem relationship matrix based on code semantics: Normalization processes map the elements in the matrix to... The interval is used to obtain the normalized problem relation matrix, which is represented as: ; In the formula, The first element in the problem relation matrix represents the... Line number Column elements, ; Represents the normalized problem relation matrix The Middle Line number Column elements, , This represents the normalized question-skill-based question relationship matrix. This represents the normalized problem relation matrix based on programming problem text. This represents the normalized problem relation matrix based on code semantics.

[0035] Step S5: The sequence modeling module uses a long short-term memory network to model the student's historical interaction sequence based on the unified question embedding vector generated by the question representation module; it queries the correlation between the current programming problem and the historical programming problem based on the question relationship matrix constructed by the question relationship modeling module, and uses two attention mechanisms to construct the knowledge state perceived by historical interaction and the knowledge state perceived by question relationship, and then predicts the student's answer performance after fusion.

[0036] The specific operations of the sequence modeling module in step S5 are as follows: Step S5.1: Constructing the Interactive Embedding: Based on the students' historical programming interaction sequences, combine the unified question embedding vector with the answer results to construct the interactive embedding, represented as: ; In the formula, Indicates time step Interactive embedding; This represents a vector concatenation operation; Represents a zero vector with the same dimension as the embedding vector of the unified problem; Indicates time step The unified problem embedding vector; Indicates time step The answer result; Figure 1 middle, , Representing time steps , The answer result; , Representing time steps , Interactive embedding; , Representing time steps , The unified problem embedding vector.

[0037] Step S5.2, Historical Interaction Sequence Modeling: Randomly initialize the initial hidden states of the Long Short-Term Memory (LSTM) network. Students will be in time step 1 to Interactive Embedded Sequence Input the data into a Long Short-Term Memory (LSTM) network and update the hidden states progressively at each time step: ; In the formula, Indicates time step The hidden state; Indicates time step Interactive embedding; Indicates time step The hidden states; thus obtaining the student's historical hidden state sequence. , Indicates time step The hidden state; Figure 1 middle, Indicates time step The hidden state.

[0038] Step S5.3, Calculation of Historical Interaction Attention Coefficient: The time step... Problem - Skill Embedding Vector Mapped to time steps via fully connected (FC) layers. query vector and time steps query vector With historical hidden state sequence Calculate the scaled dot product similarity to obtain the historical interaction attention coefficients. , is represented as: ; In the formula, Represents the query vector The dimension; This indicates a fully connected layer.

[0039] Step S5.4, Knowledge State Modeling for Historical Interaction Perception: Using Historical Interaction Attention Coefficients With historical hidden state sequence Calculate the weighted sum with exponential decay to obtain the knowledge state representation of historical interaction awareness. , is represented as: ; In the formula, This represents the attenuation rate hyperparameter; Represents the time distance vector. Indicates the current time step; This represents element-wise multiplication; express Activation function.

[0040] Step S5.5, Calculation of the problem-related attention coefficient: For the student at the current time step The problem of programming - skill embedding vector By querying the normalized question-skill-based question relationship matrix, the normalized question relationship matrix based on programming question text, and the normalized question relationship matrix based on code semantics obtained in step S4.4, the time step is obtained. The problem of programming - skill embedding vector With the student in the historical programming interaction sequence, time step 1 to... The corresponding historical programming problem The relevance of the issues in these three dimensions is denoted as follows: , , ; through the , , Perform average aggregation to obtain the problem-related attention coefficient. , is represented as: .

[0041] Step S5.6, Knowledge State Modeling for Problem Relationship Awareness: Using Problem Relationship Attention Coefficient With historical hidden state sequence Calculate the weighted sum with exponential decay to obtain the knowledge state representation that is aware of problem relationships. , is represented as: .

[0042] Step S5.7, Knowledge State Fusion and Prediction: Represent the knowledge state of historical interactions. Knowledge state representation of problem relationship perception With query vector The three are spliced ​​together to obtain the time step. fusion representation : ; Time step fusion representation The input is fed into a feedforward neural network (FFN) to obtain the time steps. Predictive feature representation : ; In the formula, This represents the weight matrix of the first layer of the feedforward neural network; This represents the weight matrix of the second layer of the feedforward neural network; This represents the bias vector of the first fully connected layer in a feedforward neural network. This represents the bias vector of the second fully connected layer in a feedforward neural network. express Activation function; Finally, the predicted feature representation is passed through the Sigmoid activation function to output the predicted probability: ; In the formula, Indicates time step The predicted probability, i.e., the final output student at time step The predicted probability of answering correctly. .

[0043] The process includes model training and evaluation steps: the dataset is divided into training, validation, and test sets in an 8:1:1 ratio, with each student's data appearing only in one subset; the training set is used to train a programming knowledge tracking model that enhances multi-dimensional problem relationships, using a binary cross-entropy loss function to calculate the prediction loss, and the model parameters are updated using backpropagation; the validation set is used to adjust the model's hyperparameters and select the optimal model; and the test set is used to perform a final prediction performance evaluation of the trained model. (Prediction loss) Represented as: ; In the formula, Indicates time step The true value, i.e., the time step. The actual answer result (0 or 1); Indicates time step The predicted probability; This represents the total number of interactions (sample size) involved in calculating the loss.

[0044] A second embodiment of the present invention also provides an electronic device, including a processor, a memory, and a bus, wherein the processor and the memory are connected via the bus, wherein the memory is used to store a set of program code, and the processor is used to call the program code stored in the memory to execute a multi-dimensional problem relationship enhancement programming knowledge tracing method.

[0045] A third embodiment of the present invention also provides a non-volatile computer storage medium storing computer-executable instructions that execute a programming knowledge tracing method for enhancing multi-dimensional problem relationships.

[0046] Although embodiments of the invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and their equivalents.

Claims

1. A programming knowledge tracing method with enhanced multi-dimensional problem relationships, characterized in that, The process includes the following steps: obtaining a student programming behavior dataset, which contains several code submission records from multiple students on multiple programming problems; each code submission record includes a problem description, submitted code, skill tags, and evaluation results; Code submission records are formalized as programming interaction events, which are represented by quadruples. Each programming interaction event includes a programming question, a set of skills associated with the programming question, the code text submitted by the student, and the student's answer. The programming interaction events of each student are organized into a historical programming interaction sequence in chronological order. A multi-dimensional problem-relationship-enhanced programming knowledge tracing model is constructed, which consists of a problem representation module, a problem relationship modeling module, and a sequence modeling module. The problem representation module takes the relationship between programming problems and skills, the programming problem, and the code text submitted by the student as input, and generates embedding vectors of programming problems in three dimensions: problem-skill, programming problem text, and code semantics. The embedding vectors in the three dimensions are concatenated and mapped to generate a unified problem embedding vector. The problem relationship modeling module calculates the correlation between programming problems based on three-dimensional embedding vectors and constructs the corresponding problem relationship matrix; The sequence modeling module uses a long short-term memory network to model students' historical interaction sequences based on the unified question embedding vector generated by the question representation module. The question relationship matrix constructed by the question relationship modeling module queries the correlation between the current programming problem and the historical programming problem. Two attention mechanisms are used to construct the knowledge state perceived by historical interaction and the knowledge state perceived by question relationship, and the fusion is used to predict students' answer performance.

2. The programming knowledge tracing method for enhancing multi-dimensional problem relationships according to claim 1, characterized in that: The specific process for generating embedding vectors of programming problems in the problem-skill dimension is as follows: Construct a problem-skill bipartite graph based on the relationship between programming problems and skills, and define the set consisting of all programming problems as... , Indicates the first A programming problem, the set of all skills constitutes... , Indicates the first Each skill, defining the vertex set as... Construct a binary adjacency matrix , Represents a binary adjacency matrix The Line number Column element, corresponding to the first A programming problem With the One skill When the first A programming problem With the One skill When a relationship exists, ,otherwise Based on binary adjacency matrix To satisfy all The A programming problem With the One skill As a pair of vertices Define all vertex pairs as edge sets Based on vertex sets and edge set Constructing a problem-skill bipartite graph; employing the PEBG model to analyze the problem-skill bipartite graph. Encode the problem-skill embedding vector set. ,in Indicates the first Problem-skill embedding vectors for programming problems.

3. The programming knowledge tracing method for enhancing multi-dimensional problem relationships according to claim 2, characterized in that: The specific process of generating the embedding vector of the programming problem in the text dimension is as follows: For the first... For each programming problem, a problem description is extracted from the code submission records in the student programming behavior dataset. This description text is then encoded using a GPT-2 encoder. First, the description text is segmented into a token sequence, and each token in the sequence is mapped to a token embedding vector. The average of all token embedding vectors is then calculated to obtain the first... Text embedding vector of a programming problem Generate text embedding vectors for all programming problems, resulting in a set of text embedding vectors. ,in Indicates the first A text embedding vector for a programming problem.

4. The programming knowledge tracing method for enhancing multi-dimensional problem relationships according to claim 3, characterized in that: The specific process of generating the embedding vector of the programming problem in the semantic dimension of the code is as follows: For the first... For each programming problem, the code text submitted by students corresponding to all correct answers is collected. This code text is then encoded using the CodeBERT model to obtain corresponding code vectors. Finally, the average of all code vectors is calculated to obtain the... Code semantic embedding vectors for a programming problem ; Generate code semantic embedding vectors for all programming problems, resulting in a set of code semantic embedding vectors. ,in Indicates the first The code semantic embedding vector of a programming problem.

5. The programming knowledge tracing method for enhancing multi-dimensional problem relationships according to claim 4, characterized in that: The specific operation of the problem relationship modeling module is as follows: For any two programming problems, calculate the cosine similarity based on their corresponding problem-skill embedding vectors, and construct a problem-skill-based problem relationship matrix; For any two programming problems, calculate the cosine similarity based on their corresponding text embedding vectors, and construct a problem relationship matrix based on the programming problem texts; For any two programming problems, calculate the cosine similarity based on their corresponding code semantic embedding vectors, and construct a problem relationship matrix based on code semantics. The problem relation matrix is ​​analyzed based on three types of questions: problem-skill, problem relation matrix based on programming question text, and problem relation matrix based on code semantics. Normalization processes map the elements in the matrix to... The interval is used to obtain the normalized problem relation matrix.

6. The programming knowledge tracing method for enhancing multi-dimensional problem relationships according to claim 5, characterized in that: The specific operations of the sequence modeling module are as follows: Based on students' historical programming interaction sequences, a unified question embedding vector is combined with the answer results to construct an interaction embedding, and an interaction embedding sequence is constructed based on the interaction embedding. The initial hidden state of the Long Short-Term Memory (LSTM) network is randomly initialized. The student's interaction embedding sequence is input into the LSM network, and the hidden state is updated step by step at each time step, thereby obtaining the student's historical hidden state sequence. The question-skill embedding vector is mapped to a query vector through a fully connected layer, and the scaled dot product similarity between the query vector and the historical hidden state sequence is calculated to obtain the historical interaction attention coefficient. The knowledge state representation of historical interaction perception is obtained by calculating a weighted sum with exponential decay using the historical interaction attention coefficient and the historical hidden state sequence. For the student's problem-skill embedding vector in the current programming problem, the problem-skill embedding vector of the current programming problem and the corresponding historical programming problems in the student's historical programming interaction sequence are obtained by querying the normalized problem-skill-based problem relationship matrix, the normalized problem relationship matrix based on programming problem text, and the normalized problem relationship matrix based on code semantics. The problem relationship attention coefficient is obtained by averaging the problem relationship in the three dimensions. The problem-relationship awareness knowledge state representation is obtained by calculating a weighted sum with exponential decay using the problem relation attention coefficient and the historical hidden state sequence. The knowledge state representations of historical interaction perception, question relationship perception, and query vector are concatenated to obtain a fused representation. The fused representation is then input into a feedforward neural network to obtain a predicted feature representation. Finally, the predicted feature representation is passed through a Sigmoid activation function to output the predicted probability value of the student's correct answer.

7. The programming knowledge tracing method for enhancing multi-dimensional problem relationships according to claim 6, characterized in that: Obtain the knowledge state representation of historical interaction perception , is represented as: ; In the formula, This represents the attenuation rate hyperparameter; Represents the time distance vector. Indicates the current time step; This represents element-wise multiplication; express Activation function; Indicates the historical interaction attention coefficient; Represents a sequence of historical hidden states; Obtain the knowledge state representation of problem relationship perception , is represented as: ; In the formula, This represents the attention coefficient related to the problem.

8. The programming knowledge tracing method for enhancing multi-dimensional problem relationships according to claim 6, characterized in that: It also includes model training and evaluation; the student programming behavior dataset is divided into training, validation and test sets according to the proportion, and the data of the same student only appears in one subset; the multi-dimensional problem relationship enhancement programming knowledge tracking model is trained based on the partitioned training set, the prediction loss is calculated using the binary cross-entropy loss function, and the model parameters are updated through the backpropagation algorithm; the model hyperparameters are adjusted based on the validation set and the optimal model is selected; the final prediction performance of the trained model is evaluated based on the test set.

9. An electronic device comprising a processor, a memory, and a bus, wherein the processor and the memory are connected via the bus, wherein, The memory is used to store a set of program code, and the processor is used to call the program code stored in the memory, characterized in that it executes a programming knowledge tracing method for enhancing multi-dimensional problem relationships as described in any one of claims 1-8.

10. A non-volatile computer storage medium storing computer-executable instructions, characterized in that, The computer can execute instructions to perform a programming knowledge tracing method for enhancing multi-dimensional problem relationships as described in any one of claims 1-8.