A programming knowledge tracing method based on code prediction enhancement
By combining a multi-head attention mechanism and a long short-term memory network with historical programming problems and answers, this method predicts the probability of learners answering target programming problems correctly, solving the problem of low accuracy in programming performance prediction in existing technologies and achieving more accurate programming performance prediction.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HUAZHONG NORMAL UNIV
- Filing Date
- 2025-06-16
- Publication Date
- 2026-06-23
AI Technical Summary
Existing technologies have low accuracy in predicting programming performance, cannot effectively capture the syntax and logic information of code, and cannot reflect the cognitive evolution of learners during multiple submissions.
By employing a multi-head attention mechanism and long short-term memory network approach, this method predicts the probability of learners answering target programming questions correctly by using historical programming questions and answer information. Combined with learners' current knowledge status and historical interaction data, it achieves accurate prediction of code characteristics and performance.
It improves the accuracy and robustness of programming performance prediction, better reflects learners' individual programming skills and common code patterns, and enhances the accuracy and generalization ability of prediction.
Smart Images

Figure CN120706473B_ABST
Abstract
Description
Technical Field
[0001] This application belongs to the field of computer technology, and more specifically, relates to a programming knowledge tracking method based on code prediction enhancement. Background Technology
[0002] Benefiting from the development of deep learning, numerous deep knowledge tracing models have been proposed. However, most of these classic knowledge tracing (KT) methods are geared towards general academic disciplines and are not specifically designed for programming performance prediction. This makes it difficult for them to capture the syntactic and logical information of code in programming performance prediction, and they cannot reflect the cognitive evolution of learners during multiple code submissions. This type of method is called Result-Driven State-Based KT, which models solely based on the answer results and predicts performance by constructing features of the learner's potential cognitive state. Subsequently, researchers have incorporated semantic and structural information of code for programming problems, designing specialized neural network architectures to achieve more accurate performance predictions. This type of method is called Semantic-Driven State-Based KT. Its core idea is to enable the model to perceive the semantic information of the code, thereby more accurately characterizing the learner's programming cognitive state. On many real-world programming datasets, this method has achieved significantly better results than Result-Driven State-Based KT.
[0003] However, these methods are essentially end-to-end model architectures, directly using test-taking performance as input to construct the learner's potential cognitive state and predict their future performance. In programming, this model has significant shortcomings: a learner's ability to correctly solve a programming problem depends primarily on whether their submitted code passes test cases, rather than solely on inferring their cognitive level based on past test-taking performance. These deficiencies in existing technologies lead to low predictive accuracy. How to accurately predict learners' programming performance is a pressing technical problem that needs to be solved in this field. Summary of the Invention
[0004] In view of the shortcomings of the existing technology, the purpose of this application is to achieve accurate prediction of learners' programming performance.
[0005] To achieve the above objectives, in a first aspect, this application provides a programming knowledge tracing method based on code prediction enhancement, the method comprising:
[0006] Based on historical programming questions and corresponding historical programming answers, the embedding representation is extracted to obtain the historical question embedding representation and the historical answer embedding representation. The historical programming answer is the source code input by the test subject (e.g., a student learning programming) for the historical programming question. The source code is used to solve the programming question.
[0007] Based on the embedding representations of historical questions, historical answers, and the target programming question (which is different from historical questions), a multi-head attention mechanism model (including the multi-head attention mechanism layer MHA and a fully connected layer) is used to predict the embedding representation of the source code input by the test subject for the target programming question. The predicted embedding representation of the source code is used as the programming answer prediction result corresponding to the target programming question.
[0008] Based on the embedded representation of the target programming question, the predicted programming answer, and the current knowledge state of the test subject, the probability of the test subject answering the target programming question correctly is predicted. The current knowledge state is obtained by analyzing the temporal evolution of the knowledge state based on the test subject's historical programming answers.
[0009] In one possible implementation, the above-described extraction of the embedding representation includes:
[0010] Input historical programming questions into the large language model, extract the original embedding representation of the historical programming questions through the large language model, and input historical programming answers into the large language model, extract the original embedding representation of the historical programming answers through the large language model;
[0011] Dimensionality reduction is performed on the original embedding representation of historical programming questions to obtain the embedding representation of historical questions. Dimensionality reduction is also performed on the original embedding representation of historical programming answers to obtain the embedding representation of historical answers.
[0012] Optionally, the above method for extracting the embedding representation of historical programming problems can also be applied to target programming problems to extract the embedding representation of the target programming problem.
[0013] In one possible implementation, the prediction of the probability of the test subject answering the target programming question correctly, based on the embedded representation of the target programming question, the predicted programming answer result, and the current knowledge state of the test subject, includes:
[0014] Based on the embedded representation of the target programming question, the programming answer prediction result, and the current knowledge state of the tested object, feature concatenation is performed to obtain the concatenated features;
[0015] The concatenated features are input into a multilayer perceptron, and the multilayer perceptron performs dimensionality compression to obtain the dimensionally compressed features output by the multilayer perceptron.
[0016] Input the compressed features into the activation function and obtain the correct answer probability output by the activation function.
[0017] In one possible implementation, before predicting the probability of the tested object answering the target programming question correctly, the following is also included:
[0018] Based on the correctness or incorrectness of the source code (i.e., the programming answer) input by the tested object in response to the programming question at the current time step, the embedded representation of the programming answer corresponding to the programming question is projected to obtain the correctness or incorrectness information at the current time step. ;
[0019] Based on the knowledge state of the tested object in the previous time step Embedded representation of programming problems at the current time step And the correctness information at the current time step By analyzing the temporal evolution of knowledge states through a Long Short-Term Memory (LSTM) network, the knowledge state of the tested object at the current time step is obtained, which serves as the current knowledge state of the tested object. .
[0020] In one possible implementation, the model used to extract the embedding representation is based on a large language model and a linear dimensionality reduction module. The large language model is used to extract the original embedding representation, and the linear dimensionality reduction module is used to reduce the dimensionality of the original embedding representation.
[0021] The linear dimensionality reduction module is obtained through the following steps:
[0022] Based on programming question samples, programming answer samples, and sample labels, an evaluation simulator is trained by minimizing the binary cross-entropy loss function.
[0023] The sample label is used to indicate whether the programming answer sample can solve (pass) the programming problem sample. The evaluation simulator is built by cascading linear dimensionality reduction module and multilayer perceptron. The multilayer perceptron is used to predict the probability that the programming answer can solve (pass) the programming problem based on the embedding representation corresponding to the programming problem and the embedding representation corresponding to the programming answer.
[0024] In one possible implementation, the multi-head attention mechanism model is trained through the following steps:
[0025] Based on programming question samples, programming answer samples, and sample labels, a multi-head attention mechanism model is trained using the total loss function of code prediction.
[0026] The total loss function for code prediction is determined by the following formula:
[0027] ;
[0028] ;
[0029] ;
[0030] ;
[0031] in, This represents the total loss function for code prediction. These are hyperparameters; different samples are numbered according to the order in which they were answered. Indicates the first Embedded representation of programming problem samples at +1 time step Indicates the first Embedded representation of programming response samples at +1 time step Indicates based on the previous The embedding representations corresponding to the samples at each time step and The predicted programming responses based on the multi-head attention mechanism model. This represents the sigmoid function. This represents the inner product operation. express The first in the corresponding set of correct codes item, express The first in the corresponding error code set item, It is a triplet The set, Indicates the number of samples. Indicates the first The labels corresponding to the samples at each time step This indicates that the multilayer perceptron in the evaluation simulator is based on and It predicts the probability that the predicted programming answer will solve the programming problem.
[0032] In one possible implementation, the prediction model used to predict the probability of the tested object answering the target programming question correctly is obtained through the following steps:
[0033] Based on programming question samples, programming answer samples, and sample labels, a Long Short-Term Memory (LSTM) network and a prediction model are trained using a binary cross-entropy loss function.
[0034] Among them, the Long Short-Term Memory (LSTM) network is used to obtain the knowledge state of the tested object at the current time step based on the knowledge state of the tested object at the previous time step, the embedded representation of the programming question at the current time step, and the right and wrong information at the current time step, and use it as the current knowledge state of the tested object.
[0035] The prediction model is used to predict the probability of the test subject answering the target programming question correctly based on the embedded representation of the target programming question, the programming answer prediction result, and the current knowledge state of the test subject. The target programming question is the programming question in the next time step.
[0036] The binary cross-entropy loss function used to train the Long Short-Term Memory (LSTM) network and the prediction model It is determined by the following formula:
[0037] ;
[0038] ;
[0039] ;
[0040] in, The prediction model is for the first... The prediction results given for the programming problem samples (target programming problem) at each time step (next time step), where MLP stands for Multilayer Perceptron. This represents the current knowledge state of the tested object as given by the Long Short-Term Memory (LSTM) network. This indicates a splicing operation.
[0041] Secondly, this application provides a programming knowledge tracing device based on code prediction enhancement, comprising:
[0042] The embedding representation extraction module is used to extract embedding representations based on historical programming questions and corresponding historical programming answers. It obtains historical question embedding representations and historical answer embedding representations. The historical programming answers are the source code input by the test subject for the historical programming questions. The source code is used to solve the programming questions.
[0043] The code prediction module is used to predict the embedding representation of the source code input by the test subject to the target programming question based on the embedding representation of historical questions, historical answers, and the embedding representation of the target programming question. The predicted embedding representation of the source code is used as the programming answer prediction result corresponding to the target programming question.
[0044] The programming performance prediction module is used to predict the probability of the test subject answering the target programming question correctly based on the embedded representation of the target programming question, the programming answer prediction result, and the current knowledge state of the test subject. The current knowledge state is obtained by analyzing the temporal evolution of the knowledge state based on the test subject's historical programming answers.
[0045] Thirdly, this application provides an electronic device, comprising: at least one memory for storing a program; and at least one processor for executing the program stored in the memory, wherein when the program stored in the memory is executed, the processor is configured to execute the method described in the first aspect or any possible implementation thereof.
[0046] Fourthly, this application provides a computer-readable storage medium storing a computer program that, when run on a processor, causes the processor to perform the method described in the first aspect or any possible implementation thereof.
[0047] Overall, the technical solutions conceived in this application have the following beneficial effects compared with the prior art:
[0048] This application decomposes the programming prediction task into code prediction (predicting the embedded representation of the source code input by the test subject for the target programming question, i.e., the programming answer prediction result) and performance prediction: First, it uses the code information submitted by the learner (test subject) in the past to predict the possible code characteristics at present (programming answer prediction result); second, it combines the learner's current cognitive state with the historical answer performance; finally, it combines the programming answer prediction result, the current cognitive state and the problem (question) characteristics to jointly predict the learner's performance, that is, to predict whether the learner can solve (pass) the target programming question. The target programming question serves as a test case. By comprehensively considering the influence of the learner's past answer performance on the target programming question and the probability that the programming answer prediction result can solve the target programming question, it realizes programming knowledge tracking based on code prediction enhancement, which can accurately predict the learner's programming performance. Attached Figure Description
[0049] Figure 1 This is a flowchart illustrating the programming knowledge tracing method based on code prediction enhancement provided in an embodiment of this application;
[0050] Figure 2 This is a schematic diagram of the structure of the programming knowledge tracing device based on code prediction enhancement provided in the embodiments of this application;
[0051] Figure 3 This is a schematic diagram of the structure of the electronic device provided in the embodiments of this application. Detailed Implementation
[0052] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.
[0053] In the embodiments of this application, the terms "exemplary" or "for example" are used to indicate that something is an example, illustration, or description. Any embodiment or design that is described as "exemplary" or "for example" in the embodiments of this application should not be construed as being more preferred or advantageous than other embodiments or design. Specifically, the use of the terms "exemplary" or "for example" is intended to present the relevant concepts in a specific manner.
[0054] In the description of the embodiments of this application, unless otherwise stated, "multiple" means two or more, for example, multiple processing units means two or more processing units, multiple elements means two or more elements, etc.
[0055] To achieve accurate programming performance prediction, a more reasonable approach is to decompose the programming prediction task into two parts: code prediction and performance prediction. First, use learners' historical code submissions to predict potential current code characteristics. Second, combine historical answer performance to construct the learner's cognitive state. Finally, combine these two with question features to jointly predict whether the learner can answer the question correctly. This type of method can be called Semantic-Driven Code-Based Knowledge Tracing (SCKT).
[0056] The core idea of SCKT focuses on code feature prediction; however, this task is inherently more difficult than performance prediction and requires balancing both (A) individuality and (B) commonality. First, regarding (A) individuality, it's crucial to fully consider the individualized programming skills and algorithmic thinking hidden within historical code sequences. Dynamically extracting skill features relevant to the target problem from historical code sequences is a fundamental and challenging problem in code prediction. Second, regarding (B) commonality, similar programming problems often implicitly share common code patterns; for a given problem, there are often only a limited number of mainstream correct solutions. If only the learner's individualized programming patterns are considered, prediction results can easily become severely biased when the learner's interaction data is too short. Therefore, the SCKT method needs to balance the learner's individualized programming characteristics with the common code patterns of the problem in code feature prediction to improve the accuracy and robustness of the prediction.
[0057] The embodiments of this application are described below with reference to the accompanying drawings.
[0058] like Figure 1 As shown, SCKT consists of three main modules, each corresponding to a training step.
[0059] Step 1 involves encoding the question text and student code using a large language model to obtain their high-dimensional semantic embeddings, fully capturing the code's logical structure and the question's semantic information. Based on this, an evaluation simulation model is trained, and its internal dimensionality reduction module is used to extract features from the original embeddings generated by the large model, providing a unified, low-dimensional, and semantically preserved representation for subsequent stages.
[0060] Step 2 involves obtaining the embedded representations of the questions and code, and then training a code predictor that comprehensively considers students' individual programming habits and common problem-solving patterns in their historical responses. This module predicts students' potential performance on new questions (in this application, the questions used to test students can also be referred to as "questions") based on their historical response sequences.
[0061] Step 3: Model the student's knowledge state based on the embedded representation of the student's answer sequence provided in Step 1, and use the code predictor in Step 2 to generate the answer code for the question to be predicted, so as to achieve accurate prediction based on the student's possible answer patterns and knowledge state.
[0062] Finally, a student programming performance prediction model was trained based on the above modules.
[0063] The following is a detailed explanation of Steps 1 through 3.
[0064] Step 1: Obtain the Embedding.
[0065] The CodeQwen1.5-7B-chat model (a code-specific large language model) was selected to generate the embeddings of the problem descriptions and source code.
[0066] Given a segmented input sequence (This can be a problem description or student code). The model calculates the context-aware representation matrix of the sequence through a multi-layer self-attention mechanism: Where n represents the length of the sequence and D represents the hidden layer dimension of the Transformer model.
[0067] To extract a fixed-dimensional vector from the representation matrix H as a global semantic representation of the entire input sequence, an aggregation function is applied to all row vectors of H. In this application, mean pooling is used as the aggregation function. To obtain the final sequence embedding .
[0068] Subsequently, an evaluation simulator was designed to extract features based on the high-dimensional semantic embedding of the questions and code extracted by the Large Language Model (LLM), and to determine whether the target code can solve the corresponding programming problem.
[0069] This evaluation simulator primarily performs two tasks: First, while traditional online judge (OJ) systems rely on predefined test cases to determine code correctness, this simulator analyzes the semantic embedding of questions and code to capture their logical similarities, enabling code pass / fail prediction even without test cases; Second, This indicates that the input programming problem is fed into the LLM, and the original embedding representation is extracted by the LLM. This represents the input of student programming responses into the LLM, the raw embedding representation extracted by the LLM, and the raw embedding representation generated by the LLM. Often, there is a lot of redundant information. To improve prediction accuracy and computational efficiency, an independent dimensionality reduction network was constructed to extract more discriminative low-dimensional feature representations.
[0070] Specifically, a dual-channel linear dimensionality reduction module is first designed (one channel for reducing the dimensionality of programming problems, and the other channel for reducing the dimensionality of programming answers). A linear dimensionality reduction module is a tool that maps (projects) high-dimensional data to a low-dimensional space through mathematical transformations, aiming to reduce data complexity and computational costs while preserving as much important information and structure as possible from the original data.
[0071] ;
[0072] In the formula, and The projection matrix is used. The dimensionality-reduced problem and code are concatenated to form a joint feature vector. The data is then fed into a multilayer perceptron (MLP) classifier for prediction. in, , For learnable parameters, For the Sigmoid function, and Indicates bias. Prediction result. Representation model to code Can the predicted probability be obtained for the corresponding problem? Training is performed by minimizing the binary cross-entropy loss function.
[0073] ;
[0074] Different samples are numbered according to the order in which they were answered, with subscripts. This indicates the time step number; each time step corresponds to a programming problem and a programming answer. Indicates the first The joint feature vector corresponding to samples at each time step (a sample at each time step consists of programming question samples and corresponding programming answer samples). Indicates the first The label corresponding to the sample at each time step indicates whether the programming answer can solve (pass) the programming problem. This indicates the number of samples. This optimization process not only improves prediction performance but also effectively filters out features highly correlated with code correctness judgments, suppressing redundant and noisy information.
[0075] During the training process in Step 1, the parameters of the dual-channel linear dimensionality reduction module and the multilayer perceptron (MLP) are adjusted based on the loss function value.
[0076] By using the dimensionality reduction matrix after training, feature extraction for semantic embedding of large models can be achieved. .in, and The problem and code embeddings for LLM output (dimension D), and and This is the extracted low-dimensional representation (dimension d). Subscript This indicates the time step number; each time step corresponds to a programming problem and a programming answer. Indicates the input number The programming problems at each time step are mapped to an LLM, and the original embedding representation is extracted from the LLM. Indicates the input number The programming responses at each time step are fed into the LLM, and the original embedding representation is extracted from the LLM. To The embedding representation obtained after dimensionality reduction To The embedding representation obtained by dimensionality reduction.
[0077] Step 2: Code Predictor.
[0078] Based on the embeddings of the question text and code information from Step 1, a code predictor with dual constraints of individuality and commonality based on an attention mechanism was designed. This jointly models individual historical behavior and group statistical priors to achieve fine-grained code embedding prediction. Given a student's historical interaction sequence... Predicting students' responses to questions Embedded representation of the answer code :
[0079] ;
[0080] Code prediction employs a multi-head attention (MHA) mechanism, generating code embeddings by leveraging the semantic association between the target question and historical interaction data. Model inputs include: It is the embedding of the target problem. B represents the batch size (BatchSize), and represents the embedded sequence of historical problems. This represents the corresponding historical code embedding sequence. The calculation can be represented as: Where h is the number of attention heads, The output is the transformation matrix. Attention output. A weighted semantic representation of the target problem and historical code is captured. The attention output is then mapped to the target code embedding space through a fully connected layer. in, and These are the learnable parameters for the fully connected layer. For code embedding in the prediction, construct a semantic alignment loss to maximize the semantic similarity between the predicted code and the actual answer code for the target question: ,in It is the sigmoid function, and the inner product operation <·,·> measures the semantic similarity of two embeddings.
[0081] The aforementioned personalized modeling only focuses on individual students' historical programming behavior, neglecting the significant common answer characteristics present in programming exercises. For example, most programming problems have only a limited number of mainstream correct solutions, and their common error patterns also share certain commonalities, such as incorrect boundary condition handling. Another issue is that when students' historical problem interaction data is sparse, relying solely on personalized attention mechanisms can lead to prediction bias due to insufficient historical information. Therefore, a contrastive loss based on group perception is designed to improve the prediction of code... Proximate similar correct code patterns in the embedding space. Stay away from error code patterns : .in It is all possible triples A set; maintain a positive sample pool for each problem q. (Correct Code Set) and Negative Sample Pool (Collection of error codes); at each prediction point Randomly sample positive samples and negative samples Calculate their similarity respectively.
[0082] To evaluate the quality of the generated code predictions, the predicted code embeddings are concatenated. and the corresponding question embedding get The code embedding is input into the evaluation simulation model for prediction, and its loss directly reflects the quality of the code embedding prediction. Therefore, the total loss for code prediction is designed as follows:
[0083] ;
[0084] in These are hyperparameters that represent the weights of the loss function.
[0085] During the training process in Step 2, the parameters of the multi-head attention mechanism are adjusted based on the loss function value.
[0086] Step 3: Predicting Programming Performance.
[0087] In programming practice scenarios, the semantic features of the problem (such as problem description and input / output requirements) combined with the answer code can serve as a fine-grained indicator of knowledge state. Therefore, this model constructs a knowledge state representation by combining the semantic encoding of the problem text and the answer code information to capture the student's level of understanding of the knowledge points. To better track the student's knowledge state, two projection matrices are used. and Projection is performed based on whether the answer is correct or incorrect, and the correctness information is introduced into the code embedding. First, the code is projected differently based on whether the answer is correct or incorrect:
[0088] ;
[0089] In the formula, This indicates that the programming answer can solve (pass) the corresponding programming problem. This indicates that the programming solution cannot solve (cannot pass) the corresponding programming problem. To better preserve information from the original code embeddings, a projection matrix is used during training. and Not involved in training. To capture the temporal evolution of students' knowledge states, this application uses a Long Short-Term Memory (LSTM) network to model students' knowledge states, with the following input features: The LSTM calculation at each time step t is as follows: ,in It refers to the student's current knowledge level. It represents the student's knowledge status at the previous time step.
[0090] To predict whether a student can answer the target question, the embedded representation of the code predicted by the code predictor in Step 2 is concatenated with the student's knowledge state: Then, a multilayer perceptron (MLP) was used to predict the results. ,in For a feature distiller consisting of a four-layer fully connected network, progressive dimensionality compression is used. The sigmoid activation function maps features to the probability of a correct answer in the interval [0,1].
[0091] Here, binary cross-entropy is chosen as the loss function for SCKT:
[0092] ;
[0093] in It is the first The true label of each sample This is the prediction result. N is the number of samples predicted.
[0094] During the training process in Step 3, the parameters of LSTM and MLP are adjusted based on the loss function value.
[0095] To comprehensively evaluate model performance, three baseline methods were compared on three real-world programming datasets: traditional knowledge tracing models, code-enhanced knowledge tracing models, and knowledge tracing methods based on pre-trained code large language models.
[0096] Classic knowledge tracing methods: This category includes several classic knowledge tracing methods based on temporal modeling: DKT models the evolution of knowledge states using RNNs, DKVMN utilizes memory networks to store skill mastery, AKT employs self-attention mechanisms and adaptive knowledge retrieval respectively, and SimpleKT improves efficiency by simplifying interaction modeling. DTransformer builds models from the question level to the knowledge level, explicitly diagnosing students' knowledge proficiency on each question. LPKT monitors knowledge states by directly modeling students' learning process. ReKT proposes a simplified structure by optimizing the complexity of existing methods while retaining strong predictive power. extraKT focuses on expanding the context window to handle longer interaction sequences.
[0097] Programming knowledge tracing methods: These methods improve traditional knowledge tracing frameworks by integrating code structure information. CodeDKT is the first to combine Code2Vec with an attention mechanism to extract code features. ECKT, based on CodeDKT, uses a large model to extract knowledge components from the code. PST constructs a Code Information Graph and a Code Tracing Graph to model code evolution. PDKT uses a fine-tuned CodeBERT to extract code features. SQKT automatically extracts skill information from student questions. However, SQKT's performance did not reach its theoretical limit due to the lack of student question information in this dataset and the use of only the basic structure.
[0098] The method based on pre-trained language models leverages the powerful representational capabilities of pre-trained code models and combines them with LSTM to track students' knowledge states. CodeBERT (CB), GraphCodeBERET (GCB), CodeQwen1.5-7B-Chat, and Qwen2.5-Coder-7B-Instruct were selected as encoding tools to embed the question text and student code. LSTM was then used to plot the learner's detailed learning trajectory and model their knowledge state.
[0099] Table 1 below shows the performance comparison of SCKT with 17 baseline models on three programming datasets (data in Java, Python, and C collected from AIZU.org). Experimental results show that SCKT achieves state-of-the-art performance on all datasets, with lead times for AUC and ACC ranging from 0.16% to 6.06%, fully validating its effectiveness and generalization ability in programming knowledge tracing tasks.
[0100] Table 1 Performance Comparison Table
[0101]
[0102] The code prediction-enhanced programming knowledge tracing device provided in this application is described below. The code prediction-enhanced programming knowledge tracing device described below can be referred to in correspondence with the code prediction-enhanced programming knowledge tracing method described above.
[0103] Figure 2 This is a schematic diagram of the structure of the programming knowledge tracing device based on code prediction enhancement provided in the embodiments of this application, as shown below. Figure 2 As shown, the device includes: an embedded representation extraction module 10, a code prediction module 20, and a programming performance prediction module 30. Wherein:
[0104] The embedding representation extraction module 10 is used to extract the embedding representation based on the historical programming questions and the corresponding historical programming answers, and to obtain the historical question embedding representation and the historical answer embedding representation. The historical programming answer is the source code input by the test object for the historical programming questions, and the source code is used to solve the programming questions.
[0105] The code prediction module 20 is used to predict the embedding representation of the source code input by the tested object for the target programming question based on the embedding representation of historical questions, the embedding representation of historical answers, and the embedding representation of the target programming question through a multi-head attention mechanism model. The predicted embedding representation of the source code is used as the programming answer prediction result corresponding to the target programming question.
[0106] The programming performance prediction module 30 is used to predict the probability of the test subject answering the target programming question correctly based on the embedded representation of the target programming question, the programming answer prediction result, and the current knowledge state of the test subject. The current knowledge state is obtained by analyzing the temporal evolution of the knowledge state based on the test subject's historical programming answers.
[0107] It is understood that the detailed functional implementation of each of the above units / modules can be found in the description in the aforementioned method embodiments, and will not be repeated here.
[0108] It should be understood that the above-described device is used to execute the methods in the above embodiments. The implementation principle and technical effect of the corresponding program modules in the device are similar to those described in the above methods. The working process of the device can be referred to the corresponding process in the above methods, and will not be repeated here.
[0109] Based on the methods in the above embodiments, this application provides an electronic device. Figure 3 This is a schematic diagram of the structure of the electronic device provided in the embodiments of this application, such as... Figure 3 As shown, the electronic device may include a processor 810, a communications interface 820, a memory 830, and a communication bus 840, wherein the processor 810, the communications interface 820, and the memory 830 communicate with each other through the communication bus 840. The processor 810 can call logical instructions in the memory 830 to execute the methods in the above embodiments.
[0110] Furthermore, the logical instructions in the aforementioned memory 830 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application.
[0111] Based on the methods in the above embodiments, this application provides a computer-readable storage medium storing a computer program that, when run on a processor, causes the processor to execute the methods in the above embodiments.
[0112] Based on the methods in the above embodiments, this application provides a computer program product that, when run on a processor, causes the processor to execute the methods in the above embodiments.
[0113] It is understood that the processor in the embodiments of this application can be a central processing unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. A general-purpose processor can be a microprocessor or any conventional processor.
[0114] The method steps in this application embodiment can be implemented in hardware or by a processor executing software instructions. The software instructions can consist of corresponding software modules, which can be stored in random access memory (RAM), flash memory, read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), registers, hard disks, portable hard disks, CD-ROMs, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor, enabling the processor to read information from and write information to the storage medium. Of course, the storage medium can also be a component of the processor. The processor and the storage medium can reside in an ASIC.
[0115] In the above embodiments, implementation can be achieved entirely or partially through software, hardware, firmware, or any combination thereof. When implemented using software, it can be implemented entirely or partially as a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of this application are generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer-readable storage medium or transmitted through the computer-readable storage medium. The computer instructions can be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium that a computer can access or a data storage device such as a server or data center that integrates one or more available media. The available medium can be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid-state disk (SSD)).
[0116] It is understood that the various numerical designations used in the embodiments of this application are merely for the convenience of description and are not intended to limit the scope of the embodiments of this application.
[0117] Those skilled in the art will readily understand that the above description is merely a preferred embodiment of this application and is not intended to limit this application. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of this application should be included within the scope of protection of this application.
Claims
1. A programming knowledge tracing method based on code prediction enhancement, characterized in that, include: Based on historical programming questions and corresponding historical programming answers, the embedding representation is extracted to obtain the historical question embedding representation and the historical answer embedding representation. The historical programming answer is the source code input by the test subject for the historical programming question, and the source code is used to solve the programming question. Based on the embedding representations of historical questions, historical answers, and the target programming question, a multi-head attention mechanism model is used to predict the embedding representation of the source code input by the test subject for the target programming question. The predicted embedding representation of the source code is used as the programming answer prediction result corresponding to the target programming question. Based on the embedded representation of the target programming question, the programming answer prediction results, and the current knowledge state of the test subject, the probability of the test subject answering the target programming question correctly is predicted. The current knowledge state is obtained by analyzing the temporal evolution of the knowledge state based on the test subject's historical programming answers. The prediction of the probability of the test subject answering the target programming question correctly, based on the embedded representation of the target programming question, the programming answer prediction result, and the current knowledge state of the test subject, includes: Based on the embedded representation of the target programming question, the programming answer prediction result, and the current knowledge state of the tested object, feature concatenation is performed to obtain the concatenated features; The concatenated features are input into a multilayer perceptron, and the multilayer perceptron performs dimensionality compression to obtain the dimensionally compressed features output by the multilayer perceptron. Input the compressed features into the activation function and obtain the correct answer probability output by the activation function; The model used for extracting the embedding representation is based on a large language model and a linear dimensionality reduction module. The large language model is used to extract the original embedding representation, and the linear dimensionality reduction module is used to reduce the dimensionality of the original embedding representation. The linear dimensionality reduction module is obtained through the following steps: Based on programming question samples, programming answer samples, and sample labels, an evaluation simulator is trained by minimizing the binary cross-entropy loss function. Among them, the sample label is used to indicate whether the programming answer sample can solve the programming problem sample. The evaluation simulator is constructed by a cascaded linear dimensionality reduction module and a multilayer perceptron. The multilayer perceptron is used to predict the probability that the programming answer can solve the programming problem based on the embedding representation corresponding to the programming problem and the embedding representation corresponding to the programming answer. The prediction model used to predict the probability of the tested object answering the target programming question correctly is obtained through the following steps: Based on programming question samples, programming answer samples, and sample labels, a Long Short-Term Memory (LSTM) network and a prediction model are trained using a binary cross-entropy loss function. Among them, the Long Short-Term Memory (LSTM) network is used to obtain the knowledge state of the tested object at the current time step based on the knowledge state of the tested object at the previous time step, the embedded representation of the programming question at the current time step, and the right and wrong information at the current time step, and use it as the current knowledge state of the tested object. The prediction model is used to predict the probability of the test subject answering the target programming question correctly based on the embedded representation of the target programming question, the predicted programming answer, and the current knowledge state of the test subject. The target programming question is the programming question at the next time step.
2. The programming knowledge tracing method based on code prediction enhancement according to claim 1, characterized in that, The extracted embedding representation includes: Input historical programming questions into the large language model, extract the original embedding representation of the historical programming questions through the large language model, and input historical programming answers into the large language model, extract the original embedding representation of the historical programming answers through the large language model; Dimensionality reduction is performed on the original embedding representation of historical programming questions to obtain the embedding representation of historical questions. Dimensionality reduction is also performed on the original embedding representation of historical programming answers to obtain the embedding representation of historical answers.
3. The programming knowledge tracing method based on code prediction enhancement according to claim 1, characterized in that, Before predicting the probability of a test subject answering a target programming question correctly, the following steps are also included: Based on the correctness or incorrectness of the source code input by the tested object in response to the programming question at the current time step, the embedded representation of the programming answer corresponding to the programming question is projected to obtain the correctness or incorrectness information at the current time step; Based on the knowledge state of the tested object at the previous time step, the embedded representation of the programming question at the current time step, and the correctness information at the current time step, the temporal evolution of the knowledge state is analyzed through a Long Short-Term Memory (LSTM) network to obtain the knowledge state of the tested object at the current time step, which is then used as the current knowledge state of the tested object.
4. The programming knowledge tracing method based on code prediction enhancement according to claim 1, characterized in that, The multi-head attention mechanism model is obtained through the following steps: Based on programming question samples, programming answer samples, and sample labels, a multi-head attention mechanism model is trained using the total loss function of code prediction. The total loss function for code prediction is determined by the following formula: ; ; ; ; in, This represents the total loss function for code prediction. These are hyperparameters; different samples are numbered according to the order in which they were answered. Indicates the first Embedded representation of programming problem samples at +1 time step Indicates the first Embedded representation of programming response samples at +1 time step Indicates based on the previous The embedding representations corresponding to the samples at each time step and The predicted programming responses based on the multi-head attention mechanism model. This represents the sigmoid function. This represents the inner product operation. express The first in the corresponding set of correct codes item, express The first in the corresponding error code set item, It is a triplet The set, Indicates the number of samples. Indicates the first The labels corresponding to the samples at each time step This indicates that the multilayer perceptron in the evaluation simulator is based on and It predicts the probability that the predicted programming answer will solve the programming problem.
5. The programming knowledge tracing method based on code prediction enhancement according to claim 4, characterized in that, The binary cross-entropy loss function used to train the Long Short-Term Memory (LSTM) network and the prediction model It is determined by the following formula: ; ; ; in, The prediction model is for the first... The prediction results given for the programming problem samples at each time step, where MLP stands for Multilayer Perceptron. This represents the current knowledge state of the tested object as given by the Long Short-Term Memory (LSTM) network. This indicates a splicing operation.
6. A programming knowledge tracing device based on code prediction enhancement, characterized in that, The application of the programming knowledge tracing method based on code prediction enhancement as described in any one of claims 1-5 includes: The embedding representation extraction module is used to extract embedding representations based on historical programming questions and corresponding historical programming answers. It obtains historical question embedding representations and historical answer embedding representations. The historical programming answers are the source code input by the test subject for the historical programming questions. The source code is used to solve the programming questions. The code prediction module is used to predict the embedding representation of the source code input by the test subject to the target programming question based on the embedding representation of historical questions, historical answers, and the embedding representation of the target programming question. The predicted embedding representation of the source code is used as the programming answer prediction result corresponding to the target programming question. The programming performance prediction module is used to predict the probability of the test subject answering the target programming question correctly based on the embedded representation of the target programming question, the programming answer prediction result, and the current knowledge state of the test subject. The current knowledge state is obtained by analyzing the temporal evolution of the knowledge state based on the test subject's historical programming answers.
7. An electronic device, characterized in that, include: At least one memory for storing computer programs; At least one processor is configured to execute a program stored in the memory, wherein when the program stored in the memory is executed, the processor is configured to perform the method as described in any one of claims 1-5.
8. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is run on the processor, it causes the processor to perform the method as described in any one of claims 1-5.