Test question classification and labeling method and device, electronic equipment and storage medium
By acquiring the text and attribute data of test questions and utilizing anchor block matching and filtering technology in the knowledge point base, efficient classification and annotation of test questions can be achieved, solving the problems of low efficiency and low accuracy in existing technologies, and improving the utilization rate of test question resources and learning efficiency.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- 安徽爱学教育科技有限公司
- Filing Date
- 2022-12-30
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies suffer from low efficiency and accuracy in test item classification and labeling, resulting in low utilization of test item resources. Furthermore, existing classification models lack integration with domain knowledge, making it difficult to accurately and quickly select suitable test items.
By acquiring the text and attribute data of the target test questions, the associated knowledge points are determined, and these are matched with anchor blocks in a pre-built knowledge point library. Based on the matching results and attribute data, knowledge point tags are filtered to achieve accurate classification and labeling of test questions.
It improves the efficiency and accuracy of test question classification and labeling, enabling better utilization of test question resources, providing personalized learning solutions, reducing the randomness of test question selection, and improving learners' learning efficiency.
Smart Images

Figure CN115964493B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of natural language processing technology, and in particular to a method, apparatus, electronic device, and storage medium for classifying and labeling test questions. Background Technology
[0002] In the field of education, different test questions examine different knowledge points and are suitable for different groups of learners. To improve learners' learning efficiency, it is necessary to accurately and quickly select appropriate test questions from a massive pool of questions. Current technology mainly collects test questions offline, classifies them manually, and adds tags manually. The efficiency and accuracy of current technology in classifying and labeling test questions are low, resulting in low utilization of test question resources. Summary of the Invention
[0003] This invention provides a method, apparatus, electronic device, and storage medium for classifying and labeling test questions, in order to solve the problems of low efficiency, low accuracy, and low utilization rate of test question resources in the prior art for classifying and labeling test questions.
[0004] This invention provides a method for classifying and labeling test questions, including:
[0005] Obtain the text data of the target test questions to be classified and labeled, as well as the attribute data of the target test questions;
[0006] Identify the knowledge points associated with the target test question, and assign the target test question to the test question database corresponding to the knowledge points associated with the target test question;
[0007] Determine the target question knowledge set corresponding to the knowledge points associated with the target question, wherein the target question knowledge set includes multiple anchor blocks;
[0008] The text data of the target test question is matched with each anchor block in the target question knowledge set, and knowledge point tags are obtained based on the knowledge points corresponding to each matched anchor block.
[0009] Based on the attribute data of the target test question and the attribute data of the knowledge points corresponding to each matched anchor block, the knowledge point tags are filtered, and the filtered target knowledge point tags are used as tags for the target test question.
[0010] The attribute data of the knowledge points corresponding to each matched anchor block is determined based on the knowledge points corresponding to each matched anchor block and a pre-built knowledge point library, which includes the attribute data of the knowledge points.
[0011] In some embodiments, determining the knowledge points associated with the target test question includes:
[0012] Extract the anchor point data corresponding to the target test question from the text data of the target test question;
[0013] Based on the anchor point data corresponding to the target test question, the knowledge points associated with the target test question are obtained.
[0014] In some embodiments, determining the target question knowledge set corresponding to the knowledge points associated with the target test question includes:
[0015] Based on the knowledge points associated with the target test questions, questions corresponding to the knowledge points associated with the target test questions are selected from the question knowledge base to obtain the target question knowledge set;
[0016] The question knowledge base contains different types of questions, and the different types of questions test different knowledge points;
[0017] The target question knowledge set contains multiple typical questions corresponding to the knowledge points associated with the target question, and each typical question is composed of several anchor blocks.
[0018] In some embodiments, matching the text data of the target test question with each anchor block in the target test question knowledge set includes:
[0019] The text data of the target test question is divided into several segments, and the text data corresponding to each segment is matched with each anchor block in the knowledge set of the target question.
[0020] In some embodiments, obtaining knowledge point tags based on the knowledge points corresponding to each matched anchor block includes:
[0021] Based on the feature data corresponding to each matched anchor block, the knowledge points corresponding to each matched anchor block are determined. The feature data corresponding to each matched anchor block includes the knowledge points associated with each matched anchor block, and the relevance between the knowledge points associated with each matched anchor block and each matched anchor block.
[0022] The knowledge point labels are obtained based on the relevance between the knowledge points associated with each anchor block in the matching and the anchor blocks in the matching.
[0023] In some embodiments, filtering the knowledge point tags based on the attribute data of the target test question and the attribute data of the knowledge points corresponding to each matched anchor block includes:
[0024] Determine the correlation between the attribute data of the target test question and the attribute data of the knowledge points corresponding to each matched anchor block;
[0025] Based on the correlation between the attribute data of the target test question and the attribute data of the knowledge points corresponding to each matched anchor block, the confidence level of each knowledge point tag is calculated.
[0026] If the confidence level of the knowledge point label is greater than or equal to the preset confidence threshold, the knowledge point label is retained; if the confidence level of the knowledge point label is less than the preset confidence threshold, the knowledge point label is removed.
[0027] In some embodiments, after labeling the filtered target knowledge point tags as tags for the target test questions, the method further includes:
[0028] For any unmatched segment in the target question, select the anchor block with the highest relevance to the unmatched segment from the target question knowledge set;
[0029] Supplementary labels are obtained based on the feature data corresponding to the selected anchor blocks;
[0030] The supplementary tags are checked. If the knowledge point tag contains the supplementary tag, the supplementary tag is discarded; if the knowledge point tag does not contain the supplementary tag, the supplementary tag is retained, and the supplementary tag is marked on the target test question.
[0031] The present invention also provides a test question classification and labeling device, comprising:
[0032] The acquisition unit is used to acquire the text data of the target test question to be labeled, as well as the attribute data of the target test question;
[0033] A classification unit is used to determine the knowledge points associated with the target test question and assign the target test question to the test question database corresponding to the knowledge points associated with the target test question;
[0034] A determining unit is used to determine the target question knowledge set corresponding to the knowledge points associated with the target question, wherein the target question knowledge set includes multiple anchor blocks;
[0035] The matching unit is used to match the text data of the target test question with each anchor block in the target question knowledge set, and obtain knowledge point tags based on the knowledge points corresponding to each matched anchor block;
[0036] The annotation unit is used to filter the knowledge point tags based on the attribute data of the target test question and the attribute data of the knowledge points corresponding to each matched anchor block, and to annotate the target knowledge point tags obtained after filtering as the tags of the target test question.
[0037] The attribute data of the knowledge points corresponding to each matched anchor block is determined based on the knowledge points corresponding to each matched anchor block and a pre-built knowledge point library, which includes the attribute data of the knowledge points.
[0038] The present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the test question classification and labeling method as described above.
[0039] The present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the test item classification and labeling method as described above.
[0040] The present invention also provides a computer program product, including a computer program that, when executed by a processor, implements the test question classification and annotation method as described above.
[0041] This invention provides a method, apparatus, electronic device, and storage medium for classifying and labeling test questions. By identifying the knowledge points associated with a target test question, the method classifies the target test question, matches the text data of the target test question with each anchor block in the target question knowledge set to obtain knowledge point tags, and filters the knowledge point tags based on the attribute data of the target test question and the attribute data of the knowledge points corresponding to each matched anchor block to obtain the target knowledge point tags for the target test question. This invention can effectively improve the efficiency and accuracy of test question classification and labeling, facilitating the efficient utilization of test question resources. Attached Figure Description
[0042] To more clearly illustrate the technical solutions in this invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.
[0043] Figure 1 This is one of the flowcharts illustrating the test question classification and annotation method provided in this embodiment of the invention;
[0044] Figure 2 This is a flowchart illustrating the process of determining the knowledge points associated with a target test question and assigning the target test question to the test question database corresponding to the knowledge points associated with the target test question, as provided in an embodiment of the present invention.
[0045] Figure 3 The second flowchart illustrates the test question classification and annotation method provided in this embodiment of the invention.
[0046] Figure 4 This is a schematic diagram of the structure of the test question classification and annotation device provided in an embodiment of the present invention;
[0047] Figure 5 This is a schematic diagram of the structure of an electronic device provided in an embodiment of the present invention. Detailed Implementation
[0048] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention.
[0049] The terms "first," "second," etc., used in the specification and claims of this invention are used to distinguish similar objects and not to describe a specific order or sequence. It should be understood that such terms can be used interchangeably where appropriate so that embodiments of the invention can be implemented in orders other than those illustrated or described herein, and the objects distinguished by "first" and "second" are generally of the same class, not limited in number; for example, the first object can be one or more. Furthermore, in the specification and claims, "and / or" indicates at least one of the connected objects, and the character " / " generally indicates that the preceding and following objects are in an "or" relationship.
[0050] In all subject areas of education, test resources are of paramount importance. How to better identify suitable test questions from a vast pool of resources based on learners' needs, providing personalized learning plans and effectively improving the utilization rate of test resources and learners' learning efficiency, has become a hot topic. Given the context of differentiated instruction, to avoid homogenized and ineffective homework, more precise assignments are needed. However, faced with a massive amount of test resources, teachers cannot accurately and quickly select appropriate questions. Therefore, a comprehensive mechanism is needed to classify and label the vast amount of test questions, providing a basis for adaptive teaching.
[0051] Currently, the collection, classification, and labeling of test questions mainly rely on manual labor, which has certain limitations. It is difficult to collect a large number of test question resources and make reasonable and effective use of them. The classification and labeling of test questions are highly subjective, and the quality judgment of the questions relies solely on personal experience. The efficiency of test question classification and labeling is low, and inaccurate classification and labeling are prone to occur.
[0052] Currently, some methods for automatic question classification and annotation employ classification models commonly used in the field of natural language processing (NLP). However, these general NLP classification models primarily focus on algorithm implementation and improvement, lacking the integration of domain knowledge and neglecting the attribute characteristics of knowledge points and questions. Consequently, their accuracy in classifying and annotating questions is low. Specifically, they lack the extraction and research of question attribute features and semantic information. Furthermore, most existing classification models are binary classification models, while questions are often associated with multiple and varying numbers of knowledge point labels. There are also certain relationships between question attributes and knowledge point attributes. Most existing classification models struggle to capture the semantic information of questions while considering the connections between question attributes and knowledge point attributes. Moreover, existing automatic knowledge point annotation methods rarely focus on multi-label text classification.
[0053] To address this, the present invention provides a method, apparatus, electronic device, and storage medium for classifying and labeling test questions. By identifying the knowledge points associated with a target test question, the invention classifies the target test question, matches the text data of the target test question with each anchor block in the target question knowledge set to obtain knowledge point tags, and filters the knowledge point tags based on the attribute data of the target test question and the attribute data of the knowledge points corresponding to each matched anchor block to obtain the target knowledge point tags for the target test question. This invention can effectively improve the efficiency and accuracy of test question classification and labeling, facilitating the efficient utilization of test question resources.
[0054] Figure 1 This is one of the flowcharts illustrating the test question classification and annotation method provided in this embodiment of the invention. For example... Figure 2 As shown, a method for classifying and labeling test questions is provided, including the following steps: steps 110, 120, 130, 140, and 150. This method's steps are merely one possible implementation of the present invention.
[0055] Step 110: Obtain the text data of the target test questions to be classified and labeled, as well as the attribute data of the target test questions.
[0056] The target test questions can be past exam questions, mock exam questions, or practice questions.
[0057] The text data of the target test questions includes the question stem text data, as well as at least one of the answer text data and answer analysis text data; the attribute data of the target test questions includes feature data in multiple dimensions, such as subject value, version value, test question difficulty value, test question type, regional information, test frequency, score rate, etc.
[0058] For example, the text data of the target test question is: Fill in the blanks in the following sentence: "The Zhuangzi says that those who go to the countryside only need to bring one day's food and return on the same day with a full stomach; those who go a hundred miles away need to " "; and those who go a thousand miles away must " "; The attribute data of the target test question is: Chinese language subject, 2021 version, Hubei region, College Entrance Examination Paper I.
[0059] Specifically, the text data and attribute data of the target test questions can be collected through terminal devices, such as mobile phones, personal computers, and tablets.
[0060] Optionally, the text data of the target test question can be obtained through various means, such as text uploaded by users, text obtained by speech recognition of voice data uploaded by users, or text obtained by image recognition of image data uploaded by users.
[0061] Optionally, the attribute data of the target test questions can be obtained based on big data technology.
[0062] Specifically, the attribute data of target test questions from various channels are processed to calculate or statistically analyze the attribute data of the target test questions, such as frequency of testing, version value, score rate, and difficulty value of the test questions.
[0063] For example, the difficulty level of a target test question can be calculated using the following formula:
[0064] H(x) = m*h1(x) + p*h2(x)
[0065] Wherein, H(x) represents the difficulty value of the target test question, and x represents the target test question; h1(x) is the first difficulty value, which is obtained by classifying the difficulty based on the version value, regional information, knowledge points tested, and difficulty labels of the knowledge points tested; m represents the weight of the first difficulty value, and the weight m is dynamically adjusted according to the difficulty matching between the target test question and the knowledge points tested; h2(x) represents the second difficulty value, which is obtained based on the difficulty coefficient of the target test question; p represents the weight of the second difficulty value, and the weight p is related to the frequency of the target test question and whether the source is diverse. The higher the frequency and the more diverse the source, the larger the weight p.
[0066] The formula for calculating the difficulty coefficient of the target test questions is as follows:
[0067] f(z) = 1 - z / w
[0068] Where f(z) represents the difficulty coefficient of the target test question, z represents the score of the target test question, w is the total score of the target test question, and the difficulty coefficient f(z) of the target test question forms a mapping relationship with the second difficulty value h2(x); the higher the frequency of the target test question and the more diverse the sources, the higher the confidence level of the calculated difficulty coefficient of the target test question.
[0069] Alternatively, if the target test item has a low frequency of occurrence and the confidence level of the calculated difficulty coefficient is low, the difficulty value of the target test item can be calculated using the following formula:
[0070] H(x)=m*h1(x)+p*h2(x)+(1-pm)*h3(y)
[0071] Where h3(y) represents the compensation difficulty value, and y represents a question similar to the target question. The difficulty distribution of questions similar to the target question is statistically analyzed to obtain the compensation difficulty value.
[0072] In this embodiment of the invention, the obtained attribute data of the target test questions is highly accurate and diverse. The obtained text data and attribute data of the target test questions provide a reliable basis for the classification and labeling of the target test questions.
[0073] Step 120: Determine the knowledge points associated with the target test question, and assign the target test question to the test question database corresponding to the knowledge points associated with the target test question.
[0074] The number of knowledge points associated with the target test question is one, two, or more.
[0075] Figure 2 This is a flowchart illustrating the process of determining the knowledge points associated with a target test question and assigning the target test question to the test question database corresponding to the knowledge points associated with the target test question, as provided in an embodiment of the present invention. Figure 2 As shown, step 120 includes steps 121, 122 and 123.
[0076] Step 121: Extract the anchor point data corresponding to the target test question from the text data of the target test question;
[0077] Among them, the anchor data corresponding to the target test question is data related to the content tested by the target test question, and the anchor data contains rich information about the content tested.
[0078] Step 122: Based on the anchor point data corresponding to the target test question, obtain the knowledge points associated with the target test question;
[0079] Step 123: Assign the target test question to the test question database corresponding to the knowledge point associated with the target test question.
[0080] For example, the text data of the target test question is: "Xiaoming deposited 500 yuan into a bank for a fixed term of one year. After one year, he received a total of 510.8 yuan in principal and interest. What is the annual interest rate of the bank for one year?" Combining the text data of the question stem and the text data of the answer to the target test question A, the anchor data corresponding to the target test question is extracted as: "linear equation in one variable, interest rate problem, involving the calculation method of bank annual interest rate". Thus, the knowledge point associated with the target test question is "linear equation in one variable". The target test question is then classified into the test question database that examines the knowledge point "linear equation in one variable".
[0081] In this embodiment of the invention, anchor point data corresponding to the target test question is extracted to obtain the knowledge points associated with the target test question. The target test question is then classified according to the knowledge points associated with it, enabling learners to select appropriate test questions from the corresponding test question database for practice based on the knowledge points they need to learn. This reduces the randomness of test question selection and improves learning efficiency.
[0082] In some embodiments, anchor point data corresponding to the target test question is extracted from the text data of the target test question based on the anchor point classification layer. The process of determining the anchor point classification layer includes:
[0083] Based on the initial anchor point classification layer, the anchor point data corresponding to the test question sample is extracted from the text data of the test question sample;
[0084] Based on the anchor point data corresponding to the test question sample and the actual anchor point data corresponding to the test question sample, the parameters of the initial anchor point classification layer are iterated to obtain the anchor point classification layer.
[0085] Optionally, the text data of the test sample includes question stem text data, and also includes at least one of answer text data and answer explanation text data.
[0086] It should be explained that the anchor data corresponding to the test item samples is the anchor data predicted by the initial anchor classification layer, while the actual anchor data corresponding to the test item samples is the anchor data annotated manually.
[0087] Specifically, based on the anchor data corresponding to the test item samples and the real anchor data corresponding to the test item samples, the loss function value is calculated. Based on the loss function value, the parameters of the initial anchor classification layer are iterated. After the iteration is completed, the anchor classification layer is obtained, which facilitates the prediction of the anchor data corresponding to the target test item and improves the accuracy of anchor data prediction.
[0088] Step 130: Determine the target question knowledge set corresponding to the knowledge points associated with the target question. The target question knowledge set includes multiple anchor blocks.
[0089] Specifically, determining the target question knowledge set corresponding to the knowledge points associated with the target test question includes:
[0090] Based on the knowledge points associated with the target test question, questions corresponding to the knowledge points associated with the target test question are selected from the question knowledge base to obtain the target question knowledge set.
[0091] Optionally, when there are a large number of knowledge points associated with the target question, the knowledge points associated with the target question are filtered based on the degree of correlation between the knowledge points associated with the target question and the target question. Based on the filtered knowledge points associated with the target question, the corresponding questions are selected from the question knowledge base to obtain the target question knowledge set.
[0092] The question knowledge base contains different types of questions, and the different types of questions test different knowledge points.
[0093] Optionally, the questions in the question knowledge base are collected based on the knowledge points specified in the textbook or examination syllabus, and can be past exam questions, test questions, or regular practice questions.
[0094] The target question knowledge set contains multiple typical questions corresponding to the knowledge points associated with the target question, and each typical question is composed of several anchor blocks.
[0095] Optionally, each question in the question knowledge base includes question stem text data, as well as at least one of question knowledge point information, answer text data, and answer analysis text data.
[0096] It needs to be explained that each question in the question knowledge base is pre-divided into several metadata fragments, which are called anchor blocks. Each anchor block is a collection of multi-dimensional feature data. For example, the feature data of an anchor block may include the anchor block's location information, the knowledge points associated with the anchor block, and the degree of correlation between the knowledge points associated with the anchor block and the anchor block.
[0097] Exemplarily, the stem text data of a certain question in the question knowledge base is: Complete the following sentences by filling in the blanks: In "Encouraging Learning" by Xun Kuang, the two sentences "", "" use the perseverance of a lame horse as an analogy to emphasize that learning must persevere; the answer text data of this question is: A lame horse travels a thousand li through perseverance; the corresponding anchor blocks for this question are "Complete the following sentences by filling in the blanks", "Encouraging Learning by Xun Kuang", "A lame horse travels a thousand li through perseverance", and "Use the perseverance of a lame horse as an analogy to emphasize that learning must persevere"; the characteristic data corresponding to the anchor block "Complete the following sentences by filling in the blanks" includes: located at the beginning of the stem, knowledge point examination method - fill in the blanks by rote; the characteristic data corresponding to the anchor block "Encouraging Learning by Xun Kuang" includes: located before the blank part in the stem, associated knowledge point - "Encouraging Learning by Xun Kuang"; the characteristic data corresponding to the anchor block "A lame horse travels a thousand li through perseverance" includes: located in the blank part of the stem, examined knowledge point - "A lame horse travels a thousand li through perseverance"; the characteristic data corresponding to the anchor block "Use the perseverance of a lame horse as an analogy to emphasize that learning must persevere" includes: located after the blank part in the stem, semantic explanation of the examined knowledge point.
[0098] In an embodiment of the present invention, by determining the target question knowledge set corresponding to the knowledge point associated with the target question, multi-dimensional characteristic data of each anchor block in the target question knowledge set can be obtained, facilitating the matching of the anchor block with the text data of the target question to obtain the knowledge point label corresponding to the target question.
[0099] Step ①: Match the text data of the target question with each anchor block in the target question knowledge set, and obtain a knowledge point label based on the knowledge points corresponding to the matched anchor blocks.
[0100] In some embodiments, the matching of the text data of the target question with each anchor block in the target question knowledge set includes:
[0101] Segment the text data of the target question into several segments, and match the text data corresponding to each segment with each anchor block in the target question knowledge set.
[0102] Specifically, segment the stem text data of the target question and other types of text data of the target question, such as answer text data and answer analysis text data; obtain the information representation of the target question, such as I = {s1, s2,..., s n}, the target question consists of n segments, and s n is the nth segment of the target question.
[0103] Optionally, if the question stem text data of the target test question is a fill-in-the-blank question type, the question stem text data of the target test question is supplemented based on the answer text data or answer analysis text data of the target test question, and then the complete question stem text data of the target test question is segmented.
[0104] In some embodiments, obtaining knowledge point tags based on the knowledge points corresponding to each matched anchor block includes:
[0105] Based on the feature data corresponding to each matched anchor block, the knowledge points corresponding to each matched anchor block are determined. The feature data corresponding to each matched anchor block includes the knowledge points associated with each matched anchor block, and the relevance between the knowledge points associated with each matched anchor block and each matched anchor block.
[0106] The knowledge point labels are obtained based on the relevance between the knowledge points associated with each anchor block in the matching and the anchor blocks in the matching.
[0107] The feature data corresponding to the anchor block includes the anchor block's text data, the anchor block's location information, the knowledge points associated with the anchor block, and the degree of association between the knowledge points associated with the anchor block and the anchor block.
[0108] Optionally, the knowledge points associated with each matched anchor block are sorted according to the relevance between the knowledge points associated with the matched anchor block and the matched anchor block. The higher the relevance between the knowledge points associated with the matched anchor block and the matched anchor block, the higher the ranking of the knowledge points associated with the matched anchor block. The first N knowledge points associated with the matched anchor blocks are selected according to the ranking order, and the selected first N knowledge points associated with the matched anchor blocks are determined as the knowledge points corresponding to each matched anchor block, thereby obtaining the knowledge point labels.
[0109] In this embodiment of the invention, each segment of the text data of the target test question is matched with each anchor block in the target question knowledge set, and the knowledge points associated with each matched anchor block are filtered out to obtain invalid knowledge points with low relevance to each matched anchor block, thereby obtaining knowledge point tags.
[0110] It should be noted that this step only matches each segment of the target test question's text data with each anchor block in the target question's knowledge set. Each segment of the target test question's text data can match multiple anchor blocks. However, not all anchor blocks are valuable. For example, if the target test question is a junior high school test question testing a certain knowledge point, it may involve some basic knowledge points from elementary school. Although the basic knowledge points from elementary school are matched, they can be considered as secondary related knowledge points. Therefore, it is also necessary to consider the attribute data of the target test question and the attribute data of the knowledge points corresponding to each matched anchor block.
[0111] Step 150: Based on the attribute data of the target test question and the attribute data of the knowledge points corresponding to each matched anchor block, filter the knowledge point tags and use the filtered target knowledge point tags as tags for the target test question.
[0112] The attribute data of the knowledge points corresponding to each matched anchor block is determined based on the knowledge points corresponding to each matched anchor block and a pre-built knowledge point library, which includes the attribute data of the knowledge points.
[0113] Optionally, the pre-built knowledge point base is a knowledge point base built based on the textbook. The pre-built knowledge point base includes each knowledge point in the textbook, as well as the attribute data of each knowledge point.
[0114] Specifically, based on the exam syllabus, the attribute data of knowledge points includes attribute data in multiple dimensions, such as the subject to which the knowledge point belongs, the importance of the knowledge point, the difficulty value of the knowledge point, and the examination points of the knowledge point.
[0115] In some embodiments, step 150 includes steps 151, 152, and 153.
[0116] Step 151: Determine the correlation between the attribute data of the target test question and the attribute data of the knowledge points corresponding to each matched anchor block;
[0117] Specifically, the attribute data set A of the target test questions is A = {a1, a2, ..., a...} m}, where a1, a2, ..., a m The set of attribute data B represents the subject value, version value, difficulty value of the target test question, and the knowledge point corresponding to the i-th matched anchor block. i ={b i1 ,b i2 ,…,b in}, where b i1 ,b i2 ,…,b in These represent the subject to which the knowledge point belongs, its importance, and its difficulty level, respectively. 'i' represents the i-th matched anchor block. The attribute data set A of the target question and the attribute data set B of the knowledge points corresponding to each matched anchor block are statistically obtained. i Relevance.
[0118] Step 152: Based on the correlation between the attribute data of the target test question and the attribute data of the knowledge points corresponding to each matched anchor block, calculate the confidence level of each knowledge point label;
[0119] Specifically, the higher the correlation between the attribute data of the target question and the attribute data of the knowledge point corresponding to the matched anchor block, the higher the confidence level of the knowledge point label obtained based on the matched anchor block.
[0120] Step 153: If the confidence level of the knowledge point label is greater than or equal to the preset confidence threshold, retain the knowledge point label; if the confidence level of the knowledge point label is less than the preset confidence threshold, remove the knowledge point label.
[0121] In this embodiment of the invention, by determining the correlation between the attribute data of the target test question and the attribute data of the knowledge points corresponding to each matched anchor block, the confidence level of each knowledge point label is calculated, thereby filtering the knowledge point labels and improving the accuracy of test question labeling.
[0122] It should be noted that each embodiment of the present invention can be freely combined, rearranged, or executed individually, and does not need to rely on or depend on a fixed execution order.
[0123] Figure 3 This is the second flowchart illustrating the test question classification and annotation method provided in this embodiment of the invention, as shown below. Figure 3 As shown, in some embodiments, step 150 is followed by steps 310, 320 and 330.
[0124] Step 310: For the unmatched segments in the target test questions, select the anchor blocks with the highest relevance to the unmatched segments from the target question knowledge set;
[0125] Specifically, the relevance between the selected anchor block and the unmatched fragment is obtained based on the feature data of the selected anchor block and the text data and attribute data of the unmatched fragment.
[0126] Optionally, for the unmatched segments in the target test question, based on the anchor block position information, anchor blocks that are close to the position information of the unmatched segments in the target test question are initially selected from the target question knowledge set, and the anchor block with the highest relevance to the unmatched segments is selected from the initially selected anchor blocks.
[0127] For example, if the unmatched segment in the target question is located at the beginning of the question stem text data, then based on the anchor block position information, anchor blocks located at the beginning of the corresponding question stem text data and near the beginning are initially selected, and then the anchor block with the highest relevance to the unmatched segment is selected from the initially selected anchor blocks.
[0128] Step 320: Based on the feature data corresponding to the selected anchor blocks, obtain supplementary labels;
[0129] Step 330: Detect the supplementary tag. If the knowledge point tag contains the supplementary tag, discard the supplementary tag; if the knowledge point tag does not contain the supplementary tag, retain the supplementary tag and mark the target question with the supplementary tag.
[0130] In this embodiment of the invention, for unmatched segments in the target test question, the anchor block with the highest relevance to the unmatched segment is selected from the target question knowledge set to obtain supplementary tags. The supplementary tags are then detected, and valid supplementary tags are retained, making the knowledge point tags marked on the target test question richer and more comprehensive, and preventing the omission of knowledge point tags.
[0131] The test question classification and annotation device provided in the embodiments of the present invention is described below. The test question classification and annotation device described below can be referred to in correspondence with the test question classification and annotation method described above.
[0132] Figure 4 This is a schematic diagram of the structure of the test question classification and annotation device provided in an embodiment of the present invention, as shown below. Figure 4 As shown, the device 400 includes:
[0133] The acquisition unit 410 is used to acquire the text data of the target test question to be labeled, as well as the attribute data of the target test question;
[0134] The classification unit 420 is used to determine the knowledge points associated with the target test question and assign the target test question to the test question database corresponding to the knowledge points associated with the target test question;
[0135] The determining unit 430 is used to determine the target question knowledge set corresponding to the knowledge points associated with the target question, wherein the target question knowledge set includes multiple anchor blocks;
[0136] The matching unit 440 is used to match the text data of the target test question with each anchor block in the target question knowledge set, and obtain knowledge point tags based on the knowledge points corresponding to each matched anchor block;
[0137] The annotation unit 450 is used to filter the knowledge point tags based on the attribute data of the target test question and the attribute data of the knowledge points corresponding to each matched anchor block, and to annotate the target knowledge point tags obtained after filtering as the tags of the target test question.
[0138] The attribute data of the knowledge points corresponding to each matched anchor block is determined based on the knowledge points corresponding to each matched anchor block and a pre-built knowledge point library, which includes the attribute data of the knowledge points.
[0139] Optionally, in some embodiments, determining the knowledge points associated with the target test question includes:
[0140] Extract the anchor point data corresponding to the target test question from the text data of the target test question;
[0141] Based on the anchor point data corresponding to the target test question, the knowledge points associated with the target test question are obtained.
[0142] Optionally, determining the target question knowledge set corresponding to the knowledge points associated with the target question includes:
[0143] Based on the knowledge points associated with the target test questions, questions corresponding to the knowledge points associated with the target test questions are selected from the question knowledge base to obtain the target question knowledge set;
[0144] The question knowledge base contains different types of questions, and the different types of questions test different knowledge points;
[0145] The target question knowledge set contains multiple typical questions corresponding to the knowledge points associated with the target question, and each typical question is composed of several anchor blocks.
[0146] Optionally, matching the text data of the target test question with each anchor block in the target question knowledge set includes:
[0147] The text data of the target test question is divided into several segments, and the text data corresponding to each segment is matched with each anchor block in the knowledge set of the target question.
[0148] Optionally, the step of obtaining knowledge point tags based on the knowledge points corresponding to each matched anchor block includes:
[0149] Based on the feature data corresponding to each matched anchor block, the knowledge points corresponding to each matched anchor block are determined. The feature data corresponding to each matched anchor block includes the knowledge points associated with each matched anchor block, and the relevance between the knowledge points associated with each matched anchor block and each matched anchor block.
[0150] The knowledge point labels are obtained based on the relevance between the knowledge points associated with each anchor block in the matching and the anchor blocks in the matching.
[0151] Optionally, the filtering of knowledge point tags based on the attribute data of the target test question and the attribute data of the knowledge points corresponding to each matched anchor block includes:
[0152] Determine the correlation between the attribute data of the target test question and the attribute data of the knowledge points corresponding to each matched anchor block;
[0153] Based on the correlation between the attribute data of the target test question and the attribute data of the knowledge points corresponding to each matched anchor block, the confidence level of each knowledge point tag is calculated.
[0154] If the confidence level of the knowledge point label is greater than or equal to the preset confidence threshold, the knowledge point label is retained; if the confidence level of the knowledge point label is less than the preset confidence threshold, the knowledge point label is removed.
[0155] Optionally, the test item classification and labeling device further includes:
[0156] The selection unit is used to select the anchor block with the highest relevance to the unmatched segment in the target question from the target question knowledge set;
[0157] The generation unit is used to obtain supplementary labels based on the feature data corresponding to the selected anchor blocks;
[0158] The detection unit is used to detect the supplementary label, and if the knowledge point label contains the supplementary label, discard the supplementary label; if the knowledge point label does not contain the supplementary label, retain the supplementary label, and mark the target test question with the supplementary label.
[0159] It should be noted that the test question classification and labeling device provided in this embodiment of the invention can implement all the method steps implemented in the above-mentioned test question classification and labeling method embodiments, and can achieve the same technical effect. Here, the parts that are the same as those in the method embodiments and the beneficial effects will not be described in detail.
[0160] Figure 5 An example is a schematic diagram of the physical structure of an electronic device, such as... Figure 5As shown, the electronic device may include: a processor 510, a communication interface 520, a memory 530, and a communication bus 540, wherein the processor 510, the communication interface 520, and the memory 530 communicate with each other through the communication bus 540. The processor 510 can call logical instructions in the memory 530 to execute a test question classification and labeling method. This method includes: acquiring text data of a target test question to be classified and labeled, and attribute data of the target test question; determining the knowledge points associated with the target test question, and assigning the target test question to a test question database corresponding to the knowledge points associated with the target test question; determining a target question knowledge set corresponding to the knowledge points associated with the target test question, the target question knowledge set including multiple anchor blocks; matching the text data of the target test question with each anchor block in the target question knowledge set, and obtaining knowledge point tags based on the knowledge points corresponding to each matched anchor block; filtering the knowledge point tags based on the attribute data of the target test question and the attribute data of the knowledge points corresponding to each matched anchor block, and labeling the filtered target knowledge point tags as tags for the target test question; wherein the attribute data of the knowledge points corresponding to each matched anchor block is determined based on the knowledge points corresponding to each matched anchor block and a pre-built knowledge point database, the knowledge point database including the attribute data of the knowledge points.
[0161] Furthermore, the logical instructions in the aforementioned memory 530 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, essentially, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0162] On the other hand, the present invention also provides a computer program product, which includes a computer program that can be stored on a non-transitory computer-readable storage medium. When the computer program is executed by a processor, the computer can execute the test question classification and annotation methods provided in the above-described method embodiments. The method includes: acquiring text data of target test questions to be classified and annotated, and attribute data of the target test questions; determining the knowledge points associated with the target test questions, and assigning the target test questions to the test question database corresponding to the knowledge points associated with the target test questions; determining the target question knowledge set corresponding to the knowledge points associated with the target test questions, wherein the target questions... The knowledge set includes multiple anchor blocks; the text data of the target test question is matched with each anchor block in the target test question knowledge set, and knowledge point tags are obtained based on the knowledge points corresponding to each matched anchor block; the knowledge point tags are filtered based on the attribute data of the target test question and the attribute data of the knowledge points corresponding to each matched anchor block, and the filtered target knowledge point tags are used as tags for the target test question; wherein, the attribute data of the knowledge points corresponding to each matched anchor block is determined based on the knowledge points corresponding to each matched anchor block and a pre-built knowledge point library, and the knowledge point library includes the attribute data of the knowledge points.
[0163] In another aspect, the present invention also provides a non-transitory computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements the test question classification and annotation methods provided in the above-described method embodiments. The method includes: acquiring text data of target test questions to be classified and annotated, and attribute data of the target test questions; determining the knowledge points associated with the target test questions, and assigning the target test questions to a test question database corresponding to the knowledge points associated with the target test questions; determining a target question knowledge set corresponding to the knowledge points associated with the target test questions, the target question knowledge set including multiple anchor blocks; and assigning the target question knowledge set to a test question database. The text data of the target test question is matched with each anchor block in the target question knowledge set. Based on the knowledge points corresponding to each matched anchor block, knowledge point tags are obtained. Based on the attribute data of the target test question and the attribute data of the knowledge points corresponding to each matched anchor block, the knowledge point tags are filtered, and the filtered target knowledge point tags are used as tags for the target test question. The attribute data of the knowledge points corresponding to each matched anchor block is determined based on the knowledge points corresponding to each matched anchor block and a pre-built knowledge point database, which includes the attribute data of the knowledge points.
[0164] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.
[0165] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.
[0166] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims
1. A test item classification and tagging method, characterized by, include: Obtain the text data of the target test questions to be classified and labeled, as well as the attribute data of the target test questions; Identify the knowledge points associated with the target test question, and assign the target test question to the test question database corresponding to the knowledge points associated with the target test question; Determine the target question knowledge set corresponding to the knowledge points associated with the target question, wherein the target question knowledge set includes multiple anchor blocks; The multiple anchor blocks are obtained by segmenting each question in the target question knowledge set; Based on the feature data corresponding to each anchor block in the target question knowledge set, the text data of the target question is matched with each anchor block in the target question knowledge set, and knowledge point tags are obtained based on the knowledge points corresponding to each matched anchor block. The feature data corresponding to the anchor block includes: the text data of the anchor block, the location information of the anchor block, the knowledge points associated with the anchor block, and the degree of correlation between the knowledge points associated with the anchor block and the anchor block; Based on the attribute data of the target test question and the attribute data of the knowledge points corresponding to each matched anchor block, the knowledge point tags are filtered, and the filtered target knowledge point tags are used as tags for the target test question. The attribute data of the knowledge points corresponding to each matched anchor block is determined based on the knowledge points corresponding to each matched anchor block and a pre-built knowledge point library, which includes the attribute data of the knowledge points.
2. The test item classification and labeling method according to claim 1, characterized in that, The determination of the knowledge points associated with the target test question includes: Extract the anchor point data corresponding to the target test question from the text data of the target test question; Based on the anchor point data corresponding to the target test question, the knowledge points associated with the target test question are obtained.
3. The test question classification and annotation method according to claim 1, characterized in that, The determination of the target question knowledge set corresponding to the knowledge points associated with the target test question includes: Based on the knowledge points associated with the target test questions, questions corresponding to the knowledge points associated with the target test questions are selected from the question knowledge base to obtain the target question knowledge set; The question knowledge base contains different types of questions, and the different types of questions test different knowledge points; The target question knowledge set contains multiple typical questions corresponding to the knowledge points associated with the target question, and each typical question is composed of several anchor blocks.
4. The test item classification and labeling method according to claim 1, characterized in that, The step of matching the text data of the target test question with each anchor block in the target test question knowledge set includes: The text data of the target test question is divided into several segments, and the text data corresponding to each segment is matched with each anchor block in the knowledge set of the target question.
5. The test item classification and labeling method according to claim 1, wherein The knowledge point tags are obtained based on the knowledge points corresponding to each anchor block in the matching, including: Based on the feature data corresponding to each matched anchor block, the knowledge points corresponding to each matched anchor block are determined. The feature data corresponding to each matched anchor block includes the knowledge points associated with each matched anchor block, and the relevance between the knowledge points associated with each matched anchor block and each matched anchor block. The knowledge point labels are obtained based on the relevance between the knowledge points associated with each anchor block in the matching and the anchor blocks in the matching.
6. The test item classification and labeling method according to claim 1, characterized in that, The filtering of knowledge point tags based on the attribute data of the target test question and the attribute data of the knowledge points corresponding to each matched anchor block includes: Determine the correlation between the attribute data of the target test question and the attribute data of the knowledge points corresponding to each matched anchor block; Based on the correlation between the attribute data of the target test question and the attribute data of the knowledge points corresponding to each matched anchor block, the confidence level of each knowledge point tag is calculated. If the confidence level of the knowledge point label is greater than or equal to the preset confidence threshold, the knowledge point label is retained; if the confidence level of the knowledge point label is less than the preset confidence threshold, the knowledge point label is removed.
7. The test question classification and labeling method according to any one of claims 2-6, characterized in that, After labeling the selected target knowledge point tags as tags for the target test questions, the method further includes: For any unmatched segment in the target question, select the anchor block with the highest relevance to the unmatched segment from the target question knowledge set; Supplementary labels are obtained based on the feature data corresponding to the selected anchor blocks; The supplementary tags are checked. If the knowledge point tag contains the supplementary tag, the supplementary tag is discarded; if the knowledge point tag does not contain the supplementary tag, the supplementary tag is retained, and the supplementary tag is marked on the target test question.
8. A test item classification and tagging apparatus characterized by comprising: include: The acquisition unit is used to acquire the text data of the target test question to be labeled, as well as the attribute data of the target test question; A classification unit is used to determine the knowledge points associated with the target test question and assign the target test question to the test question database corresponding to the knowledge points associated with the target test question; A determining unit is used to determine the target question knowledge set corresponding to the knowledge points associated with the target question, wherein the target question knowledge set includes multiple anchor blocks; The multiple anchor blocks are obtained by segmenting each question in the target question knowledge set; The matching unit is used to match the text data of the target question with each anchor block in the target question knowledge set based on the feature data corresponding to each anchor block in the target question knowledge set, and to obtain knowledge point tags based on the knowledge points corresponding to each matched anchor block. The feature data corresponding to the anchor block includes: the text data of the anchor block, the location information of the anchor block, the knowledge points associated with the anchor block, and the degree of correlation between the knowledge points associated with the anchor block and the anchor block; The annotation unit is used to filter the knowledge point tags based on the attribute data of the target test question and the attribute data of the knowledge points corresponding to each matched anchor block, and to annotate the target knowledge point tags obtained after filtering as the tags of the target test question. The attribute data of the knowledge points corresponding to each matched anchor block is determined based on the knowledge points corresponding to each matched anchor block and a pre-built knowledge point library, which includes the attribute data of the knowledge points.
9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the test item classification and labeling method as described in any one of claims 1 to 7.
10. A non-transitory computer-readable storage medium having stored thereon a computer program, characterized in that, When the computer program is executed by the processor, it implements the test item classification and labeling method as described in any one of claims 1 to 7.