A method, apparatus, and computer storage medium for classifying questions.
By employing multimodal feature fusion and mathematical logic reasoning, this method addresses the issues of limited feature extraction and insufficient logical understanding in mathematical problem classification. It achieves a deep understanding of mathematical problems and accurate classification across multiple dimensions, thereby improving the accuracy and robustness of the classification.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- IFLYTEK CO LTD
- Filing Date
- 2026-03-25
- Publication Date
- 2026-06-30
AI Technical Summary
Existing technologies cannot effectively handle the multimodal features in mathematical problems, resulting in single feature extraction for classification tasks, superficial understanding of the internal logic of the problems, and poor synergy in multi-dimensional label prediction.
A multimodal feature fusion method is adopted to obtain the text, formula and graphic features of mathematical problems, perform logical reasoning using a preset mathematical logic chain, and classify them through multi-task output branches. Combined with a mathematical dictionary and invalid data cleaning rules, a fuzzy sample classifier and rejection mechanism are used to achieve accurate classification in multiple dimensions.
It achieves a deep understanding and multi-dimensional accurate classification of mathematical problems, overcoming the problems of single features and insufficient logical understanding in traditional methods, and improving the accuracy and robustness of classification.
Smart Images

Figure CN122309745A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of natural language processing technology, and in particular to a question classification method, a question classification device, and a computer storage medium. Background Technology
[0002] In the process of digitalization and intelligentization in education, the automated and refined classification of massive amounts of question resources (such as question banks, test papers, and teaching materials) is the foundation for building intelligent education systems. For mathematics, the classification of questions is not just a simple text classification, but also requires understanding their inherent mathematical logic, formula structure, and possible graphical information.
[0003] Mathematical problems are often multimodal, potentially including text descriptions, mathematical formulas, and geometric figures. General text models cannot effectively process and integrate key features such as formulas and figures, resulting in current mathematical problem classification tasks exhibiting limited feature extraction, superficial understanding of the problem's internal logic, and poor synergy in multi-dimensional label prediction. Summary of the Invention
[0004] To address the aforementioned technical problems, this application proposes a question classification method, a question classification device, and a computer storage medium.
[0005] To address the aforementioned technical problems, this application proposes a question classification method, which includes: Get information about math problems; Extract mathematical features from the mathematical problem information, wherein the mathematical features include text features, formula features, and / or graphic features; The mathematical features are logically reasoned according to a preset mathematical logic chain; The logical reasoning results are input into the multi-task output branch to determine the question classification result of the mathematical question information.
[0006] The acquisition of mathematical problem information includes: The optical recognition text, mathematical problem slices, and original mathematical text from the question bank are extracted from the mathematical problem information.
[0007] The problem classification method, after obtaining the math problem information, further includes: Establish a dedicated mathematical dictionary database; The mathematical dictionary database is used to transform mathematical problem information according to the same source rule; The converted results are processed using a mathematical term weighted similarity algorithm, and the weights of mathematical terms in the converted results are set to be higher than those of ordinary words. The mathematical dictionary database includes a mathematical formula mapping table and / or symbol correction rules.
[0008] The problem classification method, after obtaining the math problem information, further includes: Establish a rule base for classifying invalid mathematical data; The mathematical invalid data classification rule base is traversed to perform data cleaning on the portion of the mathematical problem information that conforms to the mathematical specific rules. The mathematical rules include blank data rules, duplicate data rules, and text-unrelated data rules.
[0009] The method for classifying math problems further includes, after cleaning the math-specific rules in the math problem information that conform to those rules by traversing the math invalid data classification rule base, the math-specific rules are used as follows: The cleaned math problem information is input into a fuzzy sample classifier to remove or correct fuzzy information in the math problem information. The fuzzy sample classifier is trained using incomplete mathematical text, which includes: mathematical text with questions but no stems, mathematical text with stems but no questions, mathematical text with stems but no graphic cutouts, and / or mathematical text with graphic cutouts but no stems.
[0010] Each output branch in the multi-task output branch corresponds to at least one mathematical label in the question classification result; the output branches include: question type branch, modality branch, completeness branch, and / or validity branch.
[0011] The step of inputting the logical reasoning result into the multi-task output branch to determine the question classification result of the mathematical question information includes: Based on the mathematical labels output by the multi-task output branch; The mathematical tags are compared with preset rejection priority tags; When the mathematical tag meets the rejection criteria, the question classification result of the mathematical question information is set as a rejection tag, along with the corresponding mathematical-specific reason.
[0012] The preset mathematical logic chain is a logical reasoning process consisting of conditions, formulas, and questions in sequence.
[0013] To address the aforementioned technical problems, this application proposes a question classification device, which includes a memory and a processor coupled to the memory; wherein the memory is used to store program data, and the processor is used to execute the program data to implement the question classification method described above.
[0014] To address the aforementioned technical problems, this application proposes a computer storage medium for storing program data, which, when executed by a computer, is used to implement the question classification method described above.
[0015] Compared with existing technologies, the beneficial effects of this application are: by integrating multimodal features such as text, formulas, and graphics, and using a preset mathematical logic chain for deep reasoning, and finally through multi-task branch collaborative output, it realizes a complete process of understanding mathematical problems from surface information to grasping the internal logic, and then to multi-dimensional accurate classification, overcoming the shortcomings of traditional methods that have single features and lack logical understanding. Attached Figure Description
[0016] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort. Wherein: Figure 1 This is a flowchart illustrating the first embodiment of the question classification method provided in this application; Figure 2 This is a flowchart illustrating the second embodiment of the topic classification method provided in this application; Figure 3 This is a flowchart illustrating the third embodiment of the question classification method provided in this application; Figure 4 This is a flowchart illustrating the fourth embodiment of the topic classification method provided in this application; Figure 5 This is a schematic diagram of the structure of an embodiment of the question classification device provided in this application; Figure 6 This is a schematic diagram of the structure of an embodiment of the computer storage medium provided in this application. Detailed Implementation
[0017] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of the embodiments. Based on the embodiments of this application, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of this application.
[0018] The terms “first,” “second,” “third,” “fourth,” etc. (if present) in the specification, claims, and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a particular order or sequence. It should be understood that such data can be interchanged where appropriate so that embodiments of the application described herein can be implemented, for example, in orders other than those illustrated or described herein. Furthermore, the terms “comprising” and “having,” and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.
[0019] This application provides a novel problem classification method that deeply integrates multimodal features such as text, formulas, and graphs to simulate and understand the internal logical reasoning process of mathematical problems, and on this basis, collaboratively and accurately outputs multi-dimensional classification labels. To this end, this application proposes a three-tiered multi-label prediction architecture for elementary and middle school math problems, consisting of a "mathematical data-specific governance layer + BERT-QRNN multi-task model layer + lightweight inference engine layer." The core includes a refined governance module for elementary and middle school math problem banks, a mathematics-specific BERT-QRNN multi-task model module, and a lightweight inference engine module for mathematical scenarios. This achieves end-to-end processing from multi-source mathematical data input to multi-label output, and each module can be independently iteratively optimized according to the characteristics of the mathematics discipline.
[0020] Please continue reading for details. Figure 1 , Figure 1 This is a flowchart illustrating the first embodiment of the topic classification method provided in this application.
[0021] The question classification method of this application is applied to a question classification device, which can be a server, a terminal device, or a system in which the server and the terminal device cooperate with each other. Accordingly, the various parts of the question classification device, such as each unit, subunit, module, and submodule, can all be set in the server, all in the terminal device, or separately in the server and the terminal device.
[0022] Furthermore, the aforementioned server can be either hardware or software. When the server is hardware, it can be implemented as a distributed server cluster consisting of multiple servers, or as a single server. When the server is software, it can be implemented as multiple software programs or software modules, such as software or software modules used to provide distributed servers, or as a single software program or software module; no specific limitations are made here.
[0023] like Figure 1As shown, the specific steps are as follows: Step S11: Obtain information about the math problem.
[0024] In the embodiments of this application, mathematical problem information refers to the original mathematical problem data to be classified, which may be in the form of plain text strings, rich text containing text and formulas, composite data with text and accompanying images (such as geometric figures), or structured data from a question bank.
[0025] Furthermore, this application parses the optically recognized text, mathematical problem slices, and original mathematical text from the problem bank from the mathematical problem information.
[0026] Optical character recognition (OCR) text refers to text extracted from scanned textbooks and exam papers using OCR technology. This type of text often contains recognition errors, such as misrecognizing a checkmark (√) as a v.
[0027] Math problem slices are image areas that are cropped from documents or images and contain the main content of the problem, especially those containing geometric figures, function graphs, charts, etc.
[0028] The original mathematical text in the question bank is the standard question text directly obtained from the structured question bank. It is usually formatted in a standardized way and may contain standard mathematical formula tags (such as LaTeX).
[0029] The question classification device in this application needs to have a corresponding parser to receive and process input in these three formats. For example, it calls an OCR engine for image files; for composite PDF or Word documents, it identifies and segments the question area and graphic area; and for database records, it directly reads the raw text fields.
[0030] This application's embodiments clarify the method's compatibility processing capability for multi-source, heterogeneous mathematical problem data, define the standard for the system input interface, enabling it to directly connect to complex real-world problem data sources, including scanned documents, image materials, and digital question banks, laying the foundation for subsequent unified processing.
[0031] Furthermore, the dedicated governance layer for mathematical data provided in this application primarily cleanses mathematical problem information, including but not limited to: data normalization, data cleaning, and data filtering. The specific process of the dedicated governance layer for mathematical data is explained below through specific embodiments; please refer to the detailed examples. Figure 2 , Figure 2 This is a flowchart illustrating the second embodiment of the topic classification method provided in this application.
[0032] like Figure 2 As shown, the specific steps are as follows: Step S21: Determine the dedicated mathematical dictionary database.
[0033] In this embodiment, the mathematical dictionary database constructed by the question classification device is a pre-built knowledge base. The mathematical formula mapping table is used to directly convert common OCR errors, such as converting "v2" to "√2". The symbol correction rules define context-based correction logic; for example, when "1 / 2" is misidentified as "12", correction is performed if the context is a fractional context.
[0034] Step S22: Use the mathematical dictionary database to transform the mathematical problem information according to the same source rule.
[0035] In this embodiment, the question classification device traverses the input question text, such as OCR text, and performs searches and replacements according to the dictionary database to correct obvious formula and symbol errors and convert them into standard form.
[0036] Step S23: Process the converted result using a mathematical term weighted similarity algorithm, and set the weight of mathematical terms in the converted result to be higher than the weight of ordinary words.
[0037] In this embodiment, the question classification device compares the converted question text with standard questions in the question bank for similarity. When calculating similarity, mathematical terms such as "Pythagorean theorem" and "solving equations" are given higher weight than common words like "of" and "and". This ensures that even if the expression styles differ, as long as the core mathematical elements are the same, they can be identified as the same question, and the expression may be standardized in the future.
[0038] For example, the corrected OCR text "The two legs of a triangle are 3 and 4, calculate the hypotenuse" and the original text "Given that the two legs of a right triangle are 3 and 4, find the length of the hypotenuse" can achieve a high similarity score because the high-weight terms such as "triangle", "leg", "3", "4" and "hypotenuse" are highly matched. Even if the function words are very different, they can still achieve a high similarity score.
[0039] This application provides a data standardization process for mathematics, which can effectively improve the consistency and standardization of question data from different channels with varying quality. In particular, it greatly improves the usability of OCR data, providing high-quality and unified input for subsequent feature extraction and classification, thereby indirectly improving classification accuracy.
[0040] Please refer to the details. Figure 3 , Figure 3 This is a flowchart illustrating the third embodiment of the topic classification method provided in this application.
[0041] like Figure 3 As shown, the specific steps are as follows: Step S31: Determine the rule base for classifying invalid mathematical data.
[0042] In this embodiment of the application, the mathematical invalid data classification rule library constructed by the question classification device is a predefined set of rules specifically used to identify invalid data in mathematical questions.
[0043] Step S32: Traverse the mathematical-specific rules in the mathematical invalid data classification rule library and perform data cleaning on the portion of the mathematical problem information that conforms to the mathematical-specific rules.
[0044] In this embodiment, the math-specific rules include, but are not limited to: blank data rules, duplicate data rules, and text-unrelated data rules. The question classification device needs to apply the rules in the rule base sequentially for matching and filtering.
[0045] Among them, the blank data rule is: questions with "less than 10 text characters and no valid graphics" are defined as blank data and will be removed.
[0046] Duplicate data rule: Questions with "text similarity greater than 95% and key mathematical parameters, such as numbers and graphic types, being completely identical" are defined as duplicate data and only one copy is retained.
[0047] Image-unrelated data rule: Data that is described in text as an algebraic calculation but is accompanied by a geometric figure and is not mentioned in the text is considered image-unrelated and will be marked or removed.
[0048] Specifically, the question classification device automatically scans the input question information, and once a rule is matched, it performs the corresponding cleaning operation, including but not limited to deletion and marking.
[0049] The embodiments of this application implement automated, rule-based data quality filtering, which can quickly and in batches remove obviously invalid, duplicate, or contradictory question data before model processing, significantly reduce data noise input to the model, improve the efficiency of subsequent processing, and avoid interference from invalid data on model prediction results.
[0050] Step S33: Input the cleaned mathematical problem information into a fuzzy sample classifier to remove or correct fuzzy information in the mathematical problem information.
[0051] In this embodiment of the application, the fuzzy sample classifier is trained using incomplete mathematical text, which includes: mathematical text with questions but no stems, mathematical text with stems but no questions, mathematical text with stems but no graphic cutouts, and mathematical text with graphic cutouts but no stems, etc.
[0052] Specifically, after the rule cleaning steps described above, some "fuzzy" questions remain where the rules are difficult to determine definitively. This step uses a lightweight machine learning classifier, namely a fuzzy sample classifier, for secondary judgment. This classifier is specifically designed to identify samples that are structurally incomplete but not necessarily completely invalid.
[0053] This application requires a large amount of manually labeled, structurally incomplete mathematical text as training data in order to teach the classifier to recognize "fuzzy" samples. This includes, but is not limited to: The question is phrased but not explicitly stated: "What is the area of this rectangle?"
[0054] The question stem is not provided but is: "In an isosceles triangle, the base is 10 and the leg is 13."
[0055] These types of samples help the classifier learn the feature patterns of incompleteness. In application, the classifier can determine which type of incompleteness the current question belongs to and make a decision to "suggest correction" or "reject".
[0056] This application's embodiment solves the blind spot problem of rule cleaning by introducing a dedicated "fuzzy sample classifier," which can handle complex cases that are on the edge of being valid and invalid. It further refines the granularity of data cleaning, improves the purity and quality of the input data of the entire system, and ensures that the questions that ultimately participate in the core classification are all clearly structured and meaningful.
[0057] Step S12: Extract mathematical features from the mathematical problem information, wherein the mathematical features include text features, formula features, and / or graphic features.
[0058] In this embodiment of the application, text feature extraction refers to the use of natural language processing technology to extract semantic features from the question text, such as identifying keywords like "calculate", "prove", and "as shown in the figure".
[0059] Formula feature extraction involves using a specialized mathematical formula encoder to convert mathematical formulas (which may exist in the form of text recognized by LaTeX, MathML, or OCR) into numerical vectors while preserving their mathematical structure information, such as operators and variable relationships.
[0060] Image feature extraction, which involves using computer vision techniques, such as convolutional neural networks (CNNs), to extract visual features such as the shape and structure of images if the problem contains images.
[0061] Step S13: Perform logical reasoning on the mathematical features according to the preset mathematical logic chain.
[0062] In this embodiment, the various features extracted in step S12 are fused and input into a sequence modeling module, such as QRNN, LSTM, or Transformer. This module is trained to capture typical mathematical logic chains in math problems, such as the reasoning process of known conditions → applied formulas / theorems → the final question. Through this step, the model understands the logical connections between the parts of the problem, rather than just the surface features.
[0063] This application's embodiments clarify the core logical patterns that the sequence modeling module (such as QRNN) needs to learn and capture. In mathematical problems, especially problem-solving and application problems, there is usually an implicit reasoning structure: Conditions: The known information, data, and graphical state provided in the problem.
[0064] Formula: The mathematical knowledge, theorems, formulas, or methods required to solve the problem (may be explicitly stated or implicit).
[0065] Question: The ultimate goal that the problem requires you to solve or prove.
[0066] During training, the model learns to identify which parts of the text constitute conditions, which parts hint at the application of formulas, and where the question is posed, by solving a large number of math problems. During reasoning, the model captures these three elements and their sequential relationships to gain a deeper understanding of the question's intent and solution path, rather than simply performing keyword matching.
[0067] This application's embodiments reveal the key to the model's deep understanding. By explicitly defining the core thought chain of mathematical problem-solving—"condition-formula-question"—as the model's learning objective, the model is guided to transcend surface features and achieve deep modeling of the semantics and logic of mathematical problems. This allows the model's classification decisions to be based on an understanding of the essence of the problem, significantly improving the accuracy and robustness of classification, especially for complex problems where judgments are more reliable.
[0068] Step S14: Input the logical reasoning result into the multi-task output branch to determine the question classification result of the mathematical question information.
[0069] In this embodiment, the logical reasoning result output in step S13 is simultaneously fed into multiple parallel classifiers, i.e., multi-task output branches. Each branch is responsible for predicting a label in a specific dimension, and the prediction results of all branches together constitute the complete question classification result.
[0070] For example, the four branches might output: Question Type = "Geometric Calculation Problem", Modality = "Multi-modality (Text and Image)", Completeness = "Complete", and Validity = "Valid". Together, these constitute the multi-label classification of the question.
[0071] Detailed explanation: This step clarifies the specific dimensions of multi-task collaborative prediction. The system contains multiple independent output layers (branches), each branch being a classifier: Question Type Branch: Responsible for predicting the specific question type, such as: true / false, fill-in-the-blank, calculation, geometry operation, problem-solving, multiple choice, etc.
[0072] Modal branch: responsible for predicting the modality of the question presentation, such as: single modality (text only) or multiple modality (text and image combination).
[0073] Completeness branch: responsible for predicting whether the question structure is complete, such as: complete (the question stem, conditions, and questions are all complete) or incomplete (missing necessary parts).
[0074] The validity branch is responsible for predicting whether the content of the question is valid, such as: valid (logically consistent) or invalid (contradictory conditions, non-mathematical content).
[0075] The embodiments of this application clarify that the classification implemented by the present invention is multi-dimensional and fine-grained. Through a parallel multi-task architecture, it can complete a comprehensive evaluation of multiple key attributes of a math problem in one go, which meets the actual needs of refined management and labeling of problems in educational scenarios. The output information is rich and has high practical value.
[0076] The aforementioned multi-task output branch uses a mathematics-specific multi-task loss function for loss calculation and training optimization. This application uses a weighted summation to fuse the losses of each task, setting weights based on the importance of mathematical question types. For example, the loss function formula is: Loss_total = 1.0 × Loss_Question Type (excluding geometry operation questions) + 1.3 × Loss_Geometry Operation Questions + 1.3 × Loss_Modality + 1.0 × Loss_Completeness + 1.0 × Loss_Effectiveness.
[0077] The mathematical sample processing mechanism is as follows: Loss Masking: This method uses binary masks to block the meaningless label loss of invalid mathematical samples (such as garbled text without mathematical logic) so that they do not participate in backpropagation. Such data is marked as "invalid" during labeling, and this is encoded so that it does not participate in loss calculation during model parameter training.
[0078] Handling of controversial samples: For mathematical samples with controversial labels (such as “half-complete geometry problems”), an ignore mechanism (labeled as -1) is adopted, or the loss is calculated by combining soft labels (such as “complete” probability 0.55, “incomplete problem” probability 0.45) with KL divergence (compared with the consensus distribution labeled by math teachers) to reduce the difficulty of model discrimination.
[0079] Training strategy: A two-stage training approach is adopted. The first stage (12 rounds) involves freezing the BERT-Math layer, training the QRNN and output layer, and optimizing the single-task loss. The second stage (25 rounds) involves unfreezing the BERT-Math layer, using a mathematically specific multi-task loss function, and enabling LossMasking and handling of controversial samples. The optimal model is saved after each round of validation (based on the mathematical multi-label F1 score on the test set).
[0080] Furthermore, this application also designs a rejection mechanism to improve the ability to identify valid data in mathematical problem stems, thereby indirectly improving system capabilities, data availability, and the quality of the problem bank data. Please refer to [link / reference] for details. Figure 4 , Figure 4 This is a flowchart illustrating the fourth embodiment of the topic classification method provided in this application.
[0081] like Figure 4 As shown, the specific steps are as follows: Step S41: Based on the mathematical labels output by the multi-task output branch.
[0082] Step S42: Compare the mathematical tag with the preset rejection priority tag.
[0083] Step S43: When the mathematical tag meets the rejection criteria, set the question classification result of the mathematical question information as a rejection tag, and the corresponding mathematical specific reason.
[0084] In the embodiments of this application, the rejection mechanism can be executed after the classification decision or before the classification decision.
[0085] Specifically, taking the post-classification decision execution as an example, the question classification device obtains preliminary predicted labels and their confidence levels from each branch. This application pre-defines a rejection priority list, for example: geometry operation questions > multi-model math questions > math incomplete questions > quick calculation questions > invalid math data. This list defines which types of questions require careful handling when judging ambiguity.
[0086] When a question's predicted label belongs to a high-priority category, such as a geometry problem, and its prediction confidence is lower than the high threshold set for that category, or when the logic reasoning module finds a serious conflict between text and graphic features, it meets the rejection criteria. In this case, the question classification device does not output an uncertain classification result, but instead outputs a special rejection label with a mathematics-specific reason, such as: "The question type is predicted as a geometry problem, but the confidence level is too low (65%), and the correlation between the text and graphics is insufficient."
[0087] This application's embodiments introduce a rejection mechanism that can identify and proactively reject complex or ambiguous questions that the model struggles to classify reliably. This prevents low-confidence predictions from being misused and contaminating the question bank or affecting instructional analysis. By outputting rejection labels and reasons, clear guidance is provided for manual review, thus ensuring the overall reliability and high quality of the final output while pursuing automation.
[0088] Those skilled in the art will understand that, in the above-described method of the specific implementation, the order in which each step is written does not imply a strict execution order and does not constitute any limitation on the implementation process. The specific execution order of each step should be determined by its function and possible internal logic.
[0089] To implement the above-mentioned question classification method, this application also proposes a question classification device, which can be found in the following details. Figure 5 , Figure 5 This is a schematic diagram of an embodiment of the title classification device provided in this application.
[0090] The question classification device 400 in this embodiment includes a processor 41, a memory 42, an input / output device 43, and a bus 44.
[0091] The processor 41, memory 42, and input / output device 43 are respectively connected to the bus 44. The memory 42 stores program data, and the processor 41 is used to execute the program data to implement the question classification method described in the above embodiment.
[0092] In this embodiment, processor 41 can also be referred to as a CPU (Central Processing Unit). Processor 41 may be an integrated circuit chip with signal processing capabilities. Processor 41 can also be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. The general-purpose processor can be a microprocessor, or processor 41 can be any conventional processor.
[0093] This application also provides a computer storage medium; please refer to the following: Figure 6 , Figure 6 This is a schematic diagram of a computer storage medium according to an embodiment of the present application. The computer storage medium 600 stores a computer program 61, which, when executed by a processor, is used to implement the question classification method of the above embodiment.
[0094] When the embodiments of this application are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) or processor to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0095] The above description is merely an embodiment of this application and does not limit the patent scope of this application. Any equivalent structural or procedural transformations made using the content of this application's specification and drawings, or direct or indirect applications in other related technical fields, are similarly included within the patent protection scope of this application.
Claims
1. A method for classifying questions, characterized in that, The question classification methods include: Get information about math problems; Extract mathematical features from the mathematical problem information, wherein the mathematical features include text features, formula features, and / or graphic features; The mathematical features are logically reasoned according to a preset mathematical logic chain; The logical reasoning results are input into the multi-task output branch to determine the question classification result of the mathematical question information.
2. The question classification method according to claim 1, characterized in that, The process of obtaining math problem information includes: The optical recognition text, mathematical problem slices, and original mathematical text from the question bank are extracted from the mathematical problem information.
3. The question classification method according to claim 2, characterized in that, After obtaining the math problem information, the problem classification method further includes: Establish a dedicated mathematical dictionary database; The mathematical dictionary database is used to transform mathematical problem information according to the same source rule; The converted results are processed using a mathematical term weighted similarity algorithm, and the weights of mathematical terms in the converted results are set to be higher than those of ordinary words. The mathematical dictionary database includes a mathematical formula mapping table and / or symbol correction rules.
4. The question classification method according to claim 1, characterized in that, After obtaining the math problem information, the problem classification method further includes: Establish a rule base for classifying invalid mathematical data; The mathematical invalid data classification rule base is traversed to perform data cleaning on the portion of the mathematical problem information that conforms to the mathematical specific rules. The mathematical rules include blank data rules, duplicate data rules, and text-unrelated data rules.
5. The question classification method according to claim 4, characterized in that, After cleaning the portion of the math problem information that conforms to the specific math rules by traversing the math invalid data classification rule base, the problem classification method further includes: The cleaned math problem information is input into a fuzzy sample classifier to remove or correct fuzzy information in the math problem information. The fuzzy sample classifier is trained using incomplete mathematical text, which includes: mathematical text with questions but no stems, mathematical text with stems but no questions, mathematical text with stems but no graphic cutouts, and / or mathematical text with graphic cutouts but no stems.
6. The question classification method according to claim 1, characterized in that, Each output branch in the multi-task output branch corresponds to outputting at least one mathematical label from the question classification result; The output branches include: question type branch, modality branch, completeness branch, and / or validity branch.
7. The question classification method according to claim 1, characterized in that, The step of inputting the logical reasoning result into the multi-task output branch to determine the question classification result of the mathematical question information includes: Based on the mathematical labels output by the multi-task output branch; The mathematical tags are compared with preset rejection priority tags; When the mathematical tag meets the rejection criteria, the question classification result of the mathematical question information is set as a rejection tag, along with the corresponding mathematical-specific reason.
8. The question classification method according to claim 1, characterized in that, The preset mathematical logic chain is a logical reasoning process consisting of conditions, formulas, and questions in sequence.
9. A question classification device, characterized in that, The question classification device includes a memory and a processor coupled to the memory; The memory is used to store program data, and the processor is used to execute the program data to implement the question classification method as described in any one of claims 1 to 8.
10. A computer storage medium, characterized in that, The computer storage medium is used to store program data, which, when executed by the computer, is used to implement the question classification method as described in any one of claims 1 to 8.