Geometric test question generation and evaluation method and system for personalized teaching

By defining a formal description language and a symbolic reasoning engine, a structured geometric scenario is established, generating solvable and graphically consistent personalized geometric questions. This solves the problems of resource mismatch and lack of diagnostic evaluation in existing technologies, and achieves efficient and accurate personalized teaching support.

CN122242468APending Publication Date: 2026-06-19XI AN JIAOTONG UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
XI AN JIAOTONG UNIV
Filing Date
2026-03-10
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing geometry question bank resources cannot adapt to personalized teaching needs in real time. Manual question generation is inefficient and prone to errors. Geometry questions generated by large language models are logically unsolvable or their graphs cannot be drawn. Traditional assessment methods lack diagnostic capabilities.

Method used

Define a formal description language, establish a structured geometric scene, use a symbolic reasoning engine for logic verification and option generation, render text and graphics from the same source and perform diagnostic evaluation, and generate geometric questions of varying difficulty by controlling the recursion depth.

Benefits of technology

It enables the automated generation and accurate assessment of personalized geometry questions, ensuring that the questions are solvable and that the text and graphics are consistent. It can accurately diagnose students' knowledge gaps, supports multiple answer verification and difficulty stratification, and improves teaching efficiency and assessment accuracy.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122242468A_ABST
    Figure CN122242468A_ABST
Patent Text Reader

Abstract

A method and system for generating and evaluating geometry test questions for personalized teaching is disclosed. The method includes defining a formal description language and using it to establish structured geometric scenarios; performing logical reasoning-based verification and option generation on the established structured geometric scenarios; and using the verified structured geometric scenarios and generated options for image-text homology rendering and diagnostic evaluation. This invention constructs a mathematical closed loop of "formal construction" and "symbolic reasoning" based on symbolic logic, resolving the challenges of personalized teaching. Traditional manual question generation relies on static question banks and cannot generate a large number of "variant questions" in real time to meet students' personalized training needs. This invention can automatically and on a large scale generate verifiable, drawable, and solvable geometry test questions according to difficulty and knowledge point requirements, and can provide accurate diagnostic evaluation. Simultaneously, the method of this invention accurately eliminates "guessing" by employing a multiple-answer mechanism and strict matching indicators, enhancing the diagnostic value of the evaluation.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of computer technology, specifically relating to a method and system for generating and evaluating geometry test questions for personalized teaching. Background Technology

[0002] Geometric proofs and reasoning are a core component of mathematics education, requiring problem solvers to possess rigorous logical thinking skills. Problems typically rely on both verbal conditions and geometric figures, demanding multi-step reasoning from the solver.

[0003] Existing geometry problem resources and evaluation methods mainly rely on manually compiled datasets from textbooks / competitions and small-scale template-based generation, such as Geometry3K. [1] GeoQA [2] UniGeo [3] PGPS9K [4] And so on; evaluation sets emphasizing multimodal understanding have also emerged, such as MATH. [5] GeoEval [6] MathVerse [7] wait.

[0004] In the current educational landscape, personalized learning has become an industry trend. However, traditional classroom teaching models typically employ a fixed set of exercises for all students, making it difficult to provide targeted training for specific students' weak areas (e.g., the identification of specific geometric structures or the application of specific theorems). Therefore, existing technologies have the following significant shortcomings in supporting personalized geometry instruction: 1. Existing question bank resources are static and cannot be adapted in real time: Most existing online homework platforms are based on static databases, which can only retrieve questions that have already been entered. When it is necessary to generate a series of variant questions for intensive training based on specific student errors (e.g., "always confusing inner and outer minds"), static question banks are often insufficient and cannot meet the needs of massive, fine-grained, targeted practice.

[0005] 2. Manual problem creation is inefficient and difficult to scale: Relying on manual creation of high-quality geometry proofs is not only extremely time-consuming, but also prone to errors such as "inconsistencies between text and diagrams" or "redundant / missing conditions." Manual methods cannot meet the real-time interactive needs of millions of users in online education applications.

[0006] 3. Existing generative models suffer from "illusion" and unsolvability: Although large language models (LLMs) have certain text generation capabilities, when dealing with geometric problems, they often generate problems that are "textually true but cannot be drawn" or "logically unsolvable" (i.e. illusion problems). They lack rigorous mathematical verification mechanisms and cannot be directly applied to serious educational products.

[0007] 4. The assessment methods are limited and lack diagnostic value: Traditional multiple-choice questions are easily affected by the "guessing / elimination method" and cannot truly reflect the students' level of mastery; while open-ended proof questions are difficult to be automatically graded by machines.

[0008] References [1]Lu, P., Gong, R., Jiang, S., Qiu, L., Huang, S., Liang, X., & Zhu, SC (2021). Inter-gps: Interpretable geometry problem solving with formal language and symbolic reasoning. arXiv preprint arXiv:2105.04165. [2]Chen, J., Tang, J., Qin, J., Liang, [3]Chen, J., Li, T., Qin, J., Lu, P., Lin, L., Chen, C.,&Liang, X.(2022). Unigeo: Unifying geometry logical reasoning via reformulating mathematical expression. arXiv preprint arXiv:2212.02746. [4]Zhang, M. L., Yin, F.,&Liu, C. L. (2023). A multi-modal neuralgeometric solver with textual clauses parsed from diagram. arXiv preprint arXiv:2302.11097. [5]Hendrycks, D., Burns, C., Kadavath, S., Arora, A., Basart, S.,Tang, E.,...&Steinhardt, J. (2021). Measuring mathematical problem solvingwith the math dataset. arXiv preprint arXiv:2103.03874. [6]Zhang, J., Li, Z. Z., Zhang, M. L., Yin, F., Liu, C. L.,&Moshfeghi, Y. (2024, August). Geoeval: benchmark for evaluating llms andmulti-modal models on geometry problem-solving. In Findings of theAssociation for Computational Linguistics: ACL 2024 (pp. 1258-1276). [7]Zhang, R., Jiang, D., Zhang, Y., Lin, H., Guo, Z., Qiu, P.,...&Li, H. (2024, September). Mathverse: Does your multi-modal llm truly see thediagrams in visual math problems?. In European Conference on Computer Vision(pp. 169-186). Cham: Springer Nature Switzerland. Summary of the Invention The purpose of this invention is to address the problems in the prior art by providing a method and system for generating and evaluating geometry test questions for personalized teaching. This method and system can automatically and on a large scale generate verifiable, drawable, and solvable geometry test questions based on difficulty and knowledge point requirements, and can provide accurate diagnostic assessments.

[0009] To achieve the above objectives, the present invention provides the following technical solution: Firstly, a method for generating and assessing geometry test questions for personalized teaching is provided, including: Define a formal description language and use it to build a structured geometric scene; The established structured geometric scene is validated and options are generated based on logical reasoning. Utilize validation to perform text-image homogeneous rendering and diagnostic evaluation based on structured geometric scenes and generated options.

[0010] As a preferred embodiment, the formal description language is defined as follows: The Clause: is used to describe the geometric relationship between a defined point and a geometric object, in the form of f(X1, X2, ..., Xn), where f() represents a predefined relationship, and Xi are existing points, i=1,2...n; Construction: An operation that introduces a new point through geometric rules, used to uniquely determine the new point x by one or two clauses; when there are two clauses, the intersection point is taken as x; when only one clause uniquely determines the point, the second clause is omitted. Premise: Consists of a series of constructs arranged in sequence, used to sequentially introduce points and relationships to form a complete geometric configuration; Problem: Consists of a premise and candidate options, with the options being conclusions. Multiple answers are allowed.

[0011] As a preferred embodiment, the structured geometric scene is established as follows: Starting from the basic seed, set the maximum recursion depth N; At each level, templates are randomly sampled from a predefined clause template library, and instantiated using existing points as parameters to generate geometric objects; the predefined clause template library contains a variety of basic geometric structures; Perform geometric constraint verification on the newly generated geometric objects; if the verification passes, add them to the prerequisite sequence. Based on the premise sequence and the set maximum recursion depth N, annealing branch control is adopted. As the depth increases, the number of branches is gradually reduced to form a candidate premise pool.

[0012] As a preferred embodiment, in the step of performing logical reasoning-based verification and option generation on the established structured geometric scene, a rule-based symbolic reasoning engine is invoked to perform forward chaining: Match the currently known geometric relations with the axioms and theorem library R; Handling numerical relationships primarily based on angles; Perform forward chain iterations, repeatedly generating new geometric facts, until no new conclusions can be derived or the depth limit is reached.

[0013] As a preferred embodiment, in the step of performing logical reasoning-based verification and option generation on the established structured geometric scene, option generation includes: Perform the following steps to generate test questions for assessment: Calculate the prior difficulty of each provable conclusion to filter the correct options. Select the conclusion with a difficulty score greater than the predetermined value as the correct option to ensure that the question has discrimination. To detect error-prone points, distracting options are constructed by generating incorrect options. The methods for constructing distracting options include relation inversion, numerical perturbation, and equivalence traps. Relation inversion includes replacing parallel with non-parallel, equal with unequal, and parallel with perpendicular. Numerical perturbation modifies the proportion or angle values ​​based on common erroneous calculation logic. Equivalence traps generate conclusions that appear correct but are not valid under certain conditions. It also includes performing synonym rewriting for correct options. The question is allowed to contain multiple correct options for multiple answer verification. The generated candidate options are then input into the inference engine for backtesting. The options that are judged as true by the inference engine are marked as the set of correct answers S, and the rest are incorrect options.

[0014] As a preferred embodiment, the step of performing image-text homology rendering and diagnostic evaluation using verification through structured geometric scenes and generated options includes image-text homology rendering as follows: Formal description language is translated into natural language to generate text using rule templates, wherein the rule templates are bilingual (Chinese and English). For graphics rendering, the graphics engine is called to read the same set of formalized coordinate data and draw vector graphics. Calculate the placement of the text labels and mark the specific lines or angles involved in the question.

[0015] As a preferred embodiment, in the step of performing image-text homology rendering and diagnostic evaluation using verification through structured geometric scenes and generated options, the diagnostic evaluation includes: Get the collection of submitted answers The evaluation uses Exact Match, Option-level Metrics, and Difficulty Stratification metrics to identify knowledge gaps based on the evaluation results. When evaluating using option-level metrics, if a "missed selection" occurs, it indicates an incomplete reasoning chain; if a "wrong selection" occurs, it indicates conceptual confusion. When evaluating using difficulty stratification metrics, the accuracy rate is calculated separately for Easy, Medium, and Hard levels of the question.

[0016] Secondly, a geometry test question generation and assessment system for personalized teaching is provided, including: The structured geometry scene creation module is used to define a formal description language and use the formal description language to create a structured geometry scene. The logical reasoning verification and option generation module is used to perform logical reasoning-based verification and option generation on the established structured geometric scene. The image-text homology rendering and diagnostic evaluation module is used to perform image-text homology rendering and diagnostic evaluation by verifying the structured geometric scene and the generated options.

[0017] Thirdly, a computer device is provided, comprising: a processor and a computer-readable storage medium; A processor is used to execute computer programs; A computer-readable storage medium storing a computer program that, when executed by the processor, implements the geometry test generation and assessment method for personalized teaching as described in the first aspect.

[0018] Fourthly, a computer-readable storage medium is provided, the computer-readable storage medium storing a computer program adapted to be loaded by a processor and executed as described in the first aspect, a method for generating and evaluating geometry test questions for personalized instruction.

[0019] Compared with the prior art, the present invention has at least the following beneficial effects: This invention presents a method for generating and evaluating geometry test questions for personalized teaching. It defines a formal description language and uses it to build structured geometric scenarios. The method then performs logical reasoning-based verification and option generation on these scenarios, constructing a mathematical closed loop of "formal construction" and "symbolic reasoning" based on symbolic logic. On one hand, the symbolic reasoning engine rigorously derives and verifies each option, ensuring that only mathematically proven options are correct, while incorrect options are considered distractors. This eliminates the need for secondary manual review before online submission, improving the efficiency and accuracy of test question generation. On the other hand, by controlling the recursive search depth parameter N, questions of varying difficulty (Easy / Medium / Hard) can be naturally generated to meet the diverse difficulty requirements of personalized teaching, and the calculation is theoretically feasible. This invention resolves the challenges of personalized teaching and addresses the insufficient scalability and accuracy of existing intelligent tutoring systems. Traditional manual question generation relies on static question banks, which cannot generate a large number of "variant questions" in real time to meet students' personalized training needs. This invention uses the same knowledge point seed and changes the geometric construction parameters to generate a massive number of "variant questions" in real time. Meanwhile, a formalized language is used to uniformly drive text generation and graphics rendering, eliminating ambiguity between text and graphics and reducing the operational costs of large-scale applications. Experiments show that the question bank generated by this invention is of high quality. Compared with conventional datasets and large models (GPT-4o, Gemini 1.5 Pro, etc.), the accuracy of large models drops significantly on the geometric dataset generated by this invention. This indicates that the questions contain complex logic and long-chain reasoning, which can break through the "shallow reasoning" of general models and can serve as a high-discrimination assessment tool. This invention utilizes verification through structured geometric scenes and generated options for text-graphic homology rendering and diagnostic assessment. Text-graphic homology rendering ensures strict consistency between text conditions and visual graphics, and diagnostic assessment accurately locates knowledge blind spots and can generate capability radar charts.

[0020] Furthermore, the method of this invention accurately eliminates "guessing" by employing a multiple-answer mechanism and strict matching indicators to enhance the diagnostic value of the assessment. Experiments show that when the model answers this question bank, there is a large gap between the option-level F1 score and the strict matching score, reflecting the "guessing score" situation. The method of this invention can distinguish between students who truly master the knowledge and those who answer by guessing, providing accurate recommendation algorithm input for intelligent tutoring systems, which is superior to traditional multiple-choice questions.

[0021] Furthermore, the method of this invention achieves difficulty stratification by controlling the construction depth N parameter. The difficulty stratification is effective. Experiments show that the strict matching scores of the general model differ significantly on different difficulty subsets, while the human baseline remains stable. This proves that the method of this invention can stably generate test questions that conform to the preset difficulty gradient, meeting the "gradual" teaching requirements.

[0022] Furthermore, the method of this invention allows questions to contain multiple correct options for multi-answer verification. The multi-answer evaluation protocol can not only score but also output diagnostic information, locate error patterns in teaching assessment, and provide a basis for teaching improvement. Attached Figure Description

[0023] To more clearly illustrate the technical solutions of the embodiments of the present invention, the accompanying drawings used in the embodiments will be briefly introduced below. It should be understood that the following drawings only show some embodiments of the present invention. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.

[0024] Figure 1 A schematic diagram of a predefined clause template for generating geometric problems according to an embodiment of the present invention; Figure 2 Overall flowchart of the three-stage question generation process of this invention; Figure 3 A schematic diagram illustrating the generation of rule templates for English question types in an embodiment of the present invention; Figure 4 A schematic diagram illustrating the generation of rule templates for text headings in an embodiment of the present invention; Figure 5 A schematic diagram of the abundance of geometric elements covered by the geometric problems generated in the embodiments of the present invention; Figure 6 A schematic diagram illustrating the percentage of error types in different large models according to embodiments of the present invention; Figure 7 A schematic diagram illustrating a simple geometric problem in an embodiment of the present invention; Figure 8 Schematic diagram of an example of an isogeometric problem in an embodiment of the present invention; Figure 9 A schematic diagram illustrating a difficult geometric problem in an embodiment of the present invention. Detailed Implementation

[0025] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, those skilled in the art can obtain other embodiments without creative effort.

[0026] This invention proposes a method for generating and evaluating geometry test questions for personalized teaching, aiming to solve problems in existing technologies such as the inability of traditional manual question generation to meet personalized needs, the lack of logical correctness guarantees in generative models, and the lack of diagnostic accuracy in traditional evaluation. The method of this invention mainly includes the following steps: S1. Define a formal description language and use it to build a structured geometric scene; S2. Perform logical reasoning-based verification and option generation on the established structured geometric scene; S3. Utilize verification of the structured geometric scene and generated options to perform image and text homogeneous rendering and diagnostic evaluation.

[0027] Step S1, the geometric scene construction (generation stage) based on the formal system, involves establishing a formal description language that includes geometric objects such as points, lines, and circles, and their interrelationships. Parametric sampling is performed using a predefined clause template library (covering relationships such as perpendicularity, parallelism, and tangency), and a breadth-first search algorithm with an annealing mechanism is used to construct a sequence of geometric premises. This step, by controlling the construction depth and template type, directly determines the basic difficulty of the problem and the knowledge points tested.

[0028] Step S2, based on symbolic reasoning, involves logical verification and option generation (verification phase): A symbolic reasoning engine is used to perform forward chain derivation on the generated geometric premises, enumerating all mathematically valid conclusions. The "prior difficulty" is calculated based on the derivation path length and the number of elements involved, selecting suitable correct options. Simultaneously, highly persuasive incorrect options are generated through algorithms such as equivalent rewriting, relational inversion, and numerical perturbation. This process ensures the "absolute solvability" of each question, eliminating dead or incorrect questions from the outset.

[0029] Step S3: Image and Text Homologous Rendering and Diagnostic Assessment (Output and Application Stage): The validated formal description is simultaneously mapped to natural language question stems (supporting multiple languages) and geometric figures, ensuring strict consistency between textual conditions and visual graphics. In the assessment stage, a multi-answer mechanism and an exact match metric are employed, combined with option-level recall analysis, to eliminate the possibility of students guessing scores, thereby accurately identifying students' knowledge gaps.

[0030] Please see Figure 1 and Figure 2 Taking the construction of a complete intelligent test question generation pipeline as an example, the method of the present invention will be described in detail, which can be embedded as a backend service into online education platforms or tutoring apps.

[0031] To enable computers to understand and manipulate geometric objects, formal description languages ​​are defined as follows: The Clause: is used to describe the geometric relationship between a defined point and a geometric object, in the form of f(X1, X2, ..., Xn), where f() represents a predefined relationship, and Xi are existing points, i=1,2...n; for example, triangle(a,b,c), midpoint(a,b), etc.

[0032] Construction: An operation that introduces a new point through geometric rules, used to uniquely determine the new point x by one or two clauses; when there are two clauses, the intersection is taken as x; when only one clause uniquely determines the point, the second clause is omitted; for example, x = on_line(a,b), on_circle(o,c), etc.

[0033] Premise: Consists of a series of constructs arranged in sequence, used to sequentially introduce points and relationships to form a complete geometric configuration; Problem: Consists of a premise and candidate options. The options are in the form of conclusions (e.g., parallel, perpendicular, equal, concyclic, proportional, angle, etc.). The problem allows multiple answers, that is, multiple of the four options can be true.

[0034] The predefined clause template for generating geometric problems in this embodiment of the invention is as follows:

[0035] In one possible implementation, a recursive expansion strategy is used to establish the structured geometric scene as follows: Starting from a basic seed (such as two points), set the maximum recursion depth N (for example, N∈[3,6] for junior high school level difficulty and N∈[5,8] for senior high school level difficulty). At each level, 1-2 templates are randomly sampled from the predefined clause template library, and instantiated using the existing points as parameters to generate geometric objects; the predefined clause template library in this embodiment of the invention contains dozens of basic geometric structures such as triangles, circles, parallelograms, and angle bisectors; Consistency check: Perform geometric constraint verification on newly generated geometric objects (e.g., check for degenerate triangles or overlapping points). If the verification passes, add them to the prerequisite sequence. To balance the diversity of generated data with computational efficiency, an annealing branch control method is adopted based on the premise sequence and the set maximum recursion depth N. In the early stage, more combination branches are explored, and the number of branches is gradually reduced as the depth increases, eventually forming a large-scale candidate premise pool.

[0036] For example, in Figure 1In the example, the generated premise is: abcd = isquare abcd; efgh= ninepoints efghdac; i = on_circum ieah, on_dia ifh; jklm =centroid jklmgcb; The corresponding Chinese expression is: In square ABCD, construct the nine-point circle ⊙H of △DAC, which intersects AC, DC, and DA at E, F, and G. Point I lies on the circle determined by points E, A, and H, and FI⊥HI. Point M is the centroid of △GCB, and J and K are the midpoints of CB and GB, respectively.

[0037] In one possible implementation, step S2 invokes a rule-based symbolic reasoning engine on the established structured geometric scene to perform forward chaining, including the following steps: Match the currently known geometric relations with the axioms and theorem library R (such as the criteria for congruent triangles and the inscribed angle theorem); Handling numerical relationships primarily based on angles (such as equal angles, supplementary angles, or specific values, etc.); A forward chain iteration is performed, repeatedly generating new geometric facts, until no new conclusions can be derived or the depth limit is reached. This step not only verifies the validity of the graph but also outputs the set of all mathematically valid conclusions under that graph, forming a candidate pool of correct answers.

[0038] The rules for deriving new conclusions using the symbolic reasoning engine in this embodiment of the invention are as follows:

[0039] The code for the forward chaining algorithm in this embodiment of the invention is as follows:

[0040] In one possible implementation, during the option generation process of step S2 of this embodiment of the invention, the following operations are performed to generate test questions for assessment: Correct option selection: The Prior Difficulty of each provable conclusion is calculated to select the correct options. The formula is D=Σ(wi·xi), where xi considers factors such as the length of the proof steps, the number of lemmas required, and the number of geometric elements involved. Conclusions with a difficulty score greater than a predetermined value are selected as correct options to ensure the questions have discriminatory power.

[0041] To detect common pitfalls, error options are generated to construct distractor options. These distractor options are constructed using methods including relation inversion, numerical perturbation, and equivalence traps. Relation inversion includes replacing parallel with non-parallel, equal with unequal, and parallel with perpendicular. Numerical perturbation modifies ratios or angles based on common erroneous calculation logic (such as forgetting to multiply by a coefficient), e.g., "AB:CD=1:2→AB:CD=1:4", "∠ABC=π / 4→∠ABC=π / 8". Equivalence traps generate conclusions that appear correct but are invalid under certain conditions. The system also includes paraphrasing correct options, such as "midpoint → line segments are equal or the ratio is 1", "point on circle → line segment length equals radius", etc. Multiple-answer verification: A question is allowed to contain multiple correct options. The generated candidate options are then input into the inference engine for backtesting. Options judged as true by the inference engine are marked as the set of correct answers S (where |S|≥1 is allowed), and the rest are incorrect options. This step ensures the absolute rigor of the standard answer.

[0042] Furthermore, to prevent the accidental generation of "accidentally correct" solutions when generating incorrect options, embodiments of the present invention avoid simple entity substitution (such as...). Otherwise, in some extremely special cases, it may lead to false results.

[0043] In one possible implementation, step S3 of this embodiment of the invention employs a "single-source driving dual-display" technology, where the image and text are rendered from the same source, including: Formal descriptions are translated into natural language using rule templates for text generation. These rule templates are bilingual (Chinese and English), mapping Midpoint(M,A,B) to "M is the midpoint of segment AB" and "M is the midpoint of line segment AB," respectively. This rule-based generation avoids semantic drift inherent in machine translation.

[0044] For graphics rendering, the graphics engine (Python's Matplotlib library functions) reads the same set of formal coordinate data and draws vector graphics. Intelligent annotation: Automatically calculates the optimal placement of text labels to avoid obscuring key geometric features, and highlights specific lines or angles involved in the question.

[0045] The template for generating English questions in this embodiment of the invention is as follows: Figure 3 As shown, the template for generating Chinese question text is as follows: Figure 4 As shown.

[0046] The examples above demonstrate a template for constructing multiple-choice questions using translated question stems. This consistent expression reduces the risk of misunderstanding due to unclear word order or references, and improves the robustness of generated geometry problems.

[0047] In one possible implementation, step S3, diagnostic assessment, includes: When students or tested models submit a set of answers Then, obtain the set of submitted answers. The following indicators were used for evaluation to support accurate teaching diagnosis: Exact Match (EM) metric, if and only if =S indicates a correct answer. This indicator is used for advanced skill assessment and does not award points for "guessing half the answer".

[0048] Option-level metrics include F1 score and Hamming loss. If students frequently miss selections (high precision but low recall), analysis suggests their reasoning chain is incomplete; if they frequently make incorrect selections, it indicates conceptual confusion.

[0049] In addition, a difficulty stratification index is used to calculate the accuracy rate based on the Easy / Medium / Hard level of the questions, thereby generating a radar chart of students' abilities.

[0050] In summary, the technical feasibility of the embodiments of this invention, based on the "zero illusion" guarantee of symbolic logic, is built upon a mathematical closed loop of "formal construction" and "symbolic reasoning": (1) Guarantee of absolute solvability: Unlike probability-based Large Language Models (LLMs), which are prone to generating illusionary questions that are "textually coherent but geometrically unsolvable," each option generated by the method in this embodiment must undergo rigorous derivation and verification by a Symbolic Reasoning Engine. Only conclusions that are mathematically proven to be true will be selected as correct options, while those that are proven to be false will be selected as distractors. This determinism makes the system fully meet industrial deployment standards and eliminates the need for secondary manual review.

[0051] (2) Algorithm basis for controllable difficulty: The difficulty of geometry problems is positively correlated with the construction depth and the number of reasoning steps. The method of this invention, by controlling the depth parameter N of the recursive search, naturally possesses the ability to generate Easy / Medium / Hard level problems at the algorithm level, which is completely feasible in computational theory.

[0052] The method described in this invention resolves the contradiction between scalability and accuracy in personalized teaching. Existing intelligent tutoring systems (ITS) in industrial practice suffer from two core technical pain points that traditional manual question generation cannot address: On the one hand, it cannot meet the real-time variation requirements of "a thousand people, a thousand faces": traditional manual question generation relies on static question banks. When the system detects that a student has a weak point in the "circular angle theorem", the static question bank often only has a limited number of questions. Once the questions are completed, the problem ends. It cannot be like this invention, which can generate hundreds or thousands of "variant questions" in real time by changing the geometric construction parameters based on the same knowledge point seed for saturation attack training.

[0053] On the other hand, manual verification is costly and prone to mismatch between text and graphics: manually written geometry problems often exhibit inconsistencies where "the textual conditions describe tangents, but the lines in the accompanying diagrams are not tangent," seriously misleading machine grading and student comprehension. This invention uses a formal language as a single information source to simultaneously drive text generation and graphics rendering, eliminating text-graphic ambiguity at its source. This is crucial for reducing operating costs in large-scale industrial applications.

[0054] The following content verifies that the geometry test item generation and evaluation method for personalized teaching in this embodiment of the invention has strong diagnostic ability and high discrimination compared with existing tasks.

[0055] To verify the quality of the generated test questions, this embodiment of the invention statistically analyzed the question parameters of the generated question bank, as shown in the table below:

[0056] Please see Figure 5 , Figure 5 The abundance of geometric elements covered by the geometric problems generated by the embodiments of the present invention is shown. Furthermore, the geometric problems generated by the embodiments of the present invention represent a significant improvement over the existing datasets shown in the table below, including features such as "automatic generation," "complexity rating," "concise expression and long-step reasoning," "bilingual support (Chinese and English)," and "multiple-choice question testing."

[0057]

[0058] In the table, AG: Automatically generated. IF: Input format, T represents text, I represents image. CR: Complexity rating. AvgPL: Average proof length. AvgDL: Average description length. FA: Formal annotation. LT: Language type, EN represents English, ZH represents Chinese. QT: Question type, SA represents single-choice question, MA represents multiple-choice question, OE represents open-ended question.

[0059] Meanwhile, comparative tests were conducted on existing large models (GPT-4o, Gemini 1.5 Pro, etc.).

[0060] Experimental data: The performance comparison of different large models on different datasets is shown in the table below:

[0061] On conventional mathematical datasets (such as MATH), GPT-4o achieves an accuracy of 57.72%, but on the geometric dataset generated in this embodiment of the invention, its accuracy drops sharply to 17.51%.

[0062] Effect Analysis: This proves that the questions generated by the method in the embodiments of the present invention are not simple pattern matching, but contain complex logical traps and long chain reasoning, which can effectively penetrate the "shallow reasoning" ability of general models. It is suitable as a high-discrimination assessment tool to screen top students or evaluate the logical shortcomings of AI models.

[0063] The method of this invention effectively eliminates "guessing" noise through a "no-guessing" evaluation protocol.

[0064] The multi-answer mechanism combined with the exact match metric employed in this invention significantly enhances the diagnostic value of the assessment.

[0065] Experimental data: The table below shows the overall model performance for the following datasets, including both text and image inputs. EMA / EME / EMM / EMH in the table report the exact match (EM) for the full / easy / medium / difficult groupings, respectively.

[0066] The table below shows the overall model performance for the following dataset, which contains only text input. EMA / EME / EMM / EMH in the table report the exact match (EM) for the full / easy / medium / difficult groupings, respectively.

[0067] As can be seen from the two tables above, when answering this question bank, the model's option-level F1 score (measured partial correctness) is typically around 60%, but the exact match score (EM) is only around 20%. This 40% score gap accurately reflects the space for "partial understanding" or "guessing scores." In educational scenarios, the method of this invention can distinguish between students who "fully grasp the knowledge points" and students who "guess correctly through elimination," thereby providing accurate recommendation algorithm input for intelligent tutoring systems—something traditional multiple-choice questions cannot achieve.

[0068] Experimental data validating the effectiveness of difficulty stratification: The two tables above report the EME (Earnings Meaning) across the Easy, Medium, and Hard difficulty subsets, demonstrating the effectiveness of the "controllable difficulty stratification" method of this invention. Taking the general model as an example, under text and image input: GPT-4o: EME 29.09 → EMM 13.81 → EMH 9.24; Gemini 1.5 Pro: EME 38.18 → EMM 22.27 → EMH 9.24; Qwen2-VL-7B: EME 59.64 → EMM 17.37 → EMH 0.0 (approaching collapse on Hard). Meanwhile, the human baseline remains stable across the three difficulty levels (approximately 94.12 / 94.92 / 95.24).

[0069] This proves that the parameter control of the construction depth N in this invention is effective and can stably produce test questions that meet the preset difficulty gradient, satisfying the "gradual" practice requirements in teaching.

[0070] Error type analysis and "diagnostic assessment": The method in this embodiment of the invention categorizes model outputs into four types: RIGHT_ANSWER, WRONG_ANSWER, NO_ANSWER, and OUT_OF_LENGTH, and provides the error composition distribution for different models. For example... Figure 6 The results show that errors in the general model are mainly driven by WRONG_ANSWER (accounting for approximately 2 / 3 to 3 / 4 of the total), while different systems show significant differences in NO_ANSWER and OUT_OF_LENGTH, reflecting issues such as uncertainty handling and output truncation. Inference-based models tend to have "higher RIGHT_ANSWER + moderate NO_ANSWER," therefore, this embodiment of the invention suggests reporting the error type composition in addition to EM to aid in diagnosis. This evidence can be used to emphasize that the multi-answer evaluation protocol of this invention not only assigns scores but also outputs diagnostic information, making it suitable for locating error patterns such as "over-selection / under-selection / non-convergence" in educational assessments.

[0071] The following provides three examples, corresponding to geometry problems of easy, medium, and hard difficulty levels, respectively. Each example is automatically generated and the answer set is labeled using the method of this invention. Furthermore, the text description of the problem and the geometric diagram are generated under the same formal premise, ensuring consistency between the text and the diagram. The problem format uses a four-option (multiple selections allowed) structure, and the output includes the problem stem (Chinese / English), the geometric diagram, four candidate conclusions, and the set of correct options.

[0072] The complete list of templates and theorems used in this section is shown in the table of the preceding examples. For each example, a summary of parameters related to reproduction and key points for generation / verification are given to illustrate the feasibility and stability of the invention under different levels of difficulty.

[0073] Easy geometry problems (simple difficulty) Figure 7 As shown, this problem involves a basic geometric figure (a circle) and a few constraints (seven points). Its construction depth is relatively low, making it suitable for introductory classroom exercises. The estimated construction depth is N∈[3,5], and the estimated number of geometric points is |P|∈[5,8]. Regarding difficulty attribution, due to the short problem description, few proof steps, and low proof search depth, it is given a simple rating.

[0074] Medium-difficulty geometry problems, such as Figure 8 As shown, with Figure 7 In comparison, this problem contains more construction points (ten points) and combination constraints (circles and triangles being tangent, etc.), resulting in a longer construction sequence and requiring a more complex geometric reasoning chain. The depth of the premise construction ranges from N∈[5,7] (estimated), and the number of geometric points ranges from |P|∈[8,12]. Regarding difficulty attribution, the increased number of points and relationships leads to a greater depth and length of the proof search. Despite symbolic refinement, the problem stem maintains a high information density, earning a medium rating.

[0075] Difficult geometry problems (Hard) Figure 9 As shown, this problem has a deeper premise construction and a more complex combination structure, containing a massive number of elements (two circles, eleven points, involving factors such as tangency). The provable conclusions obtained through reasoning enumeration involve a longer reasoning chain, meeting the requirements of difficult problems for comprehensiveness and multi-step derivation. The premise construction depth ranges N∈[7,8] (estimated), and the number of geometric points ranges |P|[10,14] (estimated). In terms of difficulty attribution, the increased proportion of proof search depth and proof length leads to strict Exact Match better reflecting the differences in actual reasoning ability.

[0076] Compared to traditional methods, the geometry test question generation and evaluation method for personalized teaching in this invention has at least the following main advantages: 1. Guaranteeing 100% solvability and correctness of questions (industrial-grade reliability): Unlike large-scale probabilistic model generation, the solution in this invention adopts a closed-loop logic of "formal construction + symbolic reasoning verification". All options are deduced and verified by a mathematical engine, eliminating the oversights of manual question generation and the illusion problems of generative AI, enabling it to be directly deployed in educational software with extremely high accuracy requirements.

[0077] 2. Support for personalized, tiered difficulty levels: By controlling the construction depth (N), template subsets, and branching coefficients, the solution in this embodiment can quantitatively output questions at different levels: Easy, Medium, and Hard. The tutoring system can dynamically generate variations of specific difficulty based on students' historical performance, truly achieving individualized instruction.

[0078] 3. Strict alignment of text and images enhances user experience: The question text and geometric figures are generated by the same formal data source, which solves the ambiguity caused by the "inconsistency between text and images" common in existing question banks, and is suitable for automated rendering and interaction in mobile apps.

[0079] 4. "Guess-Free" Assessment Improves Diagnostic Accuracy: Employing multiple-answer, multiple-choice questions and a strict matching mechanism effectively suppresses students' guessing behavior. Compared to traditional single-choice questions, the solution in this invention can distinguish between "complete mastery," "partial mastery (missed selection)," and "misunderstanding (incorrect selection)," providing more refined data support for the intelligent tutoring system to recommend subsequent learning paths.

[0080] Another embodiment of the present invention also proposes a geometry test question generation and assessment system for personalized teaching, comprising: The structured geometry scene creation module is used to define a formal description language and use the formal description language to create a structured geometry scene. The logical reasoning verification and option generation module is used to perform logical reasoning-based verification and option generation on the established structured geometric scene. The image-text homology rendering and diagnostic evaluation module is used to perform image-text homology rendering and diagnostic evaluation by verifying the structured geometric scene and the generated options.

[0081] Another embodiment of the present invention provides a computer device comprising: a processor and a computer-readable storage medium; A processor is used to execute computer programs; A computer-readable storage medium storing a computer program, which, when executed by the processor, implements the method for generating and evaluating geometry test questions for personalized teaching.

[0082] Another embodiment of the present invention provides a computer-readable storage medium storing a computer program adapted to be loaded by a processor and executed by the method for generating and evaluating geometry questions for personalized instruction.

[0083] The computer program includes computer program code, which can be in the form of source code, object code, executable file, or some intermediate form. The computer-readable storage medium can include any entity or device capable of carrying the computer program code, a medium, a USB flash drive, a portable hard drive, a magnetic disk, an optical disk, a computer memory, a read-only memory, a random access memory, an electrical carrier signal, a telecommunication signal, and a software distribution medium, etc. It should be noted that the content included in the computer-readable medium can be appropriately added or removed according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to legislation and patent practice, the computer-readable medium does not include electrical carrier signals and telecommunication signals. For ease of explanation, the above content only shows the parts related to the embodiments of the present invention; for specific technical details not disclosed, please refer to the method section of the embodiments of the present invention. This computer-readable storage medium is non-transitory and can be stored in storage devices formed by various electronic devices, enabling the execution process described in the method of the embodiments of the present invention.

[0084] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, the present invention can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0085] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It should be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0086] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0087] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0088] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit it. Although the present invention has been described in detail with reference to the above embodiments, those skilled in the art should understand that modifications or equivalent substitutions can still be made to the specific implementation of the present invention. Any modifications or equivalent substitutions that do not depart from the spirit and scope of the present invention should be covered within the scope of protection of the claims of the present invention.

Claims

1. A method for generating and evaluating geometry test questions for personalized teaching, characterized in that, include: Define a formal description language and use it to build a structured geometric scene; The established structured geometric scene is validated and options are generated based on logical reasoning. Utilize validation to perform text-image homogeneous rendering and diagnostic evaluation based on structured geometric scenes and generated options.

2. The method for generating and evaluating geometry test questions for personalized teaching according to claim 1, characterized in that, The formal description language is defined as follows: The Clause: is used to describe the geometric relationship between a defined point and a geometric object, in the form of f(X1, X2, …, Xn), where f() represents a predefined relationship, and Xi are existing points, i=1,2…n; Construction: An operation that introduces a new point through geometric rules, used to uniquely determine the new point x by one or two clauses; when there are two clauses, the intersection point is taken as x; when only one clause uniquely determines the point, the second clause is omitted. Premise: Consists of a series of constructs arranged in sequence, used to sequentially introduce points and relationships to form a complete geometric configuration; Problem: Consists of a premise and candidate options, with the options being conclusions. Multiple answers are allowed.

3. The method for generating and evaluating geometry test questions for personalized teaching according to claim 1, characterized in that, The structured geometric scene is established as follows: Starting from the basic seed, set the maximum recursion depth N; At each level, templates are randomly sampled from a predefined clause template library, and instantiated using existing points as parameters to generate geometric objects. The predefined clause template library contains a variety of basic geometric structures; Perform geometric constraint verification on the newly generated geometric objects; if the verification passes, add them to the prerequisite sequence. Based on the premise sequence and the set maximum recursion depth N, annealing branch control is adopted. As the depth increases, the number of branches is gradually reduced to form a candidate premise pool.

4. The method for generating and evaluating geometry test questions for personalized teaching according to claim 1, characterized in that, In the step of verifying and generating options based on logical reasoning of the established structured geometric scene, a rule-based symbolic reasoning engine is invoked to perform forward chaining inference on the established structured geometric scene: Match the currently known geometric relations with the axioms and theorem library R; Handling numerical relationships primarily based on angles; Perform forward chain iterations, repeatedly generating new geometric facts, until no new conclusions can be derived or the depth limit is reached.

5. The method for generating and evaluating geometry test questions for personalized teaching according to claim 1, characterized in that, In the step of performing logical reasoning-based verification and option generation on the established structured geometric scene, option generation includes: Perform the following steps to generate test questions for assessment: Calculate the prior difficulty of each provable conclusion to filter the correct options. Select the conclusion with a difficulty score greater than the predetermined value as the correct option to ensure that the question has discrimination. To detect error-prone points, distracting options are constructed by generating incorrect options. The methods for constructing distracting options include relation inversion, numerical perturbation, and equivalence traps. Relation inversion includes replacing parallel with non-parallel, equal with unequal, and parallel with perpendicular. Numerical perturbation modifies the proportion or angle values ​​based on common erroneous calculation logic. Equivalence traps generate conclusions that appear correct but are not valid under certain conditions. It also includes performing synonym rewriting for correct options. The question is allowed to contain multiple correct options for multiple answer verification. The generated candidate options are then input into the inference engine for backtesting. The options that are judged as true by the inference engine are marked as the set of correct answers S, and the rest are incorrect options.

6. The method for generating and evaluating geometry test questions for personalized teaching according to claim 1, characterized in that, In the step of performing image-text homology rendering and diagnostic evaluation using verification through structured geometric scenes and generated options, image-text homology rendering includes: Formal description language is translated into natural language to generate text using rule templates, wherein the rule templates are bilingual templates in Chinese and English. For graphics rendering, the graphics engine is called to read the same set of formalized coordinate data and draw vector graphics. Calculate the placement of the text labels and mark the specific lines or angles involved in the question.

7. The method for generating and evaluating geometry test questions for personalized teaching according to claim 1, characterized in that, In the step of performing image-text homology rendering and diagnostic evaluation using verification through structured geometric scenes and generated options, the diagnostic evaluation includes: Get the collection of submitted answers The evaluation uses Exact Match, Option-level Metrics, and Difficulty Stratification metrics to identify knowledge gaps based on the evaluation results. When evaluating using option-level metrics, if a "missed selection" occurs, it indicates an incomplete reasoning chain; if a "wrong selection" occurs, it indicates conceptual confusion. When evaluating using difficulty stratification metrics, the accuracy rate is calculated separately for Easy, Medium, and Hard levels of the question.

8. A geometry test question generation and assessment system for personalized teaching, characterized in that, include: The structured geometry scene creation module is used to define a formal description language and use the formal description language to create a structured geometry scene. The logical reasoning verification and option generation module is used to perform logical reasoning-based verification and option generation on the established structured geometric scene. The image-text homology rendering and diagnostic evaluation module is used to perform image-text homology rendering and diagnostic evaluation by verifying the structured geometric scene and the generated options.

9. A computer device, characterized in that, include: Processor and computer-readable storage media; A processor is used to execute computer programs; A computer-readable storage medium storing a computer program that, when executed by the processor, implements the method for generating and evaluating geometry test questions for personalized instruction as described in any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program adapted to be loaded by a processor and executed as described in any one of claims 1 to 7, for the generation and evaluation of geometry test questions for personalized instruction.