Content revision method, apparatus, medium, and electronic device

By constructing subject-specific OCR adjustment and correction methods, combined with dynamic rule chains and feedback optimization mechanisms, the problems of misrecognition and cross-disciplinary correction of handwritten OCR in the education field have been solved, achieving efficient and accurate content correction and meeting the needs of rapid adaptation and efficient correction of educational products.

CN122244875APending Publication Date: 2026-06-19NEW ORIENTAL EDUCATION & TECH GRP CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
NEW ORIENTAL EDUCATION & TECH GRP CO LTD
Filing Date
2026-03-30
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In the field of education, the application of OCR technology for handwritten characters has problems such as misrecognition and transmission of handwritten characters and conflicting rules in cross-disciplinary grading, resulting in high misjudgment rates, long rule update cycles, reliance on human experience, and low efficiency.

Method used

By constructing a morphological mapping table, expanding the rule base, similarity decision tree, and character confusion matrix, the OCR recognition results are adjusted and corrected for different disciplines. The positive compensation and reverse defense rules are updated in combination with the feedback results, realizing a dynamic rule chain and feedback optimization mechanism, supporting seamless adaptation to multiple disciplines.

🎯Benefits of technology

It improves the accuracy of OCR post-processing, reduces the probability of misjudgment due to cursive writing and confusion of subject symbols, shortens the rule update cycle, reduces manual maintenance costs, and meets the high-concurrency grading needs of educational products.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122244875A_ABST
    Figure CN122244875A_ABST
Patent Text Reader

Abstract

This disclosure belongs to the field of intelligent education technology and relates to a content correction method, device, medium, and electronic device. The method includes: acquiring the OCR recognition result of the target content to be corrected and the corresponding standard answer, and determining the subject to which the target content belongs; adjusting the OCR recognition result according to the subject to obtain an adjusted result; and correcting the corrected result according to the standard answer to obtain a corrected result. This disclosure adjusts the subject to which the target content belongs, supports seamless adaptation to multi-subject adjustment scenarios, shortens the adaptation cycle for adding new subjects, improves the accuracy of OCR post-processing of target content, and effectively reduces the probability of misjudgment in scenarios such as cursive writing and confusion of subject symbols.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of intelligent education technology, and more specifically, to a content correction method, a content correction device, a non-transitory computer-readable storage medium, and an electronic device. Background Technology

[0002] In recent years, OCR (Optical Character Recognition) technology has begun to be applied in the education field.

[0003] However, the application of OCR technology in the field of education for handwritten characters has problems such as the misrecognition and transmission of handwritten characters, as well as the conflict of rules in cross-disciplinary grading. Summary of the Invention

[0004] To overcome the problems existing in the related technologies, this disclosure provides a content correction method, a content correction device, a non-transitory computer-readable storage medium, and an electronic device.

[0005] According to a first aspect of the present disclosure, a content correction method is provided, the method comprising: Obtain the OCR recognition result of the target content to be corrected and the corresponding standard answer, and determine the subject to which the target content belongs; The adjusted result is obtained by adjusting the OCR recognition result according to the subject. The adjusted result is obtained by correcting the standard answer.

[0006] Optionally, adjusting the OCR recognition result according to the subject to obtain the adjustment result includes: Construct a morphology mapping table, which is used to store the mapping relationship between characters and letters; Based on the subject and the standard answer, the OCR recognition result is transformed according to the morphological mapping table to obtain the adjusted result.

[0007] Optionally, adjusting the OCR recognition result according to the subject to obtain the adjustment result includes: Construct an extended rule base, which is used to store the mapping relationship between abbreviations and extended terms; The OCR recognition result is adjusted by extending the processing based on the subject and the extended rule base.

[0008] Optionally, adjusting the OCR recognition result according to the subject to obtain the adjustment result includes: Construct a similarity decision tree, which is used to store the mapping relationship between characters with similar overall structure; The OCR recognition result is transformed and adjusted based on the subject and the similarity decision tree.

[0009] Optionally, adjusting the OCR recognition result according to the subject to obtain the adjustment result includes: Construct a glyph confusion matrix, and adjust the OCR recognition result based on the subject and the glyph confusion matrix, wherein the glyph confusion matrix is ​​used to store the mapping relationship between characters with similar glyph structures; and / or The adjusted result is obtained by normalizing the OCR recognition result according to the subject.

[0010] Optionally, after correcting the correction result according to the standard answer to obtain the corrected result, the method further includes: Obtain the feedback result of the correction result and the corresponding positive compensation rule, and perform positive compensation processing on the adjustment result according to the feedback result and the normal compensation rule to obtain the compensation result; Obtain the reverse defense rules, and update the compensation result according to the reverse defense rules to obtain the target result of the target content.

[0011] Optionally, after updating the compensation result according to the reverse defense rule to obtain the target result of the target content, the method further includes: A set of feedback rules is constructed based on the correction results and the target results; The adjustment result is mapped according to the feedback rule set to obtain the mapped adjustment result, and then the mapped adjustment result is corrected.

[0012] According to a second aspect of the present disclosure, a content correction apparatus is provided, comprising: The subject identification module is configured to obtain the OCR recognition result of the target content to be corrected and the corresponding standard answer, and to determine the subject to which the target content belongs; The result adjustment module is configured to adjust the OCR recognition result according to the subject to obtain the adjustment result; The result correction module is configured to correct the adjustment result based on the standard answer to obtain a corrected result.

[0013] According to a third aspect of the present disclosure, a non-transitory computer-readable storage medium is provided, having stored thereon computer program instructions that, when executed by a processor, implement the steps of the method described in any of the first aspects of the present disclosure.

[0014] According to a fourth aspect of the present disclosure, an electronic device is provided, comprising: processor; Memory used to store processor-executable instructions; The processor is configured to execute the executable instructions to implement the steps of any of the methods described in the first aspect of this disclosure.

[0015] The technical solutions provided by the embodiments of this disclosure may include the following beneficial effects: In the methods and apparatus provided by the exemplary embodiments of this disclosure, adjustments are made according to the subject to which the target content belongs, supporting seamless adaptation to multi-subject adjustment scenarios, shortening the adaptation cycle of new subjects, improving the accuracy of OCR post-processing of target content, and effectively reducing the probability of misjudgment in scenarios such as cursive writing and confusion of subject symbols.

[0016] Other features and advantages of this disclosure will be described in detail in the following detailed description section.

[0017] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and are not intended to limit this disclosure. Attached Figure Description

[0018] The accompanying drawings are provided to further illustrate the present disclosure and form part of the specification. They are used together with the following detailed description to explain the present disclosure, but do not constitute a limitation thereof. In the drawings: Figure 1 The schematic diagram illustrates a flowchart of a content correction method according to an exemplary embodiment of the present disclosure; Figure 2 The schematic diagram illustrates a flowchart of a first method for adjusting OCR recognition results in an exemplary embodiment of the present disclosure; Figure 3 The schematic diagram illustrates a flowchart of a second method for adjusting OCR recognition results in an exemplary embodiment of this disclosure; Figure 4 The schematic diagram illustrates a flowchart of a third method for adjusting OCR recognition results in an exemplary embodiment of this disclosure; Figure 5 The schematic diagram illustrates a flowchart of a fourth method for adjusting OCR recognition results in an exemplary embodiment of this disclosure; Figure 6 The illustration schematically shows a flowchart of a method for updating adjustment results based on feedback results in an exemplary embodiment of this disclosure; Figure 7 The illustration schematically shows a flowchart of a method for processing according to a set of feedback rules in an exemplary embodiment of this disclosure; Figure 8The schematic diagram illustrates a flowchart of a content correction method in an application scenario of an exemplary embodiment of this disclosure; Figure 9 The illustration shows a flowchart of a dynamic rule engine in an application scenario of an exemplary embodiment of this disclosure; Figure 10 The schematic diagram illustrates the flow chart of the feedback optimization rule module in an application scenario of an exemplary embodiment of this disclosure; Figure 11 The illustration shows a flowchart of the application method of the positive compensation rule in an application scenario of an exemplary embodiment of this disclosure; Figure 12 The schematic diagram illustrates a flowchart of the application method of the reverse defense rule in an application scenario of an exemplary embodiment of this disclosure; Figure 13 The illustration shows a flowchart of the application method of the subject constraint strategy in an application scenario of an exemplary embodiment of this disclosure; Figure 14 This schematic diagram illustrates the structure of a content correction device according to an exemplary embodiment of the present disclosure; Figure 15 The illustration schematically depicts an electronic device for implementing a content correction method according to an exemplary embodiment of the present disclosure; Figure 16 This illustration schematically shows another electronic device for implementing a content correction method in an exemplary embodiment of the present disclosure. Detailed Implementation

[0019] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numerals in different drawings denote the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this disclosure. Rather, they are merely examples of apparatuses and methods consistent with some aspects of this disclosure as detailed in the appended claims.

[0020] It should be noted that all actions involving the acquisition of signals, information, or data in this disclosure are carried out in compliance with the relevant data protection laws and policies of the country where the location is situated, and with authorization from the owner of the relevant device.

[0021] The core requirements for educational products include accuracy, subject scalability, and operational cost control. Accuracy requirements mean that fill-in-the-blank question grading must achieve an accuracy rate of 98%+ (far exceeding the 75% requirement of general OCR scenarios); subject scalability requires support for rapid integration with new subject symbol systems such as mathematics and chemistry (existing solutions average 2 person / month / subject time); and operational cost control requires a rule update cycle of ≤24 hours (traditional development models require 1-2 weeks).

[0022] Feedback gathered from a survey of teachers' daily homework corrections revealed that a significant number of teachers need to manually review electronic assignments due to system errors; and that handwritten assignments had a significantly higher error rate in grading fill-in-the-blank questions in subjects like English, Mathematics, and Physics compared to handwritten assignments in Chinese.

[0023] Student feedback indicates that many students have experienced incorrect penalties for cursive handwriting; and that certain subjects, such as chemistry, have had points deducted for errors in subscript recognition, such as H2O→H2O.

[0024] Therefore, there are three major pain points in general OCR technology for handwritten text in the education field: (1) The problem of misrecognition and transmission of handwritten characters Traditional general-purpose OCR engines treat misrecognition of handwritten characters, such as r / v / i, as independent events, causing errors to be amplified in subsequent stages such as syntax verification and answer comparison.

[0025] (2) Conflicts in rules for interdisciplinary grading The existing system uses a unified rule set to handle multi-disciplinary scenarios, which causes the verification logic of mathematical symbols, such as √, X, and Σ, to interfere with each other with that of English characters. Furthermore, adding new subjects requires the development of a completely new verification module.

[0026] (3) The problem of the lag in manual rule maintenance Hard-coded rules face two major challenges: they cannot respond promptly to new writing error patterns among students, and the adjustment of rule weights depends on the experience of developers.

[0027] To address the issue of misidentification of cursive characters, traditional solutions employ a combination of multiple general-purpose OCR engines, such as Tesseract (a four-dimensional convolutional neural network) and Azure (a deep learning-based optical character recognition service), developing a dedicated character confusion matrix (manually defining rules for replacing easily misidentified characters such as r / v / i), and increasing the intensity of image preprocessing (binarization / denoising), etc.

[0028] However, the cursive writing processing solution has the following drawbacks: Wasted computing resources: Multi-engine voting wastes GPU (Graphics Processing Unit) computing power; High labor costs: Maintaining the character obfuscation matrix requires the involvement of teachers or linguistics experts; Risk of overcorrection: Strong preprocessing can lead to loss of original information, such as image sharpening causing the dot of the letter "i" to disappear.

[0029] For example, when processing the handwritten "very", the "v" in the original image had a slight cursive stroke. The multi-engine voting result was "u" (Tesseract) / "v" (Azure), and "u" was randomly selected. In the end, "very" was misjudged as "uery", resulting in a correction error.

[0030] For interdisciplinary grading problems, the traditional approach is to develop an independent validation module for each subject, use a regular expression whitelist, for example, restricting math questions to only contain characters such as [0-9√XΣ], and manually label the subject category to which the question belongs.

[0031] The subject-specific matching scheme has the following drawbacks: Low development efficiency: The newly added mathematical symbol verification module requires 2 people / month of development work; False positive: The whitelist mechanism rejects correct but uncommon spellings, such as "π" being mistakenly identified as an illegal character. Accumulated classification errors: If the subject label of a question is incorrect, all subsequent validations will fail.

[0032] For issues arising after rule updates, the traditional approach is to regularly (monthly / quarterly) collect misjudgment cases and manually code new rules, gradually deploy the new rules using A / B testing (split testing), and establish an expert committee to review rule changes.

[0033] The rule update scheme has the following drawbacks: Long response time: It takes an average of 7 days from discovering a new error pattern to deploying the rule; The cold start dilemma: New rules require a large number of training samples before deployment; Human bias injection: The new rules rely on the developer's subjective experience.

[0034] This disclosure provides a content correction method. Figure 1 This is a flowchart illustrating a content correction method according to an exemplary embodiment, such as... Figure 1 As shown, the method may include at least the following steps: Step S110. Obtain the OCR recognition result of the target content to be corrected and the corresponding standard answer, and determine the subject to which the target content belongs.

[0035] Step S120. Adjust the OCR recognition results according to the subject to obtain the adjusted results.

[0036] Step S130. Correct the adjustment result according to the standard answer to obtain the corrected result.

[0037] In the exemplary embodiments of this disclosure, adjustments are made based on the subject to which the target content belongs, supporting seamless adaptation to multi-subject adjustment scenarios, shortening the adaptation cycle for adding new subjects, improving the accuracy of OCR post-processing of the target content, and effectively reducing the probability of misjudgment in scenarios such as cursive writing and confusion of subject symbols.

[0038] The following provides a detailed explanation of each step in the content correction method.

[0039] In step S110, the OCR recognition result of the target content to be corrected and the corresponding standard answer are obtained, and the subject to which the target content belongs is determined.

[0040] In the exemplary embodiments of this disclosure, the target content to be corrected can be a student's exam paper or homework that needs to be graded, or other content; this exemplary embodiment does not impose any special limitations on this. Since the content to be corrected is presented in paper form, it can be preprocessed using OCR technology to obtain the corresponding OCR recognition result. The standard answer corresponding to the target content to be corrected can be the exam paper answer or the homework answer, etc.; this exemplary embodiment does not impose any special limitations on this.

[0041] Since subject identifiers for each discipline are pre-defined, the subject to which the target content belongs can be determined based on the subject identifiers presented by the OCR recognition results. Alternatively, the subject can be determined from subject identifiers presented in other content or methods. This exemplary embodiment does not impose any special limitations on this.

[0042] In step S120, the OCR recognition results are adjusted according to the subject to obtain the adjustment results.

[0043] In an exemplary embodiment of this disclosure, after determining the OCR recognition result of the target content and its subject, the obtained OCR recognition result can be adjusted according to the subject.

[0044] In an optional embodiment, Figure 2 A flowchart illustrating the first method for adjusting OCR recognition results is shown, as follows: Figure 2 As shown, the method may include at least the following steps: In step S210, a morphological mapping table is constructed, which is used to store the mapping relationship between characters and letters.

[0045] To address the visual confusion between numbers, symbols, and letters, a morphological mapping table can be constructed to specifically resolve morphological misjudgments in single-character scenarios.

[0046] In step S220, based on the subject and standard answer, the OCR recognition result is converted according to the morphological mapping table to obtain the adjusted result.

[0047] For example, when the standard answer is a single character, the general OCR result is first forced to be mapped to the letter field according to the morphological mapping table to obtain the adjusted result; when the standard answer contains numbers / symbols, the reverse letter-to-symbol compensation mapping is performed according to the morphological mapping table to obtain the adjusted result.

[0048] In an optional embodiment, Figure 3 A flowchart illustrating the second method for adjusting OCR recognition results is shown, as follows: Figure 3 As shown, the method may include at least the following steps: In step S310, an extended rule base is constructed, which is used to store the mapping relationship between abbreviations and extended words.

[0049] By building an extended rule base for abbreviations, the problem of misjudgment of grammatical structure caused by cursive writing can be solved.

[0050] In step S320, the OCR recognition results are extended based on the subject and the extended rule base to obtain the adjusted results.

[0051] The OCR recognition results are expanded with abbreviations to support branch verification for multiple semantic mappings, such as "'d" → "would / had". It's worth noting that, in addition to expanding the OCR recognition results using the extended rule base, the standard answer can also be expanded according to the extended rule base to broaden the representation of the standard answer.

[0052] In an optional embodiment, Figure 4 A flowchart illustrating a third method for adjusting OCR recognition results is shown, such as... Figure 4 As shown, the method may include at least the following steps: In step S410, a similarity decision tree is constructed, which is used to store the mapping relationship between characters with similar overall structure.

[0053] To address the unique character combination confusion phenomenon in handwritten characters, a similarity decision tree is constructed. This similarity decision tree is used to represent the mapping relationship between characters with similar morphology.

[0054] In step S420, the OCR recognition results are transformed according to the subject and similarity decision tree to obtain the adjusted results.

[0055] For example, the cursive features of "rn" and "m" are similar in character structure, so "rn" can be converted to "m" to obtain the adjusted result.

[0056] In an optional embodiment, Figure 5 The flowchart illustrates the fourth method for adjusting OCR recognition results, as shown below. Figure 5As shown, the method may include at least the following steps: In step S510, a character confusion matrix is ​​constructed, and the OCR recognition result is kept as the adjustment result according to the subject and the character confusion matrix. The character confusion matrix is ​​used to store the mapping relationship between characters with similar character structures.

[0057] For example, the glyph confusion matrix may include similarity groups of I / l / 1, similarity groups of 0 / o / O, etc., and this exemplary embodiment does not impose any special limitations on it.

[0058] Therefore, when the OCR recognition result includes similar groups from the character confusion matrix, the OCR recognition result can remain unchanged, that is, characters in the same group are allowed to be considered equivalent.

[0059] In step S520, the OCR recognition results are normalized according to the subject to obtain the adjusted results.

[0060] In addition, the OCR recognition results, which include both uppercase and lowercase letters, can be normalized to unify the OCR recognition results into lowercase representation, thus obtaining the adjusted result.

[0061] In step S130, the adjustment result is corrected according to the standard answer to obtain the corrected result.

[0062] In an exemplary embodiment of this disclosure, after determining the adjustment result corresponding to the OCR recognition result, the adjustment result can be corrected or modified according to the standard answer to obtain the corresponding corrected result.

[0063] In an optional embodiment, Figure 6 A flowchart illustrating the method for updating adjustment results based on feedback is shown, such as... Figure 6 As shown, the method may include at least the following steps: in step S610, obtaining the feedback result of the correction result and the corresponding positive compensation rule, and performing positive compensation processing on the adjustment result according to the feedback result and the normal compensation rule to obtain the compensation result.

[0064] Feedback on the correction can be provided by students through appeals or by teachers through review. For example, it could indicate that the correction was incorrect or that the AI ​​(Artificial Intelligence) made a mistake.

[0065] The positive compensation rule can be as follows: Figure 2-5 The four rules shown may also include other rules, and this exemplary embodiment does not impose any special limitations on them.

[0066] Therefore, if the feedback results indicate that the correction result is incorrect, the adjustment result can be positively compensated according to the positive compensation rule to obtain the compensation result.

[0067] In step S620, the reverse defense rules are obtained, and the compensation results are updated according to the reverse defense rules to obtain the target result of the target content.

[0068] This reverse defense rule is used to prevent error correction. For example, if a student reports that the automatic substitution of "0→o" caused an error in a math problem, a reverse defense rule is generated to prohibit this conversion. Therefore, according to this reverse defense rule, the positive compensation processing that should have been prohibited in the compensation result can be restored to the adjustment result before the substitution, thereby obtaining the corresponding target result.

[0069] In an optional embodiment, Figure 7 A flowchart illustrating the method for processing based on a set of feedback rules is shown, such as... Figure 7 As shown, the method may include at least the following steps: In step S710, a feedback rule set is constructed based on the correction result and the target result.

[0070] according to Figure 6 After optimization as shown, a corresponding set of feedback rules can be constructed to serve as the basis for further adjustments.

[0071] In step S720, the adjustment result is mapped according to the feedback rule set to obtain the mapped adjustment result, so as to correct the mapped adjustment result.

[0072] After constructing the feedback rule set, you can then process the feedback rules. Figure 2-5 The adjustment results obtained by the adjustment method shown are further processed by mapping to obtain the mapped adjustment results. The mapped adjustment results are then corrected according to the standard answer to obtain the final corrected result.

[0073] The content correction method in this embodiment will be described in detail below with reference to an application scenario.

[0074] Figure 8 The flowchart illustrates the content correction method in an application scenario, as shown below. Figure 8 As shown, in step S810, the user makes a modification request.

[0075] The user's grading request can be initiated by either the teacher or the student; this exemplary embodiment does not impose any special limitations on this.

[0076] In step S820, OCR preprocessing is performed.

[0077] OCR technology is used to process test papers, assignments, questions, and other content that users need to grade, and obtain corresponding computer text content.

[0078] In step S830, subject-specific policy routing is performed.

[0079] This part can be determined by the feedback optimization rule module. The subject constraint strategy adjusts the execution of the corresponding rules according to the symbol system of the subject to which the question belongs.

[0080] In step S840, the subject dynamic rule chain engine.

[0081] Figure 9 The diagram illustrates the process flow of a dynamic rule engine in an application scenario, such as... Figure 9 As shown, in step S910, the input layer (the original OCR result with confidence) is used.

[0082] In step S920, the dynamic rule chain engine (subject policy routing of a subject rule chain) is activated.

[0083] In step S930, level one: character-level filtering.

[0084] The technical principle of this layer is to address the visual confusion caused by numbers, symbols, and letters by establishing a heterogeneous character mapping table, focusing on resolving morphological misjudgments in single-character scenarios. A bidirectional conversion mechanism is employed. When the standard answer is a single character, the forced mapping from the general OCR result to the letter field is performed first based on the heterogeneous character mapping table; When the standard answer contains numbers / symbols, a reverse letter-to-symbol compensation mapping is performed based on the heterogeneous character mapping table.

[0085] For example, when the input is the general OCR recognition result "0" and the standard answer is "o", the execution steps can be to first detect that the standard answer has a length of 1 and is a letter, then query the heterogeneous character mapping table to perform the "0" → "o" conversion, and finally compare the conversion result with the standard answer to trigger a short-circuit return if the conversion structure is correct.

[0086] In step S940, level two: abbreviation and cursive form analysis.

[0087] The technical principle behind this layer is to build an abbreviation expansion rule base to resolve grammatical structure misjudgments caused by cursive writing. A two-way expansion strategy is employed. Perform abbreviation expansion on both general OCR results and standard answers; Supports branch validation with multiple semantic mappings (such as "'d" → "would / had").

[0088] For example, when the input is the OCR recognition result "They'd done" and the standard answer is "They had done", the execution steps could be to first expand "d" to "would" or "had", then perform grammatical verification that "They had done" conforms to the perfect tense structure, finally return the corrected result, and record the new mapping rule if the rule does not exist.

[0089] In step S950, level three: special letters and characters that are easily misidentified.

[0090] The technical principle of this layer is to establish a morphological similarity decision tree to address the unique character combination confusion phenomenon in handwritten characters. A triple verification mechanism is introduced, namely: Character structure similarity analysis (e.g., the stroke characteristics of "rn" and "m"); Contextual semantic validity verification; Subject-specific whitelist filtering (e.g., excluding false positives for the letter "g" in math problems).

[0091] The core rule matrix of the morphological similarity decision tree established at this level is shown in Table 1:

[0092] Table 1 For example, when the input is the general OCR recognition result "algebro" and the standard answer is "diego", the execution steps can be to first detect that the "al" combination is at the beginning of the word, then trigger the conversion of "al" to "d" to get "dgebro", and finally verify that "dgebro" is invalid, revert to the original result, and proceed to the next level of processing.

[0093] In step S960, level four: characters and symbols are uniformly downgraded.

[0094] The technical principle of this level is to improve the robustness of the system by using a morphological normalization strategy: On the one hand, character set dimensionality reduction, including full-width to half-width conversion and standardization of special fonts; On the other hand, similarity fuzzy matching includes building a glyph confusion matrix, such as I / l / 1 similarity groups.

[0095] For example, when the input is the general OCR recognition result "H3ll0 W0r1d", the steps can be to first normalize the case, then preserve the numeric symbols (to avoid disrupting the formula structure), and finally mark similar character groups. For example, "0 / o / O" are marked as the same group, and characters in the same group are allowed to be considered equivalent. Therefore, the result after performing this level of downgrading is "h3llowor1d".

[0096] In step S970, level five: real-time iterative optimization rules.

[0097] The technical principle of this level is to add a case processed by the feedback optimization rule module to the rule set for matching and verification.

[0098] Figure 10 The flowchart of the feedback optimization rule module in the application scenario is shown, such as... Figure 10 As shown, in step S1010, feedback data is collected (student appeal / teacher review results).

[0099] In step S1020, feedback data cleaning (noise removal + data volume support) is performed.

[0100] Data cleaning strategies may include noise filtering mechanisms: 1. Exclude isolated cases (error patterns with an occurrence frequency of <0.1%). 2. Fewer than 200 different students and teachers provided feedback on the same test questions with the same grading errors.

[0101] For feedback data, error mode generalization methods can also be used, for example: Input example: General OCR misrecognized the handwritten "t" as "seven". Output mode: { Error Type: Character morphology obfuscation Triggering condition: "Isolated letter and no Chinese characters in the context". "Modify rule": "t seven" } In step S1030, the rule generation engine (error mode generalization + rule encoding) is used.

[0102] The rule generation and encoding logic includes positive compensation rule generation logic, reverse defense rule generation logic, and subject constraint strategy generation logic.

[0103] The principle of the positive compensation rule generation logic is to use prior knowledge or statistical analysis to identify common error patterns in OCR and actively correct easily confused characters.

[0104] For example, to address misidentification caused by cursive handwriting or similar shapes, a one-to-one mapping can be designed, such as "r cursive → v". High-frequency errors can be located by combining character morphology analysis and language models; for example, "10.00" might be recognized as "lO" (the letters L and O) when numbers and alphanumeric characters are mixed. By predefining these compensation relationships, errors can be directly corrected during the OCR output stage. Forward rules often employ a two-way mapping mechanism, mapping from the OCR result to the correct character, and also mapping in reverse based on the character type of the expected answer; for example, when the answer is a letter, the OCR number "0" is mapped to "o".

[0105] Figure 11 The flowchart illustrates the application method of the positive compensation rule in the application scenario, such as... Figure 11 As shown, in step S1110, the OCR recognition result is input.

[0106] In step S1120, the forward rule chain mapping table is searched.

[0107] In step S1130, it is determined whether an error pattern is matched.

[0108] In step S1140, the original result is output.

[0109] In step S1150, character replacement is performed.

[0110] In step S1160, it is determined whether bidirectional mapping verification is enabled.

[0111] In step S1170, the replacement result is output.

[0112] In step S1180, the mapping is optimized according to the answer type.

[0113] In step S1190, the compensated result is output.

[0114] As can be seen, in this engineering approach, positive compensation rules are encapsulated as rule objects. Each rule contains matching conditions, such as strings, regular expressions, and replacement logic (target characters). Each rule is stored in a mapping table to support fast lookups and batch matching. Furthermore, a chain-of-responsibility pattern is employed, where multiple rules are chained together into a processing chain. Each rule object sequentially judges and processes the OCR output; if a match is found, a replacement is executed, and a decision is made on whether to terminate subsequent rules.

[0115] According to the statistical results, it is found that OCR often misidentifies the handwritten cursive "r" as "v", so the mapping {"rv": "v"} is added (indicating that when "rv" appears in the context, it should be corrected to "v"); for another example, for the confusion between numbers and English letters, the mapping {"0": "o"} is established (correcting the number 0 to the letter o). In practical applications, mappings can also be automatically generated according to the domain dictionary: for example, if it is recognized that "chest control" should be "thoracic cavity", then {"chest control": "thoracic cavity"} is added to the confusion dictionary, and subsequent occurrences of "chest control" will be automatically replaced with "thoracic cavity". By setting such preset compensation rules, high-frequency errors such as "10.00 → 10.00", "r cursive → v", etc. can be directly corrected at the OCR output stage, thus significantly improving the recognition accuracy.

[0116] Among them, the principle of the reverse defense rule generation logic is to prevent incorrect corrections (erroneous rules). Based on user feedback, it is found that certain forward corrections should not actually occur. At this time, a "prohibited conversion" rule is defined, that is, if the OCR output and the target both conform to a specific pattern, then this replacement is blocked. For example, if a student feedbacks that the automatic replacement of "0 → o" causes errors in math problems, then a reverse rule is generated to prohibit this conversion. In scenarios with strict semantics and formats, such as number strings, proper nouns, etc., such rules can resist invalid changes. Reverse rules can be regarded as a blacklist or filtering condition: when a certain correction behavior is marked as misjudged, the corresponding blocking rule is added to the rule library.

[0117] Figure 12 The flow diagram shows the application method of the reverse defense rule in the application scenario, as Figure 12 shown, in step S1210, receive the forward rule replacement result.

[0118] In step S1220, search for the reverse rule blacklist.

[0119] In step S1230, determine whether the prohibited condition is hit.

[0120] In step S1240, retain the replacement result.

[0121] In step S1250, roll back to the original OCR output.

[0122] In step S1260, record the interception log.

[0123] In step S1270, output the restored result.

[0124] As can be seen, in engineering, the reverse defense rules are also encapsulated as rule objects. Each rule contains matching conditions and prohibition logic (if a match is found, the replacement is rejected). All rules are uniformly maintained in the `forbidden_map` mapping table, supporting fast filtering and context constraint judgment. After the forward compensation rule chain performs the replacement, the replacement result is immediately submitted to the reverse rule chain for review. If a replacement matches the prohibition condition of a reverse rule, the rollback logic is triggered, restoring the original OCR output before the replacement and marking it as "intercepted". If no rule is matched, the subsequent process continues.

[0125] Based on feedback, it was discovered that the OCR was mistakenly converting the number "0" in mathematical formulas to the letter "O," and this correction was marked as incorrect by students. Therefore, a reverse rule was added to prevent "0→o." For example, when the OCR output "10.00" is incorrectly recognized as "lO," the forward rule might attempt to restore "lO" to "10," but the reverse rule can determine that the conversion between numbers and letters is unreliable in the current context, thus skipping the correction. Similarly, the system can prevent the incorrect replacement of lowercase letters with other characters, such as blocking the conversion of "x→×" in specific mathematical contexts. Through this rule, the system avoids introducing new misjudgments due to over-correction.

[0126] The principle behind the subject-specific constraint strategy generation logic is that the strategy adjusts the execution of corresponding rules based on the symbol system of the subject to which the question belongs. Different subjects have different meanings for characters; some rules that are effective in one subject may cause errors in another. For example, in math problems, "x" is usually treated as a multiplication sign or an unknown and should not be treated as an English letter; numbers in chemical formulas represent subscripts and require special handling. When designing rules, it is advisable to maintain independent rule configurations or switches for each subject. In this way, OCR post-processing in different subject scenarios can isolate their respective rule sets, avoiding interference.

[0127] Figure 13 The flowchart illustrates the application method of the subject constraint strategy in the application scenario, such as... Figure 13 As shown, in step S1310, the subject type of the question is identified.

[0128] In step S1320, the discipline-specific rule chain is loaded.

[0129] In step S1330, the subject strategy execution rule chain is executed.

[0130] In step S1340, a positive compensation rule is applied.

[0131] In step S1350, the reverse defense rule is verified.

[0132] In step S1360, the final result is output.

[0133] In implementation, subject identifiers (question type tags) can be used to load or filter rule sets. One approach is to predefine a `subject_rules` dictionary, mapping subject names to lists of enabled / disabled rules. Based on the input question type, the system enters the corresponding "sandbox" mode: in this mode, specific compensation or defense rules are enabled for that subject, and rules that conflict with other subjects are disabled. For example, the current subject is checked before the matching process: if it's mathematics, the conversion from English letters to mathematical symbols is excluded from the forward rule set; if it's chemistry, correction rules for chemical formula scripts are added. The matching process is illustrated below: first, the subject environment is determined and the rule set is adjusted; then, the remaining rules are executed according to the aforementioned forward / reverse process.

[0134] In mathematical scenarios, certain letter conversion rules are disabled to avoid misinterpreting mathematical symbols as English letters. For example, the misinterpretation of the letter "g" is handled through a subject-specific whitelist filtering mechanism, excluding cases where "g" is mistakenly identified as a non-symbol in mathematical mode. Similarly, for the distinction between the multiplication sign "×" and lowercase x, automatic conversion from "x→×" is disabled. In chemistry scenarios, subscript validation rules for chemical formulas are activated: if the OCR output text contains element symbols plus a string of numbers like "H2O", the rule automatically converts the numbers to the subscript form "H2O" to conform to chemical formula standards (e.g., students lose points for using "H2O→H2O" in their assignments). Furthermore, custom rule libraries are provided for subjects like physics and English; for example, conversion rules between "Ω" and "O" are allowed in physics problems, while such rules are disabled in language arts scenarios.

[0135] In step S1040, the rule distribution system and effect analysis (manual review + gray-scale verification / full deployment) are performed.

[0136] The grayscale verification phase includes: Review of identical revision requests: Extract 5% of historical revision requests, execute the new rule engine, and mark the difference in accuracy between the old and new rules; Key sample capture: High-frequency correction items with a rule trigger rate >80% are manually labeled and verified (500 samples are checked daily).

[0137] The full deployment mechanism can gradually increase to 100% within 24 hours, and automatically revert to the stable version in case of anomalies; the long-term monitoring system can continuously track the rule hit rate after deployment, and manually review the accuracy rate by sampling different proportions of full data every month.

[0138] In step S980, the output layer is (corrected result / original result).

[0139] In step S850, the correction results are output.

[0140] After obtaining the corrected or original results, you can grade them according to the standard answer to output the graded results.

[0141] In step S860, the feedback optimization rule module is used.

[0142] The execution method of this feedback optimization rule module is as follows: Figure 10 As shown, it will not be elaborated further here.

[0143] Based on this, this solution addresses the issue of misidentification of cursive writing by implementing a five-level rule chain: Error hierarchical interception: Eliminate scanning noise in the second and third levels, and establish a decision tree for easily confused characters; Short-circuit detection mechanism: When a certain layer reaches the confidence threshold, subsequent processing is terminated to block the error propagation path.

[0144] To address the issue of interdisciplinary grading, a subject-based constraint strategy is used to configure independent identifiers and verification parameter groups for different subjects; dynamic rules can be loaded to automatically switch rule chain engines such as mathematical formula detection and chemical formula analysis based on the question type.

[0145] To address the issue of lagging rule updates, a feedback-driven rule evolution is introduced: student error correction feedback is automatically converted into <error pattern, compensation rule> mapping pairs.

[0146] Regarding the necessity of technical implementation, the dynamic rule chain solves the problem of morphological differences between handwritten and printed characters (the morphology of handwritten r / v is highly overlapping); the feedback optimization mechanism addresses the unique error pattern evolution in educational scenarios (multiple new abbreviations / symbols emerge every year); and the subject constraint strategy meets the isolation requirements of multidisciplinary symbol systems and satisfies the differences in feature dimensions between mathematical symbol libraries and chemical formula parsers.

[0147] This solution overcomes the challenge of adaptability in general handwriting OCR for educational scenarios. It achieves layer-by-layer error filtering through dynamic rule chains and forms a self-evolution capability through feedback mechanisms, providing technical support for improving the accuracy of general handwriting OCR in intelligent education products and enabling rapid subject transfer.

[0148] Therefore, the system architecture of this solution consists of two core components or modules: Dynamic Rule Chain Engine: Proposes an interpretable post-processing method based on configurable rule chains to perform five-layer progressive correction processing on OCR results; Feedback Optimization Rule Module: Introduces a closed loop of student / teacher feedback, and uses feedback to evaluate the effectiveness of rules and update and iterate their implementation.

[0149] In the exemplary embodiments disclosed herein, accuracy is improved: through real-time combination optimization of dynamic rule chains, the accuracy of general OCR post-processing for educational handwriting is significantly improved, effectively reducing the probability of misjudgment in scenarios such as cursive writing and confusion of subject symbols.

[0150] Efficiency breakthrough: The rule generation and update cycle is significantly shortened compared to the industry average, achieving near real-time rule iteration and meeting the needs of high-concurrency batch modification.

[0151] Cost advantage: The fully automated rule iteration mechanism significantly reduces manual intervention, resulting in an order-of-magnitude decrease in operation and maintenance costs.

[0152] Cross-disciplinary compatibility: Supports seamless adaptation to multi-disciplinary grading scenarios, and the adaptation cycle for adding new subjects is shortened to an extremely short time.

[0153] Furthermore, in an exemplary embodiment of this disclosure, a content correction apparatus is also provided. Figure 14 A schematic diagram of the content correction device is shown, such as Figure 14 As shown, the content correction device 1400 may include: a subject recognition module 1410, a result adjustment module 1420, and a result correction module 1430. Wherein: The subject identification module 1410 is configured to obtain the OCR recognition result of the target content to be corrected and the corresponding standard answer, and to determine the subject to which the target content belongs; The result adjustment module 1420 is configured to adjust the OCR recognition result according to the subject to obtain the adjustment result; The result correction module 1430 is configured to correct the adjustment result according to the standard answer to obtain a corrected result.

[0154] In an exemplary embodiment of the present invention, the result adjustment module 1420 is configured to: Construct a morphology mapping table, which is used to store the mapping relationship between characters and letters; Based on the subject and the standard answer, the OCR recognition result is transformed according to the morphological mapping table to obtain the adjusted result.

[0155] In an exemplary embodiment of the present invention, the result adjustment module 1420 is configured to: Construct an extended rule base, which is used to store the mapping relationship between abbreviations and extended terms; The OCR recognition result is adjusted by extending the processing based on the subject and the extended rule base.

[0156] In an exemplary embodiment of the present invention, the result adjustment module 1420 is configured to: Construct a similarity decision tree, which is used to store the mapping relationship between characters with similar overall structure; The OCR recognition result is transformed and adjusted based on the subject and the similarity decision tree.

[0157] In an exemplary embodiment of the present invention, the result adjustment module 1420 is configured to: Construct a glyph confusion matrix, and adjust the OCR recognition result based on the subject and the glyph confusion matrix, wherein the glyph confusion matrix is ​​used to store the mapping relationship between characters with similar glyph structures; and / or The adjusted result is obtained by normalizing the OCR recognition result according to the subject.

[0158] In an exemplary embodiment of the present invention, the content correction device 1400 is further configured to: Obtain the feedback result of the correction result and the corresponding positive compensation rule, and perform positive compensation processing on the adjustment result according to the feedback result and the normal compensation rule to obtain the compensation result; Obtain the reverse defense rules, and update the compensation result according to the reverse defense rules to obtain the target result of the target content.

[0159] In an exemplary embodiment of the present invention, the content correction device 1400 is further configured to: A set of feedback rules is constructed based on the correction results and the target results; The adjustment result is mapped according to the feedback rule set to obtain the mapped adjustment result, and then the mapped adjustment result is corrected.

[0160] Regarding the apparatus in the above embodiments, the specific manner in which each module performs its operation has been described in detail in the embodiments related to the method, and will not be elaborated upon here.

[0161] Figure 15 This is a block diagram illustrating an electronic device 1500 according to an exemplary embodiment. For example... Figure 15 As shown, the electronic device 1500 may include: a processor 1501 and a memory 1502. The electronic device 1500 may also include one or more of a multimedia component 1503, an input / output (I / O) interface 1504, and a communication component 1505.

[0162] The processor 1501 controls the overall operation of the electronic device 1500 to complete all or part of the steps in the content correction method described above. The memory 1502 stores various types of data to support the operation of the electronic device 1500. This data may include, for example, instructions for any application or method operating on the electronic device 1500, and application-related data such as contact data, sent and received messages, pictures, audio, video, etc. The memory 1502 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk. Multimedia component 1503 may include a screen and an audio component. The screen may be, for example, a touchscreen, and the audio component is used to output and / or input audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signals may be further stored in memory 1502 or transmitted via communication component 1505. The audio component also includes at least one speaker for outputting audio signals. I / O interface 1504 provides an interface between processor 1501 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual or physical buttons. Communication component 1505 is used for wired or wireless communication between the electronic device 1500 and other devices. Wireless communication, such as Wi-Fi, Bluetooth, Near Field Communication (NFC), 2G, 3G, 4G, NB-IoT, eMTC, or other 5G technologies, or combinations thereof, is not limited here. Therefore, the corresponding communication component 1505 may include: a Wi-Fi module, a Bluetooth module, an NFC module, etc.

[0163] In an exemplary embodiment, the electronic device 1500 may be implemented by one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components to perform the content correction method described above.

[0164] In another exemplary embodiment, a computer-readable storage medium including program instructions is also provided, which, when executed by a processor, implement the steps of the content correction method described above. For example, the computer-readable storage medium may be the memory 1502 including the program instructions described above, which may be executed by the processor 1501 of the electronic device 1500 to complete the content correction method described above.

[0165] Figure 16 This is a block diagram illustrating an electronic device 1600 according to an exemplary embodiment. For example, the electronic device 1600 may be provided as a server. (Refer to...) Figure 16 The electronic device 1600 includes a processor 1622, which may be one or more, and a memory 1632 for storing computer programs executable by the processor 1622. The computer program stored in the memory 1632 may include one or more modules, each corresponding to a set of instructions. Furthermore, the processor 1622 may be configured to execute the computer program to perform the aforementioned content correction method.

[0166] Additionally, the electronic device 1600 may also include a power supply component 1626 and a communication component 1650. The power supply component 1626 can be configured to perform power management of the electronic device 1600, and the communication component 1650 can be configured to enable communication of the electronic device 1600, such as wired or wireless communication. Furthermore, the electronic device 1600 may also include an input / output (I / O) interface 1658. The electronic device 1600 can operate on an operating system stored in memory 1632.

[0167] In another exemplary embodiment, a computer-readable storage medium including program instructions is also provided, which, when executed by a processor, implement the steps of the content correction method described above. For example, the non-transitory computer-readable storage medium may be the memory 1632 including the program instructions described above, which may be executed by the processor 1622 of the electronic device 1600 to complete the content correction method described above.

[0168] In another exemplary embodiment, a computer program product is also provided, the computer program product comprising a computer program executable by a programmable device, the computer program having a code portion for performing the above-described content correction method when executed by the programmable device.

[0169] The preferred embodiments of this disclosure have been described in detail above with reference to the accompanying drawings. However, this disclosure is not limited to the specific details of the above embodiments. Within the scope of the technical concept of this disclosure, various simple modifications can be made to the technical solutions of this disclosure, and these simple modifications all fall within the protection scope of this disclosure.

[0170] It should also be noted that the various specific technical features described in the above embodiments can be combined in any suitable manner without contradiction. To avoid unnecessary repetition, this disclosure will not describe the various possible combinations separately.

[0171] Furthermore, various different embodiments of this disclosure can be combined in any way, as long as they do not violate the spirit of this disclosure, they should also be regarded as the content disclosed in this disclosure.

Claims

1. A content correction method, characterized in that, The method includes: Obtain the OCR recognition result of the target content to be corrected and the corresponding standard answer, and determine the subject to which the target content belongs; The adjusted result is obtained by adjusting the OCR recognition result according to the subject. The adjusted result is obtained by correcting the standard answer.

2. The content correction method according to claim 1, characterized in that, The adjustment of the OCR recognition result according to the subject to obtain the adjustment result includes: Construct a morphology mapping table, which is used to store the mapping relationship between characters and letters; Based on the subject and the standard answer, the OCR recognition result is transformed according to the morphological mapping table to obtain the adjusted result.

3. The content correction method according to claim 1, characterized in that, The adjustment of the OCR recognition result according to the subject to obtain the adjustment result includes: Construct an extended rule base, which is used to store the mapping relationship between abbreviations and extended terms; The OCR recognition result is adjusted by extending the processing based on the subject and the extended rule base.

4. The content correction method according to claim 1, characterized in that, The adjustment of the OCR recognition result according to the subject to obtain the adjustment result includes: Construct a similarity decision tree, which is used to store the mapping relationship between characters with similar overall structure; The OCR recognition result is transformed and adjusted based on the subject and the similarity decision tree.

5. The content correction method according to claim 1, characterized in that, The adjustment of the OCR recognition result according to the subject to obtain the adjustment result includes: Construct a glyph confusion matrix, and adjust the OCR recognition result based on the subject and the glyph confusion matrix, wherein the glyph confusion matrix is ​​used to store the mapping relationship between characters with similar glyph structures; and / or The adjusted result is obtained by normalizing the OCR recognition result according to the subject.

6. The content correction method according to claim 1, characterized in that, After correcting the correction result according to the standard answer to obtain the corrected result, the method further includes: Obtain the feedback result of the correction result and the corresponding positive compensation rule, and perform positive compensation processing on the adjustment result according to the feedback result and the normal compensation rule to obtain the compensation result; Obtain the reverse defense rules, and update the compensation result according to the reverse defense rules to obtain the target result of the target content.

7. The content correction method according to claim 6, characterized in that, After updating the compensation result according to the reverse defense rule to obtain the target result of the target content, the method further includes: A set of feedback rules is constructed based on the correction results and the target results; The adjustment result is mapped according to the feedback rule set to obtain the mapped adjustment result, and then the mapped adjustment result is corrected.

8. A content correction device, characterized in that, include: The subject identification module is configured to obtain the OCR recognition result of the target content to be corrected and the corresponding standard answer, and to determine the subject to which the target content belongs; The result adjustment module is configured to adjust the OCR recognition result according to the subject to obtain the adjustment result; The result correction module is configured to correct the adjustment result based on the standard answer to obtain a corrected result.

9. A non-transitory computer-readable storage medium having a computer program stored thereon, characterized in that, When executed by a processor, the program implements the steps of the method described in any one of claims 1-7.

10. An electronic device, characterized in that, include: A memory on which computer programs are stored; A processor for executing the computer program in the memory to implement the steps of the method according to any one of claims 1-7.