An artificial intelligence-based pen test data analysis system

The AI-based written test data analysis system solves the problem of misjudgment caused by illegible handwriting and character interference in written tests. By combining feature extraction, identification and segmentation, and confusion response modules, it improves the accuracy of character recognition and the efficiency of data processing.

CN121482809BActive Publication Date: 2026-06-23BEIJING ZHIDIAN MIJIN EDUCATION TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING ZHIDIAN MIJIN EDUCATION TECHNOLOGY CO LTD
Filing Date
2025-11-07
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing technologies struggle to handle complex situations such as illegible handwriting and interference between characters in written test data analysis, leading to misjudgments or omissions and reducing the accuracy of character recognition.

Method used

An AI-based written test data analysis system is adopted. The feature extraction module extracts character continuity features and axial offset, the identification and segmentation module calculates character recognition degree representation parameters, the target analysis module identifies compact character regions and analyzes confusion intensity, and the confusion response module performs verification marking and optimization processing.

Benefits of technology

It improved the accuracy of character recognition, enabled structured management of written test data, and significantly improved the efficiency and reliability of large-scale written test data processing.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121482809B_ABST
    Figure CN121482809B_ABST
Patent Text Reader

Abstract

The present application relates to the field of written test data analysis, and particularly relates to a written test data analysis system based on artificial intelligence, which is provided with a feature extraction module, which is used to call character scanning data of a plurality of target files to extract character continuity features of single-line characters corresponding to each target file; an identification and division module connected with the feature extraction module, which is used to combine the character continuity features and the axial offset degree of adjacent characters in the single-column characters to calculate a character identification degree representation parameter for the target file to divide the identification category of the corresponding target file; a target analysis module connected with the identification and division module, which is used to identify and analyze the target file based on the identification category; and a confusion response module connected with the target analysis module, which is used to verify and mark the character compact area in response to the determination result of the target analysis module. The present application efficiently processes the character confusion problem, improves the character recognition accuracy, and realizes the structured management of written test data.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of written test data analysis, and more particularly to a written test data analysis system based on artificial intelligence. Background Technology

[0002] In fields such as education and examinations, written tests serve as a core method for objectively measuring knowledge mastery and ability levels, with widespread application scenarios and continuously growing demand. As the number of participants expands exponentially, the volume of written test data increases exponentially. Traditional models relying on manual grading, data entry, and analysis are no longer adequate for the efficient, accurate, and structured processing requirements of modern written tests, thus driving the iteration of written test data analysis technology towards automation and intelligence.

[0003] When faced with diverse writing styles and complex paper layout interference in actual written test scenarios, the limitations of technology gradually become apparent. The rise of artificial intelligence technology has provided new solutions for written test data analysis. For the core pre-processing stage of written test data, it provides a complete technical solution for accurate character recognition and structured conversion, in order to deal with the complex character shapes and paper layout interference problems in actual written test scenarios.

[0004] Chinese Patent Application Publication No. CN118644362A discloses an intelligent automated marking method and system based on generative AI technology, belonging to the field of intelligent analysis technology. The method includes: a user entering a project exam, drawing questions according to preset configuration rules, generating and pushing the corresponding exam paper to the user; after the user submits the exam paper, obtaining the user's answer data; calling a preset generative AI model, assembling prompts, and requesting the generative AI model to perform marking analysis on the exam answer data, calculating the user's answer accuracy rate; calculating the user's current knowledge mastery level and progress statistics for the project according to preset project configuration, generating the final marking result; the above project configuration includes question drawing rule configuration and progress statistics rule configuration. This invention can achieve accurate and efficient automated marking, intuitively and comprehensively showcasing employees' knowledge mastery, thereby helping enterprises quickly train employees.

[0005] However, the following problems still exist in the existing technology.

[0006] Relying on single character morphological features for written test data makes it difficult to handle complex situations such as illegible handwriting and interference between characters in written tests, which can easily lead to misjudgment or omission, reducing the accuracy of character recognition. Summary of the Invention

[0007] To address this, the present invention provides an artificial intelligence-based written test data analysis system to overcome the problems in existing technologies that rely heavily on single character morphological features for written test data recognition, making it difficult to handle complex situations such as illegible handwriting and interference between characters in written tests, which can easily lead to misjudgments or omissions and reduce the accuracy of character recognition.

[0008] To achieve the above objectives, the present invention provides an artificial intelligence-based written test data analysis system, comprising:

[0009] The feature extraction module is used to call character scanning data of several target files to extract the character continuity features of the corresponding single line characters of each target file. The character continuity features include the average gap distance between adjacent characters and the maximum difference in gap distance between characters.

[0010] The identification and classification module, which is connected to the feature extraction module, is used to combine the character continuity features and the axial offset of adjacent characters in a single column of characters to calculate the character recognition degree characterization parameters for the target file, so as to classify the corresponding target file into recognition categories.

[0011] A target analysis module, connected to the identification and segmentation module, is used to perform identification and analysis on the target file based on the identified category, including:

[0012] The target file is divided into several character text regions. Based on the character compactness of each character text region, a character compact region is identified. Based on the overlapping area of ​​characters in the character compact region and the proportion of ink halo range at the character edge, the confusion intensity characterization value of the character compact region is analyzed to determine whether the character compact region meets the confusion identification benchmark.

[0013] Alternatively, the character scan data corresponding to the target file can be entered into the scan database;

[0014] An obfuscation response module, connected to the target analysis module, performs verification marking on compact character regions in response to the determination result of the target analysis module.

[0015] Furthermore, the identification and segmentation module is used to calculate character recognition degree representation parameters for the target file, including:

[0016] The sum of the ratio of the average gap distance threshold to the average gap distance of adjacent characters corresponding to a single line character and the ratio of the maximum difference in gap distance of a character to the maximum difference in gap distance threshold is used as the first character recognition feature.

[0017] The ratio of the axial offset of adjacent characters in a single column to the axial offset threshold is used as the second character recognition feature.

[0018] The first character recognition feature and the second character recognition feature are weighted and summed to determine the character recognition degree representation parameter.

[0019] Furthermore, the identification and classification module is used to classify the corresponding target file into identification categories, including:

[0020] If the character recognition level parameter of the target file is greater than or equal to the character recognition level parameter threshold, then the target file is classified as a low recognition category;

[0021] If the character recognition level parameter of the target file is less than the character recognition level parameter threshold, then the target file is classified as a high-recognition category.

[0022] Furthermore, the target analysis module is used to identify and analyze the target file, including:

[0023] If the target file is a low-recognition category, the target file is divided into several character text regions. Based on the character compactness of each character text region, the character compact region is identified. Based on the overlapping area of ​​characters in the character compact region and the proportion of ink halo range at the character edge, the confusion intensity characterization value of the character compact region is analyzed to determine whether the character compact region meets the confusion identification benchmark.

[0024] If the target file is of a high-recognition category, the character scan data corresponding to the target file will be entered into the scan database.

[0025] Furthermore, the target analysis module is used to identify compact regions of characters, including:

[0026] If any character text region has a character compactness greater than the character compactness threshold, then the character text region is identified as the character compact region.

[0027] Furthermore, the target analysis module is used to analyze the confusion intensity characterization value of the compact character region, including:

[0028] The ratio of the overlapping area of ​​characters to the overlapping area threshold is used as the first confusion intensity feature;

[0029] The ratio of the proportion of ink halo range at the character edge to the threshold of the proportion of ink halo range at the character edge is used as the second confusion intensity feature;

[0030] The sum of the first confusion intensity feature and the second confusion intensity feature is used as the confusion intensity characterization value.

[0031] Furthermore, the target analysis module is used to determine whether the compact character region meets the obfuscation identification criteria, including:

[0032] If the confusion intensity characterization value of a compact character region is greater than or equal to the confusion intensity characterization threshold, then the compact character region is determined to not meet the confusion identification benchmark.

[0033] If the confusion intensity characterization value of a compact character region is less than the confusion intensity characterization threshold, then the compact character region is determined to meet the confusion identification benchmark.

[0034] Furthermore, the obfuscation response module responds to the determination result of the target analysis module, including:

[0035] If the target analysis module determines that the result meets the confusion identification benchmark, it compares and matches with the character database to determine the matching degree of characters in the compact character area, locks the confused character, and determines whether to perform elimination optimization verification on the confused character based on the area of ​​the distorted region of the confused character and the nearby sample.

[0036] If the target analysis module determines that the target does not meet the confusion identification criteria, then the compact region of the character is marked.

[0037] Furthermore, the obfuscation response module is used to lock obfuscated characters, including:

[0038] Used to match close samples corresponding to several characters within a compact region of characters;

[0039] Used to identify obfuscated characters based on their matching degree with corresponding close samples;

[0040] If any character has a matching degree less than the matching degree threshold with its corresponding close sample, then the character is locked as the obfuscated character.

[0041] Furthermore, the obfuscation response module is used to determine whether to perform elimination optimization verification on the obfuscated characters, including:

[0042] If the area of ​​the distorted region between the obfuscated character and the adjacent sample is less than the threshold of the distorted region area, then it is determined that the obfuscated character should be eliminated and optimized.

[0043] The elimination optimization verification includes eliminating ink halo range recognition characters and performing contextual verification.

[0044] Compared with existing technologies, this invention extracts the character continuity features of corresponding single-line characters from several target files by calling character scan data from each target file; combines the character continuity features with the axial offset of adjacent characters in a single column to calculate character recognition degree representation parameters for the target files, thereby classifying the corresponding target files into recognition categories; based on the recognition categories, the target files are analyzed for recognition; and in response to the results of the recognition analysis, compact character regions are verified and marked. This invention efficiently handles character confusion problems, improves character recognition accuracy, and enables structured management of written test data.

[0045] In particular, this invention incorporates an identification and segmentation module that focuses on the spatial relationship features of characters in written test data. This module accurately captures common character spacing anomalies in written tests, avoiding misidentification due to irregular character arrangement and improving the initial identification accuracy of low-quality handwritten text. It quantifies the compactness or dispersion of the overall character arrangement by using the average gap distance between adjacent characters in a single line, avoiding misjudgments of overall recognition due to test-taker writing habits. Simultaneously, it combines the maximum difference in character gap distances to capture abnormal fluctuations in local character arrangement, supplementing local features that cannot be covered by a single average distance, thus improving the identification accuracy of irregular handwritten text. Furthermore, it introduces the axial offset of adjacent characters in a single column to further cover vertical arrangement issues, filling the gap in traditional techniques that only focus on horizontal features and reducing identification omissions caused by vertical offset. Therefore, this invention uses character recognition degree characterization parameters to characterize the clarity of characters in the target file, quantifies the combined impact of character arrangement regularity and axial alignment on recognition difficulty, and provides a quantitative basis for subsequent classification of recognition categories. For large-scale written tests, such as campus recruitment written tests and qualification examinations, batch data processing scenarios can significantly improve the overall processing efficiency of the system.

[0046] In particular, in real-world scenarios involving character overlap and ink halo interference, to avoid misjudgment based on a single feature and thus improve the targeting of abnormal character processing, this invention sets up a target analysis module. For target files with low recognition categories, it first filters areas requiring key attention based on character compactness, excluding areas with loose character arrangement and no risk of confusion, concentrating computing power on high-risk areas, significantly reducing the system's processing load. Then, it combines the character overlap area and the proportion of edge ink halo range to analyze the confusion intensity characterization value, forming a multi-dimensional quantitative assessment of the degree of character confusion. Specifically, the character overlap area quantifies the physical degree of character intersection, reflecting the degree to which overlap hinders recognition; the proportion of edge ink halo range captures the blurring of character edges caused by ink bleeding, quantifying the degree of damage caused by ink bleeding to character edges, thereby covering most character confusion scenarios in written tests and improving the comprehensiveness of confusion judgment. Therefore, this invention analyzes the confusion intensity characterization value to represent the degree of recognition confusion caused by character interference, and quantifies the comprehensive negative impact of interference factors on character recognizability, forming an assessment of the overall confusion of compact character regions. This further reflects the difficulty of recognizing compact character regions and provides data support for subsequent determination of whether compact character regions meet the confusion identification benchmark. This invention can improve the targeting of abnormal character processing.

[0047] In particular, this invention incorporates an obfuscation response module to optimize compact regions of characters that meet the obfuscation identification criteria. Optimization is only performed on characters with low matching degrees to close samples and small distortion areas, balancing correction accuracy and processing efficiency to achieve precise identification and differentiated optimization verification of obfuscated characters. Simultaneously, contextual verification is integrated into the process, supplementing the judgment criteria from an overall semantic level and further confirming character content by combining the contextual logic. This dual protection significantly reduces the recognition error of obfuscated characters and improves the credibility of written test data. Attached Figure Description

[0048] Figure 1 A functional block diagram of an artificial intelligence-based written test data analysis system according to an embodiment of the invention;

[0049] Figure 2 A logic decision diagram for classifying corresponding target files according to embodiments of the invention;

[0050] Figure 3 A logic diagram for determining whether a compact region of characters meets the confusion identification criteria in an embodiment of the invention;

[0051] Figure 4 This is a logic diagram for determining whether to perform elimination optimization verification on obfuscated characters in an embodiment of the invention. Detailed Implementation

[0052] To make the objectives and advantages of the present invention clearer, the present invention will be further described below with reference to embodiments; it should be understood that the specific embodiments described herein are merely for explaining the present invention and are not intended to limit the present invention.

[0053] Preferred embodiments of the present invention will now be described with reference to the accompanying drawings. Those skilled in the art should understand that these embodiments are merely illustrative of the technical principles of the present invention and are not intended to limit the scope of protection of the present invention.

[0054] It should be noted that in the description of this invention, the terms "upper", "lower", "left", "right", "inner", "outer", etc., which indicate directions or positional relationships, are based on the directions or positional relationships shown in the accompanying drawings. This is only for the convenience of description and is not intended to indicate or imply that the device or element must have a specific orientation, or be constructed and operated in a specific orientation. Therefore, it should not be construed as a limitation of this invention.

[0055] Furthermore, it should be noted that, in the description of this invention, unless otherwise explicitly specified and limited, the terms "installation," "connection," and "linking" should be interpreted broadly. For example, they can refer to a fixed connection, a detachable connection, or an integral connection; they can refer to a mechanical connection or an electrical connection; they can refer to a direct connection or an indirect connection through an intermediate medium; and they can refer to the internal connection of two components. Those skilled in the art can understand the specific meaning of the above terms in this invention according to the specific circumstances.

[0056] Please see Figure 1 The diagram shown is a functional block diagram of an artificial intelligence-based written test data analysis system according to an embodiment of the present invention. The artificial intelligence-based written test data analysis system according to an embodiment of the present invention includes:

[0057] The feature extraction module is used to call character scanning data of several target files to extract the character continuity features of the corresponding single line characters of each target file. The character continuity features include the average gap distance between adjacent characters and the maximum difference in gap distance between characters.

[0058] The identification and classification module, which is connected to the feature extraction module, is used to combine the character continuity features and the axial offset of adjacent characters in a single column of characters to calculate the character recognition degree characterization parameters for the target file, so as to classify the corresponding target file into recognition categories.

[0059] A target analysis module, connected to the identification and segmentation module, is used to perform identification and analysis on the target file based on the identified category, including:

[0060] The target file is divided into several character text regions. Based on the character compactness of each character text region, a character compact region is identified. Based on the overlapping area of ​​characters in the character compact region and the proportion of ink halo range at the character edge, the confusion intensity characterization value of the character compact region is analyzed to determine whether the character compact region meets the confusion identification benchmark.

[0061] Alternatively, the character scan data corresponding to the target file can be entered into the scan database;

[0062] An obfuscation response module, connected to the target analysis module, performs verification marking on compact character regions in response to the determination result of the target analysis module.

[0063] Specifically, the character scanning data includes the character continuity features of a single line of characters, the axial offset of adjacent characters in a single column of characters, the character compactness of the character text area, the overlapping area of ​​characters within the compact character area and the proportion of ink halo range at the character edge, the matching degree of characters within the compact character area, and the area of ​​the distorted region between confused characters and nearby samples, etc.

[0064] Understandably, collecting relevant data from the target file requires some preprocessing, including basic scanning and preprocessing to ensure data validity. In practice, a high-resolution scanner converts the paper exam document into a digital image, such as TIFF format, preserving character edge details and ink gradation. The image is then converted into a black-and-white binary image to eliminate interference from paper background color, stains, etc., retaining only character pixels and background pixels. A Gaussian filtering algorithm filters out noise generated during scanning while preserving character edges, such as ink gradation areas. Furthermore, to ensure the acquisition of relevant features such as the axial offset of adjacent characters in a single column, Hough transform can be used to detect the text line tilt angle and automatically correct the exam paper tilt; this will not be elaborated further.

[0065] Specifically, the method for acquiring the character continuity features of a single line of characters is not specifically limited. A connected component analysis algorithm can be used to identify the independent connected regions of each character within the line, marking the horizontal boundary coordinates of each character, i.e., the minimum X-coordinate of the left edge and the maximum X-coordinate of the right edge. For example, the left edge X=100 and the right edge X=120 of the character "A"; the left edge X=125 and the right edge X=145 of the next character "B". Therefore, the "discontinuity distance" between the two characters is 125-120=5 (pixels). Furthermore, the discontinuity distances of all adjacent characters within the line are statistically analyzed, and the average value is calculated as the average discontinuity distance. Among all discontinuity distances in the line, the "maximum value" and the "minimum value" are identified, and the difference between the maximum and minimum values ​​is calculated to obtain the maximum difference in discontinuity distance.

[0066] The "vertical boundary coordinates" are marked using connected component analysis, which are the minimum Y-coordinate of the top edge of the character and the maximum Y-coordinate of the bottom edge. A vertical reference point is defined for each character, for example, the Y-coordinate of the character's vertical center point, i.e., (top edge Y + bottom edge Y) / 2. Then, using the vertical reference point of the first character in a column as the reference reference Y, the absolute value of the deviation between the vertical reference point of each subsequent character and Y0 is calculated. For example, for character 2, the reference Y = 200, Y0 = 180, and the offset is 20. The average offset of all adjacent characters in a column is then calculated as the axial offset of the adjacent characters.

[0067] Specifically, there is no specific limitation on the method of acquiring the character compactness of the text region. The total number of characters in each text region can be counted by connecting component analysis. The area of ​​the rectangle after merging the windows is taken as the region area, and the ratio of the total number of characters to the region area is taken as the character compactness.

[0068] The character text region is divided using a sliding window method. A fixed-size window, such as 50×50 pixels, is used, but it can be adjusted according to the font size. The window slides across the preprocessed image, and the area containing character pixels within the window is determined as the character text region. Then, adjacent dense windows are merged to form a continuous character text region, such as a section of answer content for a subjective question, to avoid the window from cutting off the complete text. This will not be elaborated further.

[0069] Specifically, there is no specific limitation on the method for collecting the overlapping area of ​​characters within a compact character region. The number of overlapping pixels in the connected regions of adjacent characters can be calculated through pixel-level intersection analysis. The overlapping area is the product of the number of overlapping pixels and the area of ​​a single pixel, which will not be elaborated further.

[0070] Specifically, there is no specific limitation on the acquisition method of the proportion of ink halo range at the character edge. By statistically analyzing the total pixel area of ​​a single character, i.e. all pixels in the connected domain, and the pixel area of ​​the ink halo, i.e. the pixels in the gray-scale transition area at the character edge, the ratio of the ink halo pixel area to the total pixel area is taken as the proportion of ink halo range. The semi-transparent transition pixel area at the character edge in the binarized image is taken as the ink halo area.

[0071] Specifically, for acquiring the matching degree of characters within a compact character region, the characters are compared and analyzed with a pre-constructed standard character sample database. The contour coordinates of several characters within the compact character region are extracted using an edge detection algorithm. The contour coordinates are then compared with the closest similar sample in the standard character sample database using Euclidean distance to determine the matching degree. The standard character sample database includes the standard contour coordinates of several character samples.

[0072] Specifically, the method for acquiring the area of ​​the distorted region between the confused character and the corresponding close sample is not specifically limited. The confused character and the corresponding close sample can be compared using pixel-level difference analysis. The binarized images of the two characters are superimposed, and the non-overlapping area of ​​the two binarized images is taken as the distorted region. The number of pixels in the distorted region is then determined as the area of ​​the distorted region.

[0073] Specifically, there are no restrictions on the specific structure of the feature extraction module, the identification and segmentation module, the target analysis module, and the confusion response module. Each module or its units can be composed of logic components or combinations of logic components. Logic components include field-programmable processors, computers, or microprocessors in computers.

[0074] Specifically, the identification and segmentation module is used to calculate character recognition degree representation parameters for the target file, including:

[0075] The sum of the ratio of the average gap distance threshold to the average gap distance of adjacent characters corresponding to a single line character and the ratio of the maximum difference in gap distance of a character to the maximum difference in gap distance threshold is used as the first character recognition feature.

[0076] The ratio of the axial offset of adjacent characters in a single column to the axial offset threshold is used as the second character recognition feature.

[0077] The first character recognition feature and the second character recognition feature are weighted and summed to determine the character recognition degree representation parameter.

[0078] Specifically, the two features involved in character continuity focus on the horizontal arrangement pattern of characters in a single line, directly determining whether characters are overly compact, scattered, or have abnormal local spacing. Furthermore, abnormal horizontal spacing is the primary factor causing recognition confusion in written tests; for example, characters sticking together in illegible handwriting or spacing fluctuations caused by irregular writing directly interfere with character boundary recognition. The vertical axial offset of a single column of characters reflects the alignment of the characters' vertical arrangement, only affecting the overall regularity of the character arrangement and not directly disrupting the character's boundary and shape. The interference of vertical offset on recognition can be corrected through subsequent image processing, and the probability of misjudgment caused by vertical offset in actual written tests is far lower than that of abnormal horizontal spacing. Therefore, the first character recognition feature calculated based on character continuity is given a higher weight coefficient, set to 0.6, and correspondingly, the weight coefficient of the second character recognition feature calculated based on the axial offset of adjacent characters in a single column is set to 0.4.

[0079] In this embodiment, the purpose of setting the average gap distance threshold, the maximum gap distance difference threshold, and the axial offset threshold is to characterize situations where the legibility of characters in the target file is low. By acquiring historical character scanning data from several target files that have completed the marking process, the average gap distance data, the maximum gap distance difference data, and the axial offset data of adjacent characters in a single line are retrieved. The mean average gap distance, the mean maximum gap distance difference, and the mean axial offset are then calculated and used as baseline values ​​under normal conditions. Based on the purpose of setting the above three thresholds... The average break distance threshold is determined as the product of the mean average break distance and the first offset coefficient; the maximum difference threshold of the break distance is determined as the product of the mean maximum difference of the break distance and the second offset coefficient; and the axial offset threshold is determined as the product of the mean axial offset and the third offset coefficient. The first offset coefficient is selected within the interval [0.9, 0.95], preferably 0.9; the second offset coefficient is selected within the interval [1.1, 1.15], preferably 1.1; and the third offset coefficient is selected within the interval [1.15, 1.2], preferably 1.15.

[0080] Specifically, this invention includes an identification and segmentation module that focuses on the spatial relationship features of characters in written test data. This module accurately captures common character spacing anomalies in written tests, such as characters being too compact or too scattered in a candidate's handwriting, avoiding misidentification due to irregular character arrangement and improving the initial recognition accuracy of low-quality handwritten text, such as illegible handwriting. The average spacing between adjacent characters in a single line is used to quantify the overall compactness or dispersion of the character arrangement, avoiding misjudgments of overall recognition due to candidate writing habits, such as dense handwriting or excessive spacing. Simultaneously, the maximum difference in character spacing is used to capture abnormal fluctuations in local character arrangement, such as two characters being too close while other spacing is normal, supplementing local features that a single average distance cannot cover, thus improving the recognition accuracy of irregular handwritten text. Furthermore, the axial offset of adjacent characters in a single column is introduced to further cover vertical arrangement issues, such as overall character tilt and vertical character alignment deviations, filling the gap in traditional technologies that only focus on horizontal features and reducing recognition omissions caused by vertical offset. Therefore, this invention uses character recognition degree characterization parameters to characterize the clarity of characters in the target file, quantifies the combined impact of character arrangement regularity and axial alignment on recognition difficulty, and provides a quantitative basis for subsequent classification of recognition categories. For large-scale written tests, such as campus recruitment written tests and qualification examinations, batch data processing scenarios can significantly improve the overall processing efficiency of the system.

[0081] Specifically, please refer to Figure 2As shown, this is a logical decision diagram for classifying the identification categories of corresponding target files according to an embodiment of the present invention. The identification and classification module is used to classify the identification categories of corresponding target files, including:

[0082] If the character recognition level parameter of the target file is greater than or equal to the character recognition level parameter threshold, then the target file is classified as a low recognition category;

[0083] If the character recognition level parameter of the target file is less than the character recognition level parameter threshold, then the target file is classified as a high-recognition category.

[0084] The character recognition degree characterization parameter threshold is predetermined. The character recognition degree characterization parameter threshold is determined by calculating the following conditions: the average gap distance threshold is equal to the average gap distance of adjacent characters corresponding to a single line character; the maximum difference in gap distance between characters is equal to the maximum difference in gap distance threshold; and the axial offset of adjacent characters in a single column is equal to the axial offset threshold.

[0085] Specifically, the target analysis module is used to identify and analyze the target file, including:

[0086] If the target file is a low-recognition category, the target file is divided into several character text regions. Based on the character compactness of each character text region, the character compact region is identified. Based on the overlapping area of ​​characters in the character compact region and the proportion of ink halo range at the character edge, the confusion intensity characterization value of the character compact region is analyzed to determine whether the character compact region meets the confusion identification benchmark.

[0087] If the target file is of a high-recognition category, the character scan data corresponding to the target file will be entered into the scan database.

[0088] Specifically, the target analysis module is used to identify compact regions of characters, including:

[0089] If any character text region has a character compactness greater than the character compactness threshold, then the character text region is identified as the character compact region.

[0090] In this embodiment, the purpose of setting the character compactness threshold is to characterize the situation where the distance between characters in the character text area is relatively close, which has a strong interference effect on character recognition. By obtaining historical character scanning data of several target documents that have completed the marking process, calling the character compactness data corresponding to the character text area, and solving the character compactness mean, based on the purpose of setting the character compactness threshold, the character compactness threshold is determined as the product of the character compactness mean and the compactness offset coefficient, wherein the compactness offset coefficient is selected in the interval [1.2, 1.4], and is preferably 1.2 in practice.

[0091] Specifically, the target analysis module is used to analyze the confusion intensity characterization value of the compact character region, including:

[0092] The ratio of the overlapping area of ​​characters to the overlapping area threshold is used as the first confusion intensity feature;

[0093] The ratio of the proportion of ink halo range at the character edge to the threshold of the proportion of ink halo range at the character edge is used as the second confusion intensity feature;

[0094] The sum of the first confusion intensity feature and the second confusion intensity feature is used as the confusion intensity characterization value.

[0095] In this embodiment, the purpose of setting the overlap area threshold and the character edge ink halo range proportion threshold is to characterize the situation where the recognition confusion caused by character interference is strong. By obtaining the historical character scanning data of several target documents that have completed the marking process, the overlap area data and the character edge ink halo range proportion data are called to solve the mean overlap area and the mean character edge ink halo range proportion, and the corresponding values ​​are used as the benchmark values ​​under normal conditions. Based on the purpose of setting the above two thresholds, the overlap area threshold is determined to be the product of the mean overlap area and the overlap deviation coefficient, and the character edge ink halo range proportion threshold is determined to be the product of the mean character edge ink halo range proportion and the ink halo deviation coefficient. The overlap deviation coefficient is selected in the interval [1.1, 1.2], and is preferably 1.1 in the implementation. The ink halo deviation coefficient is selected in the interval [1.1, 1.2], and is preferably 1.1 in the implementation.

[0096] Specifically, please refer to Figure 3 As shown, this is a logic diagram for determining whether a compact character region meets the obfuscation identification criteria according to an embodiment of the present invention. The target analysis module is used to determine whether the compact character region meets the obfuscation identification criteria, including:

[0097] If the confusion intensity characterization value of a compact character region is greater than or equal to the confusion intensity characterization threshold, then the compact character region is determined to not meet the confusion identification benchmark.

[0098] If the confusion intensity characterization value of a compact character region is less than the confusion intensity characterization threshold, then the compact character region is determined to meet the confusion identification benchmark.

[0099] The confusion intensity characterization threshold is predetermined. The confusion intensity characterization value calculated when the overlapping area of ​​the character is equal to the overlapping area threshold and the proportion of the ink halo range at the character edge is equal to the proportion of the ink halo range at the character edge threshold is determined as the confusion intensity characterization threshold.

[0100] Specifically, in real-world scenarios, complex situations arise due to character overlap and ink smear interference, such as blurred character edges caused by ink bleeding during pen writing and overlapping of adjacent characters. To avoid misjudgment due to a single feature, such as only recognizing overlap while ignoring the influence of ink smears, and thus improve the targeting of abnormal character processing, this invention sets up a target analysis module. For target files with low recognition categories, it first filters areas requiring key attention based on character compactness, excluding areas with loose character arrangement and no risk of confusion, such as multiple-choice options with normal spacing and blank answer areas. It then concentrates computational power on high-risk areas, such as densely written areas for subjective questions, significantly reducing the system's processing load. Furthermore, it combines the character overlap area and the proportion of edge ink smear range to analyze the confusion intensity characterization value, forming a multi-dimensional quantitative assessment of the degree of character confusion. The character overlap area quantifies the physical degree of character intersection, reflecting the degree to which overlap hinders recognition. The proportion of edge ink smear range captures the blurring of character edges caused by ink smearing, quantifying the degree of damage caused by ink smearing to character edges, thereby covering most character confusion scenarios in written tests and improving the comprehensiveness of confusion judgment. Therefore, this invention analyzes the confusion intensity characterization value to represent the degree of recognition confusion caused by character interference, and quantifies the comprehensive negative impact of interference factors on character recognizability, forming an assessment of the overall confusion of compact character regions. This further reflects the difficulty of recognizing compact character regions and provides data support for subsequent determination of whether compact character regions meet the confusion identification benchmark. This invention can improve the targeting of abnormal character processing.

[0101] Specifically, the obfuscation response module responds to the determination result of the target analysis module, including:

[0102] If the target analysis module determines that the result meets the confusion identification benchmark, it compares and matches with the character database to determine the matching degree of characters in the compact character area, locks the confused character, and determines whether to perform elimination optimization verification on the confused character based on the area of ​​the distorted region of the confused character and the nearby sample.

[0103] If the target analysis module determines that the target does not meet the confusion identification criteria, then the compact region of the character is marked.

[0104] Specifically, the obfuscation response module is used to lock obfuscated characters, including:

[0105] Used to match close samples corresponding to several characters within a compact region of characters;

[0106] Used to identify obfuscated characters based on their matching degree with corresponding close samples;

[0107] If any character has a matching degree less than the matching degree threshold with its corresponding close sample, then the character is locked as the obfuscated character.

[0108] In this embodiment, the purpose of setting the matching degree threshold is to characterize the situation where the character recognition is difficult. By obtaining the historical character scanning data of several target files that have completed the marking process, calling the matching degree data of any character with the corresponding close sample, solving the average matching degree, and using the average matching degree as the matching degree threshold based on the purpose of setting the matching degree threshold.

[0109] Specifically, please refer to Figure 4 As shown, this is a logic diagram for determining whether to perform elimination optimization verification on obfuscated characters according to an embodiment of the present invention. The obfuscation response module is used to determine whether to perform elimination optimization verification on the obfuscated characters, including:

[0110] If the area of ​​the distorted region between the obfuscated character and the adjacent sample is less than the threshold of the distorted region area, then it is determined that the obfuscated character should be eliminated and optimized.

[0111] If the area of ​​the distorted region of the obfuscated character and the adjacent sample is greater than or equal to the distorted region area threshold, it is determined that no elimination optimization verification is needed for the obfuscated character.

[0112] The elimination optimization verification includes eliminating ink halo range recognition characters and performing contextual verification.

[0113] In this embodiment, the purpose of setting the distortion area threshold is to characterize situations where the difference between confused characters and close samples is large, seriously affecting the comparison and recognition of characters. By acquiring historical character scanning data of several target documents that have completed the marking process, the distortion area data of confused characters and close samples are called to solve for the mean of the distortion area. Based on the purpose of setting the distortion area threshold, the distortion area threshold is determined to be the product of the mean of the distortion area and the distortion deviation coefficient. The distortion deviation coefficient is selected in the interval [0.9, 0.95], and is preferably 0.95 in practice.

[0114] Specifically, this invention incorporates an obfuscation response module to optimize compact regions of characters that meet the obfuscation identification criteria. Optimization is only performed on characters with low matching degrees to close samples and small distortion areas, balancing correction accuracy and processing efficiency to achieve precise identification and differentiated optimization verification of obfuscated characters. Simultaneously, contextual verification is integrated into the process, supplementing the judgment criteria from an overall semantic level and further confirming the character content by combining the contextual logic. This dual protection significantly reduces the recognition error of obfuscated characters and improves the credibility of written test data.

[0115] The technical solution of the present invention has been described above with reference to the preferred embodiments shown in the accompanying drawings. However, it will be readily understood by those skilled in the art that the scope of protection of the present invention is obviously not limited to these specific embodiments. Without departing from the principles of the present invention, those skilled in the art can make equivalent changes or substitutions to the relevant technical features, and the technical solutions after these changes or substitutions will all fall within the scope of protection of the present invention.

Claims

1. A written test data analysis system based on artificial intelligence, characterized in that, include: The feature extraction module is used to call character scanning data of several target files to extract the character continuity features of the corresponding single line characters of each target file. The character continuity features include the average gap distance between adjacent characters and the maximum difference in gap distance between characters. The identification and classification module, which is connected to the feature extraction module, is used to combine the character continuity features and the axial offset of adjacent characters in a single column of characters to calculate the character recognition degree characterization parameters for the target file, so as to classify the corresponding target file into recognition categories. A target analysis module, connected to the identification and segmentation module, is used to perform identification and analysis on the target file based on the identified category, including: If the target file is a low-recognition category, the target file is divided into several character text regions. Based on the character compactness of each character text region, the character compact region is identified. Based on the overlapping area of ​​characters in the character compact region and the proportion of ink halo range at the character edge, the confusion intensity characterization value of the character compact region is analyzed to determine whether the character compact region meets the confusion identification benchmark. If the target file is a high-recognition category, the character scan data corresponding to the target file will be entered into the scan database; An obfuscation response module, connected to the target analysis module, performs verification marking on compact character regions in response to the determination result of the target analysis module.

2. The artificial intelligence-based written test data analysis system according to claim 1, characterized in that, The identification and segmentation module is used to calculate character recognition degree representation parameters for the target file, including: The sum of the ratio of the average gap distance threshold to the average gap distance of adjacent characters corresponding to a single line character and the ratio of the maximum difference in gap distance of a character to the maximum difference in gap distance threshold is used as the first character recognition feature. The ratio of the axial offset of adjacent characters in a single column to the axial offset threshold is used as the second character recognition feature. The first character recognition feature and the second character recognition feature are weighted and summed to determine the character recognition degree representation parameter.

3. The artificial intelligence-based written test data analysis system according to claim 2, characterized in that, The identification and classification module is used to classify the corresponding target file into identification categories, including: If the character recognition level parameter of the target file is greater than or equal to the character recognition level parameter threshold, then the target file is classified as a low recognition category; If the character recognition level parameter of the target file is less than the character recognition level parameter threshold, then the target file is classified as a high-recognition category.

4. The artificial intelligence-based written test data analysis system according to claim 1, characterized in that, The target analysis module is used to identify compact regions of characters, including: If any character text region has a character compactness greater than the character compactness threshold, then the character text region is identified as the character compact region.

5. The artificial intelligence-based written test data analysis system according to claim 1, characterized in that, The target analysis module is used to analyze the confusion intensity characterization value of the compact region of the character, including: The ratio of the overlapping area of ​​characters to the overlapping area threshold is used as the first confusion intensity feature; The ratio of the proportion of ink halo range at the character edge to the threshold of the proportion of ink halo range at the character edge is used as the second confusion intensity feature; The sum of the first confusion intensity feature and the second confusion intensity feature is used as the confusion intensity characterization value.

6. The artificial intelligence-based written test data analysis system according to claim 5, characterized in that, The target analysis module is used to determine whether the compact character region meets the obfuscation identification criteria, including: If the confusion intensity characterization value of a compact character region is greater than or equal to the confusion intensity characterization threshold, then the compact character region is determined to not meet the confusion identification benchmark. If the confusion intensity characterization value of a compact character region is less than the confusion intensity characterization threshold, then the compact character region is determined to meet the confusion identification benchmark.

7. The artificial intelligence-based written test data analysis system according to claim 6, characterized in that, The obfuscation response module responds to the determination result of the target analysis module, including: If the target analysis module determines that the result meets the confusion identification benchmark, it compares and matches with the character database to determine the matching degree of characters in the compact character area, locks the confused character, and determines whether to perform elimination optimization verification on the confused character based on the area of ​​the distorted region of the confused character and the nearby sample. If the target analysis module determines that the target does not meet the confusion identification criteria, then the compact region of the character is marked.

8. The artificial intelligence-based written test data analysis system according to claim 7, characterized in that, The obfuscation response module is used to lock obfuscated characters, including: Used to match close samples corresponding to several characters within a compact region of characters; Used to identify obfuscated characters based on their matching degree with corresponding close samples; If any character has a matching degree less than the matching degree threshold with its corresponding close sample, then the character is locked as the obfuscated character.

9. The artificial intelligence-based written test data analysis system according to claim 8, characterized in that, The obfuscation response module is used to determine whether to perform elimination optimization verification on the obfuscated characters, including: If the area of ​​the distorted region between the obfuscated character and the adjacent sample is less than the threshold of the distorted region area, then it is determined that the obfuscated character should be eliminated and optimized. The elimination optimization verification includes eliminating ink halo range recognition characters and performing contextual verification.