Density clustering-based abnormal response group detection method and device, and storage medium
By using density clustering algorithms to identify the error rate and similarity among test takers, abnormal cheating groups with high confidence levels are screened out, solving the problem that existing technologies cannot identify large-scale cheating groups and ensuring the fairness of the examination.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- 人力资源和社会保障部人事考试中心
- Filing Date
- 2026-04-29
- Publication Date
- 2026-06-19
AI Technical Summary
Current technology cannot accurately identify potential cheating groups among a large number of test takers, which makes it impossible to guarantee the fairness of the exam.
A density-based clustering approach is adopted. By determining the error rate among candidates, a similarity evaluation index is constructed. The density clustering algorithm is used to identify potential cheating groups, and the intra-cluster similarity evaluation index is used to screen out abnormal groups with higher confidence.
It enables reliable and accurate identification of potential cheating groups, ensuring the fairness of the exam, reducing accidental interference, and is suitable for complex cross-exam cheating detection scenarios.
Smart Images

Figure CN122241272A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of cheating detection technology, and in particular to a method, apparatus and storage medium for detecting abnormal answer groups based on density clustering. Background Technology
[0002] Existing technologies have proposed methods to prevent cheating, including intelligent software monitoring, physical device interception, and enhanced traditional invigilation. However, all of these methods have certain drawbacks. For example, these cheating detection methods typically focus on the examination room itself, analyzing individual student behavior, and cannot effectively analyze cross-examination room cheating. Furthermore, cross-examination room cheating usually involves multiple students and often utilizes modern communication methods to transmit cheating information between multiple examination rooms. Therefore, cross-examination room cheating is far more complex than single-room cheating and allows students in different examination rooms to answer questions simultaneously, increasing the secrecy of the cheating.
[0003] For example, since the above-mentioned cheating detection methods usually analyze the individual behavior of candidates, that is, they can only determine whether a certain candidate has cheated, without considering whether the candidate is associated with a potential abnormal cheating group. Therefore, in most cases, traditional cheating detection methods can only detect one or more cheating candidates, but cannot detect whether the cheating candidate is in an abnormal cheating group or other cheating candidates in the abnormal cheating group.
[0004] The publication number is CN118397658A, and the title is "A Method, Electronic Device, and Storage Medium for Detecting Cheating in Student Examinations." This method involves setting a straight line on the desktop within the screen of each examination seat, configuring a camera equipped with an intelligent algorithm to acquire student information, calculating scores for hand and head features, determining scores for looking down and looking around, and combining multiple criteria to determine whether the examinee has engaged in suspected cheating behavior and sending an alert.
[0005] The publication number is CN112613436A, and the title is "Method and Apparatus for Detecting Cheating in Examinations." It acquires frame images of a video to be detected, judges changes in body movements between frames, obtains a two-dimensional examination vector to determine whether a candidate is cheating, and combines a preset threshold with a trained human detection model and a cheating classifier to detect cheating in examinations.
[0006] There is currently no effective solution to the technical problem that existing cheating detection methods cannot accurately identify potential cheating groups among a large number of test takers, thus failing to guarantee the fairness of the examination. Summary of the Invention
[0007] The embodiments of this disclosure provide a method, apparatus, and storage medium for detecting abnormal answer groups based on density clustering, so as to at least solve the technical problem that existing cheating detection methods in the prior art cannot accurately identify whether there are potential abnormal cheating groups among a large number of candidates, thus failing to guarantee the fairness of the examination.
[0008] According to one aspect of the present disclosure, a method for detecting abnormal answer groups based on density clustering is provided, comprising: determining the error-same-answer rate between any two candidates among a plurality of candidates, and constructing a similarity evaluation index based on the error-same-answer rate, wherein the similarity evaluation index is used to indicate the proportion of questions in which any two candidates answer the same question incorrectly and have the same incorrect answer to the proportion of questions in which any two candidates have incorrect answers to the same question; identifying a first abnormal group among the plurality of candidates using a density clustering algorithm, wherein the first abnormal group is used to indicate a potential cheating group; determining an intra-cluster similarity evaluation index corresponding to each first abnormal group, wherein the intra-cluster similarity evaluation index is used to indicate the average value of multiple similarity evaluation indices corresponding to any two candidates within the first abnormal group; and filtering out a second abnormal group from each first abnormal group based on the intra-cluster similarity evaluation index corresponding to each first abnormal group and the similarity evaluation index corresponding to any two candidates within each first abnormal group, wherein the confidence level of the second abnormal group is greater than the confidence level of the first abnormal group.
[0009] According to another aspect of the present disclosure, a storage medium is also provided, the storage medium including a stored program, wherein, when the program is executed, a processor performs any of the methods described above.
[0010] According to another aspect of the present disclosure, an abnormal answer group detection device based on density clustering is also provided, comprising: a first similarity evaluation index determination module, configured to determine the error rate between any two candidates among a plurality of candidates, and construct a similarity evaluation index based on the error rate, wherein the similarity evaluation index is used to indicate the proportion of questions in which any two candidates answer the same question incorrectly and have the same incorrect answer to the proportion of questions in which any two candidates have incorrect answers to the same question; a first abnormal group identification module, configured to identify a first abnormal group among a plurality of candidates using a density clustering algorithm, wherein the first abnormal group is used to indicate a potential cheating group; a second similarity evaluation index determination module, configured to determine the intra-cluster similarity evaluation index corresponding to each first abnormal group, wherein the intra-cluster similarity evaluation index is used to indicate the average value of multiple similarity evaluation indices corresponding to any two candidates within the first abnormal group; and an abnormal group filtering module, configured to filter out a second abnormal group from each first abnormal group based on the intra-cluster similarity evaluation index corresponding to each first abnormal group and the similarity evaluation index corresponding to any two candidates within each first abnormal group, wherein the confidence level of the second abnormal group is greater than the confidence level of the first abnormal group.
[0011] According to another aspect of the present disclosure, an anomalous answer group detection device based on density clustering is also provided, comprising: a processor; and a memory connected to the processor, configured to provide the processor with instructions to process the following steps: determining the error rate between any two candidates among a plurality of candidates, and constructing a similarity evaluation index based on the error rate, wherein the similarity evaluation index is used to indicate the proportion of questions in which any two candidates answer the same question incorrectly and have the same incorrect answer to the proportion of questions in which any two candidates have incorrect answers to the same question; identifying a first anomalous group among the plurality of candidates using a density clustering algorithm, wherein the first anomalous group is used to indicate a potential cheating group; determining an intra-cluster similarity evaluation index corresponding to each first anomalous group, wherein the intra-cluster similarity evaluation index is used to indicate the average value of multiple similarity evaluation indices corresponding to any two candidates within the first anomalous group; and filtering out a second anomalous group from each first anomalous group based on the intra-cluster similarity evaluation index corresponding to each first anomalous group and the similarity evaluation index corresponding to any two candidates within each first anomalous group, wherein the confidence level of the second anomalous group is greater than the confidence level of the first anomalous group.
[0012] This application discloses an abnormal answer group detection method based on density clustering. First, the processor determines the error rate (ORR) between any two candidates from a pool of candidates and constructs a similarity evaluation index based on the ORR. Then, the processor uses a density clustering algorithm to identify first abnormal groups among the candidates. Further, the processor determines multiple intra-cluster similarity evaluation indices for each first abnormal group. Finally, based on the multiple intra-cluster similarity evaluation indices for each first abnormal group and the similarity evaluation index corresponding to any two candidates in each first abnormal group, the processor selects second abnormal groups from each first abnormal group.
[0013] As described above, this application selects the rand index, based on the error-similarity rate, as the similarity evaluation metric, thus providing a more rigorous similarity judgment and being suitable for anomaly group detection in large-scale data. That is, it not only considers the consistency of answers on incorrect questions but also integrates various error scenarios, making the detection results more reliable and accurate, and reducing interference from randomness. Therefore, through these improvements, the density clustering algorithm can more effectively identify potential abnormal cheating groups, making it more suitable for complex examination scenarios.
[0014] Secondly, this application reasonably sets the neighborhood radius (i.e., eps) and minimum number of samples (i.e., min-samples) in the density clustering algorithm, which directly determines which candidate among multiple candidates is identified as a potential anomalous group (i.e., the first anomalous group). This enables the density clustering algorithm to effectively aggregate highly similar candidate groups and isolate independent candidate data that do not belong to any group.
[0015] Finally, the clustered candidate groups obtained using the density clustering algorithm (i.e., the first abnormal group) are further evaluated using intra-cluster error and similarity metrics to ultimately determine the risk candidate group (i.e., the second abnormal group). This more stringent similarity measurement further filters out the second abnormal group, which exhibits a high degree of similarity in overall answer patterns.
[0016] Therefore, this application achieves the technical effect of reliably and accurately determining whether there is a potential abnormal cheating group among multiple candidates, and ensuring the fairness of the examination. This solves the technical problem that existing cheating detection methods cannot accurately identify whether there is a potential abnormal cheating group among a large number of candidates, thus failing to guarantee the fairness of the examination. Attached Figure Description
[0017] The accompanying drawings, which are included to provide a further understanding of this disclosure and form part of this application, illustrate exemplary embodiments of this disclosure and are used to explain this disclosure, but do not constitute an undue limitation of this disclosure. In the drawings: Figure 1 This is a hardware structure block diagram of a computing device for implementing the method according to Embodiment 1 of this disclosure; Figure 2 This is a schematic diagram of the abnormal response group detection system based on density clustering according to Embodiment 1 of this application; Figure 3 This is a flowchart of the abnormal response group detection method based on density clustering according to Embodiment 1 of this application; Figure 4 This is a schematic diagram of the anomaly group detection device based on density clustering according to Embodiment 2 of this application; Figure 5 This is a schematic diagram of the anomaly group detection device based on density clustering according to Embodiment 3 of this application. Detailed Implementation
[0018] To enable those skilled in the art to better understand the technical solutions of this disclosure, the technical solutions of the embodiments of this disclosure will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are merely some embodiments of this disclosure, and not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of this disclosure without creative effort should fall within the scope of protection of this disclosure.
[0019] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this disclosure are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this disclosure described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.
[0020] Example 1
[0021] According to this embodiment, a method embodiment for detecting abnormal response groups based on density clustering is provided. It should be noted that the steps shown in the flowchart in the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions. Furthermore, although a logical order is shown in the flowchart, in some cases, the steps shown or described may be executed in a different order than that shown here.
[0022] The method embodiments provided in this example can be executed on mobile terminals, computer terminals, servers, or similar computing devices. Figure 1 A hardware block diagram of a computing device for implementing density-based clustering-based anomaly response group detection is shown. Figure 1 As shown, a computing device may include one or more processors (processors may include, but are not limited to, microprocessors such as MCUs or programmable logic devices such as FPGAs), a memory for storing data, a transmission device for communication functions, and an input / output interface. The memory, transmission device, and input / output interface are connected to the processor via a bus. In addition, it may also include a display, keyboard, and cursor control device connected to the input / output interface. Those skilled in the art will understand that... Figure 1 The structure shown is for illustrative purposes only and does not limit the structure of the aforementioned electronic device. For example, a computing device may also include... Figure 1 The more or fewer components shown, or having the same Figure 1 The different configurations shown.
[0023] It should be noted that the aforementioned one or more processors and / or other data processing circuits are generally referred to herein as "data processing circuits". These data processing circuits may be embodied, in whole or in part, in software, hardware, firmware, or any other combination thereof. Furthermore, the data processing circuits may be a single, independent processing module, or may be integrated, in whole or in part, into any other element in a computing device. As involved in the embodiments of this disclosure, the data processing circuits serve as processor control (e.g., selection of a variable resistor termination path connected to an interface).
[0024] The memory can be used to store software programs and modules of application software, such as the program instructions / data storage device corresponding to the density clustering-based abnormal response group detection method in this embodiment of the present disclosure. The processor executes various functional applications and data processing by running the software programs and modules stored in the memory, thereby implementing the density clustering-based abnormal response group detection method of the aforementioned application. The memory may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory may further include memory remotely located relative to the processor, and these remote memories can be connected to the computing device via a network. Examples of such networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
[0025] The transmission device is used to receive or send data via a network. Specific examples of the network described above may include a wireless network provided by the computing device's communication provider. In one example, the transmission device includes a Network Interface Controller (NIC), which can connect to other network devices via a base station to communicate with the Internet. In another example, the transmission device may be a Radio Frequency (RF) module used for wireless communication with the Internet.
[0026] The display can be, for example, a touchscreen liquid crystal display (LCD), which allows users to interact with the user interface of the computing device.
[0027] It should be noted here that, in some optional embodiments, the above... Figure 1 The computing device shown may include hardware elements (including circuitry), software elements (including computer code stored on a computer-readable medium), or a combination of both hardware and software elements. It should be noted that... Figure 1 This is only one instance of a specific particular instance, and is intended to illustrate the types of components that may exist in the aforementioned computing devices.
[0028] Figure 2 This is a schematic diagram of the density-based clustering-based abnormal response group detection system described in this embodiment. (Refer to...) Figure 2 As shown, the system includes a terminal device 100 and a processor 200. The terminal device 100 is communicatively connected to the processor 200, and users can upload exam papers corresponding to multiple examinees through the terminal device 100. The exam papers corresponding to multiple examinees are identical. The processor 200 is used to determine the error rate between any two examinees among the multiple examinees and construct a similarity evaluation index based on the error rate. The processor 200 is also used to identify first anomalous groups among the multiple examinees using a density clustering algorithm. The processor 200 is further used to determine the intra-cluster similarity evaluation index corresponding to each first anomalous group, and based on the intra-cluster similarity evaluation index corresponding to each first anomalous group and the similarity evaluation index corresponding to any two examinees in each first anomalous group, to filter out second anomalous groups from each first anomalous group.
[0029] Furthermore, the terminal device 100 is also used to receive and display information related to the second abnormal group sent by the processor 200.
[0030] It should be noted that the terminal device 100 and processor 200 in the system can both use the hardware structure described above.
[0031] Under the aforementioned operating environment, according to the first aspect of this embodiment, a method for detecting abnormal response groups based on density clustering is provided. This method consists of... Figure 2 The processor 200 shown is implemented. Figure 3 A flowchart illustrating the method is shown below. (Refer to...) Figure 3 As shown, the method includes: S302: Determine the error rate between any two candidates among multiple candidates, and construct a similarity evaluation index based on the error rate. The similarity evaluation index is used to indicate the proportion of questions in which any two candidates answer the same question incorrectly and have the same wrong answer to the total number of questions in which any two candidates have the same wrong answer. S304: Use density clustering algorithm to identify the first anomalous group among multiple candidates, where the first anomalous group is used to indicate a potential cheating group; S306: Determine the intra-cluster similarity evaluation index corresponding to each first abnormal group, wherein the intra-cluster similarity evaluation index is used to indicate the average of multiple similarity evaluation indices corresponding to any two candidates within the first abnormal group; and S308: Based on the intra-cluster similarity evaluation index corresponding to each first abnormal group and the similarity evaluation index corresponding to any two candidates in each first abnormal group, select second abnormal groups from each first abnormal group, wherein the confidence of the second abnormal group is greater than the confidence of the first abnormal group.
[0032] Specifically, firstly, the user uploads the completed exam papers from multiple candidates to the processor 200 via terminal device 100. Upon receiving the exam papers from multiple candidates, the processor 200 treats the answers of any two candidates on the same exam paper as clustering results of two different clustering models on the same set, and uses the rand index to measure the similarity between the two candidates' answers. Then, after determining the error rate between any two candidates, the processor 200 constructs a similarity evaluation index based on the rand index and the error rate (S302). This similarity evaluation index indicates the proportion of questions on which two candidates answer incorrectly and have identical incorrect answers, relative to the total number of questions on which two candidates have incorrect answers.
[0033] The processor 200 constructs a similarity evaluation index based on the error-sameness ratio, including: First, the processor 200 determines the number of first-type questions in which any two candidates among multiple candidates answer the same question incorrectly. Then, the processor 200 determines the number of second-type questions in which any two candidates among multiple candidates answer the same question incorrectly, but their incorrect answers are different. Further, the processor 200 determines the number of third-type questions in which any two candidates among multiple candidates answer the same question incorrectly, while the other candidate answers correctly. Finally, the processor 200 calculates the ratio of the number of first-type questions to the total number of first-type, second-type, and third-type questions, and determines this ratio as the similarity evaluation index corresponding to any two candidates mentioned above. The above will be described in detail later, and therefore will not be repeated here.
[0034] Then, processor 200 uses a density clustering algorithm to identify the first anomalous group among multiple candidates (S304). The first anomalous group is used to indicate potential cheating groups. The process of processor 200 identifying the first anomalous group among multiple candidates using the density clustering algorithm includes: First, processor 200 analyzes the distance distribution among candidates based on a grid search method and pre-sets a minimum similarity threshold for determining the similarity between candidates. Then, processor 200 determines the search range of the neighborhood radius corresponding to the density clustering algorithm based on the minimum similarity threshold. Further, processor 200, based on the search range of the neighborhood radius, traverses parameter combinations including the neighborhood radius and the minimum number of samples, and selects the target parameter combination that optimizes the clustering effect of the density clustering algorithm. Finally, processor 200 uses the density clustering algorithm and the target parameter combination to identify the first anomalous group among multiple candidates. The above will be described in detail later, so it will not be repeated here.
[0035] Further, the processor 200 determines the intra-cluster similarity evaluation index corresponding to each first abnormal group (S306). The intra-cluster similarity evaluation index indicates the average of multiple similarity evaluation indices corresponding to any two candidates within the first abnormal group. Referring to the above description, since the processor 200 has already determined the similarity evaluation indices corresponding to any two candidates in the above process, when the processor 200 has determined the first abnormal group, it can determine the similarity evaluation index between any two candidates within the first abnormal group.
[0036] For example, processor 200 identified multiple candidates. And further identified multiple candidates ~ Similarity evaluation index between any two candidates ~ Among them, similarity evaluation indicators With the candidates and test takers Correspondingly, similarity evaluation metrics With the candidates and test takers Correspondingly, ..., similarity evaluation metrics With the candidates and test takers correspond.
[0037] The processor 200 then identified multiple first-order exception groups. ~ And the first abnormal group This includes multiple candidates. ~ Then processor 200 can determine the examinee. With the candidates Similarity evaluation metrics between Candidates With the candidates Similarity evaluation metrics between and test takers and test takers Similarity evaluation metrics between .
[0038] Thus, it is related to the first abnormal group. Corresponding intra-cluster similarity evaluation index It can be calculated based on the following formula:
[0039] Similarly, based on the same operations described above, processor 200 can identify multiple first anomaly groups. ~ Corresponding intra-cluster similarity evaluation index ~ .
[0040] Finally, the processor 200 selects second abnormal groups from each of the first abnormal groups based on the intra-cluster similarity evaluation index corresponding to each of the first abnormal groups and the similarity evaluation index corresponding to any two candidates in each of the first abnormal groups (S308). The confidence level of the second abnormal group is greater than that of the first abnormal group. Specifically, the processor 200 determines the first similarity evaluation index corresponding to the first candidate and the second candidate, the second similarity evaluation index corresponding to the second candidate and the third candidate, and the third similarity evaluation index corresponding to the first candidate and the third candidate in each of the first abnormal groups. The first candidate, the second candidate, and the third candidate represent any one of the multiple candidates. Furthermore, if the first similarity evaluation index is greater than the corresponding intra-cluster similarity evaluation index, the second similarity evaluation index is greater than the intra-cluster similarity evaluation index, and the third similarity evaluation index is less than the intra-cluster similarity evaluation index, the first abnormal group is removed, and the second abnormal group is obtained.
[0041] Therefore, the processor 200 can analyze the clustering results (i.e., the first abnormal group) by combining the intra-cluster similarity evaluation index corresponding to each first abnormal group, and exclude the cases in the first abnormal group where the first candidate and the second candidate are similar, the second candidate and the third candidate are similar, but the first candidate and the third candidate are not similar, thereby selecting the second abnormal group with higher confidence and outputting it.
[0042] As described in the background section, existing technologies have proposed methods to prevent cheating, such as intelligent software monitoring, physical device interception, and enhanced traditional invigilation. However, all of these methods have certain drawbacks. For example, these cheating detection methods typically focus on the examination room itself, analyzing individual student behavior, and cannot effectively analyze cross-examination room cheating. However, since cross-examination room cheating usually involves multiple students and often utilizes modern communication methods to transmit cheating information between multiple examination rooms, the complexity of cross-examination room cheating is far greater than that of single cheating within an examination room. Furthermore, it allows students in different examination rooms to answer questions simultaneously during the exam, increasing the secrecy of the cheating.
[0043] For example, since the above-mentioned cheating detection methods usually analyze the individual behavior of candidates, that is, they can only determine whether a certain candidate has cheated, without considering whether the candidate is associated with a potential abnormal cheating group. Therefore, in most cases, traditional cheating detection methods can only detect one or more cheating candidates, but cannot detect whether the cheating candidate is in an abnormal cheating group or other cheating candidates in the abnormal cheating group.
[0044] In view of this, this application provides an anomaly detection method based on density clustering. Furthermore, as described above, this application selects the rand index, based on the error-sameness ratio, as the similarity evaluation index, thus providing a more rigorous similarity judgment and being suitable for anomaly detection on large-scale data. That is, it not only considers the consistency of answers on incorrect questions but also integrates various error scenarios, making the detection results more reliable and accurate, and reducing interference from randomness. Therefore, through the above improvements, the density clustering algorithm can more effectively identify potential abnormal cheating groups, making it more suitable for complex examination scenarios.
[0045] Furthermore, since this application does not merely analyze the individual behavior of test takers, but rather identifies abnormal cheating groups among them, the solution provided by this application is not limited to the examination room itself, but can also effectively analyze cheating across examination rooms, thereby further ensuring the fairness of the examination.
[0046] Therefore, this application achieves the technical effect of reliably and accurately determining whether there is a potential abnormal cheating group among multiple candidates, and ensuring the fairness of the examination. This solves the technical problem that existing cheating detection methods cannot accurately identify whether there is a potential abnormal cheating group among a large number of candidates, thus failing to guarantee the fairness of the examination.
[0047] Optionally, the operation of selecting second abnormal groups from each first abnormal group based on the intra-cluster similarity evaluation index corresponding to each first abnormal group and the similarity evaluation index corresponding to any two candidates in each first abnormal group includes: determining, respectively, the first similarity evaluation index corresponding to the first candidate and the second candidate, the second similarity evaluation index corresponding to the second candidate and the third candidate, and the third similarity evaluation index corresponding to the first candidate and the third candidate in each first abnormal group, wherein the first candidate, the second candidate, and the third candidate represent any one of the multiple candidates; and removing the first abnormal group and obtaining the second abnormal group if the first similarity evaluation index is greater than the corresponding intra-cluster similarity evaluation index, the second similarity evaluation index is greater than the corresponding intra-cluster similarity evaluation index, and the third similarity evaluation index is less than the corresponding intra-cluster similarity evaluation index.
[0048] Specifically, with the first abnormal group For example, firstly, processor 200 determines the first abnormal group. Internal test takers (That is, the first examinee) and examinees Similarity evaluation metrics between (i.e., the second candidate) Candidates (That is, the first examinee) and examinees Similarity evaluation index among (i.e., the third candidate) and test takers (i.e., the second candidate) and the candidate Similarity evaluation index among (i.e., the third candidate) .
[0049] And the processor 200 identifies the first abnormal group. Similarity evaluation index between pairs of test takers ~ In this case, processor 200 will interact with the examinee With the candidates The corresponding similarity evaluation index is regarded as the first similarity evaluation index. Will with the candidates With the candidates The corresponding similarity evaluation metric is regarded as the second similarity evaluation metric. Will with the candidates With the candidates The corresponding similarity evaluation index is regarded as the third similarity evaluation index. .
[0050] Then the processor 200 will evaluate the first similarity index respectively. Second similarity evaluation index and the third similarity evaluation index Respectively with the first abnormal group Corresponding intra-cluster similarity evaluation index A comparison was performed. And based on the first similarity evaluation metric... Greater than intra-cluster similarity evaluation index Second similarity evaluation index Greater than intra-cluster similarity evaluation index However, the third similarity evaluation index Smaller than intra-cluster similarity evaluation index In this case, it indicates the first abnormal group The confidence level is low, thus classifying the first abnormal group as [a group with low confidence]. Remove.
[0051] For example, the first abnormal group Includes candidates ~ Therefore, processor 200 first determines the first abnormal group. Internal test takers (That is, the first examinee) and examinees Similarity evaluation metrics between (i.e., the second candidate) Candidates (That is, the first examinee) and examinees Similarity evaluation index among (i.e., the third candidate) and test takers (i.e., the second candidate) and the candidate Similarity evaluation index among (i.e., the third candidate) .
[0052] And the processor 200 identifies the first abnormal group. Similarity evaluation index between pairs of test takers ~ In this case, processor 200 will interact with the examinee With the candidates The corresponding similarity evaluation index is regarded as the first similarity evaluation index. Will with the candidates With the candidates The corresponding similarity evaluation metric is regarded as the second similarity evaluation metric. Will with the candidates With the candidates The corresponding similarity evaluation index is regarded as the third similarity evaluation index. .
[0053] Then the processor 200 will evaluate the first similarity index respectively. Second similarity evaluation index and the third similarity evaluation index Respectively with the first abnormal group Corresponding intra-cluster similarity evaluation index A comparison was performed. And based on the first similarity evaluation metric... Second similarity evaluation index and the third similarity evaluation index Similarity evaluation metrics within clusters If the size relationship between them does not meet the above conditions, it means that it is not necessary to remove the first abnormal group. .
[0054] Furthermore, the processor 200 analysis and test takers (i.e., the first examinee), examinee (i.e., the second candidate) and the candidates (That is, the similarity evaluation index corresponding to the third candidate) ~ With the first abnormal group Intra-cluster similarity evaluation metrics Does the size relationship between them satisfy the above conditions? If it does, then the first abnormal group needs to be removed. The confidence level is low, and the first abnormal group needs to be removed. If the above conditions are not met, it means that it is not necessary to remove the first abnormal group. .
[0055] Similarly, processor 200 analyzes the first anomaly group. In other cases (i.e., candidates) As the first examinee, examinee As the second candidate, the candidate For the third candidate, or, candidate As the first examinee, examinee As the second candidate, the candidate Does the third candidate meet the above conditions? If so, it means the first abnormal group needs to be removed. The confidence level is low, and the first abnormal group needs to be removed. If the above conditions are not met, it means that it is not necessary to remove the first abnormal group. .
[0056] Thus, the processor 200 is able to identify the first abnormal group through the above method. ~ The confidence level, and the first outlier group with lower confidence level. ~ The abnormal groups are then eliminated, thus obtaining the second abnormal group.
[0057] Therefore, the clustered candidate groups obtained using density clustering analysis (i.e., the first abnormal group) are further evaluated using the intra-cluster similarity evaluation index to ultimately determine the risk candidate group (i.e., the second abnormal group). This more stringent similarity measurement further filters out the second abnormal group, which exhibits a high degree of similarity in overall answer patterns.
[0058] Optionally, the operation of identifying the first anomalous group among multiple candidates using a density clustering algorithm includes: analyzing the distance distribution among candidates based on a grid search method, and pre-setting a minimum similarity threshold for determining the similarity between candidates; determining the search range of the neighborhood radius corresponding to the density clustering algorithm based on the minimum similarity threshold; traversing various parameter combinations according to the search range of the neighborhood radius, and selecting the target parameter combination that makes the clustering effect of the density clustering algorithm optimal, wherein the parameter combination includes the neighborhood radius and the minimum number of samples; and identifying the first anomalous group among multiple candidates using the density clustering algorithm and according to the target parameter combination.
[0059] Specifically, since cheating candidates typically constitute only a minority of the overall test-taker population, while the majority of legitimate candidates represent noise, the data corresponding to legitimate candidates deviates from the characteristic patterns of the cheating group. This makes traditional clustering methods susceptible to interference from noise points (i.e., legitimate candidates), resulting in unclear cluster boundaries and affecting detection accuracy. Therefore, this application selects density clustering as the clustering analysis algorithm.
[0060] When using density clustering algorithms to identify the first anomalous group, the key lies in setting two hyperparameters appropriately: the neighborhood radius (i.e., eps) and the minimum sample size (i.e., min-samples). The neighborhood radius represents the maximum interval at which candidates are considered to be of the same category. In the cheating group detection of this application, a smaller neighborhood radius is typically chosen to ensure that candidates are clustered only when they are highly similar. The minimum sample size reflects the minimum number of individuals required to identify a group as anomalous. Considering the indicative role of small clusters in anomaly detection, setting a lower minimum sample size helps identify small-scale anomalous groups without overlooking large-scale anomalous groups, and also filters out candidates who answered correctly.
[0061] Therefore, in order to select the optimal neighborhood radius and minimum number of samples, the processor 200 uses a grid search method to analyze each examinee. ~ Based on the distance distribution between them and combined with the empirical analysis of multiple data runs, a minimum similarity threshold for the eps search space is pre-set, and the search range of the neighborhood radius corresponding to the density clustering algorithm is determined.
[0062] Then, the processor 200 systematically traverses each parameter combination (i.e., neighborhood radius and minimum number of samples) using a grid search method to select the target parameter combination that makes the density clustering algorithm achieve the best clustering effect, so as to achieve accurate abnormal group detection.
[0063] Furthermore, given that the processor 200 has determined the target parameter combination, it uses a density clustering algorithm to cluster the candidate data, thereby identifying potential first outlier groups. The operation of using the density clustering algorithm to identify potential first outlier groups includes: the processor 200 clustering data from multiple candidates... ~ Starting from any candidate in the list, determine the number of first similar candidates around that candidate to check based on the set neighborhood radius.
[0064] Furthermore, if the number of similar candidates is greater than or equal to a predetermined minimum sample size, the candidate is identified as a core candidate. Starting with the core candidates, the processor 200 continuously searches for density-connected candidates using a density-based clustering algorithm, grouping connected candidates into a cluster until all candidates have been included. The resulting clusters represent potential cheating groups (i.e., the first anomalous group), while candidates without sufficient similar neighbors are marked as noise points and excluded from the cheating group. Thus, density-based clustering can effectively aggregate highly similar candidate groups and isolate independent candidate data that does not belong to any group.
[0065] Therefore, this application reasonably sets the neighborhood radius (i.e., eps) and minimum number of samples (i.e., min-samples) in the density clustering algorithm, which directly determines which candidate among multiple candidates is identified as a potential anomalous group (i.e., the first anomalous group). This enables the density clustering algorithm to effectively aggregate highly similar candidate groups and isolate independent candidate data that do not belong to any group.
[0066] Optionally, the operation of constructing a similarity evaluation index based on the error-sameness rate includes: determining the number of first questions in which any two candidates among multiple candidates answer the same question incorrectly and have the same wrong answer; determining the number of second questions in which any two candidates among multiple candidates answer the same question incorrectly and have different wrong answers; determining the number of third questions in which any two candidates among multiple candidates answer the same question incorrectly and have the other candidate answer correctly; calculating the proportion of the number of first questions to the total number of first, second, and third questions, and determining the proportion as the similarity evaluation index corresponding to any two candidates.
[0067] Specifically, first, processor 200 identifies multiple candidates. ~ The number of questions in the first section where any two candidates answer the same question incorrectly at the same time, and their incorrect answers are identical. Where j = 1 to n. For example, Indicates to the test taker and test takers The corresponding number of questions in the first section. Indicates to the test taker and test takers The corresponding number of questions in the first round. And so on.
[0068] Then, processor 200 identifies multiple candidates. ~ The number of second questions in which any two candidates answer the same question incorrectly, but with different incorrect answers. .For example, Indicates to the test taker and test takers The corresponding number of questions in the second round. Indicates to the test taker and test takers The corresponding number of second questions. And so on.
[0069] Furthermore, processor 200 identifies multiple candidates. ~ The number of third questions in which any two candidates answer the same question incorrectly and the other answers correctly. .For example, Indicates to the test taker and test takers The corresponding number of questions in the third section. Indicates to the test taker and test takers The corresponding number of third questions. And so on.
[0070] Then, processor 200 calculates the similarity evaluation index for any two candidates according to the following formula:
[0071] in, Indicates to the test taker and test takers The corresponding similarity evaluation metrics, Indicates to the test taker and test takers Corresponding similarity evaluation metrics, etc.
[0072] Thus, by using the rand index based on the error rate to evaluate the similarity of answers between any two candidates, the aforementioned operation achieves the technical effect of providing necessary support for subsequent use of density clustering algorithms to discover potential abnormal cheating groups.
[0073] Optionally, it further includes: determining the number of identical errors within each first abnormal group, and quantifying the consistency of incorrect questions among the candidates in each first abnormal group based on the number of identical errors within each group, wherein the number of identical errors within each group is used to indicate the number of questions in the first abnormal group where candidates have answered the same question incorrectly and have the same answer.
[0074] Specifically, after the processor 200 uses a density clustering algorithm to identify the first anomalous group among multiple candidates, it further determines the intra-cluster error count corresponding to each first anomalous group. The intra-cluster error count indicates the number of questions in the first anomalous group where candidates have answered the same question incorrectly and have identical answers. For example, the first anomalous group... Including test takers ~ And in the first abnormal group Junior high school students Candidates and test takers The number of questions on the same test paper where both parties answered the same question incorrectly and had the same answer is: .
[0075] Therefore, processor 200 can determine the first abnormal group based on the same operation as described above. ~ Corresponding intra-cluster mismatch number ~ The specific steps will not be elaborated here.
[0076] Thus, processor 200 determines the first exception group. ~ Corresponding intra-cluster mismatch number ~ In this case, it can be based on each first abnormal group ~ Corresponding intra-cluster mismatch number ~ Each of the two households is in its first abnormal group ~ The consistency of incorrect answers among all test takers within the group provides insights into the first abnormal group. ~ A direct measure of internal abnormal structures.
[0077] Thus, according to the first aspect of this embodiment, it is possible to reliably and accurately determine whether there is a potential abnormal cheating group among multiple candidates, and to ensure the technical effect of exam fairness.
[0078] In addition, refer to Figure 1 As shown, according to a second aspect of this embodiment, a storage medium is provided. The storage medium includes a stored program, wherein, when the program is executed, a processor performs any of the methods described above.
[0079] Therefore, according to this embodiment, it is possible to reliably and accurately determine whether there is a potential group of cheaters among multiple candidates, and to ensure the fairness of the examination.
[0080] It should be noted that, for the sake of simplicity, the foregoing method embodiments are all described as a series of actions. However, those skilled in the art should understand that the present invention is not limited to the described order of actions, because according to the present invention, some steps can be performed in other orders or simultaneously. Furthermore, those skilled in the art should also understand that the embodiments described in the specification are preferred embodiments, and the actions and modules involved are not necessarily essential to the present invention.
[0081] Through the above description of the embodiments, those skilled in the art can clearly understand that the methods according to the above embodiments can be implemented by means of software plus necessary general-purpose hardware platforms. Of course, they can also be implemented by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a storage medium (such as ROM / RAM, magnetic disk, optical disk) and includes several instructions to cause a terminal device (which may be a mobile phone, computer, server, or network device, etc.) to execute the methods described in the various embodiments of the present invention.
[0082] Example 2
[0083] Figure 4 An anomalous response group detection device based on density clustering according to this embodiment is shown, which corresponds to the method described in Embodiment 1. Reference Figure 4 As shown, the device includes: a first similarity evaluation index determination module 410, used to determine the error rate between any two candidates among multiple candidates, and construct a similarity evaluation index based on the error rate, wherein the similarity evaluation index is used to indicate the proportion of questions in which any two candidates answer the same question incorrectly and have the same incorrect answer to the proportion of questions in which any two candidates have the same incorrect answer; a first abnormal group identification module 420, used to identify a first abnormal group among multiple candidates using a density clustering algorithm, wherein the first abnormal group is used to indicate a potential cheating group; a second similarity evaluation index determination module 430, used to determine the intra-cluster similarity evaluation index corresponding to each first abnormal group, wherein the intra-cluster similarity evaluation index is used to indicate the average value of multiple similarity evaluation indices corresponding to any two candidates within the first abnormal group; and an abnormal group filtering module 440, used to filter out a second abnormal group from each first abnormal group based on the intra-cluster similarity evaluation index corresponding to each first abnormal group and the similarity evaluation index corresponding to any two candidates within each first abnormal group, wherein the confidence level of the second abnormal group is greater than the confidence level of the first abnormal group.
[0084] Optionally, the abnormal group screening module 440 includes: a third similarity evaluation index determination module, used to determine, respectively, the first similarity evaluation index corresponding to the first candidate and the second candidate, the second similarity evaluation index corresponding to the second candidate and the third candidate, and the third similarity evaluation index corresponding to the first candidate and the third candidate in each first abnormal group, wherein the first candidate, the second candidate, and the third candidate represent any one of the multiple candidates; and an abnormal group screening submodule, used to remove the first abnormal group and screen out the second abnormal group when the first similarity evaluation index is greater than the corresponding intra-cluster similarity evaluation index, the second similarity evaluation index is greater than the corresponding intra-cluster similarity evaluation index, and the third similarity evaluation index is less than the corresponding intra-cluster similarity evaluation index.
[0085] Optionally, the first abnormal group identification module 420 includes: a minimum similarity threshold determination module, used to analyze the distance distribution between candidates based on a grid search method and pre-set a minimum similarity threshold for determining the similarity between candidates; a search range determination module, used to determine the search range of the neighborhood radius corresponding to the density clustering algorithm based on the minimum similarity threshold; a target parameter combination determination module, used to traverse various parameter combinations according to the search range of the neighborhood radius and select the target parameter combination that makes the clustering effect of the density clustering algorithm the best, wherein the parameter combination includes the neighborhood radius and the minimum number of samples; and a first abnormal group identification module, used to identify the first abnormal group among multiple candidates using the density clustering algorithm and according to the target parameter combination.
[0086] Optionally, the first similarity evaluation index determination module 410 includes: a first question quantity determination module, used to determine the number of first questions in which any two candidates among multiple candidates answer the same question incorrectly at the same time, and the incorrect answers are the same; a second question quantity determination module, used to determine the number of second questions in which any two candidates among multiple candidates answer the same question incorrectly at the same time, and the incorrect answers are different; a third question quantity determination module, used to determine the number of third questions in which any two candidates among multiple candidates answer the same question incorrectly, and the other candidate answers correctly; and a ratio calculation module, used to calculate the ratio of the number of first questions to the total number of first questions, the total number of second questions, and the total number of third questions, and to determine the ratio as the similarity evaluation index corresponding to any two candidates.
[0087] Optionally, the device further includes: a consistency quantification module, used to determine the number of identical questions within each cluster corresponding to each first abnormal group, and to quantify the consistency of each examinee in each first abnormal group on the wrong questions based on the number of identical questions within each cluster, wherein the number of identical questions within each cluster is used to indicate the number of questions in the first abnormal group where the examinees have answered the same questions incorrectly and have the same answers.
[0088] Therefore, according to this embodiment, it is possible to reliably and accurately determine whether there is a potential group of cheaters among multiple candidates, and to ensure the fairness of the examination.
[0089] Example 3
[0090] Figure 5 An anomalous response group detection device based on density clustering according to this embodiment is shown, which corresponds to the method described according to Embodiment 1. Reference Figure 5 As shown, the device includes: a processor 510; and a memory 520 connected to the processor 510, for providing the processor 510 with instructions to process the following steps: determining the error rate between any two candidates among a plurality of candidates, and constructing a similarity evaluation index based on the error rate, wherein the similarity evaluation index is used to indicate the proportion of questions in which any two candidates answer the same question incorrectly and have the same incorrect answer to the proportion of questions in which any two candidates have incorrect answers; identifying a first anomalous group among the plurality of candidates using a density clustering algorithm, wherein the first anomalous group is used to indicate a potential cheating group; determining an intra-cluster similarity evaluation index corresponding to each first anomalous group, wherein the intra-cluster similarity evaluation index is used to indicate the average of multiple similarity evaluation indices corresponding to any two candidates within the first anomalous group; and filtering out a second anomalous group from each first anomalous group based on the intra-cluster similarity evaluation index corresponding to each first anomalous group and the similarity evaluation index corresponding to any two candidates within each first anomalous group, wherein the confidence level of the second anomalous group is greater than the confidence level of the first anomalous group.
[0091] Optionally, the operation of selecting second abnormal groups from each first abnormal group based on the intra-cluster similarity evaluation index corresponding to each first abnormal group and the similarity evaluation index corresponding to any two candidates in each first abnormal group includes: determining, respectively, the first similarity evaluation index corresponding to the first candidate and the second candidate, the second similarity evaluation index corresponding to the second candidate and the third candidate, and the third similarity evaluation index corresponding to the first candidate and the third candidate in each first abnormal group, wherein the first candidate, the second candidate, and the third candidate represent any one of the multiple candidates; and removing the first abnormal group and obtaining the second abnormal group if the first similarity evaluation index is greater than the corresponding intra-cluster similarity evaluation index, the second similarity evaluation index is greater than the corresponding intra-cluster similarity evaluation index, and the third similarity evaluation index is less than the corresponding intra-cluster similarity evaluation index.
[0092] Optionally, the operation of identifying the first anomalous group among multiple candidates using a density clustering algorithm includes: analyzing the distance distribution among candidates based on a grid search method, and pre-setting a minimum similarity threshold for determining the similarity between candidates; determining the search range of the neighborhood radius corresponding to the density clustering algorithm based on the minimum similarity threshold; traversing various parameter combinations according to the search range of the neighborhood radius, and selecting the target parameter combination that makes the clustering effect of the density clustering algorithm optimal, wherein the parameter combination includes the neighborhood radius and the minimum number of samples; and identifying the first anomalous group among multiple candidates using the density clustering algorithm and according to the target parameter combination.
[0093] Optionally, the operation of constructing a similarity evaluation index based on the error-sameness rate includes: determining the number of first questions in which any two candidates among multiple candidates answer the same question incorrectly and have the same wrong answer; determining the number of second questions in which any two candidates among multiple candidates answer the same question incorrectly and have different wrong answers; determining the number of third questions in which any two candidates among multiple candidates answer the same question incorrectly and have the other candidate answer correctly; calculating the proportion of the number of first questions to the total number of first, second, and third questions, and determining the proportion as the similarity evaluation index corresponding to any two candidates.
[0094] Optionally, the device further includes: determining the number of identical questions within each first abnormal group, and quantifying the consistency of incorrect questions among the examinees in each first abnormal group based on the number of identical questions within each group, wherein the number of identical questions within each group is used to indicate the number of questions in the first abnormal group where examinees have answered the same question incorrectly and have the same answer.
[0095] Therefore, according to this embodiment, it is possible to reliably and accurately determine whether there is a potential group of cheaters among multiple candidates, and to ensure the fairness of the examination.
[0096] The sequence numbers of the above embodiments of the present invention are for descriptive purposes only and do not represent the superiority or inferiority of the embodiments.
[0097] In the above embodiments of the present invention, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments.
[0098] In the several embodiments provided in this application, it should be understood that the disclosed technical content can be implemented in other ways. The device embodiments described above are merely illustrative; for example, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the displayed or discussed mutual coupling, direct coupling, or communication connection may be through some interfaces; the indirect coupling or communication connection between units or modules may be electrical or other forms.
[0099] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0100] Furthermore, the functional units in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.
[0101] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, read-only memory (ROM), random access memory (RAM), portable hard drives, magnetic disks, or optical disks.
[0102] The above description is only a preferred embodiment of the present invention. It should be noted that for those skilled in the art, several improvements and modifications can be made without departing from the principle of the present invention, and these improvements and modifications should also be considered within the scope of protection of the present invention.
Claims
1. A method for detecting abnormal response groups based on density clustering, characterized in that, include: Determine the error rate between any two candidates among a plurality of candidates, and construct a similarity evaluation index based on the error rate, wherein the similarity evaluation index is used to indicate the proportion of the number of questions in which any two candidates answer the same question incorrectly and have the same wrong answer to the total number of questions in which any two candidates have wrong answers to the same question; A first anomalous group among the plurality of candidates is identified using a density clustering algorithm, wherein the first anomalous group is used to indicate a potential cheating group; Each first abnormal group is assigned an intra-cluster similarity evaluation index, wherein the intra-cluster similarity evaluation index is used to indicate the average value of multiple similarity evaluation indices corresponding to any two candidates within the first abnormal group. as well as Based on the intra-cluster similarity evaluation index corresponding to each of the first abnormal groups, and the similarity evaluation index corresponding to any two candidates in each of the first abnormal groups, a second abnormal group is selected from each of the first abnormal groups, wherein the confidence level of the second abnormal group is greater than that of the first abnormal group.
2. The method according to claim 1, characterized in that, The operation of selecting a second abnormal group from the first abnormal groups based on the intra-cluster similarity evaluation index corresponding to each of the first abnormal groups and the similarity evaluation index corresponding to any two candidates in each of the first abnormal groups includes: In each of the first abnormal groups, a first similarity evaluation index corresponding to the first candidate and the second candidate, a second similarity evaluation index corresponding to the second candidate and the third candidate, and a third similarity evaluation index corresponding to the first candidate and the third candidate are determined respectively, wherein the first candidate, the second candidate, and the third candidate represent any one of the plurality of candidates; If the first similarity evaluation index is greater than the corresponding intra-cluster similarity evaluation index, the second similarity evaluation index is greater than the corresponding intra-cluster similarity evaluation index, and the third similarity evaluation index is less than the corresponding intra-cluster similarity evaluation index, the first abnormal group is removed, and the second abnormal group is obtained by filtering.
3. The method according to claim 1, characterized in that, The operation of identifying the first anomalous group among the multiple candidates using a density clustering algorithm includes: Based on the grid search method, the distance distribution between candidates is analyzed, and a minimum similarity threshold is preset to determine the similarity between the candidates. Based on the minimum similarity threshold, the search range of the neighborhood radius corresponding to the density clustering algorithm is determined; Based on the search range of the neighborhood radius, iterate through various parameter combinations and select the target parameter combination that best achieves the clustering effect of the density clustering algorithm, wherein the parameter combination includes the neighborhood radius and the minimum number of samples; and Using the density clustering algorithm and based on the target parameter combination, the first abnormal group among the multiple candidates is identified.
4. The method according to claim 1, characterized in that, The operation of constructing a similarity evaluation index based on the error rate includes: Determine the number of first questions in which any two candidates among the plurality of candidates answer the same question incorrectly at the same time; Determine the number of second questions in which any two candidates among the plurality of candidates answer the same question incorrectly at the same time, and the incorrect answers are different; Determine the number of third questions in which any two candidates among the plurality of candidates answer the same question incorrectly and the other candidate answers correctly. Calculate the ratio of the number of the first question to the total number of the first question, the second question, and the third question, and determine the ratio as the similarity evaluation index corresponding to any two candidates.
5. The method according to claim 1, characterized in that, Also includes: The number of identical questions within each cluster corresponding to each of the first abnormal groups is determined, and based on the number of identical questions within each cluster, the consistency of the candidates' incorrect answers in each of the first abnormal groups is quantified, wherein the number of identical questions within each cluster is used to indicate the number of questions in the first abnormal group where the candidates have answered the same question incorrectly and have the same answer.
6. A storage medium, characterized in that, The storage medium includes a stored program, wherein, when the program is executed, the method described in any one of claims 1 to 5 is performed by a processor.
7. An abnormal response group detection device based on density clustering, characterized in that, include: The first similarity evaluation index determination module is used to determine the error rate between any two candidates among multiple candidates, and construct a similarity evaluation index based on the error rate. The similarity evaluation index is used to indicate the proportion of the number of questions in which any two candidates answer the same question incorrectly and have the same wrong answer to the total number of questions in which any two candidates have the same wrong answer. The first abnormal group identification module is used to identify a first abnormal group among the multiple candidates using a density clustering algorithm, wherein the first abnormal group is used to indicate a potential cheating group. The second similarity evaluation index determination module is used to determine the intra-cluster similarity evaluation index corresponding to each first abnormal group, wherein the intra-cluster similarity evaluation index is used to indicate the average value of multiple similarity evaluation indices corresponding to any two candidates within the first abnormal group. as well as An abnormal group filtering module is used to filter out a second abnormal group from each of the first abnormal groups based on the intra-cluster similarity evaluation index corresponding to each of the first abnormal groups and the similarity evaluation index corresponding to any two candidates in each of the first abnormal groups, wherein the confidence level of the second abnormal group is greater than that of the first abnormal group.
8. The apparatus according to claim 7, characterized in that, The abnormal group filtering module includes: The third similarity evaluation index determination module is used to determine, respectively, the first similarity evaluation index corresponding to the first candidate and the second candidate, the second similarity evaluation index corresponding to the second candidate and the third candidate, and the third similarity evaluation index corresponding to the first candidate and the third candidate in each of the first abnormal groups, wherein the first candidate, the second candidate and the third candidate represent any one of the plurality of candidates; The abnormal group filtering submodule is used to remove the first abnormal group and filter out the second abnormal group when the first similarity evaluation index is greater than the corresponding intra-cluster similarity evaluation index, the second similarity evaluation index is greater than the corresponding intra-cluster similarity evaluation index, and the third similarity evaluation index is less than the corresponding intra-cluster similarity evaluation index.
9. The apparatus according to claim 7, characterized in that, The first abnormal group identification module includes: The minimum similarity threshold determination module is used to analyze the distance distribution between candidates based on a grid search method, and to pre-set a minimum similarity threshold for determining the similarity between the candidates. The search range determination module is used to determine the search range of the neighborhood radius corresponding to the density clustering algorithm based on the minimum similarity threshold. The target parameter combination determination module is used to traverse various parameter combinations based on the search range of the neighborhood radius, and select the target parameter combination that maximizes the clustering effect of the density clustering algorithm, wherein the parameter combination includes the neighborhood radius and the minimum number of samples; and The first abnormal group identification module is used to identify the first abnormal group among the multiple candidates by using the density clustering algorithm and according to the target parameter combination.
10. An abnormal response group detection device based on density clustering, characterized in that, include: processor; as well as A memory, connected to the processor, for providing the processor with instructions to perform the following processing steps: Determine the error rate between any two candidates among a plurality of candidates, and construct a similarity evaluation index based on the error rate, wherein the similarity evaluation index is used to indicate the proportion of the number of questions in which any two candidates answer the same question incorrectly and have the same wrong answer to the total number of questions in which any two candidates have wrong answers to the same question; A first anomalous group among the plurality of candidates is identified using a density clustering algorithm, wherein the first anomalous group is used to indicate a potential cheating group; Each first abnormal group is assigned an intra-cluster similarity evaluation index, wherein the intra-cluster similarity evaluation index is used to indicate the average value of multiple similarity evaluation indices corresponding to any two candidates within the first abnormal group. as well as Based on the intra-cluster similarity evaluation index corresponding to each of the first abnormal groups, and the similarity evaluation index corresponding to any two candidates in each of the first abnormal groups, a second abnormal group is selected from each of the first abnormal groups, wherein the confidence level of the second abnormal group is greater than that of the first abnormal group.