Model risk detection method and device, storage medium, product and electronic equipment
By using a marking model to detect the model under test, selecting detection tasks based on question types, and combining the use of small and large models, the problems of time-consuming, labor-intensive, and low-accuracy manual annotation are solved, achieving efficient and accurate model risk detection.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- ALIPAY (HANGZHOU) INFORMATION TECH CO LTD
- Filing Date
- 2026-03-04
- Publication Date
- 2026-06-19
Smart Images

Figure CN122241729A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of computer technology, and more specifically, to a model risk detection method, apparatus, storage medium, product, and electronic device in the field of computer technology. Background Technology
[0002] With the development of internet and artificial intelligence technologies, models have been applied to all aspects of people's daily lives. For example, Large Language Models (LLMs) can engage in question-and-answer sessions to solve everyday problems. However, after model training, performance testing is required to ensure a good user experience and prevent the model from outputting risky information. Current technologies use manually labeled questions and answers to test the model, which consumes a lot of time and manpower. Furthermore, because a large number of sample questions are needed for model testing, multiple people often need to collaborate on the labeling work. Since different staff members have different understandings of the model's functions, application scenarios, and technical content, they may label the same questions differently, leading to a decrease in the accuracy of model detection. Therefore, there is a need to provide an accurate and efficient model detection method. Summary of the Invention
[0003] This application provides a model risk detection method, apparatus, storage medium, product, and electronic device. The method can perform detection processing on the model to be detected using the grading model, thereby improving the efficiency of model risk detection. Furthermore, it can call different detection tasks to perform detection processing on answers to different types of questions, thereby improving the targeting of detection for different types of questions and thus improving the accuracy of model risk detection.
[0004] In a first aspect, embodiments of this application provide a model risk detection method, the method comprising: Input the detection question into the model to be detected, and obtain the target answer output by the model to be detected; Obtain the detection type corresponding to the detection problem, and determine the target detection task corresponding to the detection type; The target answer is detected by calling the target detection task using the marking model, and the detection result of the model to be detected is obtained.
[0005] The above technical solution enables the detection and processing of the model to be tested based on the marking model, which improves the efficiency of model risk detection. Furthermore, by calling different detection tasks to detect and process answers to different types of questions, the targeting of detection for different types of questions is improved, thereby enhancing the accuracy of model risk detection.
[0006] In conjunction with the first aspect, in some possible implementations, the detection type includes one or more of the following: security detection type, authenticity detection type, and cross-sectional comparison detection type.
[0007] In conjunction with the first aspect, in some possible implementations, obtaining the detection type corresponding to the detection problem and determining the target detection task corresponding to the detection type includes: If the detection type corresponding to the detection problem is the security detection type, then the target detection task corresponding to the security detection type is a point-by-point scoring task; If the detection type corresponding to the detection problem is the realism detection type, then the target detection task corresponding to the realism detection type is a pairwise comparison task; If the detection type corresponding to the detection problem is the horizontal comparison detection type, then the target detection task corresponding to the horizontal comparison detection type is a scoring comparison task.
[0008] The above technical solution enables the classification of detection problems and the invocation of corresponding detection tasks to perform targeted risk detection on the answers to different types of detection problems, thus ensuring the accuracy of risk detection of the model under test in different aspects.
[0009] In conjunction with the first aspect, in some possible implementations, the target detection task is a point-by-point scoring task or a pairwise comparison task; The step of using a marking model to invoke the target detection task to detect the target answer and obtain the detection result of the model to be detected includes: The target answer is detected and processed by the target detection task using the marking model, and a target detection score is obtained for the target answer. If the target detection score is greater than the preset score, the detection result is determined to be a risk-free result. If the target detection score is less than a preset score, the detection result is determined to be a risky result.
[0010] The above technical solution enables the determination of test results based on the test scores generated by the test processing, and the digital results improve the accuracy of the test results.
[0011] In conjunction with the first aspect, in some possible implementations, the marking model includes a small marking model and a large marking model; The step of using a marking model to call the target detection task to detect the target answer and obtain the target detection score of the target answer includes: The target answer is detected and processed using the aforementioned marking model and the target detection task to obtain an initial detection score. If the initial detection score is not within the range of the re-examination score, then the initial detection score is determined as the target detection score; If the initial detection score is within the range of the re-examination score, the target detection task is called by the large-scale marking model to detect and process the target answer, and a target detection score is obtained.
[0012] The above technical solution enables risk detection processing by combining large and small models, which not only ensures the accuracy of detection and scoring but also saves computing resources and improves response efficiency.
[0013] In conjunction with the first aspect, in some possible implementations, the target detection task is a pairwise comparison task; The step of using a marking model to call the target detection task to detect the target answer and obtain the target detection score of the target answer includes: Retrieve the preset answers corresponding to the detection questions from the preset question bank; The pairwise comparison task is invoked using a marking model to calculate the similarity score between the target answer and the preset answer; The similarity score is determined as the target detection score for the target answer.
[0014] The above technical solution enables the detection of security risks in the model to be tested based on questions and answers in a pre-set question bank, thereby improving the targeting and accuracy of security risk detection.
[0015] In conjunction with the first aspect, in some possible implementations, the target detection task is a scoring comparison task; The step of using a marking model to invoke the target detection task to detect the target answer and obtain the detection result of the model to be detected includes: Determine at least one comparison model corresponding to the model to be detected; The marking model is used to invoke the scoring comparison task to obtain the comparison answers of each comparison model for the detection question; The target answer and each of the comparison answers are processed to obtain a ranking result of the target answer and each of the comparison answers.
[0016] The above technical solution enables risk detection in the horizontal comparison of the model to be tested by combining the comparison answer of the comparison model, thereby improving the pertinence and accuracy of the horizontal comparison detection.
[0017] Secondly, embodiments of this application provide a model risk detection device, the device comprising: The problem detection unit is used to input the detection problem into the model to be detected and obtain the target answer output by the model to be detected; The detection task determination unit is used to obtain the detection type corresponding to the detection problem and determine the target detection task corresponding to the detection type. The detection result acquisition unit is used to call the target detection task using the marking model to detect the target answer and obtain the detection result of the model to be detected.
[0018] Thirdly, embodiments of this application provide a computer storage medium storing a plurality of instructions adapted for loading by a processor and executing the above-described method steps.
[0019] Fourthly, embodiments of this application provide a computer program product that stores multiple instructions adapted for loading by a processor and executing the above-described method steps.
[0020] Fifthly, embodiments of this application provide an electronic device that may include: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to execute the above-described method steps.
[0021] In one or more embodiments of this application, a detection question is input into the model to be detected, the target answer output by the model is obtained, the detection type corresponding to the detection question is obtained, the target detection task corresponding to the detection type is determined, and a marking model is used to call the target detection task to process the target answer, thereby obtaining the detection result of the model to be detected. By using a marking model to process the model to be detected, the efficiency of model risk detection is improved, and by calling different detection tasks to process different types of question answers, the targeting of detection for different types of questions is improved, thereby improving the accuracy of model risk detection. Attached Figure Description
[0022] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0023] Figure 1 This is an example diagram illustrating a model risk detection method provided in an embodiment of this application; Figure 2 This is a schematic flowchart of a model risk detection method provided in an embodiment of this application; Figure 3This is a schematic diagram of a detection process provided in an embodiment of this application; Figure 4 This is a schematic diagram of a score acquisition process based on hybrid marking, provided in an embodiment of this application. Figure 5 This is a flowchart illustrating a pairwise comparison task provided in an embodiment of this application; Figure 6 This is a flowchart illustrating a scoring comparison task provided in an embodiment of this application; Figure 7 This is a schematic diagram of the structure of a model risk detection device provided in an embodiment of this application; Figure 8 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. Detailed Implementation
[0024] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0025] With the development of internet and artificial intelligence technologies, people have become accustomed to using various models to solve problems encountered in life and work. For example, large language models possess semantic understanding capabilities, enabling them to understand user input and output answers accordingly. Understandably, before being deployed, models require extensive training based on numerous samples to optimize their capabilities. Furthermore, after training, risk detection is necessary to ensure the accuracy of the model's output. In existing technologies, questions are typically constructed manually, and answers are manually labeled before being input into the model for risk detection. However, manual labeling is time-consuming, labor-intensive, and results in a limited number of labeled answers. Moreover, due to varying understandings of the model's functions, application scenarios, and technical content among different individuals, the labeled content can differ, reducing the accuracy of model risk detection. This application provides a model risk detection device that can select the appropriate risk detection method based on the question type and achieve automatic detection, thereby improving the targeting, accuracy, and efficiency of risk monitoring. The model risk detection method provided in this application can be implemented using a computer program and can run on a model risk detection device based on the von Neumann architecture. This computer program can be integrated into applications or run as a standalone utility application.
[0026] Please see also Figure 1This illustration provides an example of model risk detection in this application. The model to be detected can be a model that has completed training and needs risk detection. A detection question can be input into the model to be detected, and the target answer output by the model in response to the detection question can be obtained. The model risk detection device can construct a grading model, which is a model that can be used to automatically perform risk detection on the model to be detected. The model risk detection device can construct at least one detection task in the grading model. Different detection tasks use different risk detection methods, and different detection tasks can be used to detect different aspects of the risk of the model to be detected, such as security, authenticity, and horizontal comparison. The model risk detection device can classify the detection questions to determine which aspect of risk the detection question is suitable for detecting, thereby controlling the grading model to call the corresponding detection task to detect the target answer. Please refer to [further details omitted]. Figure 1 The marking model can have at least three different detection tasks, such as detection task A, detection task B, and detection task C. If the detection task corresponding to the detection question is determined to be detection task B, the marking model can call detection task B to perform detection processing on the target answer, thereby obtaining the detection result of the model to be detected. The detection result can be used to indicate whether the model to be detected has any risks.
[0027] The model risk detection method provided in this application will be described in detail below with reference to specific embodiments.
[0028] Please see Figure 2 This is a flowchart illustrating a model risk detection method provided in an embodiment of this application. Figure 2 As shown, the method described in this application embodiment may include the following steps S102-S106.
[0029] S102, input the detection question into the model to be detected, and obtain the target answer output by the model to be detected.
[0030] Specifically, the model risk detection device can acquire detection questions for risk detection of the model to be tested. These questions can be input into the model, allowing it to respond and output the target answer. The detection questions can be manually input by relevant personnel or retrieved from a pre-set question bank by the device. These questions can be pre-set manually by relevant personnel or generated by the device based on other large language models that have already completed training and passed risk detection.
[0031] S104, Obtain the detection type corresponding to the detection problem, and determine the target detection task corresponding to the detection type.
[0032] Specifically, in order to detect risks in various aspects of the performance of the model to be tested, the model risk detection device can categorize detection problems into different detection types and call different detection tasks for different detection types. Therefore, it can obtain the detection type corresponding to the detection problem, and different detection types correspond to different aspects of the risk of the model to be tested, such as security, realism, and cross-sectional comparison. Then, it can determine the target detection task corresponding to that detection type.
[0033] S106, the marking model calls the target detection task to detect the target answer and obtain the detection result of the model to be detected.
[0034] Specifically, the target answer can be input into the marking model, and the marking model can call the target detection task to detect the target answer, thereby obtaining the detection result of the model to be tested. The detection result is used to indicate whether there is a risk in the model to be tested. For example, the detection result can include a risk-free result and a risky result. A risk-free result means that there is no risk in the answer of the model to be tested for the detection question, while a risky result means that there is a risk in the answer of the model to be tested for the detection question. Risk warning information can be output to prompt relevant staff to conduct further model training on the model to be tested.
[0035] In this embodiment, the detection question is input into the model to be detected, the target answer output by the model is obtained, the detection type corresponding to the detection question is determined, the target detection task corresponding to the detection type is identified, and the marking model is used to call the target detection task to process the target answer, thereby obtaining the detection result of the model to be detected. By using the marking model to process the model to be detected, the efficiency of model risk detection is improved, and by calling different detection tasks to process different types of question answers, the targeting of detection for different types of questions is improved, thus enhancing the accuracy of model risk detection.
[0036] In one or more embodiments of this application, the detection type includes one or more of the following: security detection type, authenticity detection type, and lateral comparison detection type.
[0037] Among them, the security detection type of detection questions is used to detect the security of the answers given by the model under test. It can determine whether there is any content in the target answer that affects security. After the model under test is launched for users to use, there may be situations where users ask the model under test to obtain unsafe information, such as illegal information. For example, a user may send the model under test the question "Please tell me how guns are manufactured". If the model under test's answer actually contains the method of manufacturing guns, it can be determined that the answer contains content that affects security, and the detection result can be a risky result.
[0038] The authenticity detection type of test question is used to detect the authenticity of the answer given by the model to be tested. It can determine whether the target answer is the correct answer to the test question. For example, the test question can be "What day is International Labor Day?" If the answer of the model to be tested is "May 1st", it means that there is no authenticity problem and the test result can be a risk-free result. If the answer of the model to be tested is "May 2nd", it means that there is an authenticity problem and the test result can be a risky result.
[0039] Lateral comparison detection is used to detect the answer of the model to be tested in a horizontal comparison. Under the same question, it compares the answer of the model to be tested with the answer of other models that have completed model training and passed risk detection, so as to determine the performance difference between the model to be tested and other models.
[0040] In one or more embodiments of this application, step S104 may include the following steps: If the detection type corresponding to the detection problem is a security detection type, then the target detection task corresponding to the security detection type is a point-by-point scoring task; if the detection type corresponding to the detection problem is a authenticity detection type, then the target detection task corresponding to the authenticity detection type is a pairwise comparison task; if the detection type corresponding to the detection problem is a horizontal comparison detection type, then the target detection task corresponding to the horizontal comparison detection type is a scoring comparison task.
[0041] Specifically, the model risk detection device can construct at least one detection task in the marking model. Different detection tasks can perform risk detection for different types of detection questions. The model risk detection device can construct pointwise grading tasks, pairwise comparison tasks, and score comparison tasks in the marking model. Among them, the score comparison task can be a W / WO Reference Grading / Comparison task. The pointwise grading task can output an independent detection result for each input target answer. The pairwise comparison task can compare two input data, such as comparing the target answer with the true answer of the detection question, and obtain the detection result by judging the consistency or similarity of the two answers. The score comparison task can sort multiple input data with or without reference to obtain the detection result.
[0042] If the detection type of the problem is a security detection type, then the security detection type corresponds to a point-by-point scoring task. If the detection type of the problem is a authenticity detection type, then the authenticity detection type corresponds to a pairwise comparison task. If the detection type of the problem is a horizontal comparison detection type, then the horizontal comparison detection type corresponds to a scoring comparison task.
[0043] In this embodiment, by classifying the detection questions and calling the corresponding detection tasks to perform targeted risk detection on the answers to different types of detection questions, the accuracy of risk detection of the model to be detected in different aspects is ensured.
[0044] When the detection problem corresponds to a security detection type or a realism detection type, that is, when the target detection task is a point-by-point scoring task or a pairwise comparison task, the obtained detection result can be a detection score. In other words, the size of the detection score can be used to intuitively evaluate the security and realism of the model to be detected. For example, the larger the detection score, the stronger the security or realism of the model to be detected; conversely, the smaller the detection score, the weaker the security or realism of the model to be detected.
[0045] Please see Figure 3 This document provides a schematic flowchart of a detection process for an embodiment of this application. Figure 3 As shown, in one or more embodiments of this application, step S106 may include the following steps S202-S206.
[0046] S202, the marking model is used to call the target detection task to detect and process the target answer, and obtain the target detection score of the target answer.
[0047] Specifically, the target detection task is a detection task determined based on the detection type of the detection question, used to detect the risk of the target answer. The marking model can call the target detection task to detect the target answer and obtain the target detection score output by the marking model. The target detection score is used to represent the degree of safety or authenticity of the target answer of the model to be detected for the detection question.
[0048] S204. If the target detection score is greater than the preset score, the detection result is determined to be a risk-free result.
[0049] Specifically, if the target detection score is greater than the preset score, it indicates the safety or authenticity of the model's response to the target answer, thus confirming the detection result as risk-free. The preset score is used to determine the detection result represented by the score. It can be the initial setting of the model's risk detection device or pre-set by relevant personnel. For example, if the detection score ranges from 0 to 100, the preset score can be 50.
[0050] S206. If the target detection score is less than the preset score, the detection result is determined to be a risky result.
[0051] Specifically, if the target detection score is less than the preset score, it indicates that the target answer of the model to be detected has a safety or authenticity risk, and the detection result can be determined as a risky result.
[0052] In this embodiment, a marking model is used to invoke a target detection task to detect the target answer and obtain a target detection score. If the target detection score is greater than a preset score, the detection result is determined to be a risk-free result; if the target detection score is less than the preset score, the detection result is determined to be a risky result. Judging the detection result through the detection score generated by the detection process, the digital result improves the accuracy of the detection result.
[0053] The marking models created by the model risk detection device can include small marking models and large marking models. Small models have fewer parameters and lower computational resource requirements, and can process shorter data lengths, making them suitable for simpler tasks, such as outputting raw output scores (logits). Large models have more parameters and higher computational resource requirements, possessing stronger generation and understanding capabilities, and can be used for rule-based classification and interpretation of data. Therefore, large marking models output more accurate results but require more computational resources and have longer response times, while small marking models require fewer computational resources and have shorter response times. Thus, a combination of small and large marking models can be used for risk detection, thereby improving efficiency while maintaining accuracy.
[0054] Please see Figure 4 This document provides a schematic diagram of a score acquisition process based on hybrid grading, as illustrated in an embodiment of this application. Figure 4 As shown, in one or more embodiments of this application, step S202 may include the following steps S302-S306.
[0055] S302, the small marking model is used to call the target detection task to detect and process the target answer and obtain the initial detection score.
[0056] Specifically, a small marking model can be used to call the target detection task to detect the target answer and obtain the initial detection score output by the small marking model. The initial detection score is the detection score obtained based on the computing power of the small marking model, which represents the preliminary detection result for the target answer of the model to be detected.
[0057] S304. If the initial detection score is not within the range of the re-examination score, the initial detection score shall be determined as the target detection score.
[0058] Specifically, to avoid ambiguous and inaccurate initial detection results from the small-scale marking model, the model risk detection device can set a re-examination score range. This range is used to determine whether further calculation and analysis using the large-scale marking model is needed based on the initial detection score. The re-examination score range includes a preset score, which can be the initial setting for the model risk detection device or set by relevant personnel. For example, if the detection score range is 0-100, the preset score could be 50, and the re-examination score range could be 40-60. If the initial detection score is not within the re-examination score range, it indicates that the preliminary detection result obtained by the small-scale marking model is accurate, and the initial detection score can be directly determined as the target detection score. For example, if the initial detection score is 90, since the initial detection score is large and not within the re-examination score range, the initial detection score can be determined as the target detection score. Furthermore, since the target detection score is greater than the preset score, the detection result can be determined as a risk-free result.
[0059] S306. If the initial detection score is within the range of the re-examination score, the large-scale marking model is used to call the target detection task to detect and process the target answer, and obtain the target detection score.
[0060] Specifically, if the initial detection score falls within the range of the re-examination score, it indicates that the preliminary detection result obtained by the small marking model may be inaccurate. For example, the re-examination score range may be 40-60, while the initial detection score is 55, which is not the accurate value of the detection score. Due to the limited computing and understanding capabilities of the small model, the accurate value of the detection score may fluctuate around 55. To avoid the situation where the accurate value of the detection score is less than the preset score, while the initial detection score is greater than the preset score, resulting in an incorrect detection result, the large marking model can call the second detection task to detect and process the target answer to obtain a more accurate target detection score.
[0061] In this embodiment, a small-scale marking model is used to call a target detection task to detect and process the target answer, obtaining an initial detection score. If the initial detection score is not within the range of the re-examination score, it is determined as the target detection score. If the initial detection score is within the range of the re-examination score, a large-scale marking model is used to call a target detection task to detect and process the target answer, obtaining the target detection score. By combining the large and small models for risk detection processing, the accuracy of the detection score is ensured while saving computational resources and improving response efficiency.
[0062] When the detection type corresponding to the detection question is a security detection type, that is, when the target detection task is a pairwise comparison task, it is necessary to use the marking model to call the pairwise comparison task, compare the target answer with the accurate answer of the detection question, and thus determine the authenticity of the target answer. Therefore, it is necessary to pre-set the accurate answer for the detection question.
[0063] Please see Figure 5 This document provides a flowchart illustrating a pairwise comparison task in an embodiment of this application. Figure 5 As shown, in one or more embodiments of this application, step S202 may include the following steps S402-S406.
[0064] S402, retrieve the preset answers corresponding to the test questions from the preset question bank.
[0065] Specifically, the model risk detection device can store at least one detection question and a corresponding preset answer in a pre-set question bank. The preset answer is the accurate answer to the detection question. Therefore, the preset answer corresponding to the detection question can be found in the monthly question bank. The detection questions and their corresponding preset answers in the pre-set question bank can be manually set by relevant staff or generated by the model risk detection device based on other large language models that have completed model training and passed risk detection.
[0066] Optionally, to ensure consistency in thought process and understanding between the detection questions and the preset answers, both the detection questions and the preset answers can be generated based on other large language models that have already completed model training and passed risk detection.
[0067] S404 uses a marking model to invoke a pairwise comparison task to calculate the similarity score between the target answer and the preset answer.
[0068] Specifically, the marking model calls a pairwise comparison task to calculate the similarity score between the target answer and the preset answer. The similarity score can be used to represent the similarity between the target answer and the preset answer. The higher the similarity score, the higher the similarity between the target answer and the preset answer. Conversely, the lower the similarity score, the lower the similarity between the target answer and the preset answer.
[0069] S406, the similarity score is determined as the target detection score of the target answer.
[0070] Specifically, the similarity score can be used as the target detection score for the target answer.
[0071] Optionally, the model risk detection device can use a combination of small and large marking models for pairwise comparison tasks, as detailed in steps S302-S306 of the above embodiments.
[0072] In this embodiment, preset answers corresponding to detection questions are obtained from a preset question bank. A grading model is used to invoke a pairwise comparison task to calculate the similarity score between the target answer and the preset answer. The similarity score is then determined as the target detection score for the target answer. By using questions and answers from the preset question bank, security risk detection of the model under test is achieved, improving the targeting and accuracy of security risk detection.
[0073] Please see Figure 6 This document provides a flowchart illustrating a scoring comparison task in an embodiment of this application. Figure 6 As shown, in one or more embodiments of this application, step S106 may include the following steps S502-S506.
[0074] S502, determine at least one comparison model corresponding to the model to be detected.
[0075] Specifically, horizontal comparison detection problems are used to compare the answers of the model to be tested with the answers of other models that have already completed model training and passed risk detection under the same problem, thereby determining the performance difference between the model to be tested and other models. Therefore, at least one comparison model can be identified for the model to be tested. The comparison model is a model that has completed model training and passed risk detection. It can be a historical version of the model to be tested, or a model trained by other manufacturers or professionals.
[0076] S504 uses the marking model to call the scoring comparison task and obtains the comparison answers of each comparison model for the detection question.
[0077] Specifically, the marking model calls the scoring comparison task, inputs the detection question into each comparison model, and obtains the comparison answer of each comparison model for the detection question. The comparison answer is the answer output by each comparison model for the detection question.
[0078] S506: Perform detection processing on the target answer and each comparison answer to obtain the ranking results of the target answer and each comparison answer.
[0079] Specifically, the marking model can invoke a scoring comparison task to detect and process the target answer and each comparison answer, thereby obtaining the ranking results of the target answer and each comparison answer. For example, it can obtain the detection scores of the target answer and each comparison answer, and then rank the target answer and each comparison answer according to the detection scores from high to low to obtain the ranking results.
[0080] Optionally, the sorting result can be used directly as the detection result, or the target position of the target answer in the sorting result can be used as the detection result.
[0081] Optionally, the target answer's ranking in the sorting results can be obtained. If the target ranking is higher than a preset ranking, it indicates that the target answer is better or comparable to other compared answers, and the detection result is determined to be risk-free. If the target ranking is lower than the preset ranking, it indicates that the target answer is worse than other compared answers, and the detection result is determined to be risky. The preset ranking is used to determine whether the target answer of the model under test is risky. It can be the initial setting of the model risk detection device or set by relevant personnel. For example, the preset ranking can be the second to last position or a middle position in the sorting results.
[0082] In this embodiment, at least one comparison model corresponding to the model to be detected is determined. A scoring comparison task is invoked using a marking model to obtain the comparison answers of each comparison model for the detection question. The target answer and each comparison answer are then processed for detection, resulting in a ranking of the target answer and each comparison answer. By combining the comparison answers of the comparison models, risk detection in the horizontal comparison of the model to be detected is achieved, improving the targeting and accuracy of the horizontal comparison detection.
[0083] In one or more embodiments of this application, the model risk detection method further includes: If the detection result is risky, the risk detection process stops, and a risk warning message is output to prompt relevant personnel to further train the model under test. If the detection result is risk-free, the process continues by inputting the detection question into the model under test and obtaining the target answer output by the model, until the preset number of detection processes is reached. This preset number of detection processes indicates whether the model under test has passed the complete risk detection. It can be the initial setting of the model risk detection device or set by relevant personnel; for example, the preset number of detection processes could be 500. If the preset number of detection processes is reached, a detection pass message is output to indicate that the model under test has passed the risk detection, and the model can then be deployed online for user use.
[0084] The following will be combined with the appendix Figure 7 This paper provides a detailed description of the model risk detection device provided in the embodiments of this application. It should be noted that the appendix... Figure 7 The model risk detection device in the application is used to perform the present application. Figures 1-6 The methods shown in the embodiments are for illustrative purposes only, illustrating the parts relevant to the embodiments of this application. For specific technical details not disclosed, please refer to this application. Figures 1-6 The example shown.
[0085] Please see Figure 7This illustration shows a schematic diagram of a model risk detection device provided in an exemplary embodiment of this application. The model risk detection device can be implemented as all or part of a device through software, hardware, or a combination of both. The device 1 includes a problem detection unit 11, a detection task determination unit 12, and a detection result acquisition unit 13.
[0086] Problem detection unit 11 is used to input the detection problem into the model to be detected and obtain the target answer output by the model to be detected; The detection task determination unit 12 is used to obtain the detection type corresponding to the detection problem and determine the target detection task corresponding to the detection type. Optionally, the detection type includes one or more of the following: security detection type, authenticity detection type, and cross-sectional comparison detection type.
[0087] Optionally, the detection task determination unit 12 is specifically used to determine that if the detection type corresponding to the detection problem is the security detection type, then the target detection task corresponding to the security detection type is a point-by-point scoring task. If the detection type corresponding to the detection problem is the realism detection type, then the target detection task corresponding to the realism detection type is a pairwise comparison task; If the detection type corresponding to the detection problem is the horizontal comparison detection type, then the target detection task corresponding to the horizontal comparison detection type is a scoring comparison task.
[0088] The detection result acquisition unit 13 is used to call the target detection task using the marking model to detect the target answer and obtain the detection result of the model to be detected.
[0089] Optionally, the target detection task is a point-by-point scoring task or a pairwise comparison task; The detection result acquisition unit 13 is specifically used to use the marking model to call the target detection task to detect the target answer and obtain the target detection score of the target answer. If the target detection score is greater than the preset score, the detection result is determined to be a risk-free result. If the target detection score is less than a preset score, the detection result is determined to be a risky result.
[0090] Optionally, the marking model includes a small marking model and a large marking model; The detection result acquisition unit 13 is specifically used to use the marking small model to call the target detection task to detect and process the target answer, and obtain an initial detection score; If the initial detection score is not within the range of the re-examination score, then the initial detection score is determined as the target detection score; If the initial detection score is within the range of the re-examination score, the target detection task is called by the large-scale marking model to detect and process the target answer, and a target detection score is obtained.
[0091] Optionally, the target detection task is a pairwise comparison task; The detection result acquisition unit 13 is specifically used to acquire the preset answer corresponding to the detection question from the preset question bank; The pairwise comparison task is invoked using a marking model to calculate the similarity score between the target answer and the preset answer; The similarity score is determined as the target detection score for the target answer.
[0092] Optionally, the target detection task is a scoring comparison task; The detection result acquisition unit 13 is specifically used to determine at least one comparison model corresponding to the model to be detected. The marking model is used to invoke the scoring comparison task to obtain the comparison answers of each comparison model for the detection question; The target answer and each of the comparison answers are processed to obtain a ranking result of the target answer and each of the comparison answers.
[0093] In this embodiment, the detection question is input into the model to be detected, and the target answer output by the model is obtained. The detection type includes one or more of the following: security detection type, authenticity detection type, and horizontal comparison detection type. If the detection type corresponding to the detection question is a security detection type, the target detection task corresponding to the security detection type is a point-by-point scoring task; if the detection type corresponding to the detection question is an authenticity detection type, the target detection task corresponding to the authenticity detection type is a pairwise comparison task; if the detection type corresponding to the detection question is a horizontal comparison detection type, the target detection task corresponding to the horizontal comparison detection type is a score comparison task. By classifying the detection questions and calling the corresponding detection tasks to perform targeted risk detection on the answers of different types of detection questions, the accuracy of risk detection by the model to be detected in different aspects is ensured. The small marking model calls the target detection task to detect and process the target answer to obtain an initial detection score. If the initial detection score is not within the re-examination score range, the initial detection score is determined as the target detection score; if the initial detection score is within the re-examination score range, the large marking model calls the target detection task to detect and process the target answer to obtain the target detection score. By combining large and small models for risk detection processing, the accuracy of the detection score is ensured while saving computational resources and improving response efficiency. If the target detection score is greater than the preset score, the detection result is determined to be risk-free; if the target detection score is less than the preset score, the detection result is determined to be risky. Using the detection score generated through the detection processing to determine the detection result, the digitized result improves the accuracy of the detection results.
[0094] Preset answers to detection questions are obtained from a pre-set question bank. A marking model is used to invoke a pairwise comparison task to calculate the similarity score between the target answer and the pre-set answers. This similarity score is then used as the target detection score for the target answer. By using questions and answers from the pre-set question bank, security risk detection of the model under test is achieved, improving the targeting and accuracy of security risk detection. At least one comparison model is identified for the model under test. A marking model is used to invoke a scoring comparison task to obtain the comparison answers of each comparison model for the detection questions. Detection processing is performed on the target answer and each comparison answer to obtain the ranking results of the target answer and each comparison answer. By combining the comparison answers of the comparison models, risk detection of horizontal comparison of the model under test is achieved, improving the targeting and accuracy of horizontal comparison detection. Using a marking model to process the model under test improves the efficiency of model risk detection. Furthermore, invoking different detection tasks to process different types of question answers improves the targeting of detection for different types of questions, thereby improving the accuracy of model risk detection.
[0095] It should be noted that the model risk detection device provided in the above embodiments is only illustrated by the division of the above functional modules when executing the model risk detection method. In practical applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the device can be divided into different functional modules to complete all or part of the functions described above. In addition, the model risk detection device and the model risk detection method embodiments provided in the above embodiments belong to the same concept, and the implementation process is detailed in the method embodiments, which will not be repeated here.
[0096] The sequence numbers of the embodiments in this application are for descriptive purposes only and do not represent the superiority or inferiority of the embodiments.
[0097] This application also provides a computer storage medium that can store multiple instructions, which are adapted to be loaded and executed by a processor as described above. Figures 1-6 The model risk detection method described in the illustrated embodiment can be found in the following document for a detailed execution process: Figures 1-6 The specific details of the illustrated embodiments will not be elaborated here.
[0098] This application also provides a computer program product storing at least one instruction, which is loaded and executed by the processor as described above. Figures 1-6 The model risk detection method described in the illustrated embodiment can be found in the following document for a detailed execution process: Figures 1-6 The specific details of the illustrated embodiments will not be elaborated here.
[0099] Please refer to Figure 8 This diagram illustrates a structural block diagram of an electronic device provided in an exemplary embodiment of this application. The electronic device in this application may include one or more components such as a processor 110, a memory 120, an input device 130, an output device 140, and a bus 150. The processor 110, memory 120, input device 130, and output device 140 may be connected via the bus 150.
[0100] Processor 110 may include one or more processing cores. Processor 110 connects to various parts of the electronic device using various interfaces and lines, and executes various functions of terminal 100 and processes data by running or executing instructions, programs, code sets, or instruction sets stored in memory 120, and by calling data stored in memory 120. Optionally, processor 110 may be implemented using at least one hardware form of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), or Programmable Logic Array (PLA). Processor 110 may integrate one or a combination of several of the following: Central Processing Unit (CPU), Graphics Processing Unit (GPU), and modem. The CPU primarily handles the operating system, user page, and applications; the GPU is responsible for rendering and drawing the displayed content; and the modem handles wireless communication. It is understood that the modem may also not be integrated into processor 110 and may be implemented separately using a communication chip.
[0101] The memory 120 may include random access memory (RAM) or read-only memory (ROM). Optionally, the memory 120 may include non-transitory computer-readable storage medium. The memory 120 may be used to store instructions, programs, code, code sets, or instruction sets. The memory 120 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for implementing at least one function (such as touch function, sound playback function, image playback function, etc.), instructions for implementing the various method embodiments described above, etc. The operating system may be the Android system, including systems deeply developed based on the Android system, the iOS system developed by Apple Inc., including systems deeply developed based on the iOS system, or other systems.
[0102] The memory 120 can be divided into operating system space and user space. The operating system runs in the operating system space, while native and third-party applications run in user space. To ensure that different third-party applications can achieve good running performance, the operating system allocates corresponding system resources for each application. However, different application scenarios within the same third-party application have different requirements for system resources. For example, in local resource loading scenarios, third-party applications have high requirements for disk read speed; in animation rendering scenarios, third-party applications have high requirements for GPU performance. Since the operating system and third-party applications are independent of each other, the operating system often cannot promptly perceive the current application scenario of a third-party application, resulting in the operating system's inability to adapt system resources accordingly.
[0103] In order for the operating system to distinguish the specific application scenarios of third-party applications, it is necessary to establish data communication between the third-party applications and the operating system. This would allow the operating system to obtain the current scenario information of the third-party applications at any time, and then perform targeted system resource adaptation based on the current scenario.
[0104] The input device 130 is used to receive input instructions or data, and includes, but is not limited to, a keyboard, mouse, camera, microphone, or touch device. The output device 140 is used to output instructions or data, and includes, but is not limited to, a display device and a speaker. In one example, the input device 130 and the output device 140 can be combined, and the input device 130 and the output device 140 can be a touch display screen.
[0105] The touch display screen can be designed as a full-screen, curved screen, or irregularly shaped screen. It can also be designed as a combination of a full-screen and a curved screen, or a combination of an irregularly shaped screen and a curved screen; however, this application does not limit the specific design in this regard.
[0106] In addition, those skilled in the art will understand that the structure of the electronic device shown in the above figures does not constitute a limitation on the electronic device. The electronic device may include more or fewer components than shown, or combine certain components, or have different component arrangements. For example, the electronic device may also include radio frequency circuits, input units, sensors, audio circuits, Wireless Fidelity (WiFi) modules, power supplies, Bluetooth modules, etc., which will not be described in detail here.
[0107] exist Figure 8 In the illustrated electronic device, the processor 110 can be used to call the model risk detection application stored in the memory 120 and specifically perform the following operations: Input the detection question into the model to be detected, and obtain the target answer output by the model to be detected; Obtain the detection type corresponding to the detection problem, and determine the target detection task corresponding to the detection type; The target answer is detected by calling the target detection task using the marking model, and the detection result of the model to be detected is obtained.
[0108] In one embodiment, the detection type includes one or more of the following: security detection type, authenticity detection type, and cross-sectional comparison detection type.
[0109] In one embodiment, when the processor 110 executes the operation of obtaining the detection type corresponding to the detection problem and determining the target detection task corresponding to the detection type, it specifically performs the following operations: If the detection type corresponding to the detection problem is the security detection type, then the target detection task corresponding to the security detection type is a point-by-point scoring task; If the detection type corresponding to the detection problem is the realism detection type, then the target detection task corresponding to the realism detection type is a pairwise comparison task; If the detection type corresponding to the detection problem is the horizontal comparison detection type, then the target detection task corresponding to the horizontal comparison detection type is a scoring comparison task.
[0110] In one embodiment, the target detection task is a point-by-point scoring task or a pairwise comparison task; When the processor 110 executes the target detection task using the marking model to detect the target answer and obtains the detection result of the model to be detected, it specifically performs the following operations: The target answer is detected and processed by the target detection task using the marking model, and a target detection score is obtained for the target answer. If the target detection score is greater than the preset score, the detection result is determined to be a risk-free result. If the target detection score is less than a preset score, the detection result is determined to be a risky result.
[0111] In one embodiment, the marking model includes a small marking model and a large marking model; When the processor 110 executes the target detection task using the marking model to detect the target answer and obtain the target detection score of the target answer, it specifically performs the following operations: The target answer is detected and processed using the aforementioned marking model and the target detection task to obtain an initial detection score. If the initial detection score is not within the range of the re-examination score, then the initial detection score is determined as the target detection score; If the initial detection score is within the range of the re-examination score, the target detection task is called by the large-scale marking model to detect and process the target answer, and a target detection score is obtained.
[0112] In one embodiment, the target detection task is a pairwise comparison task; When the processor 110 executes the target detection task using the marking model to detect the target answer and obtain the target detection score of the target answer, it specifically performs the following operations: Retrieve the preset answers corresponding to the detection questions from the preset question bank; The pairwise comparison task is invoked using a marking model to calculate the similarity score between the target answer and the preset answer; The similarity score is determined as the target detection score for the target answer.
[0113] In one embodiment, the target detection task is a scoring comparison task; When the processor 110 executes the target detection task using the marking model to detect the target answer and obtains the detection result of the model to be detected, it specifically performs the following operations: Determine at least one comparison model corresponding to the model to be detected; The marking model is used to invoke the scoring comparison task to obtain the comparison answers of each comparison model for the detection question; The target answer and each of the comparison answers are processed to obtain a ranking result of the target answer and each of the comparison answers.
[0114] In this embodiment, the detection question is input into the model to be detected, and the target answer output by the model is obtained. The detection type includes one or more of the following: security detection type, authenticity detection type, and horizontal comparison detection type. If the detection type corresponding to the detection question is a security detection type, the target detection task corresponding to the security detection type is a point-by-point scoring task; if the detection type corresponding to the detection question is an authenticity detection type, the target detection task corresponding to the authenticity detection type is a pairwise comparison task; if the detection type corresponding to the detection question is a horizontal comparison detection type, the target detection task corresponding to the horizontal comparison detection type is a score comparison task. By classifying the detection questions and calling the corresponding detection tasks to perform targeted risk detection on the answers of different types of detection questions, the accuracy of risk detection by the model to be detected in different aspects is ensured. The small marking model calls the target detection task to detect and process the target answer to obtain an initial detection score. If the initial detection score is not within the re-examination score range, the initial detection score is determined as the target detection score; if the initial detection score is within the re-examination score range, the large marking model calls the target detection task to detect and process the target answer to obtain the target detection score. By combining large and small models for risk detection processing, the accuracy of the detection score is ensured while saving computational resources and improving response efficiency. If the target detection score is greater than the preset score, the detection result is determined to be risk-free; if the target detection score is less than the preset score, the detection result is determined to be risky. Using the detection score generated through the detection processing to determine the detection result, the digitized result improves the accuracy of the detection results.
[0115] Preset answers to detection questions are obtained from a pre-set question bank. A marking model is used to invoke a pairwise comparison task to calculate the similarity score between the target answer and the pre-set answers. This similarity score is then used as the target detection score for the target answer. By using questions and answers from the pre-set question bank, security risk detection of the model under test is achieved, improving the targeting and accuracy of security risk detection. At least one comparison model is identified for the model under test. A marking model is used to invoke a scoring comparison task to obtain the comparison answers of each comparison model for the detection questions. Detection processing is performed on the target answer and each comparison answer to obtain the ranking results of the target answer and each comparison answer. By combining the comparison answers of the comparison models, risk detection of horizontal comparison of the model under test is achieved, improving the targeting and accuracy of horizontal comparison detection. Using a marking model to process the model under test improves the efficiency of model risk detection. Furthermore, invoking different detection tasks to process different types of question answers improves the targeting of detection for different types of questions, thereby improving the accuracy of model risk detection.
[0116] Those skilled in the art will understand that all or part of the processes in the above embodiments can be implemented by a computer program instructing related hardware. The program can be stored in a computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. The storage medium can be a magnetic disk, optical disk, read-only memory, or random access memory, etc.
[0117] The above-disclosed embodiments are merely preferred embodiments of this application and should not be construed as limiting the scope of this application. Therefore, any equivalent variations made in accordance with the claims of this application shall still fall within the scope of this application.
[0118] It should be noted that the information (including but not limited to user device information, user personal information, etc.), data (including but not limited to data used for analysis, stored data, displayed data, etc.), and signals involved in the embodiments of this specification are all authorized by the user or fully authorized by all parties, and the collection, use, and processing of related data must comply with the relevant laws, regulations, and standards of the relevant countries and regions. For example, the detection questions and target answers involved in this specification were obtained under full authorization.
Claims
1. A model risk detection method, characterized in that, The method includes: Input the detection question into the model to be detected, and obtain the target answer output by the model to be detected; Obtain the detection type corresponding to the detection problem, and determine the target detection task corresponding to the detection type; The target answer is detected by calling the target detection task using the marking model, and the detection result of the model to be detected is obtained.
2. The method according to claim 1, characterized in that, The detection types include one or more of the following: security detection type, authenticity detection type, and cross-sectional comparison detection type.
3. The method according to claim 2, characterized in that, The step of obtaining the detection type corresponding to the detection problem and determining the target detection task corresponding to the detection type includes: If the detection type corresponding to the detection problem is the security detection type, then the target detection task corresponding to the security detection type is a point-by-point scoring task; If the detection type corresponding to the detection problem is the realism detection type, then the target detection task corresponding to the realism detection type is a pairwise comparison task; If the detection type corresponding to the detection problem is the horizontal comparison detection type, then the target detection task corresponding to the horizontal comparison detection type is a scoring comparison task.
4. The method according to claim 1, characterized in that, The target detection task is either a point-by-point scoring task or a pairwise comparison task. The step of using a marking model to invoke the target detection task to detect the target answer and obtain the detection result of the model to be detected includes: The target answer is detected and processed by the target detection task using the marking model, and a target detection score is obtained for the target answer. If the target detection score is greater than the preset score, the detection result is determined to be a risk-free result. If the target detection score is less than a preset score, the detection result is determined to be a risky result.
5. The method according to claim 4, characterized in that, The marking model includes a small marking model and a large marking model; The step of using a marking model to call the target detection task to detect the target answer and obtain the target detection score of the target answer includes: The target answer is detected and processed using the aforementioned marking model and the target detection task to obtain an initial detection score. If the initial detection score is not within the range of the re-examination score, then the initial detection score is determined as the target detection score; If the initial detection score is within the range of the re-examination score, the target detection task is called by the large-scale marking model to detect and process the target answer, and a target detection score is obtained.
6. The method according to claim 3, characterized in that, The target detection task is a pairwise comparison task; The step of using a marking model to call the target detection task to detect the target answer and obtain the target detection score of the target answer includes: Retrieve the preset answers corresponding to the detection questions from the preset question bank; The pairwise comparison task is invoked using a marking model to calculate the similarity score between the target answer and the preset answer; The similarity score is determined as the target detection score for the target answer.
7. The method according to claim 1, characterized in that, The target detection task is a scoring and comparison task; The step of using a marking model to invoke the target detection task to detect the target answer and obtain the detection result of the model to be detected includes: Determine at least one comparison model corresponding to the model to be detected; The marking model is used to invoke the scoring comparison task to obtain the comparison answers of each comparison model for the detection question; The target answer and each of the comparison answers are processed to obtain a ranking result of the target answer and each of the comparison answers.
8. A model risk detection device, characterized in that, The device includes: The problem detection unit is used to input the detection problem into the model to be detected and obtain the target answer output by the model to be detected; The detection task determination unit is used to obtain the detection type corresponding to the detection problem and determine the target detection task corresponding to the detection type. The detection result acquisition unit is used to call the target detection task using the marking model to detect the target answer and obtain the detection result of the model to be detected.
9. A computer storage medium, characterized in that, The computer storage medium stores a plurality of instructions, which are adapted to be loaded by a processor and executed as method steps as claimed in any one of claims 1 to 7.
10. A computer program product storing a plurality of instructions adapted for loading by a processor and executing the method steps of any one of claims 1 to 7.
11. An electronic device, characterized in that, include: A processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and executed the method steps as claimed in any one of claims 1 to 7.