A method, apparatus, device, and readable storage medium for determining interface stability.

By employing a multi-round invocation strategy and dynamic feature correction, the accuracy issue of AI interface stability assessment was resolved, enabling integrated assessment of the interface and content layers and improving the accuracy and comprehensiveness of the assessment.

CN122309285APending Publication Date: 2026-06-30FOUNDER SECURITIES CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
FOUNDER SECURITIES CO LTD
Filing Date
2026-05-21
Publication Date
2026-06-30

Smart Images

  • Figure CN122309285A_ABST
    Figure CN122309285A_ABST
Patent Text Reader

Abstract

This invention discloses a method, apparatus, device, and readable storage medium for determining interface stability, applied in the field of artificial intelligence. The method includes: initiating a preset number of calls to the service interface to be evaluated based on a multi-round call strategy, obtaining the execution status and response data corresponding to each call; determining at least two interface stability indicators based on the execution status and response data; determining the output content under a successful call based on the execution status and response data, and determining at least two content stability indicators based on the output content; and determining a comprehensive stability parameter based on the parameter values ​​corresponding to the interface stability indicators and the parameter values ​​corresponding to the content stability indicators. This invention, by designing a multi-round call strategy, achieves batch and reproducible collection of stability data at the AI ​​service interface layer, solving the problem of passive collection, constructing a dual-dimensional stability evaluation system for the interface layer and content layer, realizing the integrated evaluation of both, and improving the accuracy of stability evaluation.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of artificial intelligence technology, and in particular to a method, apparatus, device, and readable storage medium for determining interface stability. Background Technology

[0002] Currently, the industry's assessment and monitoring of service interface stability mainly involves collecting and analyzing performance metrics of interface calls, such as response time, throughput, distribution and success rate of HTTP (Hypertext Transfer Protocol) error codes (e.g., 5XX server errors). Based on time-series data of these performance metrics, anomaly detection is performed by setting static thresholds or based on simple historical baselines to determine whether the interface service is running stably and to trigger alarms when metrics are abnormal.

[0003] It is evident that how to accurately assess the stability of AI (artificial intelligence) interfaces is a technical problem that urgently needs to be solved by those skilled in the art. Summary of the Invention

[0004] In view of this, the purpose of the present invention is to provide a method, apparatus, device and readable storage medium for determining interface stability, which solves the technical problem of low stability and accuracy of AI interfaces in the prior art.

[0005] To address the aforementioned technical problems, this invention provides a method for determining interface stability, comprising: Based on a multi-round call strategy, a preset number of calls are made to the service interface to be evaluated, and the execution status and response data corresponding to each call are obtained; Determine at least two interface stability metrics based on the execution status and the response data; Based on the execution status and the response data, determine the output content under successful call, and based on the output content, determine at least two content stability indicators; The comprehensive stability parameter is determined based on the parameter values ​​corresponding to the interface stability index and the parameter values ​​corresponding to the content stability index.

[0006] Optionally, a comprehensive stability parameter is determined based on the parameter values ​​corresponding to the interface stability index and the parameter values ​​corresponding to the content stability index, including: Determine the first weight corresponding to each interface stability index, and determine the target interface stability parameter based on the first weight corresponding to each interface stability index and the parameter value corresponding to the interface stability index. Determine the second weight corresponding to each content stability indicator, and determine the target content stability parameter based on the second weight of each content stability indicator and the parameter value corresponding to the content stability indicator; Determine the third weight corresponding to the target interface stability parameter and the fourth weight corresponding to the target content stability parameter, and determine the comprehensive stability parameter based on the third weight, the target interface stability parameter, the fourth weight, and the target content stability parameter.

[0007] Optionally, the interface stability metrics include at least the call success rate, average response time, time fluctuation rate, abnormal code ratio, and timeout rate; The content stability indicators include at least the content consistency rate, the no-illusion rate, and the logical inconsistency rate.

[0008] Optionally, after determining the comprehensive stability parameter based on the parameter values ​​corresponding to the interface stability index and the parameter values ​​corresponding to the content stability index, the method further includes: Dynamic change features are extracted based on the invoked data; wherein, the dynamic change features include at least one of the following: indicator fluctuation features, indicator change trend features, abnormal mutation features, and output drift features; the indicator fluctuation features are used to characterize the degree of dispersion of the indicator around the trend line in multiple rounds of invocation; the indicator change trend features are used to characterize the change trend of the indicator in multiple rounds of invocation; the abnormal mutation features are used to characterize whether there are abnormal mutations in the indicator exceeding a preset threshold in multiple rounds of invocation; the output drift features are used to characterize whether the output deviates continuously from a preset benchmark level; Determine the dynamic change correction value based on the aforementioned dynamic change characteristics; The comprehensive stability parameter is corrected based on the dynamically changing correction value to obtain the corrected comprehensive stability parameter.

[0009] Optionally, the comprehensive stability parameter is corrected based on the dynamically changing correction value to obtain the corrected comprehensive stability parameter, including: When the dynamic change feature indicates that the interface is operating under at least one of the following conditions: increased fluctuation of indicators, deteriorating trend of indicators, abnormal mutation of indicators, or output drift, a dynamic change correction value greater than zero is determined; wherein, the dynamic change correction value increases with the degree of abnormality of the dynamic change feature. The corrected comprehensive stability parameter is obtained by subtracting the dynamic change correction value from the comprehensive stability parameter.

[0010] Optionally, after determining the comprehensive stability parameter based on the parameter values ​​corresponding to the interface stability index and the parameter values ​​corresponding to the content stability index, the method further includes: The stability level is determined based on the comprehensive stability parameters. A stability assessment report is generated based on the interface stability index, the content stability index, and the stability level; wherein, the stability assessment includes at least interface adjustment suggestions.

[0011] Optionally, the multi-round invocation strategy can be any one of the following: a repeated invocation strategy with the same input, a continuous session invocation strategy with context inheritance, a perturbation invocation strategy, or a concurrent invocation strategy.

[0012] The present invention also provides an interface stability determination device, comprising: The execution status and response data acquisition module is used to initiate a preset number of calls to the service interface to be evaluated based on a multi-round call strategy, and obtain the execution status and response data corresponding to each call. An interface stability index determination module is used to determine at least two interface stability indices based on the execution status and the response data. The content stability index determination module is used to determine the output content under successful call based on the execution status and the response data, and to determine at least two content stability indices based on the output content. The comprehensive stability parameter determination module is used to determine the comprehensive stability parameters based on the parameter values ​​corresponding to the interface stability index and the parameter values ​​corresponding to the content stability index.

[0013] The present invention also provides an interface stability determination device, comprising: Memory, used to store computer programs; A processor for executing the computer program to implement the steps of the interface stability determination method described above.

[0014] The present invention also provides a readable storage medium (i.e., a computer-readable storage medium) on which a computer program is stored, wherein when the computer program is executed by a processor, the steps of the above-described interface stability determination method are implemented.

[0015] The present invention also provides a computer program product, including a computer program / instructions, which, when executed by a processor, implement the steps of the above-described method.

[0016] As can be seen, this invention initiates a preset number of calls to the service interface to be evaluated based on a multi-round call strategy, obtaining the execution status and response data corresponding to each call; determines at least two interface stability indicators based on the execution status and response data; determines the output content under successful calls based on the execution status and response data, and determines at least two content stability indicators based on the output content; and determines a comprehensive stability parameter based on the parameter values ​​corresponding to the interface stability indicators and the parameter values ​​corresponding to the content stability indicators. The beneficial effects of this invention are: compared with the current method of determining interface stability based on simple statistical indicators and corresponding thresholds, this invention, by designing a multi-round call strategy, achieves batch and reproducible collection of AI service interface layer stability data, solving the problem of passive collection, constructing a dual-dimensional stability evaluation system for the interface layer and content layer, realizing the integrated evaluation of the two, and improving the accuracy of stability evaluation.

[0017] In addition, the present invention also provides an interface stability determination device, apparatus, and readable storage medium, which also have the above-mentioned beneficial effects. Attached Figure Description

[0018] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on the provided drawings without creative effort.

[0019] Figure 1 A flowchart of an interface stability determination method provided in an embodiment of the present invention; Figure 2 A flowchart illustrating an interface stability determination method provided in an embodiment of the present invention; Figure 3 This is a schematic diagram of an interface stability determination device provided in an embodiment of the present invention; Figure 4 This is a schematic diagram of an interface stability determination device provided in an embodiment of the present invention. Detailed Implementation

[0020] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0021] Some terms that appear in the description of the embodiments of this application are subject to the following interpretation: It should be noted that, in this invention, the AI ​​service interface can be an application programming interface (API) that provides capabilities such as natural language processing, generative dialogue, text summarization, question answering, content rewriting, reasoning analysis, code generation, and image description.

[0022] Please refer to Figure 1 , Figure 1 This is a flowchart illustrating a method for determining interface stability according to an embodiment of the present invention. The method can be executed by a server, terminal device, or other electronic device, and includes the following steps: S101. Based on the multi-round call strategy, initiate a preset number of calls to the service interface to be evaluated, and obtain the execution status and response data corresponding to each call.

[0023] In this embodiment, the service interface to be evaluated can be an AI (artificial intelligence) service interface to be evaluated. The multi-turn call strategy in this embodiment can be flexibly configured according to the evaluation objective. In one implementation, the multi-turn call strategy can include any one or more of the following: First, a repeated call strategy with the same input. That is, multiple independent calls are made for the same input content to examine the return stability of the AI ​​service interface under the same conditions. Second, a continuous session call strategy with context inheritance. That is, in a multi-turn dialogue scenario, subsequent rounds of calls inherit the output of the previous round or historical context to evaluate the performance stability of the interface in continuous sessions. Third, a perturbation call strategy. That is, minor perturbations such as synonym substitution, format changes, redundant interference, and order adjustments are applied to the original input to determine the robustness and stability of the AI ​​service interface to input perturbations. Fourth, a concurrent call strategy. That is, multiple calls are initiated at similar times or within overlapping time windows to evaluate the response performance of the AI ​​service interface under concurrent pressure. In this embodiment, the preset number of calls can be set according to evaluation needs, such as 10, 50, 100, or more times. After each call, its execution status and response data are recorded. The execution status can include success, failure, timeout, abnormal termination, and other status information; the response data can include response content, response time, exception code, return identifier, and other information. Specifically, this embodiment can receive user-configured evaluation parameters, including: the address of the service interface to be evaluated, interface request parameters (question content, model parameters, etc.), the number of repeated calls N, the call time interval, and the call termination threshold (e.g., termination if the single failure rate exceeds 50%). Based on the configured parameters, under the same network environment and request parameters, N repeated call requests are automatically initiated to the service interface to be evaluated at preset time intervals to ensure the reproducibility of the evaluation; the execution status (success / failure / timeout) of each call is recorded in real time, and the original response data of each call (response time, status code, exception information, AI output content, etc.) is also recorded.

[0024] It should be noted that the multi-round invocation strategy in this embodiment can be any one of the following: repeated invocation strategy with the same input, continuous session invocation strategy with context inheritance, perturbation invocation strategy, and concurrent invocation strategy. The repeated invocation strategy in this embodiment involves initiating multiple repeated invocations of the service interface to be evaluated for the same input content, under the condition that they are independent of each other and the configuration parameters remain consistent. The continuous session invocation strategy with context inheritance in this embodiment involves the input of the later invocation at least partially depending on the output content or historical session information of the previous invocation during the multi-round invocation process, thus forming a context inheritance relationship. The perturbation invocation strategy in this embodiment can perturb the original input while maintaining a basically consistent task objective, and then initiate multiple invocations of the service interface to be evaluated based on the perturbated input. The perturbation processing can include, but is not limited to: synonym replacement, word order adjustment, sentence rewriting, insertion of irrelevant information, format changes, punctuation changes, and introduction of local noise. This can detect whether the AI ​​service interface is overly sensitive to subtle input changes, avoid significant deviations under input conditions that are different in form but similar in meaning, and thus reflect the stability of the interface in real-world complex input environments. The concurrent invocation strategy in this embodiment initiates multiple invocation requests to the service interface to be evaluated in parallel within the same time period or overlapping time windows to simulate high concurrency or sudden access scenarios. This allows for the simulation of peak access scenarios in real-world business systems and the identification of potential performance degradation, increased anomalies, or fluctuations in output quality that may occur in AI service interfaces under high load.

[0025] S102. Determine at least two interface stability metrics based on execution status and response data.

[0026] In this embodiment, interface stability metrics are used to characterize the stability of the AI ​​service interface at the technical execution level. Optionally, interface stability metrics include at least two of the following: call success rate, which characterizes the percentage of calls that successfully return valid results out of all calls; average response time, which characterizes the average time required for the interface to complete a call; time consumption volatility, which characterizes the dispersion of response time among different calls; exception code percentage, which characterizes the percentage of calls that return exception codes out of all calls; and timeout rate, which characterizes the percentage of calls that fail to complete within a preset timeout period. In specific implementations, the above metrics can be statistically analyzed and calculated based on execution status and response data. Through these metrics, the interface's reachability, efficiency, and anomaly performance can be quantitatively evaluated. Specifically, this embodiment can receive response data from multiple rounds of calls and extract core interface operation data: single response time, return status code, and call success / failure / timeout identifier exception information type; based on batch collected data, it calculates interface layer stability quantification indicators according to standardized algorithms. All indicators are quantifiable values ​​from 0-100%, which may include: call success rate = number of successful calls / total number of calls × 100%; average response time = sum of all successful call times / number of successful calls; time consumption fluctuation rate = (maximum single time consumption - minimum single time consumption) / average response time × 100%; exception code ratio = number of exception status codes / total number of calls × 100%; timeout rate = number of timeout calls / total number of calls × 100%.

[0027] S103. Determine the output content under successful call based on execution status and response data, and determine at least two content stability indicators based on the output content.

[0028] It is understandable that, unlike traditional interfaces, AI service interfaces may still experience unstable output even when the call is successful. Therefore, this embodiment of the invention further analyzes the output content corresponding to successful calls and extracts content stability indicators. These content stability indicators include at least two: content consistency rate, which characterizes the degree of consistency between output results from multiple calls under the same or similar input conditions; no-illusion rate, which characterizes the proportion of output content that does not contain fabricated facts, content inconsistent with input requirements, or content generated without basis; and logical contradiction rate, which characterizes the proportion of output content that does not contain obvious logical conflicts, self-contradictions, or inconsistent conclusions. The content consistency rate can be determined by semantic comparison, structural comparison, key information comparison, or result classification and aggregation of the successfully called output content. The no-illusion rate and logical contradiction rate can be obtained through rule detection, fact verification modules, logical consistency analysis modules, or manual annotation feedback. Specifically, this embodiment can receive the raw response data of the multi-round call module, extract the AI ​​output content of all successful calls, and form a multi-round return content set; based on SelfCheckGPT (Self-Checking Large Language Model) technology, it performs three core checks on the content set, which are executed automatically throughout the process: Semantic consistency check: determines whether the core semantics of the multi-round output content are consistent and counts the number of semantic inconsistencies; Illusion recognition: determines whether the output content contains illusion content that is inconsistent with the facts or generated without basis, and counts the number of illusions; Logical contradiction check: determines whether there are internal logical contradictions in a single / multi-round content and counts the number of logical contradictions; Indicator calculation: based on the verification results, it calculates the content layer stability quantification indicators. All indicators are quantifiable values ​​from 0 to 100%, which may include: Content consistency rate = number of semantically consistent calls / number of successful calls × 100%; No illusion rate = number of no illusion calls / number of successful calls × 100%; Logical contradiction rate = number of logical contradiction-free calls / number of successful calls × 100%.

[0029] S104. Determine the comprehensive stability parameters based on the parameter values ​​corresponding to the interface stability index and the parameter values ​​corresponding to the content stability index.

[0030] To obtain more accurate evaluation results, this embodiment fuses interface stability indicators and content stability indicators to form a comprehensive stability parameter. This embodiment does not limit the method for determining the comprehensive stability parameter. In one implementation, firstly, a first weight is determined for each interface stability indicator, and these indicators are weighted and fused to obtain the target interface stability parameter. Subsequently, a second weight is determined for each content stability indicator, and these indicators are weighted and fused to obtain the target content stability parameter. Finally, third and fourth weights are assigned to the target interface stability parameter and the target content stability parameter respectively, and the two are fused into a comprehensive stability parameter. The weights can be set based on empirical rules, historical sample statistics, expert knowledge, or business preferences. For example, in application scenarios where system availability is paramount, the weights of call success rate and timeout rate can be increased; in application scenarios where output quality is paramount, the weights of content consistency rate, no-illusion rate, or logical inconsistency rate can be increased. Alternatively, in another implementation, the two indicators can be directly added to determine the comprehensive stability parameter. This embodiment can also adjust the comprehensive stability parameter determined based on the parameter values ​​corresponding to the interface stability indicator and the content stability indicator based on other indicators.

[0031] Specifically, this embodiment supports user-defined weight configuration (default weights: 50% for the interface layer and 50% for the content layer), and also supports configuring detailed weights for each sub-indicator within the interface and content layers (e.g., 30% for call success rate and 20% for latency fluctuation rate in the interface layer). Based on the preset weights, stability scores for the interface layer and content layer are calculated separately, with a score range of 0-100. The better the performance of the sub-indicator, the higher the score. Example formula: Interface layer stability score = call success rate × 30% + (1 - latency fluctuation rate) × 20% + (1 - anomaly rate) × 20%. The code percentage is calculated as follows: (1 - Timeout rate) × 20% + (1 - Timeout rate) × 30% × 100; Content layer stability score = Content consistency rate × 40% + No illusion rate × 30% + Logical inconsistency rate × 30% × 100; Comprehensive score calculation: Integrating the interface layer and content layer scores, a dual-dimensional comprehensive stability score for the AI ​​service is generated. The formula is: Comprehensive stability parameter = Interface layer stability score × Interface layer weight + Content layer stability score × Content layer weight; In this embodiment, the interface layer score, content layer score, and comprehensive stability score can be transmitted to the evaluation report generation module in the future. It should be noted that the calculation method for time consumption volatility can be replaced by calculating the coefficient of variation using standard deviation / mean, which can be used as a time consumption volatility indicator to quantify the degree of time consumption volatility.

[0032] It should be further explained that, in order to improve the accuracy of determining the comprehensive stability parameters, the above-mentioned determination of comprehensive stability parameters based on the parameter values ​​corresponding to interface stability indicators and content stability indicators may include: determining a first weight corresponding to each interface stability indicator, and determining the target interface stability parameter based on the first weight and the parameter value corresponding to the interface stability indicator; determining a second weight corresponding to each content stability indicator, and determining the target content stability parameter based on the second weight and the parameter value corresponding to the content stability indicator; determining a third weight corresponding to the target interface stability parameter and a fourth weight corresponding to the target content stability parameter, and determining the comprehensive stability parameter based on the third weight, the target interface stability parameter, the fourth weight, and the target content stability parameter. This embodiment determines the weight corresponding to each interface stability indicator, the weight of the subsequent target interface stability parameter, and the weights corresponding to each content stability parameter and the target content stability parameter, respectively, improving the accuracy of weight determination, thereby improving the accuracy of determining the comprehensive stability layer parameters.

[0033] It should be further noted that, based on the above embodiments, the present invention can further introduce a dynamic change correction mechanism. After obtaining the comprehensive stability parameters, the above method may further include the following steps: S1. Extract dynamic change features based on call data; wherein, the dynamic change features include at least one of the following: indicator fluctuation features, indicator change trend features, abnormal mutation features, and output drift features; the indicator fluctuation features are used to characterize the degree of dispersion of the indicator around its trend line in multiple rounds of calls; the indicator change trend features are used to characterize the monotonic trend of the indicator showing deterioration or improvement in multiple rounds of calls; the abnormal mutation features are used to characterize whether the indicator has a step change exceeding a preset threshold in multiple rounds of calls; and the output drift features are used to characterize whether the output deviates continuously from the preset benchmark level.

[0034] During multiple rounds of calls, even if the overall average metric is good, there may be issues such as rapid deterioration, sudden anomalies, or gradual deviation of the output in localized periods. Therefore, this embodiment further extracts dynamic change features from the call sequence. The dynamic change features in this embodiment include at least one of the following: metric fluctuation features, used to characterize the degree of dispersion of a certain metric around a trend line in multiple rounds of calls; metric change trend features, used to characterize whether the metric shows a continuous upward, continuous downward, or phased deterioration trend in multiple rounds of calls; anomalous mutation features, used to characterize whether the metric exhibits an anomalous jump exceeding a preset threshold in certain call rounds; and output drift features, used to characterize whether the output result continuously deviates from a preset baseline level. For example, if the overall average interface response time is normal, but the call time in the latter half continues to increase, it indicates a performance degradation trend; or, for example, if the output result is relatively stable in the early stage but gradually deviates from the same semantic center in the later stage under the same input, it indicates an output drift problem.

[0035] S2. Determine the dynamic change correction value based on the dynamic change characteristics.

[0036] After detecting dynamic change characteristics, a dynamic change correction value can be determined based on the degree of anomaly. The dynamic change correction value can be designed to be positively correlated with the degree of anomaly; that is, the greater the fluctuation, the worse the trend, the more obvious the mutation, and the more severe the drift, the larger the dynamic change correction value. In this embodiment, the parameter value corresponding to each type of dynamic change characteristic is determined, thereby determining the dynamic change correction value, i.e., Dynamic change correction value = Indicator fluctuation correction value + Indicator change trend correction value + Anomaly mutation correction value + Output drift correction value. This embodiment can determine the indicator fluctuation correction value based on the fluctuation degree of indicators such as interface response time, content consistency score, and result credibility score in consecutive rounds of calls. The indicator fluctuation correction value can be determined based on a comparison with a preset standard, thereby avoiding misjudgments where the average score is good but the actual performance is extremely unstable. This embodiment can analyze the changing trends of various indicators based on the continuous call sequence. When an indicator is detected to have a continuous downward trend, a continuous deviation trend, or a phased degradation trend, an indicator change trend correction value is generated based on the trend strength, avoiding the static average value from masking the interface degradation process. This embodiment can detect abnormal mutations in various indicators within a continuous evaluation sequence. When the variation between adjacent rounds exceeds a preset mutation threshold, the corresponding round is marked as an abnormal mutation point, and an abnormal mutation correction value is generated based on the frequency and impact of the abnormal mutation point, avoiding misjudging sudden severe anomalies as accidental and negligible problems. This embodiment can compare the output results of subsequent rounds with the initial reference output, stable baseline output, or standard reference results. When the deviation exceeds a preset drift threshold, an output drift correction value is generated based on the degree and duration of the deviation, thereby identifying long-term quality degradation problems.

[0037] S3. Based on the dynamically changing correction value, the comprehensive stability parameter is corrected to obtain the corrected comprehensive stability parameter.

[0038] This embodiment does not limit the specific method for correcting the comprehensive stability parameter based on dynamic change correction values. This embodiment can deduct the comprehensive stability parameter based on each dynamic correction parameter value; or it can calculate the dynamic risk level based on dynamic change characteristics. The dynamic risk level includes low risk, medium risk, and high risk. When the dynamic risk level is low, the comprehensive stability parameter is not corrected or only slightly corrected; when the dynamic risk level is medium, a moderate deduction is applied to the comprehensive stability parameter; and when the dynamic risk level is high, a significant deduction is applied to the comprehensive stability parameter. Alternatively, this embodiment can increase the weight of the interface layer stability score in the comprehensive stability parameter when the dynamic change characteristics mainly manifest as fluctuations in interface response time or an increase in timeout anomalies; and increase the weight of the content layer stability score in the comprehensive score when the dynamic change characteristics mainly manifest as decreased content consistency or increased output drift. The comprehensive stability parameter is then recalculated based on the adjusted weights. The beneficial effect of this embodiment is that it not only evaluates the average performance of static indicators but also focuses on the temporal changes during the call process, thus making the evaluation results closer to actual operational risks and improving the accuracy of interface stability assessment.

[0039] It should be further explained that, based on any of the above embodiments, correcting the comprehensive stability parameter based on the dynamic change correction value to obtain the corrected comprehensive stability parameter may include: determining a dynamic change correction value greater than zero when at least one of the following occurs during the operation of the dynamic change feature characterization interface: increased index fluctuation, deteriorating index trend, abnormal abrupt change in index, or output drift; wherein the dynamic change correction value increases with the degree of abnormality of the dynamic change feature; and the corrected comprehensive stability parameter is obtained by subtracting the dynamic change correction value from the comprehensive stability parameter. This embodiment provides a specific method for correcting the comprehensive stability parameter based on the dynamic change correction value, improving the accuracy of the correction.

[0040] It should be further noted that, based on any of the above embodiments, after determining the comprehensive stability parameter based on the parameter values ​​corresponding to the interface stability index and the content stability index, it may also include: S301. Determine the stability level based on comprehensive stability parameters.

[0041] This embodiment can divide the stability level into multiple levels such as excellent, good, average, poor, and risky, so that users can quickly understand the overall status of the service interface to be evaluated.

[0042] S302. Generate a stability assessment report based on interface stability indicators, content stability indicators, and stability levels.

[0043] The stability assessment report provided in this embodiment may include, but is not limited to, the following: total number of interface calls, number of successful calls, number of failed calls, and number of timeouts; call success rate, average response time, timeout volatility, percentage of abnormal codes, and timeout rate; content consistency rate, no-illusion rate, and logical inconsistency rate; comprehensive stability parameters, corrected comprehensive stability parameters and their level results; dynamic change characteristic analysis results; and interface adjustment suggestions. The interface adjustment suggestions can be automatically generated based on the assessment results. For example, when the call success rate is low, it is recommended to optimize service availability or retry mechanisms; when the timeout volatility is high, it is recommended to investigate resource allocation, caching strategies, or model inference chains; when the content consistency rate is low or output drift exists, it is recommended to optimize prompt word templates, model version control, or context management strategies. By generating a structured assessment report, this embodiment of the invention can support the operation and maintenance monitoring, version comparison, online acceptance, and continuous optimization of AI service interfaces.

[0044] This invention provides a method for determining interface stability, which may include: S101, initiating a preset number of calls to the service interface to be evaluated based on a multi-round call strategy, obtaining the execution status and response data corresponding to each call; S102, determining at least two interface stability indicators based on the execution status and response data; S103, determining the output content under a successful call based on the execution status and response data, and determining at least two content stability indicators based on the output content; S104, determining a comprehensive stability parameter based on the parameter values ​​corresponding to the interface stability indicators and the parameter values ​​corresponding to the content stability indicators. Compared with the current method of determining interface stability based on simple statistical indicators and corresponding thresholds, this invention, by designing a multi-round call strategy, achieves batch and reproducible collection of AI service interface layer stability data, solves the problem of passive collection, constructs a dual-dimensional stability evaluation system for the interface layer and content layer, realizes the integrated evaluation of the two, and improves the accuracy of stability evaluation. Furthermore, this invention also introduces dynamic change features to detect fluctuations, deterioration trends, mutations, and drifts during interface operation, and corrects the comprehensive stability parameter, making the evaluation results more consistent with the actual operating state. This invention can also output stability levels and evaluation reports, and further provide suggestions for interface adjustments, thereby supporting AI service interface optimization, model governance, quality monitoring and operation and maintenance decisions.

[0045] The purpose of this invention is to design a proactive, multi-round, repeated call technology solution to achieve batch, reproducible collection of stability data for AI service interfaces, solving the problem of passive data collection. It constructs a dual-dimensional stability assessment system for both the interface and content layers, achieving integrated assessment and addressing the issue of single-dimensional evaluation. A unified quantitative scoring model is established, integrating multi-dimensional indicators into a comparable comprehensive stability score, providing enterprises with an objective basis for judgment. Finally, it achieves fully automated, closed-loop stability assessment, meeting the full-scenario needs of enterprises for AI service procurement evaluation, deployment acceptance, and continuous monitoring.

[0046] For a clearer understanding of this invention, please refer to the following details. Figure 2 , Figure 2 A flowchart illustrating an interface stability determination method provided in this embodiment of the invention may specifically include: S201, Receive configuration parameters, including the address of the service interface to be evaluated, interface request parameters, number of repeated calls N, call time interval, and call termination threshold.

[0047] This embodiment involves a fixed number of calls (N), which can be replaced by continuous calls for a fixed duration. That is, calls are continuously initiated within a preset time period, and metrics are calculated based on the actual number of calls, achieving the same purpose of interface data collection. Alternatively, it can be triggered by distributed nodes, with multiple distributed nodes simultaneously initiating calls to simulate interface stability assessment under high concurrency scenarios. The assessment logic and result determination remain consistent.

[0048] S202, based on configuration parameters, under the same network environment and request parameters, automatically initiates N repeated call requests to the service interface to be evaluated at preset time intervals.

[0049] S203 records the execution status (success / failure / timeout) of each call in real time, and records the response data of each call (response time status code, exception information, AI output content, etc.).

[0050] S204, determine at least two interface stability metrics based on execution status and response data.

[0051] S205, determine the output content under successful call based on execution status and response data, and determine at least two content stability indicators based on the output content.

[0052] This embodiment can employ a pre-trained content consistency detection model, manual annotation-assisted self-checking, and multi-model cross-validation, all of which can achieve quantitative calculation of content layer stability indicators. This embodiment can calculate indicators as they are called, updating indicator data with each completed call, outputting real-time stage scores, and ultimately achieving a comprehensive score consistent with the original solution.

[0053] S206. Based on the weights and interface stability indicators corresponding to each interface stability indicator, determine the target interface stability indicator; based on the weights and content stability indicators corresponding to each content stability indicator, determine the target content stability indicator.

[0054] In this embodiment, weights can be automatically assigned based on the scenario. The system has built-in weight templates for scenarios such as finance, office, and industry. After the user selects a scenario, weights are automatically assigned, achieving the same goal of integrated scoring.

[0055] S207, determine the comprehensive stability parameters based on the target interface stability index and the target content stability index.

[0056] S208. Based on the above information, a stability assessment report is generated.

[0057] This embodiment automatically generates a standardized AI service stability assessment report based on indicator data and comprehensive stability parameters. The report includes core content: assessment parameter configuration, details and scores of interface-level stability indicators, details and scores of content-level stability indicators, a two-dimensional comprehensive stability score, stability level determination (90-100 points is Grade A, 80-89 points is Grade B, 70-79 points is Grade C, and below 70 points is Grade D), and problem warnings (such as excessively high latency fluctuations and excessively low content consistency). The report supports visualization (indicator trend charts, score radar charts) and file export, enabling intuitive presentation and traceability of assessment results. The report output in this embodiment can be in the form of visualization and file export, or it can output assessment results to an API interface, providing comprehensive stability parameters and indicator data externally via API for other systems to interface with, achieving the same purpose of outputting assessment results.

[0058] The core application scenarios of this invention include pre-launch stability acceptance of externally procured AI services by enterprises, continuous monitoring of the production environment of their services by AI service providers, and commercial stability rating of AI services by third-party institutions. The recommended configuration range for the number of multi-round calls (N) is 50-200 times. This range ensures the accuracy of the evaluation results while controlling evaluation time and resource consumption. This method can adopt a modular and decoupled design, with each module deployable independently and flexibly expandable. For example, a content compliance verification dimension can be added to the content self-inspection module, or new indicators can be added to the scoring module, without modifying the overall architecture. All quantitative indicators and algorithms in this invention are standardized, ensuring consistent evaluation results for the same AI service by different users under the same configuration, guaranteeing the objectivity and reproducibility of the evaluation. The system can operate on a general-purpose server / cloud server, requiring no dedicated hardware, resulting in low deployment costs and easy implementation and promotion. This invention can be integrated with existing AI operation and maintenance platforms and monitoring platforms, incorporating stability evaluation results into the enterprise's existing AI management system for integrated control.

[0059] The beneficial effects of the embodiments of the present invention are as follows: (1) The present invention designs an active multi-round repeated call scheme, which can quantitatively evaluate the stability of AI service interfaces under continuous calls, and solves the problem that the existing technology can only passively record and cannot evaluate in advance.

[0060] (2) This invention realizes the integrated evaluation of the stability of the interface layer and the content layer, and comprehensively measures the capabilities of AI services.

[0061] (3) This invention establishes a unified quantitative scoring model and evaluation standard, outputting a comprehensive score of 0-100 points, which solves the problem of subjective evaluation in existing technologies.

[0062] (4) This system can connect to various AI service API interfaces, support users to customize the configuration of evaluation parameters and weights, and adapt to different commercial scenarios such as finance, office, and industry; at the same time, it can be integrated with open source frameworks such as Langfuse (open source large language model) to reuse its logging capabilities and realize integrated monitoring and evaluation.

[0063] The interface stability determination device provided in the embodiments of the present invention will be described below. The interface stability determination device described below and the interface stability determination method described above can be referred to in correspondence.

[0064] Please refer to the details. Figure 3 , Figure 3 A schematic diagram of an interface stability determination device provided in an embodiment of the present invention may include: The execution status and response data acquisition module 100 is used to initiate a preset number of calls to the service interface to be evaluated based on a multi-round call strategy, and obtain the execution status and response data corresponding to each call. The interface stability index determination module 200 is used to determine at least two interface stability indices based on the execution status and the response data. The content stability index determination module 300 is used to determine the output content under successful call based on the execution status and the response data, and to determine at least two content stability indices based on the output content. The comprehensive stability parameter determination module 400 is used to determine the comprehensive stability parameter based on the parameter value corresponding to the interface stability index and the parameter value corresponding to the content stability index.

[0065] Furthermore, based on any of the above embodiments, the comprehensive stability parameter determination module 400 may include: The target interface stability parameter determination unit is used to determine the first weight corresponding to each interface stability index, and to determine the target interface stability parameter based on the first weight corresponding to each interface stability index and the parameter value corresponding to the interface stability index. The target content stability parameter determination unit is used to determine the second weight corresponding to each content stability index, and to determine the target content stability parameter based on the second weight of each content stability index and the parameter value corresponding to the content stability index. The comprehensive stability parameter determination unit is used to determine the third weight corresponding to the target interface stability parameter and the fourth weight corresponding to the target content stability parameter, and to determine the comprehensive stability parameter based on the third weight, the target interface stability parameter, the fourth weight and the target content stability parameter.

[0066] Furthermore, based on any of the above embodiments, the interface stability indicators include at least the call success rate, average response time, time fluctuation rate, abnormal code ratio, and timeout rate; the content stability indicators include at least the content consistency rate, no illusion rate, and logical inconsistency rate.

[0067] Furthermore, based on any of the above embodiments, the interface stability device may further include: A dynamic change feature determination module is used to extract dynamic change features based on call data; wherein, the dynamic change features include at least one of the following: indicator fluctuation features, indicator change trend features, abnormal mutation features, and output drift features; the indicator fluctuation features are used to characterize the degree of dispersion of the indicator around the trend line in multiple rounds of calls; the indicator change trend features are used to characterize the change trend of the indicator in multiple rounds of calls; the abnormal mutation features are used to characterize whether there are abnormal mutations in the indicator exceeding a preset threshold in multiple rounds of calls; and the output drift features are used to characterize whether the output continuously deviates from a preset benchmark level. A dynamic change correction value determination module is used to determine a dynamic change correction value based on the dynamic change characteristics; The modified comprehensive stability parameter determination module is used to modify the comprehensive stability parameter based on the dynamically changing modification value to obtain the modified comprehensive stability parameter.

[0068] Furthermore, based on any of the above embodiments, the modified comprehensive stability parameter determination module may include: The dynamic change correction value determination unit is used to determine a dynamic change correction value greater than zero when the dynamic change feature indicates that there is at least one of the following situations during the operation of the interface: increased index fluctuation, deteriorating index trend, abnormal abrupt change in index, or output drift; wherein the dynamic change correction value increases with the degree of abnormality of the dynamic change feature. The modified comprehensive stability parameter determination unit is used to subtract the dynamic change correction value from the comprehensive stability parameter to obtain the modified comprehensive stability parameter.

[0069] Furthermore, based on any of the above embodiments, the interface stability device may further include: A stability level determination module is used to determine the stability level based on the comprehensive stability parameters. The stability assessment report determination module is used to generate a stability assessment report based on the interface stability index, the content stability index, and the stability level; wherein, the stability assessment includes at least interface adjustment suggestions.

[0070] Furthermore, based on any of the above embodiments, the multi-round invocation strategy is any one of the following: a repeated invocation strategy with the same input, a continuous session invocation strategy with context inheritance, a perturbation invocation strategy, and a concurrent invocation strategy.

[0071] It should be noted that the order of the modules and units in the aforementioned interface stability determination device can be changed without affecting the logic.

[0072] This invention provides an interface stability determination device, which may include: an execution status and response data acquisition module 100, used to initiate a preset number of calls to the service interface to be evaluated based on a multi-round call strategy, and obtain the execution status and response data corresponding to each call; an interface stability index determination module 200, used to determine at least two interface stability indices based on the execution status and the response data; a content stability index determination module 300, used to determine the output content under a successful call based on the execution status and the response data, and to determine at least two content stability indices based on the output content; and a comprehensive stability parameter determination module 400, used to determine a comprehensive stability parameter based on the parameter values ​​corresponding to the interface stability indices and the parameter values ​​corresponding to the content stability indices. Compared with the current method of determining interface stability based on simple statistical indicators and corresponding thresholds, this invention, by designing a multi-round call strategy, achieves batch and reproducible collection of AI service interface layer stability data, solves the problem of passive collection, constructs a dual-dimensional stability evaluation system for the interface layer and content layer, realizes the integrated evaluation of the two, and improves the accuracy of stability evaluation.

[0073] The following describes an interface stability determination device provided by an embodiment of the present invention. The interface stability determination device described below and the interface stability determination method described above can be referred to in correspondence.

[0074] Please refer to Figure 4 , Figure 4 A schematic diagram of an interface stability determination device provided in an embodiment of the present invention may include: Memory 10 is used to store computer programs; Processor 20 is used to execute computer programs to implement the interface stability determination method described above.

[0075] The memory 10, processor 20, and communication interface 30 all communicate with each other through the communication bus 40.

[0076] In this embodiment of the invention, the memory 10 is used to store one or more programs. The programs may include program code, which includes computer operation instructions. In this embodiment of the invention, the memory 10 may store programs for implementing the following functions: Based on a multi-round call strategy, a preset number of calls are made to the service interface to be evaluated, and the execution status and response data corresponding to each call are obtained; Determine at least two interface stability metrics based on execution status and response data; The output content under a successful call is determined based on the execution status and response data, and at least two content stability indicators are determined based on the output content. The comprehensive stability parameter is determined based on the parameter values ​​corresponding to the interface stability index and the parameter values ​​corresponding to the content stability index.

[0077] In one possible implementation, the memory 10 may include a program storage area and a data storage area, wherein the program storage area may store the operating system and applications required for at least one function; and the data storage area may store data created during use.

[0078] Furthermore, memory 10 may include read-only memory and random access memory, providing instructions and data to the processor. A portion of the memory may also include NVRAM. The memory stores operating systems and operating instructions, executable modules, or data structures, or subsets thereof, or extended sets thereof, wherein the operating instructions may include various operating instructions for implementing various operations. The operating system may include various system programs for implementing various basic tasks and handling hardware-based tasks.

[0079] Processor 20 can be a central processing unit (CPU), an application-specific integrated circuit, a digital signal processor, a field-programmable gate array, or other programmable logic device. Processor 20 can be a microprocessor or any conventional processor. Processor 20 can call programs stored in memory 10.

[0080] The communication interface 30 can be an interface for the communication module, used to connect with other devices or systems.

[0081] Of course, it should be noted that, Figure 4 The structure shown does not constitute a limitation on the interface stability determination device in the embodiments of the present invention. In practical applications, the interface stability determination device may include more than Figure 4 More or fewer components as shown, or combinations of certain components.

[0082] The following describes the computer-readable storage medium provided in the embodiments of the present invention. The computer-readable storage medium described below and the interface stability determination method described above can be referred to in correspondence.

[0083] The present invention also provides a computer-readable storage medium storing a computer program, which, when executed by a processor, implements the steps of the interface stability determination method described above.

[0084] The computer-readable storage medium may include various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0085] The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on its differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. For the apparatus disclosed in the embodiments, since it corresponds to the method disclosed in the embodiments, the description is relatively simple; relevant parts can be referred to in the method section.

[0086] Those skilled in the art will further recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both. To clearly illustrate the interchangeability of hardware and software, the components and steps of the various examples have been generally described in terms of functionality in the foregoing description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of this invention.

[0087] Finally, it should be noted that in this document, relationships such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such process, method, article, or apparatus.

[0088] The present invention has provided a detailed description of an interface stability determination method, apparatus, device, and readable storage medium. Specific examples have been used to illustrate the principles and implementation methods of the present invention. The descriptions of the above embodiments are only for the purpose of helping to understand the method and core ideas of the present invention. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of the present invention. Therefore, the content of this specification should not be construed as a limitation of the present invention.

Claims

1. A method of interface stability determination, the method comprising: include: Based on a multi-round call strategy, a preset number of calls are made to the service interface to be evaluated, and the execution status and response data corresponding to each call are obtained; Determine at least two interface stability metrics based on the execution status and the response data; Based on the execution status and the response data, determine the output content under successful call, and based on the output content, determine at least two content stability indicators; The comprehensive stability parameter is determined based on the parameter values ​​corresponding to the interface stability index and the parameter values ​​corresponding to the content stability index.

2. The interface stability determination method of claim 1, wherein, Based on the parameter values ​​corresponding to the interface stability index and the parameter values ​​corresponding to the content stability index, a comprehensive stability parameter is determined, including: Determine the first weight corresponding to each interface stability index, and determine the target interface stability parameter based on the first weight corresponding to each interface stability index and the parameter value corresponding to the interface stability index. Determine the second weight corresponding to each content stability indicator, and determine the target content stability parameter based on the second weight of each content stability indicator and the parameter value corresponding to the content stability indicator; Determine the third weight corresponding to the target interface stability parameter and the fourth weight corresponding to the target content stability parameter, and determine the comprehensive stability parameter based on the third weight, the target interface stability parameter, the fourth weight, and the target content stability parameter.

3. The interface stability determination method of claim 1, wherein, The interface stability metrics include at least the call success rate, average response time, time fluctuation rate, exception code ratio, and timeout rate. The content stability indicators include at least the content consistency rate, the no-illusion rate, and the logical inconsistency rate.

4. The interface stability determination method according to any one of claims 1 to 3, characterized in that, After determining the comprehensive stability parameters based on the parameter values ​​corresponding to the interface stability index and the parameter values ​​corresponding to the content stability index, the method further includes: Dynamic change features are extracted based on the invoked data; wherein, the dynamic change features include at least one of the following: indicator fluctuation features, indicator change trend features, abnormal mutation features, and output drift features; the indicator fluctuation features are used to characterize the degree of dispersion of the indicator around the trend line in multiple rounds of invocation; the indicator change trend features are used to characterize the change trend of the indicator in multiple rounds of invocation; the abnormal mutation features are used to characterize whether there are abnormal mutations in the indicator exceeding a preset threshold in multiple rounds of invocation; the output drift features are used to characterize whether the output deviates continuously from a preset benchmark level; Determine the dynamic change correction value based on the aforementioned dynamic change characteristics; The comprehensive stability parameter is corrected based on the dynamically changing correction value to obtain the corrected comprehensive stability parameter.

5. The interface stability determination method according to claim 4, characterized in that, The comprehensive stability parameters are corrected based on the dynamically changing correction value to obtain the corrected comprehensive stability parameters, including: When the dynamic change feature indicates that the interface is operating under at least one of the following conditions: increased fluctuation of indicators, deteriorating trend of indicators, abnormal mutation of indicators, or output drift, a dynamic change correction value greater than zero is determined; wherein, the dynamic change correction value increases with the degree of abnormality of the dynamic change feature. The corrected comprehensive stability parameter is obtained by subtracting the dynamic change correction value from the comprehensive stability parameter.

6. The interface stability determination method according to claim 1, characterized in that, After determining the comprehensive stability parameters based on the parameter values ​​corresponding to the interface stability index and the parameter values ​​corresponding to the content stability index, the method further includes: The stability level is determined based on the comprehensive stability parameters. A stability assessment report is generated based on the interface stability index, the content stability index, and the stability level; wherein, the stability assessment includes at least interface adjustment suggestions.

7. The interface stability determination method according to claim 1, characterized in that, The multi-round invocation strategy can be any one of the following: repeated invocation strategy with the same input, continuous session invocation strategy with context inheritance relationship, perturbation invocation strategy, and concurrent invocation strategy.

8. An interface stability determination device, characterized in that, include: The execution status and response data acquisition module is used to initiate a preset number of calls to the service interface to be evaluated based on a multi-round call strategy, and obtain the execution status and response data corresponding to each call. An interface stability index determination module is used to determine at least two interface stability indices based on the execution status and the response data. The content stability index determination module is used to determine the output content under successful call based on the execution status and the response data, and to determine at least two content stability indices based on the output content. The comprehensive stability parameter determination module is used to determine the comprehensive stability parameters based on the parameter values ​​corresponding to the interface stability index and the parameter values ​​corresponding to the content stability index.

9. An interface stability determination device, characterized in that, include: Memory, used to store computer programs; A processor for executing the computer program to implement the steps of the interface stability determination method as described in any one of claims 1 to 7.

10. A readable storage medium, characterized in that, The readable storage medium stores a computer program that, when executed by a processor, implements the steps of the interface stability determination method as described in any one of claims 1 to 7.