A prompt word sensitivity analysis method of a large language model
By employing an instance-level cue word sensitivity analysis method, utilizing the cue word sensitivity analysis metrics PSS and decoding confidence, the problem of accurately measuring the cue word sensitivity of large language models is solved. This enables more refined analysis and a deeper understanding of the underlying mechanisms of the model, thereby improving the accuracy and guidance of the analysis.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SHANGHAI ARTIFICIAL INTELLIGENCE INNOVATION CENT
- Filing Date
- 2024-09-13
- Publication Date
- 2026-06-26
AI Technical Summary
Existing technologies struggle to accurately measure the sensitivity of large language models to prompt words, and lack consideration for real-world application scenarios and user needs, making it impossible to delve into their underlying mechanisms.
We employ an instance-level cue word sensitivity analysis method. By acquiring instance data and calculating the cue word sensitivity analysis index PSS and decoding confidence, we comprehensively evaluate the cue word sensitivity of a large language model and reveal its underlying mechanism.
It improves the accuracy of cue word sensitivity analysis for large language models, enabling a more comprehensive analysis of the model's cue word sensitivity in both objective and subjective evaluations, and providing guidance for model development.
Smart Images

Figure CN119474254B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the technical field of cue word sensitivity analysis for large language models, and in particular to a cue word sensitivity analysis method for large language models. Background Technology
[0002] Large language models are highly sensitive to subtle changes in prompt words. Even minor modifications, such as adding a few spaces, can significantly alter the performance of a large language model. This sensitivity makes it difficult for developers of large language models to determine whether changes in test set scores are due to performance improvements or prompt word selection when evaluating the model. Furthermore, users often need to fine-tune the prompt words multiple times to obtain higher-quality output.
[0003] Traditional cue sensitivity analysis techniques primarily analyze model cue sensitivity at the dataset level, focusing on the overall performance trend under different cue formats. For example, the PromptBench benchmark system evaluates the performance of several mainstream large language models under different cue words. Pezeshkpour et al. found that even small grammatical or lexical changes can lead to a significant drop in the performance of large language models. These studies reveal the high sensitivity of large language models to cue words, laying the foundation for further understanding and mitigation of this problem.
[0004] However, existing methods primarily focus on analyzing cue sensitivity at the dataset level, often based on score variations across different cue word templates on the same dataset. This evaluation method is inaccurate and fails to reflect the model's true cue word sensitivity. Furthermore, existing analyses of large language models do not involve subjective evaluation and lack consideration for practical application scenarios and user needs. Existing work also fails to delve into the root causes of large language models' cue sensitivity, limiting our understanding of its underlying mechanisms. Summary of the Invention
[0005] The purpose of this invention is to provide a method for analyzing the sensitivity of prompt words in large language models in order to improve the accuracy of prompt word sensitivity analysis.
[0006] The objective of this invention can be achieved through the following technical solutions:
[0007] A method for sensitivity analysis of cue words in a large language model, comprising the following steps:
[0008] S1. Obtain instance data, which includes multiple instances, each instance corresponding to a set of prompt words, which includes prompt words for multiple-choice questions. Input the instance data into the large language model to be tested, and analyze the prompt word sensitivity index PSS based on the answer analysis of the large language model.
[0009] S2. Input the multiple-choice prompt words in the instance data into the large language model to be detected. Calculate the decoding confidence based on the probability of the highest probability token corresponding to the multiple-choice prompt words predicted by the large language model. Calculate the average decoding confidence based on the decoding confidence of the multiple-choice prompt words corresponding to multiple instances.
[0010] S3. Analyze the overall sensitivity of prompt words in the large language model under test based on the prompt word sensitivity analysis index PSS and the average decoding confidence score.
[0011] Furthermore, the prompt word sensitivity analysis index PSS is the average sensitivity of all instances in the instance data.
[0012] Furthermore, the sensitivity is:
[0013]
[0014] in, For the sensitivity of an instance, Y(p) represents the performance metric under cue word p, C(|P|,2) represents the number of cue word pairs in the same instance, and i and j represent two different cue word indices.
[0015] Furthermore, the performance metrics under the prompt word p include the correctness of the large language model's answer given the true value in the instance data.
[0016] Furthermore, the correctness of the language model's answer is the similarity between the large language model's answer and the given true value.
[0017] Furthermore, the performance metrics under the prompt word p do not include the scores for the large language model's answer given the true value in the instance data.
[0018] Furthermore, the decoding confidence level corresponding to an instance is:
[0019]
[0020] Among them, Probability(t) next |p) represents the probability of the token that the model predicts with the highest probability under cue word p in the cue word set P.
[0021] Furthermore, in the process of analyzing the overall prompting word sensitivity of the large language model to be tested, the sensitivity analysis index PSS represents the probability of inconsistency in correctness between any two prompts in the same instance in both subjective and objective evaluation.
[0022] Furthermore, in the process of analyzing the overall prompt sensitivity of the large language model to be tested, the sensitivity analysis index PSS represents the difference in average response quality between two prompts in the same instance in the objective evaluation.
[0023] Furthermore, the average decoding confidence is positively correlated with the sensitivity of the prompt words.
[0024] Compared with the prior art, the present invention has the following beneficial effects:
[0025] This invention provides a Prompt Sensitivity (PPS) metric for measuring the cue word sensitivity of large language models at the instance level. This enables a more comprehensive and detailed analysis of the cue word sensitivity of large language models, allowing for analysis of both objective and subjective evaluations. Furthermore, it analyzes the mechanism of cue word sensitivity in large language models and proposes a methodology for analyzing cue word sensitivity from the perspective of decoding confidence, thereby improving the accuracy of cue word sensitivity analysis. Attached Figure Description
[0026] Figure 1 This is a flowchart of the present invention;
[0027] Figure 2 This is a graph showing the experimental results of the sensitivity difference of prompt words in this invention. Detailed Implementation
[0028] The present invention will now be described in detail with reference to the accompanying drawings and specific embodiments. These embodiments are implemented based on the technical solution of the present invention, providing detailed implementation methods and specific operating procedures. However, the scope of protection of the present invention is not limited to the following embodiments.
[0029] This invention introduces the ProSA framework, focusing on instance-level analysis, and proposes a method for cue word sensitivity analysis of large language models. The flowchart of the method is as follows: Figure 1 As shown, this invention quantifies the average difference in the responses of LLMs to different cue variants of the same instance and utilizes decoding confidence to reveal the underlying mechanisms of cue sensitivity. This invention aims to comprehensively evaluate and analyze the cue sensitivity of large language models and reveal the underlying mechanisms. ProSA emphasizes instance-level analysis, includes a novel sensitivity metric PSS, and can analyze cue word sensitivity of large language models on both objective and subjective evaluations. Furthermore, this framework utilizes decoding confidence to explore the root causes of cue sensitivity, aiming to provide guidance for developing more robust and user-responsive large language models.
[0030] The method of the present invention includes the following steps:
[0031] S1. Obtain instance data, which includes multiple instances, each instance corresponding to a set of prompt words, which includes prompt words for multiple-choice questions. Input the instance data into the large language model to be tested, and analyze the prompt word sensitivity index PSS based on the answer analysis of the large language model.
[0032] S2. Input the multiple-choice prompt words in the instance data into the large language model to be detected. Calculate the decoding confidence based on the probability of the highest probability token corresponding to the multiple-choice prompt words predicted by the large language model. Calculate the average decoding confidence based on the decoding confidence of the multiple-choice prompt words corresponding to multiple instances.
[0033] S3. Analyze the overall prompt word sensitivity of the large language model under test based on the prompt word sensitivity analysis index PSS and the average decoding confidence score.
[0034] This invention designs an instance-level prompt word sensitivity analysis index (PSS), the specific components of which are as follows:
[0035]
[0036] For a given instance, Y(p) represents the performance metric for a given cue word p within the set P of cue words. For instances with a given true value, Y(p) refers to the correctness of the large language model's answer. For tasks without a clear true value, the answer typically has a score representing the quality of generation, and Y(p) refers to that score, ranging from [0, 1]. i )-Y(p j )| represents a prompt p i and hint p j The absolute difference in performance metrics between instances. C(|P|, 2) represents the number of hint pairs in the same instance. PSS is the absolute difference in performance metrics between all instances in the same dataset. The average value.
[0037] Due to differences in task type and evaluation method, PSS has different meanings in objective and subjective evaluation. In objective evaluation, PSS represents the probability of inconsistency in correctness between any two prompts in the same instance. In subjective evaluation, PSS represents the difference in average response quality between two prompts in the same instance.
[0038] Compared to statistical analysis at the dataset level, PSS provides a more accurate and intuitive representation of cue sensitivity.
[0039] Furthermore, this invention uses token probabilities to calculate the decoding confidence of the large language model in a multiple-choice setting. The decoding confidence is defined as follows for one instance:
[0040]
[0041] Here, Probability(t) next |p) represents the probability of the token predicted by the model with the highest probability under cue word p, given the cue word set P. The average decoding confidence of a large language model is the probability of its output across different instances. The average value. The average decoding confidence in this invention is positively correlated with the sensitivity of the cue words.
[0042] The key point of this invention is the framework for sensitivity analysis of prompt words in large language models, which can be specifically divided into the following two points:
[0043] (1) This invention provides a metric, PPS, for measuring the cue word sensitivity of a large language model at the instance level.
[0044] This enables a more comprehensive and detailed analysis of the cue word sensitivity of large language models.
[0045] This invention is the first to analyze the mechanism of cue word sensitivity in large language models and proposes a methodology for analyzing cue word sensitivity from the perspective of decoding confidence of large language models.
[0046] The present invention has the following advantages:
[0047] (1) Compared with the data set level cue word sensitivity analysis index, the instance level analysis index PSS of the present invention can more comprehensively and meticulously characterize the cue word sensitivity of large language models.
[0048] This invention is the first to analyze the sensitivity of prompt words in a large language model from the perspective of interpretability. Experiments were conducted on four objective datasets and two subjective datasets.
[0049] First, on four datasets—CommonsenseQA[3], ARC-Challenge[4], MATH[5], and HumanEval[6]—we used eight models—InternLM2 series[7], Llama3 series[8], Qwen1.5 series[9], and Mistral-7B
[10] —to calculate the PSS for 12 cue words. The experimental results show that there are significant differences in cue word sensitivity among different models. The results are as follows: Figure 2 As shown.
[0050] In addition, we also conducted experiments on three sets of prompt words using five large language models on two subjective datasets: LC AlpacaEval 2.0
[11] and Arena Hard Auto
[12] . The results are shown in Table 1.
[0051] Table 1 Experimental Results
[0052] Model LC AlpacaEval 2.0 Arena Hard Auto InternLM2-20B-chat 0.022 0.249 Llama3-8B-instruct 0.013 0.266 Llama3-70B-instruct 0.016 0.258 Qwen1.5-14B-chat 0.022 0.249 Qwen1.5-72B-chat 0.036 0.250
[0053] It is evident that different models exhibit varying sensitivities to cue words on different subjective benchmarks, with the ArenaHard Auto benchmark showing greater sensitivity to cue words.
[0054] The references cited in the above experiment are as follows:
[0055] 【1】Promptbench:Towards evaluating the robustness of large language models on adversarial prompts
[0056] 【2】Large language models sensitivity to the order of options in multiple-choicequestions
[0057] 【3】Commonsenseqa: A question answering challenge targeting commonsenseknowledge
[0058] 【4】Think you have solved question answering? try arc, the ai2 reasoningchallenge.【5】Measuring mathematical problem solving with the math dataset
[0059] 【6】Evaluating Large Language Models Trained on Code
[0060] 【7】Internlm2 technical report
[0061] 【8】Llama 3 Model Card
[0062] 【9】Qwen Technical Report
[0063]
[10] Mistral 7B
[0064]
[11] Length-Controlled AlpacaEval:A Simple Way to Debias AutomaticEvaluators
[12] From Live Data to High-Quality Benchmarks:The Arena-HardPipeline
[0065] The preferred embodiments of the present invention have been described in detail above. It should be understood that those skilled in the art can make numerous modifications and variations based on the concept of the present invention without creative effort. Therefore, all technical solutions that can be obtained by those skilled in the art based on the concept of the present invention through logical analysis, reasoning, or limited experimentation on the basis of existing technology should be within the scope of protection defined by the claims.
Claims
1. A method for sensitivity analysis of cue words in a large language model, characterized in that, The method includes the following steps: S1. Obtain instance data, which includes multiple instances, each instance corresponding to a set of prompt words, which includes prompt words for multiple-choice questions. Input the instance data into the large language model to be tested, and analyze the prompt word sensitivity index PSS based on the answer analysis of the large language model. S2. Input the multiple-choice prompt words in the instance data into the large language model to be detected. Calculate the decoding confidence based on the probability of the highest probability token corresponding to the multiple-choice prompt words predicted by the large language model. Calculate the average decoding confidence based on the decoding confidence of the multiple-choice prompt words corresponding to multiple instances. S3. Analyze the overall prompt word sensitivity of the large language model under test based on the prompt word sensitivity analysis index PSS and the average decoding confidence score; The prompt word sensitivity analysis index PSS is the average sensitivity of all instances in the instance data; The sensitivity of the instance is: in, Sensitivity for a single instance Represents the prompt word The following performance indicators This represents the number of suggestion word pairs in the same instance, where i and j represent two different suggestion word indices; The decoding confidence level for an instance is: in, Represents the set of prompt words In the middle, the model in the prompt words The probability of the token with the highest probability in the prediction; The average decoding confidence is positively correlated with the sensitivity of the prompt words.
2. The method for analyzing the sensitivity of prompt words in a large language model according to claim 1, characterized in that, The prompt words The performance metrics under this model are the correctness of the responses given true values in the instance data.
3. The method for analyzing the sensitivity of cue words in a large language model according to claim 2, characterized in that, The correctness of the large language model's answer is defined as the similarity between the large language model's answer and the given true value.
4. The method for analyzing the sensitivity of prompt words in a large language model according to claim 1, characterized in that, The prompt words The performance metrics below do not include the scores given to the large language model's answers in the instance data when the true values are not included.
5. The method for analyzing the sensitivity of prompt words in a large language model according to claim 1, characterized in that, In the process of analyzing the overall prompting word sensitivity of the large language model to be tested, the sensitivity analysis index PSS represents the probability of inconsistency in correctness between any two prompts in the same instance in both subjective and objective evaluation.
6. The method for analyzing the sensitivity of prompt words in a large language model according to claim 1, characterized in that, In the process of analyzing the overall prompt sensitivity of the large language model to be tested, the sensitivity analysis index PSS represents the difference in average response quality between two prompts in the same instance in the objective evaluation.