Method, system and device for automatic extraction and formalization of security goals in security protocols

By constructing an expert-level labeled dataset and fine-tuning a large language model using a random negative downsampling strategy, an automated conversion from security protocol documents to formal security attribute descriptions was achieved. This solves the problem of low extraction and conversion efficiency in existing technologies and improves the accuracy and efficiency of security target extraction and analysis.

CN122242504APending Publication Date: 2026-06-19BEIJING UNIV OF POSTS & TELECOMM

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
BEIJING UNIV OF POSTS & TELECOMM
Filing Date
2026-02-11
Publication Date
2026-06-19

Smart Images

  • Figure CN122242504A_ABST
    Figure CN122242504A_ABST
Patent Text Reader

Abstract

This application provides a method, system, and device for automatic extraction and formal transformation of security objectives in security protocols. The method includes: extracting security target statements from a target security protocol document based on a pre-defined domain-specific identification and extraction model; wherein the domain-specific identification and extraction model is obtained by fine-tuning a pre-defined basic large language model based on an expert-level labeled dataset; the expert-level labeled dataset includes initial security target candidate statements extracted from the security protocol document, and labels added by annotation experts based on literature to the initial security target candidate statements to indicate whether the initial security target candidate statements are security target statements; and performing formal transformation on the security target statements based on a pre-defined formal protocol model to generate formal security attribute description data of the targets. This application can improve the accuracy of security objective extraction and the efficiency of formal analysis and verification, while reducing system maintenance and re-verification costs.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of computer technology, and in particular to a method, system, and device for automatic extraction and formal transformation of security targets in a security protocol. Background Technology

[0002] Formal verification, as a core standard for ensuring the security of encryption protocols, communication protocols, and distributed systems, occupies a crucial position in the field of modern information security. By using formal analysis tools such as Tamarin Prover (a security protocol verification tool based on multiset rewriting), ProVerif (an automated security protocol verification tool based on Pi calculus), or DeepSec (an automated verification tool based on Pi calculus for verifying the equivalence of security protocol traces), security experts can perform rigorous mathematical modeling of complex protocol interaction logic. This modeling process aims to use mathematical means to prove whether the protocol strictly satisfies specific security goals or properties under an attacker's model, such as confidentiality, authentication, forward security, and non-repudiation. When the analysis tool discovers potential vulnerabilities in the protocol logic, it outputs the corresponding attack path, thus intuitively demonstrating the triggering mechanism of the security flaw.

[0003] However, despite the increasing maturity of formal analysis tools, extracting and transforming security attribute descriptions usable by formal analysis tools from lengthy, multi-page protocol specifications described in natural language, such as IETF RFC documents, 3GPP mobile communication standards, and various complex industrial technical specifications, remains an extremely challenging area lacking effective automation methods. Currently, the primary approach still heavily relies on the manual labor of formal analysis experts. Experts must possess deep domain backgrounds, retrieve relevant security target statements from massive technical documents, and then manually transform them into the specific security attribute logical descriptions required by formal analysis tools based on their personal experience. This process is not only extremely time-consuming and labor-intensive, often resulting in weeks or even months of analyzing a single protocol, but it is also highly susceptible to human bias. Due to the high complexity and fragmented nature of long text specifications, experts inevitably make omissions, oversights, or misunderstandings during manual analysis. Such modeling biases caused by human factors can directly lead to the overlooking of critical vulnerabilities, resulting in false security conclusions. Another potential technical approach is to utilize existing natural language processing techniques to assist in generating security target statements and converting them into formal security attribute descriptions. However, existing general-purpose NLP techniques and basic large-model techniques struggle to locate security objective statements that are only mentioned briefly or described implicitly within protocol documents that often run to hundreds of pages. First, existing general-purpose NLP techniques cannot fully grasp the stringent logical requirements of formal analysis of security protocols. Due to their broad semantic understanding, they often output a large amount of descriptive statements that, while related to security concepts, lack modeling value. This redundant information constitutes significant noise for formal modeling. Second, existing automated frameworks generally lack large-scale, expert-annotated datasets and corresponding evaluation benchmarks for extracting protocol security objectives. This leaves cutting-edge large-model techniques without a foundation for evolution in the vertical field of formal analysis of security protocols, making it difficult to achieve substantial improvements in accuracy and logical consistency without high-quality negative feedback. Summary of the Invention

[0004] In view of this, embodiments of this application provide a method, system, and device for automatic extraction and formal transformation of security targets in security protocols, so as to eliminate or improve one or more defects existing in the prior art.

[0005] One aspect of this application provides a method for automatic extraction and formal transformation of security objectives in a security protocol, the method comprising the following steps: Based on a pre-defined domain-specific identification and extraction model, the target security protocol document is identified and extracted to obtain the corresponding security target statement. The domain-specific identification and extraction model is pre-tuned using a random negative downsampling strategy on a pre-constructed expert-level labeled dataset. The expert-level labeled dataset includes initial security target candidate statements extracted from the security protocol document, and labels added by annotation experts based on formal analysis of literature to the initial security target candidate statements, indicating whether the initial security target candidate statements are security target statements. Based on a preset formal protocol model, the security target statement is formally transformed to generate corresponding target formal security attribute description data.

[0006] In some embodiments of this application, the step of identifying and extracting the target security protocol document based on a preset domain-specific identification and extraction model to obtain the security target statement corresponding to the target security protocol document includes: The target security protocol document is semantically segmented to obtain multiple corresponding protocol text blocks; The protocol text blocks are input into a preset domain-specific identification and extraction model, so that the domain-specific identification and extraction model can perform identification and extraction processing on the protocol text blocks to obtain security target candidate statements; Semantic clustering and deduplication are performed on the candidate security target statements to obtain the security target statements corresponding to the target security protocol document.

[0007] In some embodiments of this application, the step of performing formal transformation processing on the security target statement based on a preset formal protocol model to generate corresponding target formal security attribute description data includes: Using the security target statement as a semantic query, a search is performed in a preset protocol document knowledge base to obtain target-related data associated with the security target statement; The target associated data is input into the formal protocol model so that the formal protocol model can perform a formal description of the target associated data to obtain the corresponding initial formal security attribute description data. The initial formal security attribute description data is subjected to syntax validation and logical aggregation to obtain the target formal security attribute description data.

[0008] In some embodiments of this application, the step of inputting the target associated data into the formal protocol model, so that the formal protocol model performs a formal description of the target associated data to obtain the corresponding initial formal security attribute description data, includes: The target associated data is input into the formal protocol model, so that the formal protocol model performs mandatory security attribute symbol variable matching mapping on the target associated data based on predefined legal symbols and variables, and obtains the initial formal security attribute description data using a preset security attribute classification and grading description template.

[0009] In some embodiments of this application, before the step of identifying and extracting the target security protocol document based on a preset domain-specific identification and extraction model to obtain the security target statement corresponding to the target security protocol document, the following steps are included: The acquired original security protocol document is input into the first large language model so that the first large language model can perform multiple rounds of extraction on the original security protocol document to generate initial security target candidate statements; Based on a pre-defined formal analysis of the literature, annotation experts are able to annotate the initial security target candidate statements to obtain the corresponding annotation result data. The annotation results are checked and standardized to obtain the expert-level annotation dataset.

[0010] In some embodiments of this application, the random negative downsampling strategy includes: using a specific proportion of protocol operational logic paragraphs as negative samples to perform decision boundary calibration on the basic large language model in order to obtain the domain-specific recognition and extraction model.

[0011] Another aspect of this application provides a system for automatic extraction and formal transformation of security objectives in a security protocol, the system comprising: The dataset building module is used to build expert-level labeled datasets; The model fine-tuning module is used to fine-tune the preset basic large language model based on the expert-level labeled dataset using a random negative downsampling strategy to obtain a domain-specific recognition and extraction model. The security target identification and extraction module is used to identify and extract target security protocol documents based on a preset domain-specific identification and extraction model, so as to obtain the security target statement corresponding to the target security protocol document. The security target formalization module is used to perform formal transformation processing on the security target statement based on a preset formal protocol model, and generate corresponding target formal security attribute description data.

[0012] A third aspect of this application provides an electronic device including a processor and a memory, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement a method for automatic extraction and formal transformation of security objectives in the security protocol.

[0013] A fourth aspect of this application provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements a method for automatically extracting and formally transforming security objectives in the security protocol.

[0014] The fifth aspect of this application provides a computer program product comprising a computer program that, when executed by a processor, implements a method for automatically extracting and formally transforming security objectives in the security protocol.

[0015] This application presents a method for automatic extraction and formal transformation of security objectives in security protocols. The method includes the following steps: Based on a pre-defined domain-specific identification and extraction model, the target security protocol document is identified and extracted to obtain the corresponding security objective statements. The domain-specific identification and extraction model is pre-tuned using a random negative downsampling strategy on a pre-constructed expert-level labeled dataset. The expert-level labeled dataset includes initial security objective candidate statements extracted from the security protocol document, and labels added by annotation experts based on formal analysis literature to the initial security objective candidate statements, indicating whether the initial security objective candidate statements are security objective statements. Based on a pre-defined formal protocol model, the security objective statements are formally transformed to generate corresponding target formal security attribute description data. This method can accurately eliminate non-security operational logic noise in specifications, improve the accuracy of security objective extraction, shorten the protocol security audit cycle, improve the efficiency of formal analysis and verification, solve the symbol naming illusion problem commonly encountered by neural networks when generating structured logic code, and reduce the cost of system maintenance and re-verification.

[0016] Additional advantages, objectives, and features of this application will be set forth in part in the description which follows, and will in part become apparent to those skilled in the art upon review of the following description, or may be learned by practice of the application. The objectives and other advantages of this application can be realized and obtained by means of the structures specifically pointed out in the specification and drawings.

[0017] Those skilled in the art will understand that the purposes and advantages that can be achieved with this application are not limited to those specifically described above, and that the above and other purposes that this application can achieve will be more clearly understood from the following detailed description. Attached Figure Description

[0018] The accompanying drawings, which are included to provide a further understanding of this application and form part of this application, do not constitute a limitation thereof. The components in the drawings are not drawn to scale but are merely for illustrating the principles of this application. For ease of illustration and description of certain parts of this application, corresponding portions in the drawings may be enlarged, i.e., may appear larger relative to other components in an exemplary device actually manufactured according to this application. In the drawings: Figure 1 This is a schematic diagram of the first process of the automatic extraction and formal transformation method of security targets in a security protocol in an embodiment of this application.

[0019] Figure 2 This is a schematic diagram of the second process of the method for automatic extraction and formal transformation of security targets in a security protocol in one embodiment of this application.

[0020] Figure 3 This is a schematic diagram of the third process of the automatic extraction and formal transformation method of security targets in the security protocol in one embodiment of this application.

[0021] Figure 4 This is a schematic diagram of a system for automatically extracting and formalizing security targets in a security protocol according to an embodiment of this application.

[0022] Figure 5 This is a schematic diagram of the first method for automatically extracting and formalizing security objectives in a security protocol, as illustrated in a specific example of this application.

[0023] Figure 6 This is a schematic diagram of the second method for automatically extracting and formalizing security objectives in a security protocol, as illustrated in a specific example of this application.

[0024] Figure 7 This is a schematic diagram of the third method for automatically extracting and formalizing security objectives in a security protocol, as illustrated in a specific example of this application.

[0025] Figure 8 This is a schematic diagram of the fourth method for automatically extracting and formalizing security objectives in a security protocol, as illustrated in a specific example of this application.

[0026] Figure 9 This is a schematic diagram of the fifth method for automatically extracting and formalizing security objectives in a security protocol, as illustrated in a specific example of this application. Detailed Implementation

[0027] To make the objectives, technical solutions, and advantages of this application clearer, the application will be further described in detail below with reference to the embodiments and accompanying drawings. Here, the illustrative embodiments and their descriptions are used to explain this application, but are not intended to limit it.

[0028] It should also be noted that, in order to avoid obscuring this application with unnecessary details, only the structures and / or processing steps closely related to the solution according to this application are shown in the accompanying drawings, while other details that are not closely related to this application are omitted.

[0029] It should be emphasized that the term "including / comprises" as used herein refers to the presence of a feature, element, step, or component, but does not exclude the presence or addition of one or more other features, elements, steps, or components.

[0030] It should also be noted that, unless otherwise specified, the term "connection" in this article can refer not only to a direct connection, but also to an indirect connection involving an intermediary.

[0031] In the following description, embodiments of the present application will be illustrated with reference to the accompanying drawings. In the drawings, the same reference numerals represent the same or similar parts, or the same or similar steps.

[0032] It should be noted that this application differs fundamentally from conventional text summarization or code generation logic. The core of this application lies in utilizing a domain fine-tuning and precise security attribute mapping mechanism to extract security target statements required for formal analysis from protocol specifications, and to formalize them into security attribute descriptions that accurately match the variables in the formal protocol flow model. The inventors of this application innovatively conceived of automatic extraction and formal transformation of security targets for security protocols based on a large language model. This enables precise target identification of unstructured protocol standards and generates logically consistent formal security attribute descriptions. Unlike other verification aids based on general natural language processing or simple code generation, this application employs a cascaded processing framework, focusing on the consistency of mapping from natural language semantics to formal symbolic models, rather than simple text extraction. By constructing an expert-level labeled dataset and introducing a random negative downsampling training method, this application can solve the problem of false alarms caused by the extreme sparsity of security target statements in protocol documents. On the other hand, the transformation mechanism based on Retrieval Enhanced Generation (RAG) and formal protocol flow model constraints provided in this application ensures that the generated security attributes are strictly aligned with the underlying formal model in terms of variable naming and logical structure through a formal mapping method that consistently expresses security attribute variables. Combining domain fine-tuning models, RAG technology, and formal protocol flow models, a complete automated transformation process from natural language specifications to the security attribute descriptions required for formal verification is formed. Compared to other manual analysis or general extraction schemes, this application provides higher transformation accuracy, achieves symbolic consistency through closed-loop mapping without large-scale expert intervention, and can quickly perform secondary transformations by adjusting the index library or template when protocol versions iterate or verification requirements change. This application fills a gap in the field of automated security protocol modeling and provides strong support for developing a high-efficiency, highly reliable protocol security analysis system.

[0033] The following examples will provide a detailed description.

[0034] This application provides a method for automatically extracting and formally transforming security objectives in a security protocol. See [link to previous section]. Figure 1 The method includes the following steps: Step 100: Based on a preset domain-specific identification and extraction model, identify and extract the target security protocol document to obtain the security target statement corresponding to the target security protocol document; wherein, the domain-specific identification and extraction model is obtained by fine-tuning a preset basic large language model using a random negative downsampling strategy based on a pre-constructed expert-level annotation dataset; the expert-level annotation dataset includes initial security target candidate statements extracted from the security protocol document, and labels marked by annotation experts based on formal analysis of literature to indicate whether the initial security target candidate statement is a security target statement; In step 100, the target security protocol document can be the protocol specification of TLS 1.3 draft 22, the 5G-AKA protocol specification document, or the EDHOC protocol specification. The security target statement can be statements such as "After verifying message_3, the Responder is assured that the Initiator has calculated the key PRK_4x3m (explicit key confirmation) and that no other party than the Responder can compute the key." or "Compromise of the long-term keys does not enable apassive attacker to compromise future session keys." The basic large language model can be Qwen2.5-7b-instruction, Gemma2-9b-instruction, llama3.1-8b-instruction, or other pre-trained basic open-source large language models. The basic large language model is fine-tuned using the constructed expert-level labeled dataset to enable it to extract security target statements for protocol security analysis; that is, a domain-specific recognition and extraction model for extracting security target statements. The domain-specific identification and extraction model obtained after fine-tuning has significantly improved the accuracy of extracting security target statements compared with the general model, and can solve the problem of high false alarm rate in the identification and extraction stage of the general model.

[0035] Step 200: Based on the preset formal protocol model, perform formal transformation processing on the security target statement to generate corresponding target formal security attribute description data.

[0036] In step 200, the formal protocol model can be a model based on a formalization tool like ProVerif, or a predefined symbolic protocol flow interaction logic model for the target protocol. The target formal security attribute description data can be a structured description object (such as JSON), which includes predefined symbol fields such as entity identifiers (e.g., initiator and receiver), security attribute categories, and security strength parameters. Using the formal protocol flow model as symbolic constraints, semantic retrieval and symbolic anchoring logic replace ambiguous entities in natural language with legal and unique symbolic names in the protocol model.

[0037] As described above, the automatic extraction and formal transformation method for security targets in security protocols provided in this application constructs an expert-level labeled dataset through a fine-tuning strategy based on random negative downsampling. This accurately eliminates non-security operational logic noise in the specification, improves the accuracy of security target extraction, and avoids a large number of false targets caused by "keyword bias" in traditional large model schemes. This reduces protocol modeling defects caused by omissions or misinterpretations during manual auditing, effectively ensuring the accuracy of formal verification input and improving the reliability of security protocol auditing. A cascaded automated processing framework is adopted to achieve end-to-end transformation from unstructured natural language protocol documents to formal security attribute descriptions. Compared to the traditional mode of experts manually searching and writing formal scripts, this application eliminates the need for experts to read and logically map hundreds of pages of documents line by line, shortening the protocol security audit cycle and improving the efficiency of formal analysis and verification. By introducing a retrieval-enhanced symbolic matching technique, abstract natural language descriptions are forcibly associated with variable definitions in the original protocol text, thus resolving the symbolic naming illusion problem commonly encountered by neural networks when generating structured logic code. The generated formal attribute descriptions are strictly consistent with the formal protocol flow model in terms of variable naming and role identities, ensuring that the generated code snippets can be directly used in backend analysis tools, reducing the cost of secondary manual correction. The general-purpose, neural symbolic processing-based security protocol verification auxiliary framework provided in this application, along with the first expert-annotated benchmark dataset, can not only handle classic academic protocols but also be widely applied to various highly complex industrial-grade protocol verification scenarios such as telecommunications, industrial IoT, and web security. Because the formal protocol model learns the deep paradigm of protocol security logic rather than simple word matching, it maintains extremely high adaptability and generalization performance when faced with new and unseen protocol specifications. Furthermore, when the narrative style of the protocol document is fine-tuned or the template syntax of the formal verification tool changes, there is no need to retrain the massive underlying language model. By simply adjusting the semantic slots in the preprocessing module or updating the retrieval index, adaptation and verification for new requirements can be completed quickly, reducing the cost of system maintenance and re-verification.

[0038] To further improve the accuracy of security target extraction and reduce protocol modeling defects caused by omissions or misinterpretations during manual auditing, this application provides an embodiment of a method for automatic extraction and formal transformation of security targets in a security protocol, see [link to relevant documentation]. Figure 2 Step 100 includes: Step 110: Perform semantic segmentation on the target security protocol document to obtain multiple corresponding protocol text blocks; In step 110: the target security protocol document is cleaned and dynamically semantically segmented to generate protocol text blocks with semantic self-containment.

[0039] Step 120: Input the protocol text blocks into a preset domain-specific identification and extraction model, so that the domain-specific identification and extraction model can identify and extract the protocol text blocks to obtain security target candidate statements; In step 120: The aforementioned domain-specific recognition and extraction model is used to scan the text blocks, and the candidate statements of the security target that meet the formal verification requirements are accurately captured from the massive amount of redundant text blocks.

[0040] Step 130: Perform semantic clustering and deduplication on the candidate security target statements to obtain the security target statements corresponding to the target security protocol document.

[0041] In step 130: Semantic clustering and deduplication are performed on the candidate security target statements to obtain security target statements with uniqueness and accuracy.

[0042] In one or more embodiments of this application, an efficient noise reduction and recognition mechanism is used to transform unstructured long documents into concise, security target statements that can be processed in subsequent stages.

[0043] To further achieve a closed-loop mapping from natural language intent to the security attributes required for formal analysis, thereby shortening the protocol security audit cycle and improving the efficiency of formal analysis and verification, this application provides a method for automatic extraction and formal transformation of security objectives in a security protocol. (See also...) Figure 3 Step 200 includes: Step 210: Using the security target statement as a semantic query, search in the preset protocol document knowledge base to obtain target association data associated with the security target statement; In step 210, the security target statement is used as a semantic query to retrieve the associated variable definitions and role context information in the protocol document knowledge base. The target-related data can include the definition of protocol interaction entities, the description of cryptographic primitives, the role relationships of participating parties, and logical context fragments of variables in protocol interactions.

[0044] Step 220: Input the target associated data into the formal protocol model so that the formal protocol model can formally describe the target associated data to obtain the corresponding initial formal security attribute description data; Step 230: Perform syntax verification and logical aggregation on the initial formal security attribute description data to obtain the target formal security attribute description data.

[0045] In one or more embodiments of this application, the generated target formal security attribute description data can be directly parsed by formal verification tools, thereby achieving a closed mapping from natural language intent to the security attributes required for formal analysis.

[0046] To further shorten the security audit cycle of protocols and improve the efficiency of formal analysis and verification, in a method for automatic extraction and formal transformation of security targets in a security protocol provided in this application embodiment, step 220 includes: Step 221: Input the target associated data into the formal protocol model, so that the formal protocol model performs mandatory security attribute symbol variable matching mapping on the target associated data based on predefined legal symbols and variables, and obtains the initial formal security attribute description data using a preset security attribute classification and grading description template.

[0047] In step 221, security attribute symbols are accurately mapped based on formal protocol flow model constraints. Query enhancement generation technology is used to extract variable definition context. Natural language statements of security objectives are instantiated into formal security attribute representations consistent with formal protocol model symbols through preset structured pattern templates and formal protocol flow models.

[0048] In one or more embodiments of this application, a preset security attribute classification and grading description template is used. This template can be a framework that specifies security attribute categories (such as confidentiality and authentication), attribute level strengths (such as non-injective consistency and injective consistency), and their corresponding formal symbolic variable architectures, and introduces a formal protocol model as the core symbolic constraint. A mandatory security attribute symbolic variable matching and mapping is performed. By anchoring the retrieved context entities to legal symbols and variables in the formal protocol model, it is ensured that the generated initial formal security attribute description data is completely consistent with the formal protocol model in variable naming.

[0049] To further improve the accuracy of security target extraction, in a method for automatic extraction and formal transformation of security targets in a security protocol provided in this application embodiment, the method includes the following steps before step 100: Step 010: Input the obtained original security protocol document into the first large language model so that the first large language model can perform multiple rounds of extraction on the original security protocol document to generate initial security target candidate statements; In step 010, the first large language model can be an advanced and powerful large language model such as GPT-5 (a large language model developed by OpenAI), Gemini 3 Pro (a large language model developed by Google), or DeepSeekV3.2 (a large language model developed by DeepSeek). Representative key security protocol documents are collected, and multiple rounds of heuristic extraction are performed using the large language model's cue word engineering to produce preliminary candidate security targets, i.e., initial candidate security target statements.

[0050] Step 020: Based on the preset formal analysis of the literature, the annotation experts will annotate the initial security target candidate statements to obtain the corresponding annotation result data; In step 020, annotation experts, by referring to relevant formal analysis papers or literature, classify the initial candidate statements for security targets, check for omissions in the model generation, determine whether they are security target statements, and manually write their corresponding formal security attributes to establish a truth value mapping between natural language intent and formal logic. The system automatically detects the results using the annotation consistency judgment template. If semantic conflicts exist, a final review by senior arbitration experts and a conflict resolution branch are triggered, where experts perform logical correction.

[0051] Step 030: Check and standardize the annotation results to obtain the expert-level annotation dataset.

[0052] In step 030, through data deduplication and standardization encapsulation, a high-confidence safety target / attribute labeled dataset is obtained, which is the expert-level labeled dataset. This expert-level labeled dataset transforms fragmented expert knowledge into structured training data, providing irreplaceable data for subsequent domain fine-tuning of the model.

[0053] In one or more embodiments of this application, a multi-level review mechanism is used to establish a precise mapping from natural language protocol descriptions to security target statements and formal security attributes, and classification levels and annotation specifications are formulated for protocol targets, which can improve the accuracy of security target extraction.

[0054] To further accurately eliminate non-security operational logic noise in the specification, improve the accuracy of security target extraction, avoid a large number of false targets generated in traditional large model solutions, and reduce protocol modeling defects caused by omissions or misinterpretations in manual auditing, the random negative downsampling strategy in the automatic extraction and formal transformation method of security targets in a security protocol provided in this application embodiment includes: using a specific proportion of protocol operational logic paragraphs as negative samples to calibrate the decision boundary of the basic large language model to obtain the domain-specific recognition and extraction model.

[0055] In one or more embodiments of this application, a specific proportion of protocol operational logic paragraphs are introduced as negative feedback signals by configuring the ratio of positive to negative samples. This process fine-tunes the decision boundaries of the training and calibration model, enabling it to accurately distinguish between the security target statements that must be modeled and the purely security operation implementation details. This effectively overcomes the inherent keyword bias and the model's understanding of the security target statements required for security protocol analysis when processing security protocol text. In other words, the random negative downsampling strategy can balance the distribution of positive and negative samples and achieve automated filtering of protocol operational logic noise by fine-tuning the decision boundaries of the training and calibration model.

[0056] This application also provides a system for automatic extraction and formal transformation of security objectives in security protocols, the system comprising: Dataset building module 10 is used to build expert-level labeled datasets; The model fine-tuning module 20 is used to fine-tune the preset basic large language model based on the expert-level labeled dataset using a random negative downsampling strategy to obtain a domain-specific recognition and extraction model. The security target identification and extraction module 30 is used to identify and extract the target security protocol document based on a preset domain-specific identification and extraction model to obtain the security target statement corresponding to the target security protocol document. The security target formalization module 40 is used to perform formal transformation processing on the security target statement based on a preset formal protocol model, and generate corresponding target formal security attribute description data.

[0057] In a specific example of the method for automatic extraction and formal transformation of security objectives in the security protocol of this application, see [link to relevant documentation]. Figure 5 The method includes the following steps: First, a labeled dataset is constructed using expert knowledge through a dataset construction module. Second, a large language model is fine-tuned using a model fine-tuning module to produce a domain-specific recognition and extraction model. Next, the original protocol document is input into a security target recognition and extraction module, which uses the aforementioned domain-specific recognition model to identify and capture candidate security target statements. Finally, a security target formalization module transforms the candidate statements into formalized security attribute descriptions.

[0058] Specifically, the dataset construction module needs to complete the construction and encapsulation of the expert-level labeled dataset SecGoal. See [link / reference]. Figure 6First, representative key security protocol documents need to be collected. Multi-round heuristic extraction is performed using large language model prompt word engineering to generate preliminary candidate security targets. Then, in the expert annotation and gap-filling module, annotation experts, by referring to relevant formal analysis papers or literature, classify the candidate targets, check for omissions in the model's generation, determine whether they are security target statements, and manually write their corresponding formal security attributes to establish a truth value mapping between natural language intent and formal logic. The system automatically detects the results using an annotation consistency judgment template. If semantic conflicts exist, a final review by senior arbitration experts and a conflict resolution branch are triggered, where experts perform logical correction. Finally, through data deduplication and standardization encapsulation, a high-confidence security target / attribute annotation dataset is output. This dataset transforms fragmented expert knowledge into structured training data, providing irreplaceable data for subsequent domain fine-tuning of the model.

[0059] The model fine-tuning module requires using the aforementioned dataset to fine-tune the large language model's instructions, enabling it to extract secure target statements for protocol security analysis. (See also...) Figure 7 The core of this module lies in implementing a random negative downsampling strategy optimized for the non-uniform distribution of protocol security target statements. When constructing the fine-tuning training task, the system strategically configures the ratio of positive to negative samples by introducing a specific proportion of protocol operational logic paragraphs as negative feedback signals. This process fine-tunes the decision boundaries of the training calibration model, enabling it to accurately distinguish between the security target statements that must be modeled and purely security operation implementation details. This effectively overcomes the inherent keyword bias and the model's understanding of the security target statements required for security protocol analysis when processing security protocol text, and outputs a domain-specific model for identifying and extracting security target statements. The fine-tuned model, as a domain-specific identification and extraction model, significantly improves the accuracy of extracting security target statements compared to general models, largely solving the high false alarm rate problem of general models in the identification and extraction stage.

[0060] The security target identification and extraction module needs to automatically extract security target statements from the input raw protocol specifications. (See also...) Figure 8 First, the document preprocessing module cleanses and dynamically segments the input security protocol document, generating semantically self-contained protocol text blocks. Then, the aforementioned domain-specific recognition and extraction model scans these text blocks, accurately capturing security target statements that meet formal verification requirements from a vast amount of redundant text. Finally, the post-processing module performs semantic clustering and deduplication on the extraction results, outputting unique and accurate candidate security target statements. This module, through efficient noise reduction and recognition mechanisms, transforms unstructured, lengthy documents into concise security target statements suitable for subsequent processing stages.

[0061] In the formalization module for security objectives, it is necessary to map the security objective statements, expressed in natural language, to a strict formal security attribute symbol namespace. See also... Figure 9 First, the relevant context retrieval module uses the security target statement as a semantic query to retrieve the associated variable definitions and role context information from the protocol document knowledge base. Then, the formal description generation module combines a pre-defined security attribute classification and hierarchical description template and introduces a formal protocol model as the core symbolic constraint. This module performs mandatory security attribute symbol variable matching mapping, anchoring the retrieved context entities to legal symbols and variables in the formal protocol model, ensuring that the generated code is completely consistent with the formal protocol model in variable naming. Finally, the post-processing module performs syntax verification and logical aggregation, outputting the final formal description of the security attributes. The code snippets generated by this module can be directly parsed by formal verification tools, achieving a closed mapping from natural language intent to the security attributes required for formal analysis.

[0062] This application also provides an electronic device, which may include a processor, a memory, a receiver, and a transmitter. The processor is used to execute the automatic extraction and formal transformation method for security targets in the security protocol mentioned in the above embodiments. The processor and the memory can be connected via a bus or other means, taking a bus connection as an example. The receiver can be connected to the processor and the memory via wired or wireless means.

[0063] The processor can be a central processing unit (CPU). The processor can also be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or combinations of the above types of chips.

[0064] Memory, as a non-transitory computer-readable storage medium, can be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as the program instructions / modules corresponding to the automatic extraction and formal transformation method of security targets in the security protocol described in the embodiments of this application. The processor executes various functional applications and data processing by running the non-transitory software programs, instructions, and modules stored in the memory, thereby implementing the automatic extraction and formal transformation method of security targets in the security protocol described in the above method embodiments.

[0065] The memory may include a program storage area and a data storage area. The program storage area may store the operating system and applications required for at least one function; the data storage area may store data created by the processor, etc. Furthermore, the memory may include high-speed random access memory and non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, the memory may optionally include memory remotely located relative to the processor, which can be connected to the processor via a network. Examples of such networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.

[0066] The one or more modules are stored in the memory, and when executed by the processor, the method for automatic extraction and formal transformation of security targets in the security protocol described in the embodiment is executed.

[0067] In some embodiments of this application, the user equipment may include a processor, a memory, and a transceiver unit. The transceiver unit may include a receiver and a transmitter. The processor, memory, receiver, and transmitter may be connected via a bus system. The memory is used to store computer instructions, and the processor is used to execute the computer instructions stored in the memory to control the transceiver unit to send and receive signals.

[0068] As one implementation method, the functions of the receiver and transmitter in this application can be implemented by transceiver circuits or dedicated transceiver chips, and the processor can be implemented by dedicated processing chips, processing circuits or general-purpose chips.

[0069] As another implementation approach, the server provided in this application embodiment can be implemented using a general-purpose computer. That is, the program code implementing the processor, receiver, and transmitter functions is stored in memory, and the general-purpose processor implements the processor, receiver, and transmitter functions by executing the code in memory.

[0070] This application also provides a computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements the steps of the automatic extraction and formal transformation method for security targets in the aforementioned security protocol. The computer-readable storage medium can be a tangible storage medium, such as random access memory (RAM), main memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, floppy disks, hard disks, removable storage disks, CD-ROMs, or any other form of storage medium known in the art.

[0071] This application also provides a computer program product, including a computer program that, when executed by a processor, implements the steps of the automatic extraction and formal transformation method for security objectives in the aforementioned security protocol.

[0072] Those skilled in the art will understand that the exemplary components, systems, and methods described in conjunction with the embodiments disclosed herein can be implemented in hardware, software, or a combination of both. Whether implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application. When implemented in hardware, it can be, for example, electronic circuits, application-specific integrated circuits (ASICs), appropriate firmware, plug-ins, function cards, etc. When implemented in software, the elements of this application are programs or code segments used to perform the required tasks. The programs or code segments can be stored on a machine-readable medium or transmitted over a transmission medium or communication link via data signals carried on a carrier wave.

[0073] It should be clarified that this application is not limited to the specific configurations and processes described above and shown in the figures. For the sake of brevity, detailed descriptions of known methods are omitted here. In the above embodiments, several specific steps are described and shown as examples. However, the method process of this application is not limited to the specific steps described and shown. Those skilled in the art can make various changes, modifications, and additions, or change the order of steps, after understanding the spirit of this application.

[0074] In this application, features described and / or illustrated for one embodiment may be used in the same or similar manner in one or more other embodiments, and / or combined with or in place of features of other embodiments.

[0075] The above description is merely a preferred embodiment of this application and is not intended to limit this application. Various modifications and variations can be made to the embodiments of this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the protection scope of this application.

Claims

1. A method for automatic extraction and formal transformation of security objectives in a security protocol, characterized in that, The method includes: Based on a pre-defined domain-specific identification and extraction model, the target security protocol document is identified and extracted to obtain the corresponding security target statement. The domain-specific identification and extraction model is pre-tuned using a random negative downsampling strategy on a pre-constructed expert-level labeled dataset. The expert-level labeled dataset includes initial security target candidate statements extracted from the security protocol document, and labels added by annotation experts based on formal analysis of literature to the initial security target candidate statements, indicating whether the initial security target candidate statements are security target statements. Based on a preset formal protocol model, the security target statement is formally transformed to generate corresponding target formal security attribute description data.

2. The method according to claim 1, characterized in that, The preset domain-specific identification and extraction model identifies and extracts the target security protocol document to obtain the security target statement corresponding to the target security protocol document, including: The target security protocol document is semantically segmented to obtain multiple corresponding protocol text blocks; The protocol text blocks are input into a preset domain-specific identification and extraction model, so that the domain-specific identification and extraction model can perform identification and extraction processing on the protocol text blocks to obtain security target candidate statements; Semantic clustering and deduplication are performed on the candidate security target statements to obtain the security target statements corresponding to the target security protocol document.

3. The method according to claim 1, characterized in that, The method, based on a preset formal protocol model, performs formal transformation processing on the security target statement to generate corresponding target formal security attribute description data, including: Using the security target statement as a semantic query, a search is performed in a preset protocol document knowledge base to obtain target-related data associated with the security target statement; The target associated data is input into the formal protocol model so that the formal protocol model can perform a formal description of the target associated data to obtain the corresponding initial formal security attribute description data. The initial formal security attribute description data is subjected to syntax validation and logical aggregation to obtain the target formal security attribute description data.

4. The method according to claim 3, characterized in that, The step of inputting the target associated data into the formal protocol model, so that the formal protocol model performs a formal description of the target associated data to obtain the corresponding initial formal security attribute description data, includes: The target associated data is input into the formal protocol model, so that the formal protocol model performs mandatory security attribute symbol variable matching mapping on the target associated data based on predefined legal symbols and variables, and obtains the initial formal security attribute description data using a preset security attribute classification and grading description template.

5. The method according to claim 1, characterized in that, Before the target security protocol document is identified and extracted based on the preset domain-specific identification and extraction model to obtain the security target statement corresponding to the target security protocol document, the process includes: The acquired original security protocol document is input into the first large language model so that the first large language model can perform multiple rounds of extraction on the original security protocol document to generate initial security target candidate statements; Based on a pre-defined formal analysis of the literature, annotation experts are able to annotate the initial security target candidate statements to obtain the corresponding annotation result data. The annotation results are checked and standardized to obtain the expert-level annotation dataset.

6. The method according to claim 1, characterized in that, The random negative downsampling strategy includes: using a specific proportion of protocol operational logic paragraphs as negative samples to calibrate the decision boundary of the basic large language model in order to obtain the domain-specific recognition and extraction model.

7. A system for automatic extraction and formal transformation of security objectives in a security protocol, characterized in that, The system includes: The dataset building module is used to build expert-level labeled datasets; The model fine-tuning module is used to fine-tune the preset basic large language model based on the expert-level labeled dataset using a random negative downsampling strategy to obtain a domain-specific recognition and extraction model. The security target identification and extraction module is used to identify and extract target security protocol documents based on a preset domain-specific identification and extraction model, so as to obtain the security target statement corresponding to the target security protocol document. The security target formalization module is used to perform formal transformation processing on the security target statement based on a preset formal protocol model, and generate corresponding target formal security attribute description data.

8. An electronic device, characterized in that, It includes a processor and a memory; when the processor executes the running program stored in the memory, it implements the method for automatic extraction and formal transformation of security objectives in the security protocol as described in any one of claims 1 to 6.

9. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the method for automatic extraction and formal transformation of security objectives in the security protocol according to any one of claims 1 to 6.

10. A computer program product, comprising a computer program, characterized in that, When executed by a processor, the computer program implements the method for automatic extraction and formal transformation of security objectives in the security protocol as described in any one of claims 1 to 6.