A diversified seed generation method for fuzz testing of multimedia protocols based on a large model
By analyzing the source code of multimedia protocols using a large language model, diverse fuzz test seeds are generated, solving the problem of insufficient resource type coverage in existing technologies and achieving more efficient fuzz testing and code coverage.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- NAT UNIV OF DEFENSE TECH
- Filing Date
- 2025-08-13
- Publication Date
- 2026-06-30
AI Technical Summary
Existing fuzzing tools fail to adequately cover resource types in multimedia protocols, resulting in low fuzzing coverage and an inability to discover potential vulnerabilities.
The target program source code is analyzed using a large language model to extract all resource file types and generate corresponding request URLs. Fuzzy test seeds are generated based on actual communication traffic to automatically fill in missing resource file types.
It improves the seed diversity of multimedia protocol fuzzing, saves manual analysis costs, and enhances code coverage and fuzzing efficiency.
Smart Images

Figure CN121887689B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of fuzz testing technology, and in particular relates to a method for generating diverse seeds for fuzz testing of multimedia protocols based on a large model. Background Technology
[0002] With the rapid development of internet technology, network protocols, as standards for communication and information exchange between entities in computer networks, are becoming increasingly important. Multimedia protocols refer to the protocols followed when transmitting and processing multimedia data in computer networks, and are widely used in real-time communication, video software, and other fields. However, as application requirements continue to evolve, the complexity of multimedia protocols is also increasing. This leads to potential vulnerabilities introduced by developers due to misunderstandings of protocol specifications, coding errors, and other reasons. Attackers can exploit these vulnerabilities to access or damage target systems without authorization, thus threatening system security.
[0003] Rapidly discovering vulnerabilities in network protocol programs is a crucial means of maintaining network security. Among these, fuzz testing is currently the most effective automated vulnerability discovery technique. Its basic idea is to automatically or semi-automatically generate a large amount of random data and input this random data into a dynamically running target program. By monitoring whether any abnormal behavior occurs in the target program, software vulnerabilities can be discovered. It has advantages such as high automation and low false positive rate, and is widely used in vulnerability discovery for various software such as file processing and network protocols.
[0004] However, multimedia protocols are sensitive to and diverse in terms of resource types, requiring the use of multiple test cases during fuzzing. Existing fuzzing tools, when testing multimedia protocols, do not consider the diversity of resource types in the seed library, resulting in generated seeds covering a limited number of resource types. Consequently, the fuzzing tools fail to adequately test the resource types that the program may handle, limiting the coverage of the target program code by the fuzzing.
[0005] It is evident that generating corresponding request seeds for various resource types in multimedia protocols, thereby constructing a diverse fuzzing seed library, is an important means of improving fuzzing coverage. Increasing fuzzing coverage helps to discover more potential vulnerabilities.
[0006] From the perspective of existing technology, there are two types of fuzzing schemes for multimedia protocol programs: black-box fuzzing technology and gray-box fuzzing technology. These two types of schemes will be described in detail below.
[0007] The first type of approach is black-box fuzz testing. This approach abstracts the target program into a black box with an unknown internal structure. It does not consider the program's internal structure and characteristics. It intercepts the actual traffic of the network protocol program as the initial seed, directly inputs the test cases generated by the mutation of the initial seed into the target program's interaction interface, and then observes whether the target program produces unexpected behavior.
[0008] like Figure 1 As shown, the first type of solution generates test cases by randomly mutating the captured real traffic, inputs the test cases into the user interface of the target program, and finally observes whether the target program exhibits abnormal behavior to determine whether a vulnerability has been found.
[0009] The first type of approach is fast and easy to understand, making it easy for testers to extend to any binary program. However, this approach has two main problems. First, it is inefficient; randomly mutating test cases may generate a large number of invalid test cases that do not conform to the program's input constraints. To avoid this, a lot of manual intervention is needed to constrain the input test cases based on relevant prior knowledge. Second, this approach makes it difficult for testers to analyze the reasons for abnormal behavior in the target program and understand the program's specific operating logic.
[0010] The second type of approach is gray-box fuzzing. This approach involves testers possessing partial information about the program's internal structure and implementation. They use internal program state information (such as code coverage) to guide test input generation, and continuously optimize the seed based on coverage feedback, allowing the gray box to gradually reach deeper into the code. This type of approach has been extensively researched in existing technologies.
[0011] like Figure 2 As shown, the second type of approach instrumentes the target program's source code, that is, inserts code segments at some controllable locations, collects dynamic context information, then randomly mutates to generate test cases and inputs them into the target program's interaction interface, observes whether the target program exhibits any abnormalities to determine if vulnerabilities exist, and provides feedback based on coverage to guide the mutation of test cases.
[0012] The second type of approach is widely used in current network protocol fuzzing processes. However, for multimedia protocol programs that are sensitive to and diverse in resource types, multiple types of test cases are required for fuzzing. This approach, like the first type, does not pay attention to the diversity of resource types in the seed library, resulting in fewer resource types covered by the generated seeds. Consequently, the fuzzing tool fails to adequately test the resource types that the program may handle, thus limiting the coverage of the fuzzing test. Summary of the Invention
[0013] This invention proposes a diversified seed generation scheme (including method, system, electronic device, and storage medium) for fuzz testing of multimedia protocols based on a large model, aiming to address the shortcomings of existing technologies.
[0014] The first aspect of this invention proposes a method for generating diverse seeds for fuzz testing of multimedia protocols based on a large model, the method comprising:
[0015] Step S1: Input the target program source code into the large language model;
[0016] Step S2: Use a large language model to extract all resource file types involved in the target program source code;
[0017] Step S3: Use the large language model to generate a corresponding request URL for each type of resource file;
[0018] Step S4: Capture the request traffic in actual communication and generate a fuzz test seed for the request traffic based on the request URL.
[0019] According to the method of the first aspect of the present invention, in step S1, the target program source code is the source code context related to resource file processing in the target program, and the target program is a multimedia program.
[0020] According to the method of the first aspect of the present invention, in step S2, a large language model is driven by constructing a corresponding prompt word template, thereby extracting all resource file types involved in the target program source code.
[0021] According to the method of the first aspect of the present invention, in step S3, a large language model is driven by constructing corresponding prompt word templates, thereby generating a corresponding request URL for each type of resource file; specifically including:
[0022] Retrieve N types of resource files and generate a corresponding request URL for each type of resource file; or
[0023] Select resource file Fi of type i from N types of resource files, generate a request URL for resource file Fi, denoted as URL(Fi), and extract the URL template from URL(Fi). Fill the URL template with relevant information of resource files of other types N-1, thereby obtaining the request URLs of resource files of other types N-1.
[0024] According to the method of the first aspect of the present invention, in step S4, request traffic in actual communication is captured, a URL payload is extracted from the request traffic, and the URL payload extracted from the request traffic is replaced with the request URL generated in step S3, thereby generating a fuzz test seed for the request traffic.
[0025] According to the method of the first aspect of the present invention, the method further includes: determining whether the initial seed library covers all resource file types; if not, adding corresponding resource files for the missing resource file types for performing subsequent fuzz testing.
[0026] According to the method of the first aspect of the present invention, the method further includes: determining whether the initial seed library contains a fuzzy test seed for the request traffic; if not, adding the fuzzy test seed to the initial seed library.
[0027] A second aspect of this invention proposes a diversified seed generation system for fuzz testing of multimedia protocols based on a large model, the system comprising a processing unit configured to perform:
[0028] Input the target program source code into the large language model;
[0029] Use a large language model to extract all resource file types involved in the target program's source code;
[0030] The large language model is invoked to generate corresponding request URLs for each type of resource file;
[0031] Capture request traffic in actual communication and generate fuzz test seeds for the request traffic based on the request URL.
[0032] A third aspect of this invention discloses an electronic device. The electronic device includes a memory and a processor. The memory stores a computer program, and when the processor executes the computer program, it implements a method for generating diverse seeds for fuzzy testing of multimedia protocols based on a large model, as disclosed in the first aspect of this invention.
[0033] A fourth aspect of this invention discloses a computer-readable storage medium. The computer-readable storage medium stores a computer program, which, when executed by a processor, implements a method for generating diverse seeds for fuzz testing of multimedia protocols based on a large model, as described in the first aspect of this disclosure.
[0034] This invention, from the perspective of practical needs and applications, proposes a diversified seed generation scheme for multimedia protocol fuzzing based on a large model. This invention primarily utilizes the analytical capabilities of a large language model to comprehensively and accurately analyze the multimedia resource types supported by the source code of the program under test, and then uses this analysis combined with methods such as structured mutation to construct diversified seeds. This invention solves the problem in existing technologies where the initial seed library contains a limited number of resource types, resulting in fuzzing tools being unable to fully test the target program. Attached Figure Description
[0035] To more clearly illustrate the specific embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the specific embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.
[0036] Figure 1 This is a flowchart illustrating the first type of solution in the prior art.
[0037] Figure 2 This is a flowchart illustrating the second type of solution in the prior art.
[0038] Figure 3 This is a flowchart illustrating a method for generating diverse seeds for fuzz testing of multimedia protocols based on a large model, as proposed in this invention.
[0039] Figure 4 This is a flowchart illustrating the second embodiment of the present invention. Detailed Implementation
[0040] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0041] The definitions of key terms involved in this invention include:
[0042] Large language model: refers to a deep learning model trained using a large amount of text data.
[0043] Fuzz testing: refers to the most effective automated vulnerability discovery technique currently available. Its basic idea is to automatically or semi-automatically generate a large amount of random data and input this random data into a dynamically running target program. By monitoring whether any abnormal situations occur in the target program, software vulnerabilities can be discovered.
[0044] Seed: In the fuzzing process, the initial test case is generated by mutating the seed.
[0045] Multimedia protocol: refers to a multimedia transmission protocol, a standardized method for transmitting audio and video data over a network.
[0046] The first aspect of this invention proposes a method for generating diverse seeds for fuzz testing of multimedia protocols based on a large model, such as... Figure 3As shown, the method includes:
[0047] Step S1: Input the target program source code into the large language model;
[0048] Step S2: Use a large language model to extract all resource file types involved in the target program source code;
[0049] Step S3: Use the large language model to generate a corresponding request URL for each type of resource file;
[0050] Step S4: Capture the request traffic in actual communication and generate a fuzz test seed for the request traffic based on the request URL.
[0051] According to the method of the first aspect of the present invention, in step S1, the target program source code is the source code context related to resource file processing in the target program, and the target program is a multimedia program.
[0052] According to the method of the first aspect of the present invention, in step S2, a large language model is driven by constructing a corresponding prompt word template, thereby extracting all resource file types involved in the target program source code.
[0053] According to the method of the first aspect of the present invention, in step S3, a large language model is driven by constructing corresponding prompt word templates, thereby generating a corresponding request URL for each type of resource file; specifically including:
[0054] Retrieve N types of resource files and generate a corresponding request URL for each type of resource file; or
[0055] Select resource file Fi of type i from N types of resource files, generate a request URL for resource file Fi, denoted as URL(Fi), and extract the URL template from URL(Fi). Fill the URL template with relevant information of resource files of other types N-1, thereby obtaining the request URLs of resource files of other types N-1.
[0056] According to the method of the first aspect of the present invention, in step S4, request traffic in actual communication is captured, a URL payload is extracted from the request traffic, and the URL payload extracted from the request traffic is replaced with the request URL generated in step S3, thereby generating a fuzz test seed for the request traffic.
[0057] According to the method of the first aspect of the present invention, the method further includes: determining whether the initial seed library covers all resource file types; if not, adding corresponding resource files for the missing resource file types for performing subsequent fuzz testing.
[0058] According to the method of the first aspect of the present invention, the method further includes: determining whether the initial seed library contains a fuzzy test seed for the request traffic; if not, adding the fuzzy test seed to the initial seed library.
[0059] In some embodiments, extracting all resource file types from the source code using a large language model (step S2) and generating the corresponding request URI using a large language model (step S3) respectively require constructing corresponding prompt word templates to drive the large language model to complete the extraction work.
[0060] First Embodiment
[0061] Input the source code context related to resource file processing in the target program into the large language model.
[0062] The constructed prompt word template drives a large language model to extract all resource file types in the target source code. The prompt word template is, for example: {"role":"user", "content":"Analyze which of these files handle various multimedia resources. Return the file names:{file_name}"}.
[0063] The constructed prompt word templates are used to drive a large language model to generate URIs for resource file types that are missing from the seed library in the target source code.
[0064] Prompt word template is for example: {"role":"user", "content":"Generate the URI of the missing resource file type by following the file path generated above. Returnthe file paths:{file_path}"}.
[0065] Capture request traffic during actual communication, replace the URI payload in the real traffic with the generated URI, and generate the seed corresponding to the missing resource file.
[0066] For files with missing resource types, the path where the resource files are stored is found by traversing and analyzing the file directory of the source code, and the corresponding resource type file is obtained from the Internet and placed in that path.
[0067] Second Embodiment
[0068] A company plans to develop a multimedia protocol program based on Live555 and now needs to conduct security testing on the program. When the official website developers used traditional fuzzing tools for security testing, the seed library provided was missing many resource file types supported by Live555. Therefore, by applying the technology of this invention, the code context related to resource file processing in the project source code is input into a large language model, and a pre-written prompt word template is used to obtain the missing file types and their corresponding URIs. Then, real traffic is captured, and the URI payload is replaced to generate seeds. For missing file types, the source code's file directory is traversed and analyzed to find the path where the resource files are stored, and the corresponding resource type files are obtained from the Internet and placed in that path. Security testers save a significant amount of time manually extracting missing resource file types and constructing corresponding seeds, achieving better fuzzing efficiency and code coverage.
[0069] like Figure 4 As shown, the specific process includes:
[0070] Input the target source code into the large language model.
[0071] The missing file types and URIs are extracted using the prompt words, as detailed below:
[0072] MP4 and rtsp: / / live555:live555@xxx.com:8090 / Source / xxx.mp4
[0073] Capture real traffic and use the generated URI to replace the URI payload to generate a seed.
[0074] GET rtsp: / / live555:live555@xxx.com:8090 / Source / xxx.mp4 RTSP / 1.0
[0075] Cseq: 2
[0076] User-Agent: MyClient
[0077] Traverse the source code file directory and retrieve the missing files, placing them in the corresponding directory / Source / xxx.mp4.
[0078] Complete the generation of diverse seeds.
[0079] A second aspect of this invention proposes a diversified seed generation system for fuzz testing of multimedia protocols based on a large model, the system comprising a processing unit configured to perform:
[0080] Input the target program source code into the large language model;
[0081] Use a large language model to extract all resource file types involved in the target program's source code;
[0082] The large language model is invoked to generate corresponding request URLs for each type of resource file;
[0083] Capture request traffic in actual communication and generate fuzz test seeds for the request traffic based on the request URL.
[0084] A third aspect of this invention discloses an electronic device. The electronic device includes a memory and a processor. The memory stores a computer program, and when the processor executes the computer program, it implements a method for generating diverse seeds for fuzzy testing of multimedia protocols based on a large model, as disclosed in the first aspect of this invention.
[0085] A fourth aspect of this invention discloses a computer-readable storage medium. The computer-readable storage medium stores a computer program, which, when executed by a processor, implements a method for generating diverse seeds for fuzz testing of multimedia protocols based on a large model, as described in the first aspect of this disclosure.
[0086] This invention, from the perspective of practical needs and applications, proposes a diversified seed generation scheme for multimedia protocol fuzzing based on a large model. This invention primarily utilizes the analytical capabilities of a large language model to comprehensively and accurately analyze the multimedia resource types supported by the source code of the program under test, and then uses this analysis combined with methods such as structured mutation to construct diversified seeds. This invention solves the problem in existing technologies where the initial seed library contains a limited number of resource types, resulting in fuzzing tools being unable to fully test the target program.
[0087] Compared with the prior art, the present invention has three main technical advantages: (1) it increases the diversity of fuzz test seeds for multimedia protocol programs; (2) it saves the cost of manually analyzing source code to extract relevant seeds; and (3) it increases the code coverage in the fuzz test process.
[0088] Firstly, this invention can automatically generate diverse seeds using a large language model when fuzzing multimedia protocol programs is performed, and the code coverage of fuzzing is limited by the lack of seed types, thus comprehensively covering various resource file types processed by multimedia protocol programs.
[0089] Secondly, this invention can save the cost of manually analyzing source code to extract relevant seeds. In the traditional process of fuzz testing of multimedia protocols, it is necessary to generate diverse seeds, which requires manual auditing of the project source code, finding missing resource file types and manually constructing seeds. The process is tedious and prone to errors. This invention automates this process.
[0090] Thirdly, this invention can improve code coverage during fuzzing because the generation of seeds for missing resource file types can trigger the code logic in the source code that processes the corresponding resource files, effectively improving code coverage.
[0091] Please note that the technical features of the above embodiments can be combined arbitrarily. For the sake of brevity, not all possible combinations of the technical features in the above embodiments have been described. However, as long as the combination of these technical features does not contradict each other, it should be considered within the scope of this specification. The above embodiments only illustrate several implementations of the present invention, and their descriptions are relatively specific and detailed, but they should not be construed as limiting the scope of the invention patent. It should be pointed out that for those skilled in the art, several modifications and improvements can be made without departing from the concept of the present invention, and these all fall within the protection scope of the present invention. Therefore, the protection scope of the present invention should be determined by the appended claims.
Claims
1. A method for generating diverse seeds for fuzz testing of multimedia protocols based on a large model, characterized in that, The method includes: Step S1: Input the target program source code into the large language model; Step S2: Use a large language model to extract all resource file types involved in the target program source code; Step S3: Use the large language model to generate a corresponding request URL for each type of resource file; Step S4: Capture the request traffic in actual communication and generate a fuzz test seed for the request traffic based on the request URL.
2. The method for generating diverse seeds for fuzz testing of multimedia protocols based on a large model according to claim 1, characterized in that, In step S1, the target program source code is the source code context related to resource file processing in the target program, and the target program is a multimedia program.
3. The method for generating diverse seeds for fuzzy testing of multimedia protocols based on a large model according to claim 2, characterized in that, In step S2, the large language model is driven by constructing corresponding prompt word templates, thereby extracting all resource file types involved in the target program source code.
4. The method for generating diverse seeds for multimedia protocol fuzz testing based on a large model according to claim 3, characterized in that, In step S3, the large language model is driven by constructing corresponding prompt word templates, thereby generating corresponding request URLs for each type of resource file; specifically including: Retrieve N types of resource files and generate a corresponding request URL for each type of resource file; or Select resource file Fi of type i from N types of resource files, generate a request URL for resource file Fi, denoted as URL(Fi), and extract the URL template from URL(Fi). Fill the URL template with relevant information of resource files of other types N-1, thereby obtaining the request URLs of resource files of other types N-1.
5. The method for generating diverse seeds for fuzz testing of multimedia protocols based on a large model according to claim 4, characterized in that, In step S4, the request traffic in the actual communication is captured, the URL payload is extracted from the request traffic, and the URL payload extracted from the request traffic is replaced with the request URL generated in step S3, thereby generating a fuzz test seed for the request traffic.
6. The method for generating diverse seeds for multimedia protocol fuzz testing based on a large model according to claim 5, characterized in that, The method further includes: determining whether the initial seed library covers all resource file types; if not, adding corresponding resource files for the missing resource file types for subsequent fuzz testing.
7. The method for generating diverse seeds for multimedia protocol fuzz testing based on a large model according to claim 6, characterized in that, The method further includes: determining whether the initial seed library contains a fuzzy test seed for the request traffic; if not, adding the fuzzy test seed to the initial seed library.