Sample detection method, file detection method, computing device, storage medium and program product

By identifying the script behavior of the samples to be detected using a large model and combining it with abnormal behavior rules, the accuracy problem of malicious script detection in existing technologies has been solved, achieving efficient and accurate sample detection while reducing resource consumption.

WO2026138133A1PCT designated stage Publication Date: 2026-07-02CLOUD INTELLIGENCE ASSETS HOLDING (SINGAPORE) PTE LTD +1

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
CLOUD INTELLIGENCE ASSETS HOLDING (SINGAPORE) PTE LTD
Filing Date
2025-10-28
Publication Date
2026-07-02

Smart Images

  • Figure CN2025130699_02072026_PF_FP_ABST
    Figure CN2025130699_02072026_PF_FP_ABST
Patent Text Reader

Abstract

Provided in the embodiments of the present disclosure are a sample detection method, a file detection method, a computing device, a storage medium and a program product. The method comprises: acquiring a sample to be checked; determining whether a target sample satisfying a similarity condition with the sample to be checked is stored; if a target sample satisfying the similarity condition with the sample to be checked is present, using a recognition result stored corresponding to the target sample as a first recognition result of the sample to be checked; if no target sample satisfying the similarity condition with the sample to be checked is present, using a large model to perform recognition on a script behavior of the sample to be checked, so as to obtain a first recognition result, and storing the sample to be checked and the first recognition result in correspondence; and, on the basis of the first recognition result, detecting whether the sample to be checked is abnormal. The technical solution provided in the embodiments of the present disclosure achieves effective and accurate detection for samples, reduces resource consumption and reduces detection costs.
Need to check novelty before this filing date? Find Prior Art

Description

Sample detection methods, document detection methods, computing devices, storage media and software products

[0001] This disclosure claims priority to Chinese Patent Application No. 202411948161.3, filed with the China Patent Office on December 26, 2024, entitled “Sample Detection Method, Document Detection Method, Computing Device, Storage Medium and Program Product”, the entire contents of which are incorporated herein by reference. Technical Field

[0002] This disclosure relates to the field of data security technology, and in particular to a sample detection method, a document detection method, a computing device, a storage medium, and a program product. Background Technology

[0003] As attack methods continue to evolve, the complexity and diversity of malicious scripts have increased significantly. Malicious scripts refer to script programs with offensive intent, which typically have self-replication, propagation, and destructive behaviors, causing security risks.

[0004] Malicious scripts are often embedded in samples of various file types, such as script files, executable files, and document files, to spread and cause harm and damage to systems and networks. Therefore, it is necessary to detect these samples as malicious scripts to determine whether the samples are abnormal.

[0005] Therefore, how to effectively and accurately reduce the cost of sample testing is a technical problem that urgently needs to be solved. Summary of the Invention

[0006] This disclosure provides a sample detection method, a document detection method, a computing device, a storage medium, and a program product to solve the technical problem that the prior art cannot achieve effective and accurate detection.

[0007] In a first aspect, this disclosure provides a sample detection method, including:

[0008] Obtain the sample to be tested;

[0009] Check whether there are any target samples that meet similar conditions to the sample to be detected;

[0010] If there is a target sample that meets similar conditions to the sample to be detected, the recognition result corresponding to the target sample is used as the first recognition result of the sample to be detected;

[0011] If no target sample exists that meets similar conditions to the sample to be detected, the script behavior of the sample to be detected is identified using a large model to obtain a first identification result, and the sample to be detected is saved in correspondence with the first identification result.

[0012] Based on the first identification result, it is determined whether the sample to be detected is abnormal.

[0013] Secondly, this disclosure provides a sample detection method, including:

[0014] Obtain the sample to be tested;

[0015] The script behavior of the sample to be detected is identified using a large model to obtain a first identification result;

[0016] Based on the first identification result, it is determined whether the sample to be detected is abnormal.

[0017] Thirdly, this disclosure provides a file detection method, including:

[0018] Retrieve program files from the cloud server;

[0019] Check if a target file that meets similar conditions to the program file is saved;

[0020] If no target file exists that meets similar conditions to the program file, the script behavior of the program file is identified using a large model to obtain a first identification result, and the program file is saved in correspondence with the first identification result.

[0021] Based on the first identification result, it is determined whether the program file is abnormal.

[0022] Fourthly, this disclosure provides a computing device, including a processing component and a storage component;

[0023] The storage component stores one or more computer instructions; the one or more computer instructions are invoked and executed by the processing component to implement the sample detection method as described in the first aspect above, or the sample detection method as described in the second aspect above, or the file detection method as described in the third aspect above.

[0024] Fifthly, this disclosure provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processing component, implements the sample detection method as described in the first aspect, the sample detection method as described in the second aspect, or the document detection method as described in the third aspect.

[0025] Sixthly, this disclosure provides a computer program product, including a computer program / instruction, which, when executed by a processing component, implements the sample detection method as described in the first aspect above, or the sample detection method as described in the second aspect above, or the file detection method as described in the third aspect above.

[0026] In this embodiment, a large model is used to identify script behaviors in the sample to be detected, and a first identification result is obtained. Based on the first identification result, it can be determined whether the sample to be detected is abnormal. The script behaviors identified by the large model realize dynamic analysis of the sample to be detected. With the powerful understanding ability of the large model, the script behaviors can be automatically, accurately and efficiently analyzed without manual analysis, thereby improving the effectiveness and accuracy of detection. Moreover, the identification results of the large model can be saved in correspondence with the sample to be detected. Thus, the sample to be detected can first find target samples that meet similar conditions from the saved data, and then use the identification results corresponding to the target samples as the identification results of the sample to be detected. This eliminates the need to call the large model for identification again. Under the premise of ensuring effective and accurate detection, resource consumption and detection costs can be reduced.

[0027] These or other aspects of this disclosure will become more apparent in the following description of embodiments. Attached Figure Description

[0028] The accompanying drawings, which are included to provide a further understanding of this disclosure and form part of this disclosure, illustrate exemplary embodiments of the present disclosure and are used to explain the disclosure, but do not constitute an undue limitation of the disclosure. In the drawings:

[0029] Figure 1 shows a flowchart of an embodiment of a sample detection method provided in this disclosure;

[0030] Figure 2 shows a flowchart of yet another embodiment of a sample detection method provided in this disclosure;

[0031] Figure 3 illustrates a schematic diagram of the large model recognition process in one possible implementation of an embodiment of this disclosure;

[0032] Figure 4 illustrates a schematic diagram of the prompt message generation process in one possible implementation of an embodiment of this disclosure.

[0033] Figure 5 illustrates a schematic diagram of the file detection process of a program file in a practical application according to an embodiment of the present disclosure;

[0034] Figure 6 shows a schematic diagram of the interface in a practical application of an embodiment of the present disclosure;

[0035] Figure 7 shows a schematic diagram of the structure of an embodiment of a sample detection device provided in this disclosure;

[0036] Figure 8 shows a schematic diagram of the structure of one embodiment of a computing device provided in this disclosure. Detailed Implementation

[0037] To make the objectives, technical solutions, and advantages of this disclosure clearer, the technical solutions of this disclosure will be clearly and completely described below in conjunction with specific embodiments and corresponding drawings. Obviously, the described embodiments are only a part of the embodiments of this disclosure, and not all of them. All other embodiments obtained by those skilled in the art based on the embodiments of this disclosure without creative effort are within the scope of protection of this disclosure.

[0038] It should be noted that, in the cases involving user information in the embodiments of this disclosure, the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, stored data, displayed data, etc.) involved in the embodiments of this disclosure are all information and data authorized by the user or fully authorized by all parties. Furthermore, the collection, use, and processing of related data must comply with the relevant laws, regulations, and standards of the relevant countries and regions, and corresponding operation entry points are provided for users to choose to authorize or refuse. In addition, the various models involved in this disclosure (including but not limited to language models or large models) comply with relevant laws and standards.

[0039] The technical solutions of this disclosure can be applied to sample detection scenarios to determine whether a sample has abnormal issues such as malicious scripts. The samples described in this disclosure may refer to those that are easily embedded with malicious scripts, thereby hiding in seemingly harmless script files, executable files, document files, configuration files, or archive files, etc.

[0040] In the process of realizing this disclosure, the inventors discovered that in traditional methods, static analysis is usually used, such as checking the code structure in the sample to achieve detection. However, since static analysis cannot determine code behavior, this method still cannot achieve effective and accurate sample detection.

[0041] To achieve effective and accurate sample detection, the inventors discovered that with the rapid development of large-scale model technology, the powerful contextual understanding capabilities of large-scale models enable them to understand the semantic structure of code, not just its syntax. They can identify logical relationships, function call chains, control flow, etc., thus accurately determining the behavior of the code. Furthermore, large-scale models can process a large number of samples in a short time, achieving efficient detection. Automated detection reduces reliance on manual analysis, improving detection efficiency and consistency. Therefore, the inventors proposed the technical solution of this disclosure, which uses large-scale models to identify script behavior in the samples to be detected and uses the identification results to detect the samples, achieving effective and accurate sample detection.

[0042] Furthermore, the inventors, building upon the use of large models for detection, have undertaken a series of considerations: Since large models consume significant computing resources, and in some application scenarios, hundreds, thousands, or even hundreds of millions of samples need to be processed, potentially increasing detection costs and placing high demands on computing resources, their practical deployment presents certain challenges. Therefore, based on the aforementioned innovative thinking, the inventors propose that the recognition results of the large model can be stored in correspondence with the samples to be detected. The sample to be detected can then first search for target samples that meet similar conditions from the stored data. The recognition results corresponding to the target samples can then be used as the recognition results for the sample to be detected. This eliminates the need to call the large model for recognition. By leveraging text similarity characteristics, resource consumption and detection costs can be reduced while ensuring effective and accurate detection.

[0043] It should be noted that the technical solutions of this disclosure are applicable to virtual network environments, and the users described generally refer to "virtual users." Real users can register user accounts on the server through registration to obtain user identities in the network environment. Furthermore, the users referred to herein can be individuals or organizations, such as enterprises, and this disclosure does not impose specific limitations in this regard.

[0044] In a practical application, the technical solution of this disclosure can be applied to a cloud computing scenario to detect samples involved in the cloud computing scenario.

[0045] Cloud computing is one of the fastest-growing trends in computer technology, involving the provision of hosted services over a network. A cloud computing environment provides computing and storage resources as a service to end users. End users can then request processing from the provided services. The processing capacity of these services is typically limited by the available resources.

[0046] It should be understood that although this disclosure includes a detailed description of cloud computing, the implementation of the teachings herein is not limited to a cloud computing environment. Rather, embodiments of this disclosure can be implemented in conjunction with any other type of computing environment now known or developed hereafter.

[0047] Cloud computing is a service delivery model designed to enable on-demand network access to a shared pool of configurable computing resources (e.g., networks, network bandwidth, servers, processing, memory, storage, applications, virtual machines, and services) that can be rapidly provisioned and deployed with minimal management effort or interaction with service providers.

[0048] In cloud computing scenarios, samples requiring detection can refer to program files within cloud computing products purchased by users, such as ECS (Elastic Compute Service) servers. These include operating system files, application files such as web application code written in languages ​​like PHP, Python, and Java, log files, login files, etc. Attackers can embed malicious scripts in these files to perform malicious activities such as cryptocurrency mining for profit. The technical solution adopted in this disclosure can achieve effective and accurate sample detection.

[0049] The technical solutions of the embodiments of this disclosure will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this disclosure, and not all embodiments. Based on the embodiments of this disclosure, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this disclosure.

[0050] Figure 1 is a flowchart of an embodiment of a sample detection method provided by this disclosure. The technical solution of this embodiment can be applied to the server.

[0051] The server-side can provide various services, and can be implemented as a distributed server cluster composed of multiple servers or as a single server. The server can also be a server in a distributed system, or a server integrated with blockchain. The server can also be a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (CDNs), and big data and artificial intelligence platforms, or an intelligent cloud computing server or intelligent cloud host with artificial intelligence technology, etc., and this disclosure does not limit this.

[0052] The sample detection method described in the embodiment shown in Figure 1 may include the following steps:

[0053] 101: Obtain the sample to be tested.

[0054] The sample to be tested can refer to, for example, program files; document files such as Word (an electronic document format), PDF (an electronic document format), etc.; archive files such as ZIP (a file format for data compression and document storage), etc. Among them, program files can include, for example, script files such as HTML (Hypertext Markup Language) files, XML (Extensible Markup Language) files, JSON (a lightweight data interchange format), etc.; executable files such as DLL (Dynamic Link Library) files, etc.; script interpreter files such as Python (a programming language) files; configuration files such as INI (Initialization) files, etc. Of course, this is only an example, and this disclosure is not limited to these.

[0055] This could be in response to a sample detection request, which involves acquiring the sample to be detected.

[0056] In one alternative approach, the sample detection request may be triggered by a user, and the sample detection request may include a sample to be detected provided by the user. The user can upload the sample to be detected using a front-end tool such as a client.

[0057] In another alternative approach, the sample detection request may be generated, for example, upon detecting a specific event, such as the publication of the sample to be detected or the execution of the sample to be detected.

[0058] In another alternative approach, the sample detection request can be generated periodically by the server. The server can generate the sample detection request at predetermined intervals. The sample to be detected can be any sample corresponding to the object to be detected. The object to be detected can be, for example, a user or a cloud server in a cloud computing scenario, thereby realizing timed sample detection.

[0059] 102: Use a large model to identify the script behavior of the sample to be detected and obtain the first identification result.

[0060] Large models can refer to machine learning models with a large number of parameters and complex structures, capable of processing massive amounts of data and completing various complex tasks, such as natural language processing, computer vision, and speech recognition.

[0061] Large models can be implemented using large language models (LLMs) or large multimodal models (LMMs), such as generative pre-trained models or BERT (Bidirectional Encoder Representation from Transformers). This disclosure does not limit the implementation in this regard.

[0062] The technical solution of this disclosure can use the powerful understanding ability of large models to identify the sample to be detected, thereby determining the script behavior in the sample to be detected and obtaining a first identification result.

[0063] The first identification result can include at least one script behavior corresponding to the sample to be detected. Of course, the first identification result can also be empty, indicating that no script behavior was detected from the sample to be detected.

[0064] Script behavior refers to the various operations and activities exhibited by a script during execution. These behaviors can be legal (normal) or malicious (abnormal). Understanding script behavior is crucial for detecting and preventing malicious scripts and ensuring system and data security. Script behavior can include, for example, system operations such as file copying, file creation, file deletion, file compression, file attribute modification, and operating system information collection; network activities such as network communication and network service discovery; process management such as process creation, adding root users, sudo privileges (superuser or administrator privileges) invocation, and creating or modifying system-level processes and services; encryption and obfuscation such as key generation, key file modification, and encoding obfuscation; registry and environment variable operations such as setting environment variables and modifying the registry; and anti-debugging and sandbox detection such as sandbox bypass and privilege escalation exploitation. Of course, this is just an example, and this disclosure is not limited to these.

[0065] Among them, the large model can identify all script behaviors, or it can only identify abnormal script behaviors. It can also send corresponding instructions to the large model based on the actual situation.

[0066] In the case where the sample to be detected is a program file, the script behavior identified by the large model may include the code behavior executed by the sample itself, or the code behavior executed by the malicious script embedded in the sample; in the case where the sample to be detected is not a program file, the script behavior identified by the large model is the code behavior executed by the malicious script embedded in the sample.

[0067] Large-scale models are typically pre-trained models, trained on massive datasets and possessing rich linguistic knowledge. Through transfer learning, they can quickly adapt to malicious script detection tasks, thereby reducing training time and resource consumption, and exhibiting strong generalization capabilities. Pre-trained large-scale models can better generalize to unseen data, improving the detection capability for new threats.

[0068] To further improve the accuracy of identification, the large model can also be fine-tuned and trained in this embodiment. Through continuous learning and updating, the large model can learn common patterns and characteristics of scripts, identify malicious scripts that have been obfuscated, encrypted or mutated, and enhance its defense against adversarial attacks.

[0069] Therefore, this large model can be obtained through fine-tuning using training data. The training data may include training scripts and the corresponding script behaviors, where the training scripts can be malicious or normal scripts, etc. The script behaviors of the training scripts can refer to various operations generated during the execution of the training data. Optionally, the method may further include:

[0070] Obtain the training script and its corresponding script behavior; train a large model based on the training script and its corresponding script behavior.

[0071] Alternatively, the sample to be detected and its corresponding script behavior can also be used as training data to further fine-tune the large model.

[0072] 103: Based on the first identification result, determine whether the sample to be detected is abnormal.

[0073] The first identification result may include at least one script behavior corresponding to the sample to be detected. For example, it can be determined whether the sample to be detected is abnormal by judging whether there is an abnormal script behavior in at least one script behavior.

[0074] Of course, in practical applications, whether a sample to be detected is abnormal can be determined by judging whether at least one script behavior meets the abnormal behavior judgment rule. Therefore, optionally, the above-mentioned detection of whether a sample to be detected is abnormal based on the first identification result may include: determining the detection result of the sample to be detected according to the abnormal behavior rule hit by at least one script behavior of the sample to be detected. In practical applications, multiple abnormal behavior rules can be preset, each defining one or more abnormal script behaviors. Thus, whether a sample to be detected is abnormal can be determined by individual or combined script behaviors. If at least one script behavior of the sample to be detected hits any one of the abnormal behavior rules, the sample to be detected can be determined to be abnormal; otherwise, the sample to be detected is normal. Each of the multiple abnormal behavior rules can be set with a corresponding abnormal behavior level, so that the abnormal behavior level of the sample to be detected can be determined according to the abnormal behavior rule hit by the sample. The abnormal behavior level may include, for example, a high-risk level, a medium-risk level, and a low-risk level. By combining abnormal behavior rules for script behavior judgment, the effectiveness and accuracy of sample detection can be guaranteed. Moreover, the abnormal behavior rules can be set according to actual needs, making them highly versatile and able to meet different practical requirements.

[0075] In this embodiment, a large model is used to identify script behaviors in the sample to be detected, and the detection of the sample is achieved based on the identification results, thus realizing effective and accurate sample detection.

[0076] In some embodiments, to reduce detection costs and resource consumption, after obtaining the first identification result, the sample to be detected and the first identification result can be saved. This allows for the initial search of the saved data to determine if a target sample with similar conditions exists. If no target sample exists, a larger model can then be used for identification. Therefore, the script behavior described above for using a large model to identify the sample to be detected, and obtaining the first identification result, can include:

[0077] Check if there are any target samples that meet the similarity criteria for the sample to be tested;

[0078] If no target sample exists that meets similar conditions to the sample to be detected, the script behavior of the sample to be detected is identified using a large model to obtain a first identification result, and the sample to be detected is saved in correspondence with the first identification result.

[0079] Optionally, if a target sample exists that meets similar conditions to the sample to be detected, the recognition result corresponding to the target sample can be used as the first recognition result of the sample to be detected. For ease of understanding, Figure 2 shows a flowchart of another embodiment of the sample detection method provided in this disclosure. The method may include the following steps:

[0080] 201: Obtain the sample to be tested.

[0081] 202: Check if there is a target sample that meets the same conditions as the sample to be detected. If yes, proceed to step 203; otherwise, proceed to step 204.

[0082] This can be achieved by calculating the similarity between the sample to be detected and the target sample, where the similarity condition can be that the similarity is greater than a similarity threshold. Furthermore, the stored data may include multiple samples, and the similarity between the sample to be detected and different samples can be calculated separately. The similarity condition could be, for example, the highest similarity; alternatively, it could be that the similarity is greater than a similarity threshold and the similarity is the highest, etc. This disclosure does not limit this. Samples to be detected and target samples that meet the similarity condition can be considered to have a high degree of matching, being the same or similar samples, and may embed the same malicious scripts, etc.

[0083] 203: Use the saved recognition result corresponding to the target sample as the first recognition result of the sample to be detected.

[0084] 204: Use a large model to identify the script behavior of the sample to be detected and obtain the first identification result.

[0085] 205: Save the sample to be detected in correspondence with the first identification result.

[0086] The sample to be detected and the first identification result can be saved to the target database. Step 202 can be to search for target samples that meet similar conditions to the sample to be detected from the target database.

[0087] The database contains multiple samples. To find target samples that meet similar conditions to the sample to be detected, fast search algorithms such as binary search, hash search, binary tree search, and string matching can be used. This disclosure does not limit the search in this regard.

[0088] To facilitate similarity calculation, the sample to be detected can be converted into a feature vector, and then its corresponding first recognition result can be saved to the target database.

[0089] In this context, the target database can store the sample to be detected and its first recognition result in a key-value pair, where the feature vector of the sample to be detected can be used as the key and the first recognition result as the corresponding value. However, this disclosure is not limited to this, and the data storage structure in the target database is not restricted.

[0090] Optionally, the above-mentioned search for whether there are target samples that meet the similarity conditions with the sample to be detected may include: converting the sample to be detected into a feature vector; and searching for target samples whose feature vectors have a feature similarity greater than a feature threshold and / or have the largest feature similarity with the feature vectors of the sample to be detected.

[0091] The above-mentioned method of saving the sample to be detected in correspondence with the first recognition result includes: saving the feature vector of the sample to be detected in correspondence with the first recognition result.

[0092] This can be achieved by either saving the feature vector of the sample to be detected and the corresponding first recognition result to the target database, or by searching the target database for the target sample whose feature vector has a feature similarity greater than a feature threshold and / or has the highest feature similarity. Converting the sample to a feature vector allows for convenient and accurate similarity comparison, ensuring both the efficiency and accuracy of the target sample search.

[0093] 206: Based on the first identification result, detect whether the sample to be detected is abnormal.

[0094] In this embodiment, by leveraging the text similarity characteristic, resource consumption and detection costs can be reduced while ensuring effective and accurate detection.

[0095] In some embodiments, converting the sample to be detected into a feature vector may include: using a feature extraction model to convert the sample to be detected into a feature vector.

[0096] Among them, the feature vector can be, for example, an embedding vector.

[0097] The feature extraction model can be trained based on training samples and their corresponding feature vectors. Therefore, in some embodiments, the method may further include:

[0098] Obtain training samples and their corresponding feature vectors; train a feature extraction model based on the training samples and their corresponding feature vectors.

[0099] The feature extraction model is a machine learning model, such as the BERT model or multilingual text embedding models like embedding.

[0100] By using feature extraction models to transform feature vectors, the accuracy of the feature vectors can be guaranteed, thereby improving the accuracy of the search and further ensuring the accuracy of the detection.

[0101] In some embodiments, to improve recognition accuracy, the script behavior of using a large model to identify the sample to be detected, and obtaining the first recognition result, includes:

[0102] Using a large model, the script behavior of the sample to be detected is identified from multiple pre-defined script behaviors according to pre-defined recognition requirements, and the first recognition result is obtained.

[0103] Optionally, the multiple pre-defined script behaviors, pre-defined recognition requirements, and samples to be detected can be combined to generate a prompt instruction according to the prompt template, and the prompt instruction can be input into the large model to obtain the first recognition result.

[0104] The prompts can be information input to the large model, or they can be natural language input, used to prompt or guide the large model to give the expected output.

[0105] In some embodiments, using a large model to identify the script behavior of the sample to be detected and obtaining a first identification result may include: using a large model to identify the script behavior of the sample to be detected and obtaining the output data of the large model; and parsing the output data to obtain the first identification result.

[0106] It can be achieved by using a large model to identify the script behavior of the sample to be detected from multiple pre-defined script behaviors according to pre-defined recognition requirements, and obtaining output data that meets the output requirements.

[0107] This can be achieved by combining the multiple pre-defined script behaviors, pre-defined recognition requirements, the sample to be detected, and the output requirements to generate prompt instructions.

[0108] The output requirements mentioned above may include format requirements, such as JSON format. Therefore, obtaining the first recognition result from the output data can be done by parsing the output data according to the format requirements, such as parsing at least one script action corresponding to the sample to be detected from JSON format. By parsing the output data of the large model, the first recognition result can be obtained, which facilitates subsequent operations.

[0109] For ease of understanding, the following is an example of a prompt command. It should be noted that this is merely an example; actual applications can be customized to meet specific needs. The prompt command could be, for example:

[0110] #Character Quest

[0111] Your task is to extract JSON-formatted script behavior logs from a computer program file. You need strong knowledge of computer program scripting and programming skills to accurately parse and understand the behavior of the script. Additionally, you need to be familiar with the JSON format and able to apply it to the output script behavior logs. Please generate the JSON logs directly; do not reply with irrelevant or unnecessary messages.

[0112] #Task Principles

[0113] 1. You need to have strong computer program analysis skills, be able to understand and analyze complex computer programs, including variables, functions, control structures, etc. If the source code has features such as encoding obfuscation, you need to try to decode it before analyzing it.

[0114] 2. Try to summarize the behavior of the script from a global perspective, while also paying attention to details that may contain malicious code characteristics.

[0115] 3. You need to output the behavior logs corresponding to the script in JSON format. The output JSON is a list array, and each element in the array contains the key-value pair "behavior".

[0116] 4. “behavior” represents behavior tags, which are used to describe various script behaviors of computer program scripts. These tags need to accurately reflect the behavioral characteristics of computer program scripts.

[0117] 5. Please note that the most important principle is to ensure that the "behavior" field is selected from the list of behaviors I provide. Please do not generate any other "behavior" values, as this is crucial for me.

[0118] 6. Not every script behavior needs to be mapped to a "behavior". If there is no corresponding "behavior", it can be ignored.

[0119] 7. The same "behavior" tag can appear at most once in the output log.

[0120] #Constraints

[0121] 1. The generated JSON logs should select the corresponding "behavior" from the following behavior list library. Do not create new "behaviors" yourself; simply choose from these. Please strictly adhere to the data in the behavior list library I have provided.

[0122] 2. The same "behavior" tag can appear at most once in the output log.

[0123] 3. Determining the corresponding behavior solely based on the code, without requiring extensive reasoning, such as executing code via a pipe, does not necessarily possess characteristics like 'moving or operating the original system commands'.

[0124] 4. This is a list of behaviors library. <list>Each line contains a "behavior" value, and each "behavior" is a result like "aaa, bbb". Please return a structure where each line contains a single value.

[0125] 5. Each "behavior" value is returned at most once.

[0126] 6. If encoding exists, try decoding it before interpreting the code behavior.

[0127] <list>

[0128] Discover local sensitive information

[0129] Malicious tool download and delivery

[0130] Defense component destruction

[0131] Modify file attributes

[0132] Script command execution

[0133] Turn off the firewall.

[0134] Network communication

[0135] Account Add

[0136] Set up a scheduled task

[0137] (…omitted here, not listed further)

[0138] "

[0139] It can be seen that in the above prompt instructions, the behavior list library <list>The task consists of multiple pre-defined script behaviors, and the task principles and constraints include pre-defined identification requirements and output requirements.

[0140] For ease of understanding, Figure 3 shows a schematic diagram of the large model recognition process in one possible implementation of an embodiment of the present disclosure. As shown in Figure 3, a prompt instruction can be generated based on the sample to be detected (step 301), the prompt instruction is input into the large model to obtain output data (step 302), and then the output data can be parsed (step 303) so as to obtain at least one script behavior corresponding to the sample to be detected.

[0141] In some embodiments, to further improve detection accuracy, the method may further include:

[0142] The sample to be tested is executed in a sandbox environment; based on the execution result of the sample to be tested, the script behavior of the sample to be tested is determined, and a second recognition result is obtained.

[0143] The above-mentioned detection of whether the sample to be detected is abnormal based on the first identification result may include: detecting whether the sample to be detected is abnormal based on the first identification result and the second identification result.

[0144] The second identification result may include at least one script behavior corresponding to the sample to be detected.

[0145] The sandbox environment can be a virtual, isolated environment that simulates the real running environment of the sample to be tested. The sample to be tested can run in the sandbox environment, and the script behavior of the sample to be tested can be determined based on the execution results. Sandbox detection can make the recognition results more realistic and reliable.

[0146] In the sandbox environment, the script behavior of the sample to be detected can be recorded, and the second identification result can also include at least one script behavior corresponding to the sample to be detected.

[0147] This allows for the combination of the first and second identification results to detect whether the sample to be detected is abnormal, thereby improving detection accuracy.

[0148] The script behaviors in the first identification result and the script behaviors in the second identification result can be aggregated and duplicate script behaviors can be removed, thereby determining whether the sample to be detected is abnormal based on at least one aggregated script behavior.

[0149] In some embodiments, the method may further include: saving the sample to be detected in correspondence with the second identification result.

[0150] Optionally, the sample to be detected can be saved in the target database in correspondence with the second recognition result, or the sample to be detected can be converted into a feature vector and then saved in the target database in correspondence with the second recognition result.

[0151] Optionally, the duplicate content in the second identification result that is identical to that in the first identification result may be removed before the result is saved in correspondence with the sample to be detected.

[0152] As described above, it can be determined whether the sample to be detected is abnormal based on the abnormal behavior rules. The first identification result and the second identification result can include at least one script behavior corresponding to the sample to be detected. Therefore, in some embodiments, the detection result of the sample to be detected can be determined based on the abnormal behavior rules hit by at least one script behavior of the sample to be detected.

[0153] Furthermore, in practical applications, samples embedding malicious scripts are usually small. In addition, the samples may have already been detected by third-party detection tools. Therefore, in some embodiments, the method may also include: determining the sample size of the sample to be detected and / or the third-party detection results of the sample to be detected.

[0154] The above-mentioned detection of whether the sample to be detected is abnormal based on the first identification result and the second identification result may include: detecting whether the sample to be detected is abnormal based on at least one of the first identification result, the second identification result, the sample size of the sample to be detected, and the third-party detection result of the existence of the sample to be detected.

[0155] The third-party testing results may include whether the sample being tested is abnormal or normal, or they may include identification information of the third-party testing tools.

[0156] In the case where the sample to be tested is determined in response to a sample testing request, the sample testing request may include the third-party testing results, etc. In practical applications, when a third-party testing tool detects an anomaly in the sample to be tested, a sample testing request can be triggered, and the technical solution of this disclosure embodiment can be adopted to achieve effective and accurate detection. In the case where the above-mentioned sample testing request is generated in response to a specific event, another implementation of that specific event is that a third-party testing tool detects an anomaly in the sample to be tested.

[0157] In this embodiment of the disclosure, in addition to combining the first identification result and the second identification result for anomaly determination, at least one of the sample size and third-party detection results can also be combined for anomaly determination.

[0158] As described above, it is possible to determine whether a sample to be detected is abnormal based on abnormal behavior rules. Each abnormal behavior rule may include one or more abnormal script behaviors, as well as file size and / or third-party detection results.

[0159] If the sample to be tested contains at least some of the abnormal script behaviors defined by the abnormal behavior rules; or, if the sample to be tested contains at least some of the abnormal script behaviors defined by the abnormal behavior rules and meets at least one of the following conditions: the file size meets the definition of the abnormal behavior rules and there is a third-party detection result for the sample to be tested, then the sample to be tested can be considered to have hit the abnormal behavior rules and the sample to be tested can be determined to be abnormal.

[0160] If the sample to be tested does not include at least some of the abnormal script behaviors defined by the abnormal behavior rules; or, if the sample to be tested does not include at least some of the abnormal script behaviors defined by the abnormal behavior rules and meets at least one of the following conditions: the file size does not meet the definition of the abnormal behavior rules and there are no third-party detection results, then the sample to be tested can be considered normal.

[0161] Of course, preliminary screening of the sample to be tested can also be performed based on file size and / or third-party detection results. The above-mentioned detection of whether the sample to be tested is abnormal can be based on at least one of the first identification result, the second identification result, the sample size of the sample to be tested, and the existence of third-party detection results of the sample to be tested.

[0162] If a sample is determined to be suspicious based on at least one of the sample size and third-party detection results, then anomalies in the sample are determined based on a first identification result and a second identification result. The detection result can be determined based on anomaly behavior rules matching at least one script behavior of the sample in the first and second identification results.

[0163] The detection results can include abnormal or normal, and can also include the level of abnormal behavior, as well as abnormal script behavior, etc.

[0164] As described above, multiple abnormal behavior rules define abnormal script behaviors, which can constitute an abnormal behavior list. Therefore, in some embodiments, the list of scripts not defined in the abnormal behavior list can be filtered from at least one script behavior of the sample to be detected, and then the detection result of the sample to be detected can be determined from the filtered at least one script behavior.

[0165] In some embodiments, the method may further include:

[0166] Based on the test results, a prompt message is generated and sent to the user corresponding to the sample to be tested.

[0167] As described above, the detection results can include, for example, abnormal behavior levels and abnormal script behaviors. A sample to be detected may hit one or more abnormal behavior rules; therefore, one or more detection results corresponding to the sample can be obtained, or the results can be generated based on these detection results. For example, the prompt information can include abnormal script behaviors corresponding to different abnormal behavior levels for the sample to be detected.

[0168] The notification message can be sent to the testing user to facilitate their identification and decision on whether to process the sample. This user could be the one who triggered the sample testing request, or relevant maintenance personnel.

[0169] In a cloud computing scenario, the detection user can be a user who has purchased cloud computing products. The detection user can send a notification message to the client to display the message, which indicates that the cloud computing product may be under attack.

[0170] Each abnormal behavior rule can also include abnormal explanation information, so the prompt information can also include abnormal explanation information corresponding to different levels of abnormal behavior, in order to help users understand the abnormality.

[0171] The notification message can be sent to the detection user based on a communication account, such as an email account, SMS number, or user account. The notification message can then be sent to the client corresponding to the server based on the user account, so that the notification message can be displayed on the client's display interface. Alternatively, the notification message can be sent to the detection user via email based on an email account, or via SMS based on an SMS number. This disclosure does not limit the specific method of sending the notification message.

[0172] In some embodiments, detecting whether a sample to be detected is abnormal based on the first identification result may include:

[0173] Convert each sample to be detected, corresponding to at least one script behavior, into a target behavior label.

[0174] The detection result of the sample to be detected is determined based on the abnormal behavior rules that are matched by at least one target behavior label corresponding to the sample to be detected.

[0175] It may be based on the first identification result, or based on the first identification result and the second identification result, to determine at least one script behavior of the sample to be detected.

[0176] The abnormal behavior rule can specifically define one or more abnormal behavior labels. Optionally, the prompt information generated based on the detection results can include the abnormal behavior labels corresponding to the abnormal behavior level.

[0177] Abnormal behavior labels can refer to script behavior descriptions that are easy to understand or have a unified standard. Converting script behavior into corresponding target behavior labels can facilitate management and processing.

[0178] Among them, a list of behavior tags can be pre-defined, which defines the behavior tags corresponding to different script behaviors. For example, the behavior tag corresponding to the script behavior "network communication" is "execution, network communication"; and the behavior tag corresponding to the script behavior "account addition" is "persistent control, account addition", etc.

[0179] In some embodiments, converting at least one script behavior corresponding to the sample to be detected into a target behavior label may include:

[0180] Filter out undefined script behaviors from the list of behavior labels corresponding to at least one script behavior in the sample to be detected;

[0181] Each of the filtered script behaviors is converted into a target behavior label defined in the behavior label list.

[0182] Therefore, if at least one target behavior label corresponding to the sample to be detected includes one or more abnormal behavior labels defined in an abnormal behavior rule, then the sample to be detected can be considered to have hit the abnormal behavior rule, and the sample to be detected is abnormal, and a corresponding detection result can be generated.

[0183] For ease of understanding, Figure 4 shows a schematic diagram of the prompt information generation process in one possible implementation of the present disclosure embodiment. As shown in Figure 4, for at least one script behavior generated by the large model, it is converted into target behavior labels (step 401), and undefined script behaviors in the behavior label list are filtered out (step 402). Then, the filtered target behavior labels can be judged in combination with abnormal behavior rules (step 403) to obtain the detection result, and prompt information is generated based on the detection result (step 404). The prompt information can be sent to the detection user for alarm, etc.

[0184] In some embodiments, the method may further include:

[0185] Execute the sample to be tested in a sandbox environment; based on the execution results of the sample to be tested, identify whether the sample to be tested exhibits the target script behavior;

[0186] The script behavior described above, which uses a large model to identify the sample to be detected, can obtain the following first identification result:

[0187] When the sample to be detected exhibits target script behavior, a large model is used to identify the script behavior of the sample to be detected, and the first identification result is obtained.

[0188] If the sample to be tested does not exhibit the target script behavior, the process can be terminated, and the sample will no longer be tested. Sandbox detection can first filter out suspicious samples containing target script behavior, and then use a large model to identify these suspicious samples, thereby further reducing detection costs and resource consumption.

[0189] In addition, file size and the presence of third-party detection results can also be considered to determine whether a sample to be detected is suspicious. Therefore, in some embodiments, when a sample to be detected exhibits target script behavior, using a large model to identify the script behavior of the sample to be detected and obtaining a first identification result may include:

[0190] Under the condition that the sample to be detected has target script behavior, the sample size of the sample to be detected meets the constraints, and the sample to be detected has third-party detection results, the script behavior of the sample to be detected is identified by a large model to obtain the first identification result.

[0191] This constraint could be, for example, that the sample size is greater than a predetermined value.

[0192] In some embodiments, the method may further include:

[0193] Execute the sample to be tested in a sandbox environment; based on the execution results of the sample to be tested, identify whether the sample to be tested exhibits the target script behavior;

[0194] The above search for whether there are target samples that meet the similarity criteria to the sample to be detected can include:

[0195] If the sample to be tested has target script behavior, the sample size of the sample to be tested meets the restriction conditions, or the sample to be tested has third-party detection results, check whether there is a target sample that meets similar conditions to the sample to be tested. Otherwise, the process can be terminated and the sample to be tested will no longer be tested.

[0196] The aforementioned target script behaviors can include, for example, predefined abnormal script behaviors or code branch adversarial behaviors.

[0197] As described above, the technical solutions of this disclosure can be applied to cloud computing scenarios. The sample to be detected can refer to a program file in a cloud server (ECS). Therefore, this disclosure also provides a file detection method, which may include:

[0198] Obtain program files from the cloud server; use a large model to identify the script behavior of the program files and obtain a first identification result; based on the first identification result, determine whether the program files are abnormal.

[0199] Optionally, in some embodiments, using a large model to identify the script behavior of the program file and obtaining a first identification result may include:

[0200] Check if there is a target file that meets similar conditions to the program file; if there is no target file that meets similar conditions to the program file, use a large model to identify the script behavior of the program file, and save the program file corresponding to the first identification result.

[0201] Optionally, in some embodiments, if there is a target file that meets similar conditions to the program file, the identification result saved in the target file can be used as the first identification result of the program file.

[0202] For ease of understanding, the following example of detecting program files in a cloud server (ECS) will be used to introduce the technical solution of this disclosure embodiment. The sample to be detected can refer to the program files in the cloud server.

[0203] As shown in Figure 5, for any program file in the cloud server 400, the server 500 can first execute the program file in the sandbox environment, thereby determining the script behavior of the program file based on the execution result of the program file and obtaining the second identification result (step 501). Then, it can search for similar files in the target database to see if there is a target file that meets the similarity conditions with the program file (step 502). If so, it can obtain the identification result corresponding to the target file from the target database as the first identification result of the program file (step 503). If not, it calls the large model to identify the script behavior of the sample to be detected and obtains the first identification result (step 504). The program file and the first identification result are then saved in the target database (step 505). After that, it can perform filtering, conversion, and other operations based on the first identification result and the at least one script behavior of the program file in the second identification result to obtain at least one target behavior label of the program file (step 506). Then, it performs rule judgment and generates the detection result of the program file based on the abnormal behavior rule hit by at least one target behavior label (step 507). It also generates prompt information (508), which can be sent to the detection user to alert the user. The system can send a notification message to the client, which can then display the message on its interface to alert the user. As shown in Figure 6, the notification message 600 can be displayed on the interface. The message can include an abnormal behavior level 601 and an abnormal behavior label 602 corresponding to each level.

[0204] Figure 7 is a schematic diagram of a sample detection device according to an embodiment of the present disclosure. The device may include:

[0205] The sample acquisition module 701 is used to acquire the sample to be tested.

[0206] The first detection module 702 is used to identify the script behavior of the sample to be detected using a large model and obtain the first identification result;

[0207] A second detection module 703 is used to determine whether the sample to be detected is abnormal based on the first identification result. In some embodiments, the device may further include:

[0208] The sample search module is used to check whether there are target samples that meet similar conditions to the sample to be detected; if not, the first detection module is triggered to execute.

[0209] In some embodiments, the third detection module is used to use the recognition result corresponding to the target sample as the first recognition result of the sample to be detected if there is a target sample that meets similar conditions to the sample to be detected.

[0210] The device may also include:

[0211] The storage module is used to save the sample to be detected in correspondence with the first identification result.

[0212] In some embodiments, the third detection module may include: converting the sample to be detected into a feature vector; and finding the target sample whose feature vector has a feature similarity greater than a similarity threshold and / or has the largest feature similarity with the feature vector of the sample to be detected.

[0213] The aforementioned storage module can be specifically used to save the feature vector of the sample to be detected in correspondence with the first recognition result.

[0214] In some embodiments, the third detection module converts the sample to be detected into a feature vector by: using a feature extraction model to convert the sample to be detected into a feature vector; wherein the feature extraction model is trained based on training sample data and its corresponding feature vectors.

[0215] In some embodiments, the device may further include:

[0216] The first sandbox detection module is used to execute the sample to be detected in the sandbox environment; based on the execution result of the sample to be detected, the script behavior of the sample to be detected is determined, and a second recognition result is obtained.

[0217] The second detection module is specifically used to detect whether the sample to be detected is abnormal based on the first recognition result and the second recognition result.

[0218] In some embodiments, the device may further include:

[0219] The fourth detection module is used to determine the sample size of the sample to be tested and / or the third-party detection results of the sample to be tested;

[0220] The aforementioned second detection module can specifically detect whether the sample to be detected is abnormal based on at least one of the first identification result, the second identification result, the sample size of the sample to be detected, and the third-party detection result of the presence of the sample to be detected.

[0221] In some embodiments, the first detection module may specifically utilize a large model to identify the script behavior of the sample to be detected from multiple predetermined script behaviors according to predetermined identification requirements, and obtain a first identification result.

[0222] In some embodiments, the first identification result includes at least one script behavior that the sample to be detected has matched;

[0223] The second detection module, based on the first identification result, may detect whether the sample to be detected is abnormal by: determining the detection result of the sample to be detected according to the abnormal behavior rule hit by at least one script behavior of the sample to be detected.

[0224] In some embodiments, the device may further include:

[0225] The prompting module is used to generate prompt messages based on the detection results and send the prompt messages to the detection user corresponding to the sample to be tested.

[0226] In some embodiments, the first identification result includes at least one script behavior of the sample to be detected;

[0227] The second detection module described above, based on the first recognition result, can detect whether the sample to be detected is abnormal by: converting at least one script behavior into target behavior labels respectively; and determining the detection result of the sample to be detected based on the abnormal behavior rules matched by at least one target behavior label corresponding to the sample to be detected.

[0228] In some embodiments, the first detection module uses a large model to identify the script behavior of the sample to be detected and obtains a first identification result by: using the large model to identify the script behavior of the sample to be detected and obtaining the output data of the large model; and parsing the output data to obtain the first identification result.

[0229] In some embodiments, the device may further include:

[0230] The second sandbox detection module executes the sample to be tested in the sandbox environment; based on the execution result of the sample to be tested, it identifies whether the sample to be tested contains target script behavior.

[0231] The aforementioned first detection module can identify the script behavior of the sample to be detected using a large model when the sample to be detected has target script behavior, and obtain the first identification result.

[0232] In some embodiments, the device may further include:

[0233] The second sandbox detection module executes the sample to be tested in the sandbox environment; based on the execution result of the sample to be tested, it identifies whether the sample to be tested contains target script behavior.

[0234] The aforementioned sample search module can search for whether there are any target samples that meet similar conditions to the sample to be detected, provided that the sample to be detected has target script behavior.

[0235] In some embodiments, the technical solutions of this disclosure can be applied to cloud computing scenarios, where:

[0236] The aforementioned sample acquisition module is specifically used to acquire program files from the cloud server;

[0237] The first detection module is specifically used to identify the script behavior of program files using a large model to obtain the first identification result;

[0238] The second detection module is specifically used to determine whether the program file is abnormal based on the first identification result.

[0239] The sample detection device shown in Figure 7 can execute the sample detection method described in the embodiments shown in Figure 1 or Figure 2. Its implementation principle and technical effects will not be repeated here. The specific methods by which each module and unit of the sample detection device in the above embodiments performs its operations have been described in detail in the embodiments related to the method, and will not be elaborated upon here.

[0240] This disclosure also provides a computing device, as shown in FIG8, which may include a storage component 801 and a processing component 802;

[0241] Storage component 801 stores one or more computer instructions, wherein the one or more computer instructions are called and executed by processing component 802 to implement the sample detection method described in the embodiment shown in FIG1, or the sample detection method described in the embodiment shown in FIG2, or the file detection method described in the foregoing embodiments.

[0242] Of course, computing devices may also include other components, such as input / output interfaces, communication components, power supply components, etc.

[0243] Input / output interfaces provide interfaces between processing components and peripheral interface modules, which can be output devices, input devices, etc. Communication components are configured to facilitate wired or wireless communication between computing devices and other devices.

[0244] The processing component may include one or more processors to execute computer instructions to complete all or part of the steps in the above-described method. Alternatively, the processing component may be implemented as one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components to perform the above-described method.

[0245] Storage components can be implemented from any type of volatile or non-volatile storage device or a combination thereof, such as Static Random-Access Memory (SRAM), Electrically Erasable Programmable Read Only Memory (EEPROM), Erasable Programmable Read Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk.

[0246] The communication component is configured to facilitate wired or wireless communication between the device housing the communication component and other devices. The device housing the communication component can access wireless networks based on communication standards, such as mobile communication networks, or combinations thereof. In one exemplary embodiment, the communication component receives broadcast signals or broadcast-related information from an external broadcast management system via a broadcast channel.

[0247] A power supply unit provides power to the various components of the device in which it resides. A power supply unit may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to the device in which it resides.

[0248] It should be noted that the aforementioned computing devices can be physical devices or elastic computing hosts provided by cloud computing platforms. They can be implemented as a distributed cluster of multiple servers or terminal devices, or as a single server or a single terminal device.

[0249] This disclosure also provides a computer-readable storage medium storing a computer program. When executed by a processing component, the computer program can implement the sample detection method described in the embodiment shown in FIG1, the sample detection method described in the embodiment shown in FIG2, or the document detection method described in the foregoing embodiments. The computer-readable medium may be included in the computing device described in the above embodiments; or it may exist independently and not assembled into the computing device. The computer-readable storage medium may be volatile, non-volatile, or a combination thereof, and may be removable or non-removable. Examples of computer-readable storage media include, but are not limited to, phase-change random access memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random-access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), flash memory or other memory technologies, CD-ROM, digital video disc (DVD) or other optical storage, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transfer medium.

[0250] This disclosure also provides a computer program product comprising a computer program carried on a computer-readable storage medium. When executed by a processing component, the computer program can implement the sample detection method described in the embodiment shown in FIG1 or FIG2, or the document detection method described in the foregoing embodiments. In such embodiments, the computer program may be downloaded and installed from a network, and / or installed from a removable medium. When the computer program is executed by the processing component, it performs various functions defined in the system of this disclosure.

[0251] It should be noted that the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such process, method, article, or apparatus. Unless otherwise specified, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes that element.

[0252] The above are merely embodiments of this disclosure and are not intended to limit the scope of this disclosure. Various modifications and variations can be made to this disclosure by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this disclosure should be included within the scope of the claims of this disclosure.< / list> < / list> < / list>

Claims

1. A sample detection method, wherein, include: Obtain the sample to be tested; Check whether there are any target samples that meet similar conditions to the sample to be detected; If there is a target sample that meets similar conditions to the sample to be detected, the recognition result corresponding to the target sample is used as the first recognition result of the sample to be detected; If no target sample exists that meets similar conditions to the sample to be detected, the script behavior of the sample to be detected is identified using a large model to obtain a first identification result, and the sample to be detected is saved in correspondence with the first identification result. Based on the first identification result, it is determined whether the sample to be detected is abnormal.

2. The method according to claim 1, wherein, The process of checking whether there are stored target samples that meet similar conditions to the sample to be detected includes: The sample to be detected is converted into a feature vector; Find the target sample whose feature vector has a feature similarity greater than a similarity threshold and / or has the largest feature similarity with the feature vector of the sample to be detected; The step of saving the sample to be detected in correspondence with the first identification result includes: The feature vector of the sample to be detected is saved in correspondence with the first recognition result.

3. The method according to claim 2, wherein, The step of converting the sample to be detected into a feature vector includes: The sample to be detected is converted into a feature vector using a feature extraction model; The feature extraction model is obtained by training based on training sample data and its corresponding feature vectors.

4. The method according to any one of claims 1 to 3, wherein, Also includes: The sample to be tested is executed in a sandbox environment; Based on the execution result of the sample to be detected, determine the script behavior present in the sample to be detected, obtain a second identification result, and save the sample to be detected and the second identification result accordingly; The step of detecting whether the sample to be detected is abnormal based on the first identification result includes: Based on the first identification result and the second identification result, it is determined whether the sample to be detected is abnormal.

5. The method according to claim 4, wherein, Also includes: Determine the sample size of the sample to be tested, and / or the third-party test results of the sample to be tested; The step of detecting whether the sample to be detected is abnormal based on the first identification result and the second identification result includes: Based on the first identification result and the second identification result, as well as at least one of the sample size of the sample to be detected and the third-party detection result of the sample to be detected, the abnormality of the sample to be detected is detected.

6. The method according to any one of claims 1 to 5, wherein, The step of using a large model to identify the script behavior of the sample to be detected and obtaining a first identification result includes: Using a large model, the script behavior of the sample to be detected is identified from multiple predetermined script behaviors according to predetermined identification requirements, and a first identification result is obtained.

7. The method according to any one of claims 1 to 6, wherein, The first identification result includes at least one script behavior matched by the sample to be detected; The step of detecting whether the sample to be detected is abnormal based on the first identification result includes: The detection result of the sample to be detected is determined based on the abnormal behavior rule that is matched by at least one script behavior of the sample to be detected.

8. The method according to claim 7, wherein, Also includes: Based on the detection results, a prompt message is generated; The notification message is sent to the user corresponding to the sample to be tested.

9. The method according to any one of claims 1 to 8, wherein, The first identification result includes at least one script behavior of the sample to be detected; Based on the first identification result, detecting whether the sample to be detected is abnormal includes: Convert each of the at least one script behavior into a target behavior label; Based on the abnormal behavior rules that are matched by at least one target behavior label corresponding to the sample to be detected, the detection result of the sample to be detected is determined.

10. The method according to any one of claims 1 to 9, wherein, The step of using a large model to identify the script behavior of the sample to be detected and obtaining a first identification result includes: The large model is used to identify the script behavior of the sample to be detected, and the output data of the large model is obtained. The first recognition result is obtained by parsing the output data.

11. The method according to any one of claims 1 to 10, wherein, Also includes: The sample to be tested is executed in a sandbox environment; Based on the execution result of the sample to be tested, identify whether the sample to be tested has a predefined target script behavior; The step of using a large model to identify the script behavior of the sample to be detected and obtaining a first identification result includes: When the sample to be detected exhibits target script behavior, a large model is used to identify the script behavior of the sample to be detected, and a first identification result is obtained.

12. The method according to any one of claims 1 to 11, wherein, Also includes: The detection results, the first identification result, and the second identification result of the sample to be detected are collected periodically to form a model optimization dataset; Based on the model optimization dataset, the large model is incrementally trained to update the script behavior recognition parameters of the large model; The updated large model is used to perform script behavior recognition operations on the samples to be detected.

13. The method according to any one of claims 1 to 12, wherein, After detecting whether the sample to be detected is abnormal based on the first identification result, the method further includes: If the detection result of the sample to be detected is abnormal, extract the abnormal features of the sample to be detected. The abnormal features include the execution path, triggering conditions and associated files of the abnormal script behavior. The abnormal characteristics are matched with a preset threat handling strategy library to determine the target handling strategy, which includes isolating the sample to be detected, deleting associated files, and blocking script execution. The sample to be tested is processed automatically according to the target processing strategy.

14. The method according to any one of claims 7 to 13, wherein, Also includes: Real-time collection of malicious script behavior data across the entire network, including the execution behavior, obfuscation methods, and attack targets of new types of malicious scripts; The abnormal behavior rules are updated based on the malicious script behavior data, including adding new abnormal script behavior definitions, adjusting the abnormal behavior level classification, and supplementing the abnormal behavior combination judgment logic. The updated abnormal behavior rules are synchronized to the detection process of the sample to be detected for subsequent sample anomaly determination.

15. The method according to any one of claims 1 to 14, wherein, The acquisition of the sample to be tested includes: If the sample to be detected originates from a cloud computing scenario, obtain the cloud resource identifier corresponding to the sample to be detected. The cloud resource identifier includes the cloud server ID, cloud storage bucket name, and cloud function name. Associate the resource operation logs corresponding to the cloud resource identifier, and extract the call records, execution duration, and resource usage data of the sample to be tested from the resource operation logs; The call records, execution time, and resource usage data are used as auxiliary information and combined with the first identification result for anomaly detection of the sample to be detected.

16. A sample detection method, wherein, include: Obtain the sample to be tested; The script behavior of the sample to be detected is identified using a large model to obtain a first identification result; Based on the first identification result, it is determined whether the sample to be detected is abnormal.

17. A document detection method, wherein, include: Retrieve program files from the cloud server; Check if a target file that meets similar conditions to the program file is saved; If no target file exists that meets similar conditions to the program file, the script behavior of the program file is identified using a large model to obtain a first identification result, and the program file is saved in correspondence with the first identification result. Based on the first identification result, it is determined whether the program file is abnormal.

18. A computing device, wherein, This includes processing components and storage components; The storage component stores one or more computer instructions; the one or more computer instructions are invoked and executed by the processing component to implement the sample detection method as described in any one of claims 1 to 15, the sample detection method as described in claim 16, or the document detection method as described in claim 17.

19. A computer-readable storage medium, wherein, It stores a computer program, which, when executed by a processing component, implements the sample detection method as described in any one of claims 1 to 15, the sample detection method as described in claim 16, or the document detection method as described in claim 17.

20. A computer program product, wherein, It includes a computer program / instruction, which, when executed by a processing component, implements the sample detection method as described in any one of claims 1 to 15, the sample detection method as described in claim 16, or the document detection method as described in claim 17.