A software legalization detection method, device, medium and product

By acquiring static features and dynamic behavior data of software and combining them with network communication data to generate risk scores, the problem of misjudgment and missed detection caused by the easy tampering of static features in existing technologies has been solved, and highly accurate software authenticity detection has been achieved.

CN121690747BActive Publication Date: 2026-06-26北京企慕科技有限公司

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
北京企慕科技有限公司
Filing Date
2025-12-13
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing technologies that rely on static feature information for software authenticity detection are easily tampered with, leading to the risk of false positives and false negatives, and are unable to effectively identify counterfeit software.

Method used

By acquiring static feature data and dynamic behavior data of the target software, and combining them with network communication data, a risk score is generated. By integrating behavioral deviation records and network anomaly records, multi-level detection is achieved.

Benefits of technology

It improves the accuracy and robustness of software compliance judgment, reduces the risk of missed detection due to feature forgery, and has the ability to learn and continuously evolve.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121690747B_ABST
    Figure CN121690747B_ABST
Patent Text Reader

Abstract

A software legalization detection method, device, medium and product. In the method, static feature data of target software is acquired, and declaration identity information is determined according to the static feature data; dynamic behavior data is collected and network communication data is acquired when the target software runs; the declaration identity information is matched and a legal behavior baseline is matched from a baseline library; the dynamic behavior data is compared with the legal behavior baseline to obtain an offset value, and a behavior deviation record is generated when the offset value is greater than or equal to an offset threshold; the network communication data is analyzed, and a network anomaly record is generated when the network communication data meets a preset network anomaly condition; the behavior deviation record and the network anomaly record are fused to generate a risk score of the target software; and a detection result and a disposal instruction are generated according to the risk score and a preset disposal strategy. The technical scheme provided in the application improves the accuracy and robustness of software compliance judgment and reduces the risk of missed detection caused by feature forgery.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the technical field of network security, specifically to a method, device, medium, and product for detecting software authenticity. Background Technology

[0002] In current software asset management practices, to ensure software compliance, existing solutions typically deploy agent programs on terminal devices to periodically scan the operating system and collect static characteristic information of installed software. This static characteristic information mainly includes metadata such as the software's installation path, filename, version number, and file hash value. After collection, the agent program reports the static characteristic information to a central management server. The server then compares this information with a pre-defined whitelist of authorized software or a blacklist of known non-compliant software to determine the compliance of the software on the terminal.

[0003] However, the aforementioned technical solutions, which rely entirely on comparing static feature information, suffer from a fundamental technical problem: the accuracy of their detection conclusions depends entirely on the authenticity of the collected static feature information itself. In practice, software's static feature information, such as filenames and version numbers, is extremely easy to tamper with. Malicious or unauthorized software can use technical means to disguise its static features, making them completely identical to those of genuine, licensed software. When existing detection methods encounter such tampering, because their judgment logic is limited to surface-level matching of information and lacks a mechanism to verify the authenticity of the information, they will draw incorrect conclusions based on these forged static features, misjudging non-compliant software as compliant, thus leading to detection failure and a serious risk of missed detections. Summary of the Invention

[0004] To address the aforementioned technical problems, this application provides a method, device, medium, and product for detecting software authenticity.

[0005] The first aspect of this application provides a method for detecting software authenticity, which adopts the following technical solution:

[0006] Obtain static feature data of the target software on the terminal device, and determine the declared identity information of the target software based on the static feature data;

[0007] During the execution of the target software, dynamic behavior data of the target software is collected, and network communication data of the target software is obtained;

[0008] Based on the declared identity information, the corresponding genuine behavior baseline is matched and retrieved from the preset baseline library;

[0009] The dynamic behavior data is compared with the genuine behavior baseline to obtain the offset value. When the offset value is greater than or equal to a preset offset threshold, a behavior deviation record is generated.

[0010] In response to generating the behavior deviation record, the network communication data is analyzed, and when the network communication data meets preset network anomaly conditions, a network anomaly record is generated;

[0011] By combining the behavioral deviation records and the network anomaly records, a risk score for the target software is generated;

[0012] Based on the risk score and the preset handling strategy, the detection results and handling instructions are generated.

[0013] By adopting the above technical solution, the problems of misjudgment and missed detection caused by the susceptibility of static features to tampering in existing technologies are effectively solved. The declared identity of the software is determined based on static features, providing a comparison benchmark for subsequent dynamic verification. Dynamic behavior data is collected during software runtime and compared with the corresponding genuine behavior baseline, enabling the identification of anomalies in the runtime logic and resource calls of counterfeit software, thereby generating behavior deviation records. Furthermore, network communication analysis is introduced to perform network-level verification on target software exhibiting behavioral anomalies. By judging whether its communication behavior meets preset anomaly conditions, network anomaly records are generated. The integration of behavioral and network anomaly records generates a risk score, and corresponding handling strategies are triggered based on the score. This technical approach, by deepening from "surface feature matching" to "behavioral authenticity verification," improves the accuracy and robustness of software compliance judgment and reduces the risk of missed detection due to feature forgery.

[0014] Optionally, the step of collecting dynamic behavior data of the target software and obtaining network communication data of the target software during runtime includes:

[0015] The running status of the target software is monitored in real time by deploying a terminal agent program on the terminal device.

[0016] When the running state is running, the application interface call sequence and system resource usage pattern during the operation of the target software are recorded to form the dynamic behavior data;

[0017] The outbound network traffic generated by the target software is acquired, and the target communication address and communication protocol are parsed from the outbound network traffic to form the network communication data.

[0018] By adopting the above technical solutions, real-time and comprehensive monitoring of the software's runtime status is achieved, improving the depth and reliability of data collection. By recording API (Application Programming Interface) call sequences and system resource usage patterns, unique "behavioral fingerprints" exhibited by the software during operation can be captured. These dynamic behavioral data, compared to easily tampered static filenames or version numbers, are more difficult to forge and possess greater uniqueness, providing a genuine and difficult-to-counterfeit data foundation for subsequent behavior comparisons. Simultaneously, by analyzing the outbound network traffic generated by the software and obtaining its communication addresses and protocols, indirect evidence of the software's behavior can be established from the perspective of network communication. This step significantly enhances the detection system's identification capabilities.

[0019] Optionally, the step of matching and retrieving the corresponding legitimate behavior baseline from the preset baseline library based on the declared identity information includes:

[0020] The software identifier and version information are parsed from the declared identity information;

[0021] Based on the software identifier and the version information, a primary key query condition is constructed, and the primary key query condition is used to match the baseline index in the preset baseline library;

[0022] Based on the found baseline index, the corresponding genuine behavior baseline is retrieved from the preset baseline library.

[0023] By adopting the above technical solution, the retrieval of genuine behavior baselines has been made more accurate and efficient. This solution constructs unique primary key query conditions by parsing software identifiers and version information, ensuring that genuine behavior benchmarks perfectly corresponding to the target software's declared identity can be quickly and accurately located from massive baseline data. This indexing and matching mechanism based on precise identity information effectively avoids baseline mismatches caused by software version differences or ambiguous identifiers, providing a unique and authoritative reference standard for subsequent dynamic behavior comparisons. This not only guarantees the accuracy and reliability of behavior deviation detection but also improves the automation level of the detection process and the accuracy of the judgment conclusions.

[0024] Optionally, comparing the dynamic behavior data with the genuine behavior baseline to obtain an offset value, and generating a behavior deviation record when the offset value is greater than or equal to a preset offset threshold, includes:

[0025] The genuine behavior baseline is parsed into a baseline vector, and the baseline vector contains a reference value component that corresponds one-to-one with multiple preset behavior features;

[0026] Extract multiple real-time feature values ​​corresponding to the multiple preset behavioral features from the dynamic behavioral data, and generate a real-time behavioral vector according to the same component order as the baseline vector;

[0027] Calculate the preset vector distance between the real-time behavior vector and the baseline vector, and specify the preset vector distance as the offset value;

[0028] When the offset value is greater than or equal to the preset offset threshold, the absolute value of the difference between each corresponding component of the real-time behavior vector and the baseline vector is calculated, the preset behavior feature corresponding to the component with the largest absolute value is determined as the maximum deviation feature, and the maximum deviation feature is combined with the offset value to generate the behavior deviation record.

[0029] By adopting the above technical solution, a precise detection leap from qualitative judgment to quantitative analysis has been achieved. Vectorizing behavioral features and calculating Euclidean distance quantifies complex software behavior differences into an intuitive offset value, providing a mathematical basis for judging behavioral anomalies and effectively avoiding the uncertainty of subjective judgment. A sensitive and reliable anomaly triggering mechanism is established by triggering alarms through preset offset thresholds. More importantly, by locating the maximum deviation feature through component difference analysis, not only is the existence of the anomaly confirmed, but the specific dimensions of the anomaly are also precisely identified, providing crucial target guidance for subsequent problem diagnosis and response.

[0030] Optionally, the step of fusing the behavioral deviation records and the network anomaly records to generate a risk score for the target software includes:

[0031] Extract the behavior process identifier and behavior occurrence timestamp from the behavior deviation record, and extract the network process identifier and network occurrence timestamp from the network anomaly record;

[0032] Determine whether the time difference between the timestamp of the behavior and the timestamp of the network occurrence is within a preset time window;

[0033] When the time difference is within the preset time window, the operating system of the terminal device is queried and the first process chain information corresponding to the behavior process identifier and the second process chain information corresponding to the network process identifier are obtained based on the behavior process identifier and the network process identifier, respectively.

[0034] Based on the first process chain information, the second process chain information, and the time difference, the time process association strength is determined;

[0035] Based on the correlation strength over time, risk weight coefficients are matched and obtained from a preset scenario weight library, and the offset values ​​are weighted using the risk weight coefficients to generate the risk score.

[0036] By adopting the above technical solution, multi-dimensional evidence correlation and fusion analysis of behavioral anomalies and network anomalies is achieved, significantly improving the accuracy and reliability of risk assessment. This solution, by verifying the proximity of timestamps and the homogeneity of process chains, can effectively confirm whether behavioral deviations and network anomalies originate from the coordinated malicious activities of the same software entity, thereby avoiding misjudging unrelated independent events as related risks. Based on this, by calculating the correlation strength of time processes and matching corresponding risk weight coefficients, the system can intelligently assess the logical correlation between the two types of anomaly evidence and accurately weight the initial behavioral deviation value. This weighting mechanism based on evidence correlation provides a quantitative basis for generating differentiated handling instructions, reducing the risk of misjudgment that may arise from a single evidence dimension.

[0037] Optionally, determining the temporal process correlation strength based on the first process chain information, the second process chain information, and the time difference includes:

[0038] Based on the time difference, a time correlation score is calculated using a preset time decay function;

[0039] Based on the first process chain information and the second process chain information, the process chain distance is determined, and the process chain correlation score is calculated using a preset inverse correlation function according to the process chain distance.

[0040] The time-process correlation strength is generated by nonlinearly combining the time correlation score and the process chain correlation score.

[0041] By employing the above technical solution, a precise quantitative assessment of the correlation strength between behavior and network anomalies is achieved. This solution calculates temporal correlation scores using a time decay function, ensuring that within a preset time window, the closer the occurrence of anomalies, the stronger their temporal correlation. This aligns with the objective law that security events typically exhibit temporal continuity. Simultaneously, by analyzing process chain information and calculating process chain distances, it is possible to trace and verify whether two anomalies share a common process ancestor at the system level, thereby confirming their homology. The non-linear combination of correlation scores across the time and process chain dimensions, rather than a simple weighted average, more accurately characterizes the complex interaction between the two, avoiding biases that may result from a single dimension dominating or linear superposition. This provides a scientifically reliable basis for subsequent risk-weighted calculations.

[0042] Optionally, the method further includes:

[0043] After the disposal instruction is executed, the system health status indicators of the terminal device are continuously collected;

[0044] By analyzing the changing trends of the system health status indicators before and after the execution of the disposal instruction, the strength of the causal relationship between the disposal instruction and the event where the values ​​of the system health status indicators return to a preset stable state range is determined.

[0045] When the causal correlation strength is higher than a preset correlation threshold, the combination of the behavior deviation record and the network anomaly record is determined as a verified attack event;

[0046] The maximum deviation feature is extracted from the behavioral deviation records, and the maximum deviation feature is associated and combined with the network anomaly records to obtain a dynamic threat template;

[0047] When the causal correlation strength is not higher than the preset correlation threshold, the behavior deviation record is determined as a benign baseline drift event, and the genuine behavior baseline is corrected based on the benign baseline drift event.

[0048] By adopting the above technical solution, a closed-loop detection system with self-learning and continuous evolution capabilities was constructed. By monitoring changes in the system's health status after the execution of handling commands and analyzing their causal relationship with the handling actions, effective verification and knowledge accumulation of security incident judgments were achieved. When the causal correlation is strong, the system can confirm that previously detected events are genuine attacks and automatically extract key features to generate dynamic threat templates, enriching the system's threat intelligence database. Conversely, when the correlation is insufficient, it is judged as a benign baseline drift, and this feedback is used to adaptively correct the original legitimate behavior baseline, enabling the baseline to be dynamically updated as legitimate software behavior changes. This mechanism gives the entire system the adaptive ability to maintain accuracy in a constantly changing software environment.

[0049] A second aspect of this application provides an electronic device including a processor, a memory, a user interface, and a network interface, wherein the memory is used to store instructions, the user interface and the network interface are both used to communicate with other devices, and the processor is used to execute the instructions stored in the memory to cause the electronic device to perform the method as described in any of the foregoing.

[0050] A third aspect of this application provides a computer-readable storage medium storing instructions that, when executed, perform the method described in any of the preceding descriptions.

[0051] A fourth aspect of this application provides a computer program product that, when run on an electronic device, causes the electronic device to perform the method as described in any of the preceding claims.

[0052] In summary, one or more technical solutions provided in the embodiments of this application have at least the following technical effects or advantages:

[0053] By constructing a multi-layered detection system encompassing "static feature recognition, dynamic behavior verification, network communication correlation, and closed-loop feedback learning," this approach effectively addresses the false negatives caused by the susceptibility to tampering with static features in traditional solutions. The solution first uses vectorized comparison of difficult-to-forge dynamic behavior data with a legitimate baseline, achieving a fundamental leap from surface feature matching to intrinsic behavioral authenticity. Then, through dual verification using time windows and process chains, it intelligently correlates behavioral anomalies with network anomalies, forming a multi-dimensional evidence chain. Finally, through system status feedback after handling, it achieves closed-loop optimization of attack event verification and baseline self-learning. This technical approach not only significantly improves the accuracy and robustness of software compliance determination but also endows the system with continuous evolutionary adaptive capabilities, reducing the risk of false negatives due to feature forgery. Attached Figure Description

[0054] Figure 1 This is a schematic diagram of the system architecture of an embodiment of a software authenticity detection method according to this application;

[0055] Figure 2 This is a flowchart illustrating a software authenticity detection method disclosed in an embodiment of this application;

[0056] Figure 3 This is another flowchart illustrating a software authenticity detection method disclosed in an embodiment of this application;

[0057] Figure 4 This is a schematic diagram of the structure of an electronic device disclosed in an embodiment of this application.

[0058] Explanation of reference numerals in the attached figures: 100, System architecture; 101, First terminal device; 102, Second terminal device; 103, Third terminal device; 104, Network; 105, Server; 401, Processor; 402, Communication bus; 403, User interface; 404, Network interface; 405, Memory. Detailed Implementation

[0059] To enable those skilled in the art to better understand the technical solutions in this specification, the technical solutions in the embodiments of this specification will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments.

[0060] In the description of the embodiments of this application, the words "for example" or "for instance" are used to indicate examples, illustrations, or explanations. Any embodiment or design that is described as "for example" or "for instance" in the embodiments of this application should not be construed as being more preferred or advantageous than other embodiments or design options. Rather, the use of the words "for example" or "for instance" is intended to present the relevant concepts in a specific manner.

[0061] In the description of the embodiments of this application, the term "multiple" means two or more. For example, multiple systems means two or more systems, and multiple screen terminals means two or more screen terminals. Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the indicated technical features. Thus, a feature defined with "first" or "second" may explicitly or implicitly include one or more of that feature. The terms "comprising," "including," "having," and variations thereof all mean "including but not limited to," unless otherwise specifically emphasized.

[0062] like Figure 1 As shown, system architecture 100 may include terminal devices 101, 102, and 103, a network 104, and a server 105. Network 104 serves as the medium for providing communication links between terminal devices 101, 102, and 103 and server 105. Network 104 may include various connection types, such as wired or wireless communication links, or fiber optic cables, etc.

[0063] Users can use terminal devices 101, 102, and 103 to interact with server 105 via network 104 to receive or send messages, etc. Various communication client applications can be installed on terminal devices 101, 102, and 103, such as model training applications, video recognition applications, web browser applications, social platform software, etc.

[0064] Terminal devices 101, 102, and 103 can be either hardware or software. When terminal devices 101, 102, and 103 are hardware, they can be various electronic devices with displays, including but not limited to smartphones, tablets, e-book readers, MP3 (Moving Picture Experts Group Audio Layer III) players, MP3 (Moving Picture Experts Group Audio Layer IV) players, laptops, and desktop computers, etc. When terminal devices 101, 102, and 103 are software, they can be installed in the aforementioned electronic devices. They can be implemented as multiple software programs or software modules (e.g., multiple software programs or software modules used to provide distributed services) or as a single software program or software module. No specific limitations are imposed here.

[0065] This embodiment discloses a method for detecting software authenticity. Figure 2 This is a flowchart illustrating a software authenticity detection method disclosed in an embodiment of this application, as shown below. Figure 2 As shown, the method includes the following steps:

[0066] S201. Obtain static feature data of the target software on the terminal device, and determine the declared identity information of the target software based on the static feature data;

[0067] The core purpose of this step is to create a preliminary identity profile for a specific software instance installed or running on any computing node within the enterprise network, that is, to accurately identify the identity "declared" by the software to the operating system and the external world. Here, "terminal device" can be understood as any computing entity that hosts the software, such as, but not limited to, desktop computers and laptops used by employees daily, as well as physical servers, virtual machines, or containers deployed in the data center. "Target software" specifically refers to any software program instance on these terminal devices that is of interest to this detection system and requires compliance analysis, such as a running process named "photoshop.exe" or its corresponding installed program file on the hard drive. This step establishes a clear and comparable identity benchmark for subsequent dynamic behavior verification, serving as the logical starting point for the entire intelligent detection process.

[0068] To determine the claimed identity of target software, the system first needs to comprehensively acquire its "static characteristic data." This "static characteristic data" refers to relatively fixed and unchanging descriptive information that is independent of the software's actual running state, can be directly read from the software files themselves or their configuration records in the operating system, and collectively constitutes the software's "static fingerprint." Specifically, there are various ways to acquire this data. A common implementation is that an agent program deployed on the terminal device directly parses the target software's main program file. By reading the file's attribute metadata, information such as the product name, file description, company or developer name, and crucial file version and product version number can be obtained. Simultaneously, the agent program performs a hash operation on the file to generate a unique digital fingerprint (e.g., a SHA-256 hash value) and rigorously verifies its accompanying digital signature to confirm that the signature was indeed issued by a legitimate and trustworthy software vendor. This is a key basis for determining the file's originality and integrity. In addition, as another important source of information, especially in the Windows operating system environment, the agent program can also actively query the system registry. By traversing key paths such as "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall", it can collect a large amount of official information written by the software's standard installer, including its display name, display version, and publisher registered in the system.

[0069] After successfully acquiring static feature data from one or more of the aforementioned sources, the system performs a standardized processing procedure to ultimately determine a structured "claimed identity information." This process aims to extract the most core and authoritative identity identifier from potentially mixed and inconsistent raw data through cleaning, deduplication, cross-validation, and field mapping. For example, the system may design a set of priority rules, prioritizing the acceptance of vendor information and product names from digital signatures, and combining them with the version number in the registry to construct the final identity. The final generated "claimed identity information" can be understood as a precise and standardized data object; it is no longer fragmented raw data but a logical whole. In a preferred embodiment, this information contains at least two core fields: a "software identifier" (e.g., uniformly specified as "Adobe Photoshop CC") to uniquely identify the software product line, and a "version information" (e.g., "23.5.1") to indicate the specific release version.

[0070] Regarding the triggering timing of step S201, this application also provides several flexible and configurable solutions to adapt to different management needs and application scenarios. The first solution can be configured as a "periodic full scan" mode. In this mode, the terminal agent performs a full scan of the entire terminal device's file system and registry at preset time intervals (e.g., every 24 hours) to inventory all installed software and establish or update its static identity profile. This solution is suitable for periodic IT asset inventory and comprehensive compliance audits. The second solution, as a more efficient and real-time implementation, can be configured as an "event-driven real-time monitoring" mode. In this mode, the agent monitors critical operating system events in real time. For example, when a new process is created or a new software installer is executed, the agent is immediately activated and, only for the target software corresponding to the new software installer, immediately performs the acquisition of static characteristic data and the determination of its identity. This solution has the advantages of rapid response and low system resource consumption, and can detect newly introduced unknown or potentially risky software in a timely manner, making it particularly suitable for implementing real-time access control for software installation and operation.

[0071] S202. When the target software is running, collect the dynamic behavior data of the target software and obtain the network communication data of the target software;

[0072] The "target software runtime" step in this section specifically refers to the point where the target software's program code has been loaded into the terminal device's memory and its instructions have begun to be executed by the Central Processing Unit (CPU), manifesting as an active process at the operating system level. At this stage, the software is no longer a static file but a dynamic entity that interacts in real time with the operating system and external networks. Its every action can reveal its true intentions, regardless of how its static characteristics may be disguised. The "dynamic behavioral data" here can be defined as a set of observable events, actions, and state changes generated during software runtime through its interactions with the operating system kernel, system resources, and other applications. This data constitutes a "behavioral profile" depicting the actual working logic of the software.

[0073] Optionally, during the operation of the target software, collecting dynamic behavior data of the target software and obtaining network communication data of the target software includes: monitoring the running status of the target software in real time through a terminal agent program deployed on the terminal device; when the running status is running, recording the application programming interface call sequence and system resource usage pattern during the operation of the target software to form the dynamic behavior data; obtaining outbound network traffic generated by the target software, and parsing the target communication address and communication protocol from the outbound network traffic to form the network communication data.

[0074] This embodiment achieves this by deploying a driver agent at the operating system kernel level. This terminal agent registers a process creation callback routine by calling a kernel function (e.g., PsSetCreateProcessNotifyRoutine in Windows), thereby monitoring and accurately determining in real time whether the target software has entered the "runtime" state. Upon confirmation, the agent subscribes to kernel event providers (such as file I / O and registry access) related to the target process using kernel-level event tracing mechanisms (e.g., Windows Event Tracing ETW), non-intrusively recording its complete application programming interface (API) call sequence. Simultaneously, it periodically obtains precise system resource usage patterns, such as CPU time and memory working set, by directly accessing the process control block (EPROCESS structure) in the kernel, forming dynamic behavioral data. For network communication, the driver agent uses the Windows Filtering Platform (WFP) to register a labeled callback function at the network connection establishment layer (FWPM_LAYER_ALE_FLOW_ESTABLISHED_V4). When the target process initiates an outbound network connection, the callback function can directly obtain the process ID, target communication address, and communication protocol (such as TCP / UDP and port) of the connection from the metadata provided by the system, thereby accurately forming network communication data.

[0075] Another alternative implementation employs a lightweight agent program running in user space. This terminal agent periodically calls system APIs (such as the CreateToolhelp32Snapshot and Process32First / Next combination in Windows) to traverse the list of active system processes, comparing process image paths to determine if the target software is in a "running" state. When the target software is detected to be running, the agent program loads a monitoring DLL (Dynamic Link Library) into the target software's process space using remote thread injection technology. This DLL modifies the target process's import address table to hook critical application interfaces (such as CreateFile, send, etc.), recording their call parameters and sequences before the original functions are called. Simultaneously, the agent program obtains the target process's system resource usage patterns by calling standard APIs such as GetProcessMemoryInfo and GetProcessTimes, collectively constituting dynamic behavioral data. To acquire network communication data, the agent program uses the libpcap / WinPcap library to capture all outbound network traffic on the local machine. On the other hand, it calls APIs such as GetExtendedTcpTable to obtain the system network connection table (containing PID and connection information). By comparing the captured data packets with the connection table, it can filter out outbound network traffic generated by the target software and parse out the target communication address and communication protocol.

[0076] S203. Based on the declared identity information, match and retrieve the corresponding genuine behavior baseline from the preset baseline library;

[0077] To achieve the above objectives, the system relies on a core knowledge asset: a "pre-built baseline library." This "pre-built baseline library" is not a simple list of software, but a large-scale, structured database pre-built and continuously maintained by the system's operator or security service provider. The database is typically built in a highly controlled, cleanroom environment isolated from external networks. In this environment, those skilled in the art install and run a vast number of official, multi-version genuine software programs, simulating real-world user operations in various typical scenarios (such as first-time startup, execution of core functions, online updates, and idle standby) to comprehensively collect dynamic behavioral data and network communication data exhibited by these genuine software programs under completely normal conditions. After undergoing a series of data science processes, including statistical analysis, feature extraction, and pattern clustering, this raw data is refined and solidified into standard, machine-readable "genuine behavior baselines," which are then stored in the library. Furthermore, the establishment and updating of this baseline library can be continuously learned from numerous trusted production environments, automatically summarizing the normal behavioral patterns of software in real business scenarios to further improve the coverage and adaptability of the baselines.

[0078] Optionally, the step of matching and retrieving the corresponding genuine behavior baseline from the preset baseline library based on the declared identity information includes: parsing the software identifier and version information from the declared identity information; constructing a primary key query condition based on the software identifier and the version information, and using the primary key query condition to match the baseline index in the preset baseline library; and retrieving the corresponding genuine behavior baseline from the preset baseline library based on the found baseline index.

[0079] In one specific embodiment, the preset baseline library is stored and managed using a relational database (such as MySQL or PostgreSQL). From the declared identity information uploaded by the terminal agent, the system first parses out the hash value (e.g., SHA256 value) that uniquely identifies the main software program as the "software identifier," and the version number string of the executable file as the "version information." Subsequently, the system combines the hash value and the version number string to construct a primary key query condition and initiates a query to the "BaselineIndex Table" in the baseline library. This index table uses the combination of the software hash value and the version number as a composite primary key. A query statement is, for example: SELECT BaselineDataID FROM BaselineIndex WHERE SoftwareIdentifier='...' AND Version='...'. After a successful query, the system obtains a unique baseline data ID, which is used as a foreign key to link to the "baseline data table." The system then uses this ID to perform a secondary query in the data table, such as SELECT BaselineContent FROM BaselineData WHERE ID='...', thereby retrieving the complete, serialized, genuine behavior baseline data structure stored in a large binary object or JSONB field.

[0080] As an alternative implementation, the preset baseline library can be constructed using a document-oriented NoSQL database (such as MongoDB or Elasticsearch). In this scheme, the software identifier (e.g., Adobe_Photoshop) and version information (e.g., 23.5.1) parsed from the declared identity information are used to construct a unique document ID. Specifically, the system concatenates the two using a preset separator (such as underscores) to form a string Adobe_Photoshop_23.5.1, which serves as the primary key query condition in the database and directly corresponds to the id field of a document in the baseline library collection. The system performs an efficient key-value lookup operation, such as db.baselines.findOne({_id:"Adobe_Photoshop_23.5.1"}) in MongoDB. Since the id field is indexed by default in NoSQL databases, this matching process is extremely fast. Once a match is successful, the database will directly return the entire JSON document corresponding to the _id. The system will then extract the preset baseline field content from the document. This content records the corresponding legitimate behavior baseline in detail in the form of a hierarchical JSON object, including its API call pattern, network communication characteristics, and resource consumption range.

[0081] S204. Compare the dynamic behavior data with the genuine behavior baseline to obtain the offset value. When the offset value is greater than or equal to a preset offset threshold, generate a behavior deviation record.

[0082] The system first decomposes the dynamic behavior data and the legitimate behavior baseline into multiple quantifiable comparison dimensions. For example, for application interface call sequences, the system can use an edit distance-based sequence alignment algorithm (such as the Levenshtein distance algorithm) to calculate the difference between the dynamically collected API call sequences and the legal call patterns defined in the baseline, obtaining a "sequence difference score". For system resource usage, the system compares the collected average CPU usage, peak memory usage, and other values ​​with the normal ranges set in the baseline (e.g., CPU [5%, 20%], memory [400MB, 600MB]), calculating the percentage exceeding the range as a "resource over-limit score". Next, the system assigns a preset weight to the score of each dimension (e.g., the weight of abnormal API sequences is higher than the weight of slight memory over-limit), and aggregates these sub-scores into a comprehensive "offset value" through weighted summation. The formula for calculating this offset value can be configured as: Offset value = w1·sequence difference score + w2·resource over-limit score + ... , where w1 and w2 are preset weights. The system then compares the calculated offset value with a dynamically adjustable "preset offset threshold" (e.g., 0.8). If the offset value is greater than or equal to the threshold, the system determines that a significant behavioral deviation has occurred and immediately generates a detailed "behavioral deviation record." This record not only includes the final offset value and the triggered threshold, but also specifically indicates the dimension comparison that led to the deviation judgment (e.g., "API call 'CreateRemoteThread' not in the baseline was found" or "CPU utilization exceeds the baseline range by 35%").

[0083] Optionally, the step of comparing the dynamic behavior data with the genuine behavior baseline to obtain an offset value, and generating a behavior deviation record when the offset value is greater than or equal to a preset offset threshold, includes: parsing the genuine behavior baseline into a baseline vector, the baseline vector containing a base value component corresponding one-to-one with multiple preset behavior features; extracting multiple real-time feature values ​​corresponding to the multiple preset behavior features from the dynamic behavior data, and generating a real-time behavior vector in the same component order as the baseline vector; calculating a preset vector distance between the real-time behavior vector and the baseline vector, and designating the preset vector distance as the offset value; when the offset value is greater than or equal to the preset offset threshold, calculating the absolute value of the difference between each corresponding component of the real-time behavior vector and the baseline vector, determining the preset behavior feature corresponding to the component with the largest absolute value as the maximum deviation feature, and combining the maximum deviation feature with the offset value to generate the behavior deviation record.

[0084] To achieve this comparison, a preferred embodiment first vectorizes the behavioral data. Specifically, the system predefines a behavioral feature space with N (N is a positive integer) dimensions, where each dimension i (i=1, 2, ..., N) corresponds to a quantifiable software behavior indicator, such as the call frequency of a specific API, the number of connections to a network port, or the read / write entropy of a file. The genuine behavior baseline is represented as an N-dimensional baseline vector b=(b1, b2, ..., b... N ), where component b i This represents the distribution parameter of the standard or expected value of the software on the i-th behavioral feature. Correspondingly, the dynamic behavioral data collected from real-time monitoring is also mapped to an N-dimensional real-time behavioral vector r = (r1, r2, ... r...) in the same feature space. N ), where component r i This represents the actual observed value of the software on the i-th behavioral feature.

[0085] After vectorizing the behavior data, the calculation of the offset value is transformed into measuring the distance or dissimilarity between the real-time behavior vector r and the baseline vector b in the N-dimensional feature space. This application provides a variety of optional measurement schemes.

[0086] As a preferred implementation, this offset value can be determined by calculating the Euclidean distance between the two vectors. This distance is a classic method for measuring the straight-line distance between two points in multidimensional space, and its calculation formula is as follows:

[0087] In this formula, This is the calculated offset value. Let r represent the L2 norm of (rb). N is the total number of dimensions in the behavioral feature space.i Let be the component value of the real-time behavior vector r in the i-th dimension, that is, the actual observed value of the i-th behavior feature. i Let be the component value of the baseline vector b in the i-th dimension, that is, the baseline value of the i-th behavioral feature.

[0088] As an alternative implementation, this offset value can also be determined by calculating the Manhattan distance, also known as the L1 distance, between the two vectors. This method calculates the total path length along an axis in the standard coordinate system between two points, and the formula is as follows:

[0089]

[0090] In this formula, The calculated offset value, The symbol (rb) represents the L1 norm. The meanings of the other symbols are the same as those defined in the aforementioned Euclidean distance scheme. Compared with Euclidean distance, Manhattan distance linearly accumulates deviations across all dimensions with equal weights. Its calculation results are less sensitive to extreme outliers in individual dimensions than Euclidean distance, and it better reflects the overall cumulative amount of behavioral deviations.

[0091] After calculating the offset value using any of the above methods, the system compares this value with a dynamically configurable preset offset threshold. This threshold is the critical value for determining whether the behavior constitutes an anomaly. Its specific value can be flexibly set by the security administrator based on the severity level of the security policy, the importance of the monitored software, and the characteristics of the selected distance measurement method, thereby achieving a balance between detection sensitivity and false alarm rate.

[0092] If the system determines that the calculated offset value is greater than or equal to the preset offset threshold, it indicates that the target software's actual behavior has significantly deviated from its known legitimate behavior pattern, constituting a potential security risk. At this point, the system will automatically generate a structured behavior deviation record. This record, as a detailed event snapshot, preferably includes the event timestamp, the target software's process information (such as process ID and path), the final calculated offset value, the distance calculation method used in this determination, the key deviation dimensions that triggered the threshold (i.e., one or more behavioral features that contribute the most to the total offset value), and a comparison between the baseline values ​​and actual observations for these key dimensions. This detailed record provides crucial data support for subsequent security responses, threat tracing, and digital forensics.

[0093] S205. In response to generating the behavior deviation record, analyze the network communication data, and when the network communication data meets the preset network anomaly conditions, generate a network anomaly record;

[0094] Specifically, in response to generating the behavior deviation record, the system can immediately perform a specialized analysis on the network communication data collected within the same time window to determine whether it meets one or more preset network anomaly conditions. These conditions can be configured as various rules, such as, but not limited to: comparing the target IP address or domain name in the communication data with a malicious reputation database containing known command and control servers; analyzing the packet size and time interval of network traffic to identify periodic "heartbeat" communication patterns pointing to unconventional servers; or detecting whether there is protocol abuse, such as transmitting non-standard command data through the DNS protocol. Once the network communication data triggers any preset anomaly condition, the system generates a network anomaly record, which not only indicates the specific anomaly type (e.g., "connected to a known malicious server"), but also contains relevant communication metadata (such as target IP, port, timestamp, etc.) and is associated with the previous behavior deviation record, thereby achieving dual corroboration of internal behavior anomalies and external network anomalies.

[0095] S206. Merge the behavioral deviation records and the network anomaly records to generate a risk score for the target software;

[0096] In this step, the fusion process is not simply about juxtaposing two records, but rather about deeply analyzing and correlating the rich information they contain. The behavioral deviation record details specific abnormal behaviors of the process in areas such as file operations, registry access, process creation, or memory operations, along with their context; while the network anomaly record provides evidence of threats at the network communication level, such as communication with malicious IPs, the existence of covert channels, or data leakage. This step aims to combine these two types of evidence, representing "internal modus operandi" and "external communication channels" respectively, and use an algorithmic model to determine the strength of their inherent logical correlation, thereby providing a comprehensive risk assessment that is far more reliable than that of a single piece of evidence.

[0097] Optionally, the step of fusing the behavior deviation record and the network anomaly record to generate a risk score for the target software includes: extracting a behavior process identifier and a behavior occurrence timestamp from the behavior deviation record, and extracting a network process identifier and a network occurrence timestamp from the network anomaly record; determining whether the time difference between the behavior occurrence timestamp and the network occurrence timestamp is within a preset time window; when the time difference is within the preset time window, querying and obtaining first process chain information corresponding to the behavior process identifier and second process chain information corresponding to the network process identifier from the operating system of the terminal device, respectively, based on the behavior process identifier and the network process identifier; determining the time process correlation strength based on the first process chain information, the second process chain information, and the time difference; matching and obtaining risk weight coefficients from a preset scene weight library according to the time process correlation strength, and using the risk weight coefficients to perform weighted calculations on the offset value to generate the risk score.

[0098] In this embodiment, the scheme is used to detect high-risk behaviors directly triggered by the same process. First, the system extracts a behavior process identifier (PID) of 1234 (e.g., corresponding to svchost.exe) and a behavior occurrence timestamp of 1678886400.500 from the behavior deviation record. This behavior is writing a file to a critical system directory. Simultaneously, the system extracts a network process identifier of 1234 and a network occurrence timestamp of 1678886400.650 from the network anomaly record. This network behavior is connecting to a known malicious domain. The system determines that the difference between the two timestamps (0.15 seconds) is within a preset time window (e.g., 2 seconds). Since the two process identifiers are the same, the first process chain information and the second process chain information obtained by the system after querying the operating system are completely consistent, for example, both are "services.exe->svchost.exe(1234)". Based on this, the system determines the time-process association strength to the highest level, "direct association within the same process". Subsequently, based on this association strength, the system matches and obtains a high risk weight coefficient from a preset scenario weight library, for example, 3.0. Finally, the system uses this coefficient to weight the deviation of the behavior from the original record offset value (e.g., 60) to generate a final risk score of 60 × 3.0 = 180, thereby upgrading a moderately suspicious behavior to a high-risk alarm.

[0099] In another embodiment, this scheme is used to identify covert attacks involving collaboration between parent and child processes. The system extracts the process ID PID 2345 (corresponding to WINWORD.EXE) and timestamp 1678886401.200 from behavior deviation records, indicating the execution of a suspicious macro script; and extracts the network process ID PID 6789 (corresponding to powershell.exe) and timestamp 1678886401.800 from network anomaly records, indicating the initiation of DNS tunnel communication. After confirming that the 0.6-second time difference is within a preset time window, the system queries the process chain information for both PIDs: the first process chain is "explorer.exe->WINWORD.EXE(2345)", and the second process chain is "explorer.exe->WINWORD.EXE(2345)->powershell.exe(6789)". By comparison, the system finds that the first process chain is the prefix of the second process chain, thus determining that WINWORD.EXE is the parent process of powershell.exe. Based on this parent-child process relationship and the small time difference, the system determines that the time process association strength is indirectly triggered by the parent and child processes. Based on this strength, a secondary but still significant risk weight coefficient, such as 2.0, is matched from the scenario weight library, and the original offset value (e.g., 50) is weighted to generate a risk score of 50 × 2.0 = 100, which effectively reveals the attack chain that uses legitimate programs to launch malicious payloads.

[0100] Optionally, determining the temporal process correlation strength based on the first process chain information, the second process chain information, and the time difference includes: calculating a time correlation score using a preset time decay function based on the time difference; determining the process chain distance based on the first process chain information and the second process chain information, and calculating the process chain correlation score using a preset inverse correlation function based on the process chain distance; and performing a nonlinear combination of the time correlation score and the process chain correlation score to generate the temporal process correlation strength.

[0101] Specifically, the system extracts the timestamps from behavioral deviation records and network anomaly records, calculates the absolute value of the time difference between them, and denoted as Δt. Then, the system calls a preset time decay function to calculate the time correlation score S_time. A preferred implementation is to use an exponential decay function, with the formula, for example: S_time = exp(-λ·Δt). In this formula, exp is the natural exponential function, and λ is a time decay coefficient that can be configured according to the actual security scenario requirements, used to control the decay rate of the score as the time difference increases. For example, when λ is configured to 0.5, if two events are 1 second apart, their time correlation score is approximately 0.6; while if they are 4 seconds apart, the score drops rapidly to approximately 0.135. This method can map discrete time differences to a continuous score in the interval [0, 1], thereby accurately quantifying the industry consensus that the closer the time, the stronger the correlation.

[0102] Furthermore, the system determines the process chain distance D_proc between the first process chain information (e.g., A->B->C) and the second process chain information (e.g., A->B->D or A->B->C->E). This distance can be defined as follows: 0 when the two processes are the same process; 1 when they are direct parent-child; 2 when they are siblings (sharing a common direct parent process); and for other cases without direct association, the distance can be set to a preset large value or infinity. After determining the process chain distance D_proc, the system maps this discrete distance value to a process chain correlation score S_proc using a preset inverse correlation function or lookup table. For example, it can be set as follows: S_proc=1.0 when D_proc=0; S_proc=0.8 when D_proc=1; S_proc=0.5 when D_proc=2; and for other greater distances, S_proc can be set to a smaller value such as 0.1.

[0103] Furthermore, after obtaining S_time, which quantifies the temporal closeness, and S_proc, which quantifies the process affinity, the system combines these two scores to obtain a final strength value, Strength_corr, that comprehensively reflects the correlation between the two. A preferred nonlinear combination method is weighted multiplication, and its formula can be: Strength_corr = (S_time) / (S_proc ... w3 )*(S_proc w4Here, w3 and w4 are the weight coefficients corresponding to the time and process dimensions, respectively. Their sum can be 1 (e.g., w3=0.4, w4=0.6), used to adjust the relative importance of the two dimensions in the final evaluation. Using multiplicative combination instead of addition has the technical effect that the final score will only be high when both the time and process dimensions show a strong correlation; a low score in either dimension will significantly lower the total score, which is more in line with the logic of composite event correlation analysis. The finally calculated Strength_corr is a continuous value in the interval [0, 1], which is the quantified time-process correlation strength, and can be directly used to match and obtain the corresponding risk weight coefficients from the preset scenario weight library.

[0104] S207. Based on the risk score and the preset handling strategy, generate detection results and handling instructions.

[0105] In a specific embodiment, after the system calculates the comprehensive risk score according to the aforementioned steps, it executes step S207, which generates corresponding detection results and handling instructions based on the risk score and the preset handling strategy, thereby transforming the quantified risk into specific, executable response actions. Here, the "preset handling strategy" is a set of rules or a policy library that can be pre-configured by the security administrator. It establishes a mapping relationship between continuous risk scores and discrete handling levels and specific operations. Specific technical solutions for implementing this mapping relationship include, but are not limited to: The first solution is a threshold-based hierarchical strategy. For example, the risk score range can be divided into multiple levels, such as observation level (e.g., score 0-40), alert level (e.g., score 41-70), blocking level (e.g., score 71-90), and emergency level (e.g., score 91-100), with each level corresponding to a set of progressive operations; The second solution is a dynamic matching strategy based on a multi-dimensional strategy matrix. This matrix not only considers the risk score but also combines dimensions such as anomaly type, asset importance, and user identity to jointly determine the most appropriate handling measures, thereby achieving a more refined response; The third solution is an adaptive strategy based on a machine learning model. The system can use a pre-trained classification model, taking the risk score and related contextual features as input, and the model dynamically outputs the optimal handling suggestions. Regardless of the approach used, this step ultimately generates two types of outputs: First, the detection result, which is typically a structured log or alert event. This log details the unique ID of the detection, the final risk score, the risk level, the specific behavioral deviation that triggered the score, and the chain of evidence linking the network anomaly event (including time, process, user, etc.). The format can be JSON or XML to facilitate subsequent auditing, tracing, and situational awareness. Second, the handling instructions, which are one or more machine commands that can be directly executed by downstream security components. For example, for observation-level security, only an enhanced logging instruction might be generated; for alert-level security, an instruction to notify the administrator might be generated; and for blocking or emergency-level security, mandatory instructions such as terminating processes or isolating hosts might be generated. These instructions can be sent to different execution points such as endpoint detection and response agents, firewalls, and security orchestration automation and response platforms to achieve automated, closed-loop handling of security threats.

[0106] Please see Figure 3 This is another flowchart illustrating a software authenticity detection method in an embodiment of this application.

[0107] S301. After the disposal instruction is executed, the system health status indicators of the terminal device are continuously collected.

[0108] The "System Health Status Indicators" aim to objectively quantify whether the terminal's operating status has returned to a normal baseline from multiple dimensions. Specific content may include, but is not limited to: 1. Resource utilization indicators, such as overall CPU and kernel utilization, physical and virtual memory usage, disk I / O rate, network throughput, and concurrent connections; 2. Process and service activity indicators, such as the total number of currently running processes, whether any abnormal or unauthorized new processes are started, and the operating status of critical system services; 3. File and registry integrity indicators, such as periodically verifying the hash values ​​of core system files or critical configuration files, and monitoring the read / write behavior of sensitive registry entries.

[0109] Specific technical solutions for achieving "continuous data collection" can include: The first solution is fixed-period polling collection, which involves collecting data at a preset fixed time interval (e.g., every 30 seconds for the first 10 minutes after the event). This solution is simple to implement and has predictable resource overhead. The second solution is dynamic adaptive collection, where the collection frequency changes dynamically. High-frequency collection (e.g., every 5 seconds) is used in the initial stage after the event. If the indicators show stability for several consecutive periods, the collection interval is gradually extended (e.g., adjusted to 1 minute, then 5 minutes) to reduce the impact on terminal performance while maintaining monitoring sensitivity. These indicator collection actions can be executed by a lightweight agent program deployed on the terminal, or initiated by the management center through a remote management protocol (e.g., Windows WMI or Linux SSH). By executing this step, this application can provide quantitative data support for the success or failure of the event handling action, avoiding the misjudgment that the threat has been eliminated simply because the command was successfully issued. It can also detect persistent threat behaviors such as malicious process "resurrection" in a timely manner, thus forming an automated closed-loop verification mechanism for security incident response.

[0110] S302. By analyzing the changing trend of the system health status indicators before and after the execution of the disposal instruction, determine the causal correlation strength between the disposal instruction and the event where the values ​​of the system health status indicators return to a preset stable state range.

[0111] "Analyzing trends" specifically refers to comparing and analyzing the time-series data of one or more system health status indicators (such as CPU utilization and network connection count) before and after the execution time of the disposal instruction (T_exec). The "preset stable state range" is the range of indicator fluctuations during normal system operation, which can be configured as follows: one is a static threshold range, such as CPU utilization below 15%; the other is a dynamic baseline learned from historical data, such as the indicator value being within plus or minus one standard deviation of the average value over the past 24 hours. Specific technical solutions for determining the strength of causal association may include, but are not limited to: The first solution is a correlation analysis method based on time proximity. This method calculates the time difference Δt = T_recover - T_exec between the execution time of the disposal instruction (T_exec) and the time point when the system health status indicator first enters and remains in a stable state range (T_recover). The strength of the causal association can be calculated as a function inversely proportional to Δt, for example, Strength = 1 / (1 + k·Δt), where k is an adjustable coefficient. The smaller the time difference Δt, the stronger the causal association. The second solution is an attribution method based on statistical change point detection. The system applies a change point detection algorithm (e.g., CUSU) to the collected indicator time series. The M-algorithm or Bayesian change point detection automatically identifies the key time point (T_change) when an indicator abruptly changes from an abnormal mode to a stable mode. If this T_change and T_exec are highly coincident in time (e.g., within a few seconds or a data collection cycle), a strong causal relationship is determined. The third approach is a comparative analysis method based on a counterfactual prediction model. The system uses pre-intervention data to train a short-term time series prediction model (such as an ARIMA model) to predict the indicator trend (i.e., the counterfactual trajectory) under the condition of "no intervention". Then, it compares the actual observed, regressed indicator trajectory with this predicted trajectory. The greater and more significant the difference between the two, the stronger the intervention effect of the intervention command and the higher the causal relationship. Finally, this step outputs a quantitative correlation strength score (e.g., a value from 0 to 1). This score not only confirms the effectiveness of a single response action but also serves as feedback data for continuously optimizing the preset intervention strategy library, thereby constructing a closed-loop safety response system that can self-verify, self-learn, and self-evolve.

[0112] S303. When the causal correlation strength is higher than the preset correlation threshold, the combination of the behavior deviation record and the network anomaly record is determined as a verified attack event.

[0113] The "preset association threshold" is a key parameter that can be flexibly configured by the system administrator according to security policies and risk tolerance. For example, it can be set to 0.8 or 0.9 to define the criteria for judging "strong association". The "verified attack event" represents a high-confidence security conclusion. It indicates that the initial alarm has been upgraded from a pending "suspicious event" to a "confirmed event" with a complete chain of evidence and a closed-loop handling mechanism. Specific technical solutions for implementing this determination step may include: The first solution is the single threshold comparison method, which directly compares the calculated causal correlation strength score (e.g., a floating-point number between 0 and 1) with the single threshold. Once the strength score is higher than the threshold, the determination is completed immediately. The second solution is the dynamic hierarchical threshold method, in which the correlation threshold is not fixed, but negatively correlated with the initial risk score (e.g., the score calculated in S206). That is, for events with extremely high initial risk, even if their causal correlation strength is slightly low (e.g., 0.7), they can be determined as verified attacks. Conversely, for events with low initial risk, a very high causal correlation strength (e.g., 0.95) is required for final confirmation, thereby achieving a more refined and dynamic determination logic. Once an event is determined to be a verified attack, the system can execute a series of subsequent automated actions. For example, it can permanently archive all records related to the event, including original behavioral deviation records, network anomaly records, generated risk scores, executed handling instructions, and health status indicator data during the verification process, and label them as verified attacks for subsequent security audits and attack reviews. Furthermore, the system can automatically extract key features of this attack event (such as process names, file hashes, target IP addresses, attack sequences, etc.) and solidify them as high-value attack patterns or vulnerability indicators into a local knowledge base or rule base for optimizing future detection models and handling strategies.

[0114] S304. Extract the maximum deviation feature from the behavior deviation record, and associate and combine the maximum deviation feature with the network anomaly record to obtain a dynamic threat template;

[0115] The "maximum deviation feature" refers to the key feature among multiple atomic behaviors that constitute a behavioral deviation, which can best represent the core intention of this attack or cause the most significant anomaly. The specific technical solutions for extracting this feature may include: The first solution is the extraction method based on risk contribution, that is, when performing a risk score on the behavior (such as S206), identify and extract the atomic behavior feature that contributes the most to the overall risk score. For example, the behavior that a process named svchost.exe creates a scheduled task may contribute a higher risk score than a 20% increase in CPU usage, so the former is extracted as the maximum deviation feature; The second solution is the extraction method based on statistical anomaly amplitude, that is, for all behavior indicators that deviate from the baseline, calculate their relative or absolute amplitude of deviation (such as Z-score or the difference from the baseline mean), and extract the feature with the largest deviation amplitude; The third solution is the extraction method based on a preset severity level, that is, the system pre-defines a behavior severity level library (for example, kernel module loading > process injection > sensitive file reading), and extracts the behavior with the highest severity level observed in this event as the maximum deviation feature. The extracted maximum deviation feature (for example, a specific malicious command line) is associated and combined with the network anomaly record that triggers it (for example, the IP address and port of a C2 server), thus forming a structured dynamic threat template.

[0116] This template is a high-fidelity description of the attack pattern, and its specific form may include: First, the specific indicator type template, which contains specific and directly searchable indicators of compromise (IoC), such as "{'source IP': '1.2.3.4', 'destination port': '4444', 'associated behavior': 'powershell.exe - enc<Base64-encoded malicious payload>'}"; Second, the abstract rule type template, in which some specific indicators are generalized into more general patterns, such as "{'source IP reputation': 'low', 'destination service': 'uncommon high port communication', 'associated process':'script interpreter', 'behavior pattern': 'execute encoded commands'}". This template has a stronger ability to detect variant attacks. Through this step, the system can automatically refine and transform each successfully verified and handled attack event into immediately reusable and high-value threat intelligence. These dynamically generated threat templates can be directly distributed to all protected endpoints for proactive threat hunting or enhancing real-time detection rules, thereby enabling the entire security protection system to have the ability to learn and evolve from actual combat.

[0117] S305. When the causal association strength is not higher than the preset association threshold, determine the behavior deviation record as a benign baseline drift event, and correct the legitimate behavior baseline based on the benign baseline drift event.

[0118] When executing step S305, the system determines that the causal correlation strength calculated in the aforementioned step S302 is not higher than a preset correlation threshold. This usually means that although the system state eventually returns to normal, the regression event and the handling instructions executed by the system are not strongly correlated in time or statistics, indicating that the initial behavioral deviation may not have been caused by a malicious attack. In this case, the system determines the behavioral deviation record as a "benign baseline drift event." Here, a "benign baseline drift event" specifically refers to a permanent or semi-permanent change in the host behavior pattern caused by legitimate system changes (such as software version upgrades, patch installations, business logic updates, or normal administrator configuration adjustments). Although this change deviates from the old "genuine behavior baseline," it is essentially harmless and should be considered a new normal behavior. Subsequently, the system will correct the genuine behavior baseline based on this determined benign baseline drift event.

[0119] Specific technical solutions for baseline correction may include, but are not limited to: The first solution is an automated learning and incremental update method. The system automatically extracts behavioral data (such as new process relationships, file access paths, network port usage, etc.) from the benign baseline drift event and uses it as a new legitimate behavioral sample to update the original baseline model. For example, if the baseline is a statistical model, parameters such as the mean and variance can be updated; if the baseline is a behavioral graph, new nodes or edges can be added. The second solution is a supervised correction method based on manual review. The system does not directly update the baseline but marks the benign baseline drift event as a baseline change to be confirmed and pushes it to the security administrator's review interface for manual analysis and confirmation. Only after the administrator confirms that the behavior is a legitimate business change will the system perform baseline correction. This method increases the accuracy of correction and prevents potential attackers from using this mechanism to poison the baseline. The third solution is a confidence-weighted progressive fusion method. The system calculates a confidence level for the drift event and makes only small, low-weight adjustments to the baseline when it first occurs. When similar behavioral deviations are observed again in the future and are also judged as benign drifts, the system will gradually increase its confidence in such behavior as the new normal and correspondingly increase the adjustment weight of the baseline model until the behavioral pattern is fully integrated into the new baseline. The preset correlation threshold here can be configured according to the system's security level and tolerance for false alarms; for example, it can be set to 0.5. Through this design step, this application not only provides a complete closed-loop processing mechanism for security alerts, effectively distinguishing between real attacks and benign changes, significantly reducing false alarm rates and alleviating the analytical burden on security operations personnel, but more importantly, it enables the legitimate behavioral baseline to dynamically adapt to changes in business and environment, thereby continuously improving the robustness and accuracy of the entire anomaly detection system.

[0120] This embodiment also discloses an electronic device, as shown in the reference. Figure 4 The electronic device may include: at least one processor 401, at least one communication bus 402, user interface 403, network interface 404, and at least one memory 405.

[0121] The communication bus 402 is used to enable communication between these components. The user interface 503 may include a display screen and a camera; optionally, the user interface 403 may also include a standard wired interface or a wireless interface. The network interface 404 may optionally include a standard wired interface or a wireless interface (such as a Wi-Fi interface).

[0122] The processor 401 may include one or more processing cores. The processor 401 connects to various parts of the server using various interfaces and lines, and performs various server functions and processes data by running or executing instructions, programs, code sets, or instruction sets stored in memory 405, and by calling data stored in memory 405. Optionally, the processor 401 may be implemented using at least one hardware form of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), or Programmable Logic Array (PLA). The processor 401 may integrate one or a combination of several of the following: Central Processing Unit (CPU), Graphics Processing Unit (GPU), and Modem. The CPU primarily handles the operating system, user interface, and applications; the GPU is responsible for rendering and drawing the content to be displayed on the screen; and the modem handles wireless communication.

[0123] The memory 405 may include random access memory (RAM) or read-only memory. Optionally, the memory 405 may include a non-transitory computer-readable storage medium. The memory 405 may be used to store instructions, programs, code, code sets, or instruction sets. The memory 405 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for at least one function (such as touch function, sound playback function, image playback function, etc.), instructions for implementing the above-described method embodiments, etc.; the data storage area may store data involved in the above-described method embodiments, etc. Optionally, the memory 405 may also be at least one storage device located remotely from the aforementioned processor 401. Figure 4 As shown, the memory 405, which serves as a computer storage medium, may include an operating system, a network communication module, a user interface module, and an application program for a software authenticity detection method.

[0124] In some embodiments of this application, a computer-readable storage medium is provided, including instructions that, when executed on the electronic device, cause the electronic device to perform a software authenticity detection method according to an embodiment of this application.

[0125] In some embodiments of this application, a computer program product is also provided, which, when run on an electronic device, causes the electronic device to execute a software authenticity detection method according to an embodiment of this application.

[0126] The foregoing description is merely an exemplary embodiment of this disclosure and should not be construed as limiting the scope of this disclosure. Any equivalent changes and modifications made in accordance with the teachings of this disclosure shall still fall within the scope of this disclosure. Other embodiments of this disclosure will be readily apparent to those skilled in the art upon consideration of the disclosure in this specification. This application is intended to cover any variations, uses, or adaptations of this disclosure that follow the general principles of this disclosure and include common knowledge or customary techniques in the art not described in this disclosure. The specification and embodiments are to be considered exemplary only, and the scope and spirit of this disclosure are defined by the claims.

Claims

1. A method for detecting software authenticity, characterized in that, Applied to a server, the method includes: Obtain static feature data of the target software on the terminal device, and determine the declared identity information of the target software based on the static feature data; During the execution of the target software, dynamic behavior data of the target software is collected, and network communication data of the target software is obtained; Based on the declared identity information, the corresponding genuine behavior baseline is matched and retrieved from the preset baseline library; The dynamic behavior data is compared with the genuine behavior baseline to obtain the offset value. When the offset value is greater than or equal to a preset offset threshold, a behavior deviation record is generated. In response to generating the behavior deviation record, the network communication data is analyzed, and when the network communication data meets preset network anomaly conditions, a network anomaly record is generated; By combining the behavioral deviation records and the network anomaly records, a risk score for the target software is generated; Based on the risk score and the preset handling strategy, the detection results and handling instructions are generated; The step of comparing the dynamic behavior data with the genuine behavior baseline to obtain an offset value, and generating a behavior deviation record when the offset value is greater than or equal to a preset offset threshold, includes: The genuine behavior baseline is parsed into a baseline vector, and the baseline vector contains a reference value component that corresponds one-to-one with multiple preset behavior features; Extract multiple real-time feature values ​​corresponding to the multiple preset behavioral features from the dynamic behavioral data, and generate a real-time behavioral vector according to the same component order as the baseline vector; Calculate the preset vector distance between the real-time behavior vector and the baseline vector, and specify the preset vector distance as the offset value; When the offset value is greater than or equal to the preset offset threshold, the absolute value of the difference between each corresponding component of the real-time behavior vector and the baseline vector is calculated. The preset behavior feature corresponding to the component with the largest absolute value is determined as the maximum deviation feature. The maximum deviation feature is then combined with the offset value to generate the behavior deviation record. The process of fusing the behavioral deviation records and the network anomaly records to generate a risk score for the target software includes: Extract the behavior process identifier and behavior occurrence timestamp from the behavior deviation record, and extract the network process identifier and network occurrence timestamp from the network anomaly record; Determine whether the time difference between the timestamp of the behavior and the timestamp of the network occurrence is within a preset time window; When the time difference is within the preset time window, the operating system of the terminal device is queried and the first process chain information corresponding to the behavior process identifier and the second process chain information corresponding to the network process identifier are obtained based on the behavior process identifier and the network process identifier, respectively. Based on the first process chain information, the second process chain information, and the time difference, the time process association strength is determined; Based on the correlation strength over time, risk weight coefficients are matched and obtained from a preset scenario weight library, and the offset values ​​are weighted using the risk weight coefficients to generate the risk score.

2. The method according to claim 1, characterized in that, The step of collecting dynamic behavior data of the target software and obtaining network communication data of the target software during runtime includes: The running status of the target software is monitored in real time by deploying a terminal agent program on the terminal device. When the running state is running, the application interface call sequence and system resource usage pattern during the operation of the target software are recorded to form the dynamic behavior data; The outbound network traffic generated by the target software is acquired, and the target communication address and communication protocol are parsed from the outbound network traffic to form the network communication data.

3. The method according to claim 1, characterized in that, The step of matching and retrieving the corresponding legitimate behavior baseline from the preset baseline library based on the declared identity information includes: The software identifier and version information are parsed from the declared identity information; Based on the software identifier and the version information, a primary key query condition is constructed, and the primary key query condition is used to match the baseline index in the preset baseline library; Based on the found baseline index, the corresponding genuine behavior baseline is retrieved from the preset baseline library.

4. The method according to claim 1, characterized in that, The determination of the time process correlation strength based on the first process chain information, the second process chain information, and the time difference includes: Based on the time difference, a time correlation score is calculated using a preset time decay function; Based on the first process chain information and the second process chain information, the process chain distance is determined, and the process chain correlation score is calculated using a preset inverse correlation function according to the process chain distance. The time-process correlation strength is generated by nonlinearly combining the time correlation score and the process chain correlation score.

5. The method according to claim 1, characterized in that, The method further includes: After the disposal instruction is executed, the system health status indicators of the terminal device are continuously collected; By analyzing the changing trends of the system health status indicators before and after the execution of the disposal instruction, the strength of the causal relationship between the disposal instruction and the event where the values ​​of the system health status indicators return to a preset stable state range is determined. When the causal correlation strength is higher than a preset correlation threshold, the combination of the behavior deviation record and the network anomaly record is determined as a verified attack event; The maximum deviation feature is extracted from the behavioral deviation records, and the maximum deviation feature is associated and combined with the network anomaly records to obtain a dynamic threat template; When the causal correlation strength is not higher than the preset correlation threshold, the behavior deviation record is determined as a benign baseline drift event, and the genuine behavior baseline is corrected based on the benign baseline drift event.

6. An electronic device, characterized in that, The device includes a processor, a memory, a user interface, and a network interface. The memory is used to store instructions. The user interface and the network interface are both used to communicate with other devices. The processor is used to execute the instructions stored in the memory to cause the electronic device to perform the method as described in any one of claims 1-5.

7. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores instructions that, when executed, perform the method as described in any one of claims 1-5.

8. A computer program product, characterized in that, When the computer program product is run on an electronic device, it causes the electronic device to perform the method as described in any one of claims 1-5.