A method, device and electronic equipment for identifying a cryptographic malicious traffic family

By calculating the information entropy of the server and client negotiation phases in network traffic, removing random strings, determining the number of consecutive visible characters, and using hash calculation to obtain the negotiation family fingerprint, the efficiency and accuracy problem of malicious encrypted traffic identification in the Internet is solved, and efficient malicious traffic family identification is achieved.

CN115577335BActive Publication Date: 2026-06-23VIEWINTECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
VIEWINTECH
Filing Date
2021-07-06
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing technologies struggle to effectively and accurately identify families of malicious encrypted traffic that communicate using encryption protocols on the internet.

Method used

By calculating the information entropy of the server and client negotiation phase information in network traffic, removing randomly generated strings, determining the number of consecutively visible characters, and using hash calculation to obtain the negotiation family fingerprint, encrypted malicious traffic families can be identified.

Benefits of technology

It improves the efficiency and accuracy of encrypted traffic analysis and malicious traffic identification, and can quickly identify malicious traffic families in high-speed transmission network environments.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115577335B_ABST
    Figure CN115577335B_ABST
Patent Text Reader

Abstract

The application provides a method and device for identifying an encrypted malicious traffic family and electronic equipment, which utilizes information entropy to remove randomly generated strings in multiple server negotiation stage information and multiple client negotiation stage information of network traffic, obtains to-be-identified information, then determines to-be-identified information with more continuous visible character quantity as the removal of a human-set string, obtains detection information, and finally determines the encrypted malicious traffic family to which the network traffic belongs by using negotiation family fingerprints obtained through hash calculation of the detection information, thereby improving the efficiency and accuracy of encrypted traffic analysis and malicious traffic identification.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of computer technology, and more specifically, to a method, apparatus, and electronic device for identifying encrypted malicious traffic families. Background Technology

[0002] Currently, with the widespread adoption of encrypted services and increased security awareness, encrypted communication is becoming increasingly common on the internet. Various encrypted network protocols, such as Transport Layer Security (TLS), provide enhanced security for network communications. However, a growing amount of malicious encrypted traffic also uses these same protocols for communication. Effectively identifying malicious encrypted traffic is a pressing issue that needs to be addressed. Summary of the Invention

[0003] To address the aforementioned problems, the present invention aims to provide a method, apparatus, and electronic device for identifying encrypted malicious traffic families.

[0004] In a first aspect, embodiments of the present invention provide a method for identifying encrypted malicious traffic families, including:

[0005] Obtain network traffic and parse out multiple server-side negotiation phase information and multiple client-side negotiation phase information carried in the network traffic;

[0006] Calculate the information entropy of multiple server-side negotiation phase information and multiple client-side negotiation phase information respectively;

[0007] Information whose information entropy is less than or equal to the information entropy threshold among the multiple server-side negotiation phase information and the multiple client-side negotiation phase information is identified as information to be identified.

[0008] Determine the number of consecutive visible characters in the information to be identified;

[0009] The information to be identified is defined as the information to be detected if the number of consecutive visible characters is less than a character number threshold.

[0010] The detection information is hashed to obtain the negotiated family fingerprint of the detection information;

[0011] Based on the negotiated family fingerprint of the detection information, the encrypted malicious traffic family to which the network traffic belongs is determined.

[0012] Secondly, embodiments of the present invention also provide an encrypted malicious traffic family identification device, comprising:

[0013] The acquisition module is used to acquire network traffic and parse out multiple server negotiation phase information and multiple client negotiation phase information carried in the network traffic;

[0014] The calculation module is used to calculate the information entropy of multiple server-side negotiation phase information and multiple client-side negotiation phase information respectively;

[0015] The first determining module is used to determine information whose information entropy is less than or equal to the information entropy threshold among the multiple server-side negotiation phase information and the multiple client-side negotiation phase information as information to be identified;

[0016] The second determining module is used to determine the number of consecutive visible characters in the information to be identified;

[0017] The third determining module is used to determine the information to be identified when the number of consecutive visible characters is less than the character number threshold as detection information;

[0018] The hash calculation module is used to perform hash calculations on the detection information to obtain the negotiated family fingerprint of the detection information;

[0019] The fourth determination module is used to determine the encrypted malicious traffic family to which the network traffic belongs based on the negotiated family fingerprint of the detection information.

[0020] Thirdly, embodiments of the present invention also provide a computer-readable storage medium storing a computer program, which, when executed by a processor, performs the steps of the method described in the first aspect.

[0021] Fourthly, embodiments of the present invention also provide an electronic device, the electronic device including a memory, a processor and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the processor to perform the steps of the method according to any one of claims 1-4.

[0022] In the solutions provided by the first to fourth aspects of the present invention, information with information entropy less than or equal to an information entropy threshold among multiple server-side negotiation phase information and multiple client-side negotiation phase information in network traffic is determined as information to be identified. Information to be identified with a number of consecutive visible characters less than a character count threshold is determined as detection information. Then, a hash calculation is performed on the detection information to obtain a negotiation family fingerprint. Based on the negotiation family fingerprint of the detection information, the encrypted malicious traffic family to which the detection information belongs is determined. Compared with related technologies that cannot identify families of encrypted malicious traffic in network traffic, this method utilizes information entropy to remove randomly generated strings from multiple server-side negotiation phase information and multiple client-side negotiation phase information in network traffic to obtain information to be identified. Then, by judging the number of consecutive visible characters in the information to be identified, information to be identified with a large number of consecutive visible characters is determined as manually set strings to be removed, obtaining detection information. Finally, the negotiation family fingerprint obtained after hash calculation of the detection information is used to determine the encrypted malicious traffic family to which the network traffic belongs, thereby improving the efficiency and accuracy of encrypted traffic analysis and malicious traffic identification.

[0023] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, preferred embodiments are described below in detail with reference to the accompanying drawings. Attached Figure Description

[0024] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0025] Figure 1 The flowchart of an encrypted malicious traffic family identification method provided in Embodiment 1 of the present invention is shown;

[0026] Figure 2 This diagram illustrates the structure of an encrypted malicious traffic family identification device provided in Embodiment 2 of the present invention.

[0027] Figure 3 A schematic diagram of the structure of an electronic device provided in Embodiment 3 of the present invention is shown. Detailed Implementation

[0028] In the description of this invention, it should be understood that the terms "center," "longitudinal," "lateral," "length," "width," "thickness," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," "clockwise," and "counterclockwise," etc., indicate the orientation or positional relationship based on the orientation or positional relationship shown in the accompanying drawings. They are only for the convenience of describing this invention and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation. Therefore, they should not be construed as limitations on this invention.

[0029] Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of this invention, "a plurality of" means two or more, unless otherwise explicitly specified.

[0030] In this invention, unless otherwise explicitly specified and limited, the terms "installation," "connection," "linking," and "fixing," etc., should be interpreted broadly. For example, they can refer to a fixed connection, a detachable connection, or an integral connection; they can refer to a mechanical connection or an electrical connection; they can refer to a direct connection or an indirect connection through an intermediate medium; and they can refer to the internal connection of two components. Those skilled in the art can understand the specific meaning of the above terms in this invention according to the specific circumstances.

[0031] Currently, with the widespread adoption of encrypted services and increased security awareness, encrypted communication is becoming increasingly common on the internet. Various encrypted network protocols, such as Transport Layer Security (TLS), provide enhanced security for network communications. However, a growing amount of malicious encrypted traffic also uses these same protocols for communication. Effectively identifying malicious encrypted traffic is a pressing issue that needs to be addressed.

[0032] Based on this, various embodiments of this application propose a method, apparatus, and electronic device for identifying encrypted malicious traffic families. This method utilizes information entropy to remove randomly generated strings from multiple server-side negotiation phase information and multiple client-side negotiation phase information of network traffic, obtaining information to be identified. Then, by judging the number of consecutive visible characters in the information to be identified, information with a large number of consecutive visible characters is identified as artificially set strings and removed, obtaining detection information. Finally, the negotiation family fingerprint obtained after hashing the detection information is used to determine the encrypted malicious traffic family to which the network traffic belongs, thereby improving the efficiency and accuracy of encrypted traffic analysis and malicious traffic identification.

[0033] To make the above-mentioned objectives, features and advantages of this application more apparent and understandable, the application will be further described in detail below with reference to the accompanying drawings and specific embodiments.

[0034] Example 1

[0035] The execution subject of the encrypted malicious traffic family identification method proposed in this embodiment is the server.

[0036] See Figure 1 The flowchart shown illustrates a method for identifying encrypted malicious traffic families. This embodiment proposes a method for identifying encrypted malicious traffic families, including the following specific steps:

[0037] Step 100: Obtain network traffic and parse out multiple server negotiation phase information and multiple client negotiation phase information carried in the network traffic.

[0038] In step 100 above, the specific process of parsing out the multiple server negotiation phase information and multiple client negotiation phase information carried by the network traffic is existing technology and will not be described in detail here.

[0039] The multiple server-side negotiation phase information includes, but is not limited to: the server-selected TLS version identifier (Version), the server-selected random number (Random), the server-selected session identifier (Session ID), the server-selected cipher suite identifier (Cipher Suite), the server-selected compression method identifier (Compression Methods), and server-side extensions.

[0040] The server-side extension items include, but are not limited to: server_name, status_request, supported_groups, ec_point_formats, heartbeat, application_layer_protocol_negotiation, signed_certificate_timestamp, extended_master_secret, session_ticket, pre_shared_key, supported_versions, cookie, key_share, alipay2019, next_protocol_negotiation, channel_id, and renegotiation_info.

[0041] The various client negotiation phase information includes, but is not limited to: TLS version, client random number, client session ID, client supported cipher suites list, client supported compression methods identifiers, and client extensions.

[0042] The client extensions include but are not limited to: server_name, status_request, status_request_v2, ec_point_formats, supported_groups, signature_algorithms, heartbeat, application_lay er_protocol_negotiation, signed_certificate_timestamp, padding, encrypt_then_mac, extended_master_secret, token_binding, compress_certificate, recor d_size_limit, session_ticket, Reservedkey_share, pre_shared_key, early_data, supported_versions, psk_key_exchange_modes, post_handshake_auth, signatu re_algorithms_cert, key_share, alipay2019, next_protocol_negotiation, draft-ietf-tokbind-negotiation, channel_id_old, channel_id and renegotiation_info.

[0043] Among them, the information in the multiple server-side negotiation phase information and the multiple client-side negotiation phase information are different strings.

[0044] After parsing out the multiple server negotiation phase information and multiple client negotiation phase information carried by the network traffic, the following step 102 is performed to calculate the information entropy of the multiple server negotiation phase information and multiple client negotiation phase information respectively.

[0045] Step 102: Calculate the information entropy of multiple server-side negotiation phase information and multiple client-side negotiation phase information respectively.

[0046] In step 102 above, in order to calculate the information entropy of multiple server negotiation phase information, the following steps (1) to (4) can be performed:

[0047] (1) Count the number of bytes in each string contained in the server negotiation phase information in the multiple server negotiation phase information;

[0048] (2) Count the number of occurrences of each byte value in the strings contained in the negotiation phase information of each server;

[0049] (3) Calculate the probability of occurrence of each byte value in the string contained in the negotiation phase information of each server using the following formula 1:

[0050]

[0051] Where Q(y) i The ) represents the probability of each byte value in the string contained in the server negotiation phase information appearing in the string contained in the server negotiation phase information; x i This indicates the number of occurrences of each byte value in the strings contained in the negotiation phase information of each server; m indicates the number of bytes in the strings contained in the negotiation phase information of each server.

[0052] (4) Based on the probability of occurrence of each byte value in the string contained in the obtained server negotiation phase information, calculate the information entropy of each server negotiation phase information in multiple server negotiation phase information.

[0053] In step (1) above, those skilled in the art will know that one byte consists of 8 bits. Therefore, based on the fact that one byte consists of 8 bits, the server can count the number of bytes in the strings contained in the information of each of the above server negotiation stages.

[0054] In step (2) above, those skilled in the art will know that the value range of a byte is between 0 and 255. The server can first calculate the value of each byte in the payload string of the negotiation phase information of each server, and then count the value of each byte to obtain the number of occurrences of each byte value in the negotiation phase information of each server.

[0055] For example, in a server-side negotiation phase, there are 28 bytes, of which the values ​​of the 3rd, 12th, and 26th bytes are 22.

[0056] Therefore, for this server-side negotiation phase information, the number of occurrences of the value 22 is 3.

[0057] After counting the occurrence count of each byte value in the negotiation phase information of each server through the above steps (2), you can continue to execute step (3) to calculate the occurrence probability of each byte value in the string contained in the negotiation phase information of each server.

[0058] In step (3) above, each byte takes the value of any number between 0 and 255. i This represents the i-th possible value of a single byte in the string, i being any value between 0 and 255.

[0059] When i = 22, that is, the value 22 is taken from the string in the server-side negotiation phase information. 22 When considering the probability of occurrence of ), we can continue with the example of server-side negotiation phase information above. It can be determined that when a string of server-side negotiation phase information contains 28 bytes, m = 28; x 22 This represents the number of times each byte in a server-side negotiation phase message contains the value 22, i.e.: x 22 =3; therefore, Q(y) can be calculated using the formula. 22 ):

[0060]

[0061] The process of calculating the information entropy of information from multiple client negotiation phases is similar to the process of calculating the information entropy of information from multiple server negotiation phases, and will not be repeated here.

[0062] In step (4) above, the information entropy of each server negotiation phase information in the multiple server negotiation phase information is calculated using the following formula:

[0063]

[0064] Where K(x) represents the information entropy of each server negotiation phase information in multiple server negotiation phase information; z is 256.

[0065] Step 104: Identify the information whose information entropy is less than or equal to the information entropy threshold among the multiple server-side negotiation phase information and the multiple client-side negotiation phase information as information to be identified.

[0066] In step 104 above, the information entropy threshold can be set to 8. Of course, the information entropy threshold can be set to other values ​​according to the actual situation, which will not be elaborated here.

[0067] As can be seen from the descriptions in steps 102 to 104 above, by using information entropy to remove randomly generated strings from multiple server-side negotiation phase information and multiple client-side negotiation phase information of network traffic, the information to be identified can be obtained. Thus, a fast and practical method of calculating information entropy can be used to determine and remove randomly generated strings from multiple server-side negotiation phase information and multiple client-side negotiation phase information. In the process of identifying encrypted malicious traffic families, this method can save the resource overhead of real-time detection calculation and greatly improve the efficiency of random string judgment, making this technology applicable in real-world network environments with high-speed transmission.

[0068] Step 106: Determine the number of consecutive visible characters in the information to be identified.

[0069] In step 106 above, "continuously visible characters" refers to consecutive English characters and / or numeric characters in the information to be identified.

[0070] The information to be identified is mostly a partially encrypted string, and the encrypted characters can be replaced by consecutive "*" or consecutive "?".

[0071] In one implementation, when the string of the information to be identified is “?????????abc******”, it indicates that the information to be identified is a partially encrypted string, in which the consecutive visible characters are “abc”; then the server can determine that the number of consecutive visible characters in the string of the information to be identified, “?????????abc******”, is 3.

[0072] Step 108: The information to be identified where the number of consecutive visible characters is less than the character count threshold is determined as the detection information.

[0073] In step 108 above, if the number of consecutive visible characters is greater than or equal to the character count threshold, it indicates that the information to be identified may be unencrypted. Therefore, the server determines the information to be identified with the number of consecutive visible characters greater than or equal to the character count threshold as a manually set string and excludes such strings from the detection information.

[0074] In one implementation, the client extensions server_name and application_layer_protocol_negotiation, as well as the server extensions server_name and application_layer_protocol_negotiation, are determined to be manually set strings and excluded from the detection information because the number of consecutive visible characters is greater than or equal to the character count threshold.

[0075] Step 110: Perform hash calculation on the detection information to obtain the negotiated family fingerprint of the detection information.

[0076] In step 110 above, the negotiated family fingerprint of the detection information is the hash value obtained after hashing the detection information.

[0077] Step 112: Based on the negotiated family fingerprint of the detection information, determine the encrypted malicious traffic family to which the network traffic belongs.

[0078] To determine the family of encrypted malicious traffic to which the network traffic belongs, the following steps (1) to (2) can be performed:

[0079] (1) Using the negotiated family fingerprint of the detection information, a traversal operation is performed in the malicious traffic fingerprint database; wherein, the malicious traffic fingerprint database stores the correspondence between malicious traffic fingerprints and malicious traffic family names;

[0080] (2) When a malicious traffic fingerprint that is the same as the negotiation family fingerprint is found in the malicious traffic fingerprint database, it is determined that the network traffic belongs to the malicious traffic family indicated by the malicious traffic family name corresponding to the malicious traffic fingerprint that is the same as the negotiation family fingerprint of the detection information.

[0081] In step (1) above, the malicious traffic fingerprint database is a database that stores the correspondence between the generated malicious traffic fingerprints and the malicious traffic family names after extracting a large number of malicious traffic fingerprints of SSL / TLS protocol traffic generated by malware with known family names.

[0082] The specific process of extracting malicious traffic fingerprints from SSL / TLS protocol traffic generated by malware with known family names is similar to the process described in steps 100 to 110 above, and will not be repeated here.

[0083] In summary, this embodiment proposes a method for identifying encrypted malicious traffic families. Information with information entropy less than or equal to an information entropy threshold among multiple server-side negotiation phase information and multiple client-side negotiation phase information in network traffic is identified as information to be identified. Information with a number of consecutive visible characters less than a character count threshold is identified as detection information. Then, a hash calculation is performed on the detection information to obtain a negotiation family fingerprint. Based on this fingerprint, the encrypted malicious traffic family to which the detection information belongs is determined. Compared to related technologies that cannot identify families of encrypted malicious traffic in network traffic, this method utilizes information entropy to remove randomly generated strings from multiple server-side and client-side negotiation phase information in network traffic to obtain information to be identified. Then, by judging the number of consecutive visible characters in the information to be identified, information with a large number of consecutive visible characters is identified as manually set strings to be removed, obtaining detection information. Finally, the negotiation family fingerprint obtained after hashing the detection information is used to determine the encrypted malicious traffic family to which the network traffic belongs, thereby improving the efficiency and accuracy of encrypted traffic analysis and malicious traffic identification.

[0084] Example 2

[0085] This embodiment proposes an encrypted malicious traffic family identification device for executing the encrypted malicious traffic family identification method proposed in Embodiment 1 above.

[0086] See Figure 2 The diagram shown illustrates the structure of an encrypted malicious traffic family identification device. This embodiment proposes an encrypted malicious traffic family identification device, comprising:

[0087] The acquisition module 200 is used to acquire network traffic and parse out multiple server negotiation phase information and multiple client negotiation phase information carried in the network traffic;

[0088] Calculation module 202 is used to calculate the information entropy of multiple server-side negotiation phase information and multiple client-side negotiation phase information respectively;

[0089] The first determining module 204 is used to determine information whose information entropy is less than or equal to the information entropy threshold among the multiple server-side negotiation phase information and the multiple client-side negotiation phase information as information to be identified;

[0090] The second determining module 206 is used to determine the number of consecutive visible characters in the information to be identified;

[0091] The third determining module 208 is used to determine the information to be identified when the number of consecutive visible characters is less than the character number threshold as detection information;

[0092] Hash calculation module 210 is used to perform hash calculation on the detection information to obtain the negotiated family fingerprint of the detection information;

[0093] The fourth determining module 212 is used to determine the encrypted malicious traffic family to which the network traffic belongs based on the negotiated family fingerprint of the detection information.

[0094] The calculation module 202 is used to calculate the information entropy of multiple server negotiation phase information, including:

[0095] Count the number of bytes in the strings contained in each of the multiple server negotiation phase information;

[0096] Count the number of occurrences of each byte value in the strings contained in the negotiation phase information of each server;

[0097] The probability of each byte value appearing in the strings contained in the negotiation phase information of each server is calculated using the following formula:

[0098]

[0099] Where Q(y) i The ) represents the probability of each byte value in the string contained in the server negotiation phase information appearing in the string contained in the server negotiation phase information; x i This indicates the number of occurrences of each byte value in the strings contained in the negotiation phase information of each server; m indicates the number of bytes in the strings contained in the negotiation phase information of each server.

[0100] Based on the probability of occurrence of each byte value in the strings contained in the obtained server negotiation phase information, the information entropy of each server negotiation phase information in the multiple server negotiation phase information is calculated.

[0101] The calculation module 202 is used to calculate the information entropy of each server negotiation stage information in multiple server negotiation stage information based on the probability of occurrence of each byte value in the string contained in each server negotiation stage information, including:

[0102] The information entropy of each server negotiation phase information in multiple server negotiation phase information is calculated using the following formula:

[0103]

[0104] Where K(x) represents the information entropy of each server negotiation phase information in multiple server negotiation phase information; z is 256.

[0105] The fourth determining module 212 is specifically used for:

[0106] Using the negotiated family fingerprint of the detection information, a traversal operation is performed in the malicious traffic fingerprint database; wherein, the malicious traffic fingerprint database stores the correspondence between malicious traffic fingerprints and malicious traffic family names;

[0107] When a malicious traffic fingerprint identical to the negotiation family fingerprint is found in the malicious traffic fingerprint database, it is determined that the network traffic belongs to the malicious traffic family indicated by the malicious traffic family name corresponding to the malicious traffic fingerprint identical to the negotiation family fingerprint of the detection information.

[0108] In summary, this embodiment proposes an encrypted malicious traffic family identification device. It identifies information in network traffic whose information entropy is less than or equal to an information entropy threshold among multiple server-side negotiation phase information and multiple client-side negotiation phase information as information to be identified. Information to be identified with a number of consecutive visible characters less than a character count threshold is identified as detection information. Then, a hash calculation is performed on the detection information to obtain a negotiation family fingerprint. Based on this fingerprint, the encrypted malicious traffic family to which the detection information belongs is determined. Compared to related technologies that cannot identify encrypted malicious traffic families in network traffic, this device utilizes information entropy to remove randomly generated strings from multiple server-side and client-side negotiation phase information to obtain information to be identified. Then, by judging the number of consecutive visible characters in the information to be identified, information to be identified with a large number of consecutive visible characters is identified as manually set strings to be removed, obtaining detection information. Finally, the negotiation family fingerprint obtained after hashing the detection information is used to determine the encrypted malicious traffic family to which the network traffic belongs, thereby improving the efficiency and accuracy of encrypted traffic analysis and malicious traffic identification.

[0109] Example 3

[0110] This embodiment proposes a computer-readable storage medium storing a computer program. When the computer program is run by a processor, it executes the steps of the encrypted malicious traffic family identification method described in Embodiment 1 above. For a detailed implementation, please refer to Method Embodiment 1, which will not be repeated here.

[0111] In addition, see Figure 3 The diagram shows the structure of an electronic device. This embodiment also proposes an electronic device, which includes a bus 51, a processor 52, a transceiver 53, a bus interface 54, a memory 55, and a user interface 56. The electronic device includes a memory 55.

[0112] In this embodiment, the electronic device further includes: one or more programs stored in the memory 55 and executable on the processor 52, configured to be executed by the processor to perform the one or more programs for the following steps (1) to (7):

[0113] (1) Obtain network traffic and parse out multiple server negotiation phase information and multiple client negotiation phase information carried by the network traffic;

[0114] (2) Calculate the information entropy of multiple server-side negotiation phase information and multiple client-side negotiation phase information respectively;

[0115] (3) Information whose information entropy is less than or equal to the information entropy threshold among the multiple server-side negotiation phase information and the multiple client-side negotiation phase information is identified as information to be identified;

[0116] (4) Determine the number of consecutive visible characters in the information to be identified;

[0117] (5) The information to be identified is determined as detection information if the number of consecutive visible characters is less than the character number threshold;

[0118] (6) Perform hash calculation on the detection information to obtain the negotiated family fingerprint of the detection information;

[0119] (7) Based on the negotiated family fingerprint of the detection information, determine the encrypted malicious traffic family to which the network traffic belongs.

[0120] Transceiver 53 is used to receive and send data under the control of processor 52.

[0121] The bus architecture (represented by bus 51) can include any number of interconnected buses and bridges, linking various circuits including one or more processors represented by processor 52 and memory represented by memory 55. Bus 51 can also link various other circuits such as peripheral devices, voltage regulators, and power management circuits, which are well known in the art and therefore will not be further described in this embodiment. Bus interface 54 provides an interface between bus 51 and transceiver 53. Transceiver 53 can be a single element or multiple elements, such as multiple receivers and transmitters, providing a unit for communicating with various other devices over a transmission medium. For example, transceiver 53 receives external data from other devices. Transceiver 53 is used to transmit data processed by processor 52 to other devices. Depending on the nature of the computing system, a user interface 56 may also be provided, such as a keypad, display, speaker, microphone, or joystick.

[0122] Processor 52 is responsible for managing bus 51 and general processing, such as running a general-purpose operating system as described above. Memory 55 can be used to store data used by processor 52 during operation.

[0123] Optionally, the processor 52 may be, but is not limited to, a central processing unit, a microcontroller, a microprocessor, or a programmable logic device.

[0124] It is understood that the memory 55 in the embodiments of the present invention can be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. The non-volatile memory can be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. The volatile memory can be random access memory (RAM), which is used as an external cache. By way of example, but not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced Synchronous DRAM (ESDRAM), Synchlink DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The memory 55 of the systems and methods described in this embodiment is intended to include, but is not limited to, these and any other suitable types of memory.

[0125] In some implementations, memory 55 stores elements such as executable modules or data structures, or subsets thereof, or extended sets thereof: operating system 551 and application programs 552.

[0126] The operating system 551 includes various system programs, such as the framework layer, core library layer, and driver layer, used to implement various basic business functions and handle hardware-based tasks. The application program 552 includes various applications, such as a media player and a browser, used to implement various application functions. The program implementing the method of this embodiment can be included in the application program 552.

[0127] In summary, this embodiment proposes a computer-readable storage medium and electronic device. Information in network traffic whose information entropy is less than or equal to an information entropy threshold among multiple server-side negotiation phase information and multiple client-side negotiation phase information is identified as information to be identified. Information to be identified with a number of consecutive visible characters less than a character count threshold is identified as detection information. Then, a hash calculation is performed on the detection information to obtain a negotiation family fingerprint. Based on this fingerprint, the encrypted malicious traffic family to which the detection information belongs is determined. Compared to related technologies that cannot identify families of encrypted malicious traffic in network traffic, this method utilizes information entropy to remove randomly generated strings from multiple server-side and client-side negotiation phase information in network traffic to obtain information to be identified. Then, by judging the number of consecutive visible characters in the information to be identified, information to be identified with a large number of consecutive visible characters is identified as manually set strings to be removed, obtaining detection information. Finally, the negotiation family fingerprint obtained after hashing the detection information is used to determine the encrypted malicious traffic family to which the network traffic belongs, thereby improving the efficiency and accuracy of encrypted traffic analysis and malicious traffic identification.

[0128] The above description is merely a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in the present invention should be included within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.

Claims

1. A method for identifying encrypted malicious traffic families, characterized in that, include: Obtain network traffic and parse out multiple server-side negotiation phase information and multiple client-side negotiation phase information carried in the network traffic; Calculate the information entropy of multiple server-side negotiation phase information and multiple client-side negotiation phase information respectively; wherein, calculating the information entropy of multiple server-side negotiation phase information includes: Count the number of bytes in the string contained in each of the multiple server negotiation phase information; Count the number of occurrences of each byte value in the strings contained in the negotiation phase information of each server; The probability of each byte value appearing in the strings contained in the negotiation phase information of each server is calculated using the following formula: in, This indicates the probability of each byte value in the string contained in the server negotiation phase information appearing in the string contained in the server negotiation phase information; This indicates the number of occurrences of each byte value in the string contained in the negotiation phase information of each server; This indicates the number of bytes in the strings contained in the negotiation phase information of each server; Based on the probability of occurrence of each byte value in the strings contained in the obtained server negotiation phase information, the information entropy of each server negotiation phase information in multiple server negotiation phase information is calculated. Information whose information entropy is less than or equal to the information entropy threshold among the multiple server-side negotiation phase information and the multiple client-side negotiation phase information is identified as information to be identified, so as to remove randomly generated strings from the multiple server-side negotiation phase information and the multiple client-side negotiation phase information; Determine the number of consecutive visible characters in the information to be identified; The information to be identified with a number of consecutive visible characters less than a character number threshold is identified as detection information, while the information to be identified with a number of consecutive visible characters greater than or equal to a character number threshold is identified as a manually set string, and such strings are excluded from the detection information. The detection information is hashed to obtain the negotiated family fingerprint of the detection information; Based on the negotiated family fingerprint of the detection information, the encrypted malicious traffic family to which the network traffic belongs is determined; The negotiated family fingerprint of the detection information determines the encrypted malicious traffic family to which the network traffic belongs, including: The negotiated family fingerprint of the detection information is used to perform a traversal operation in the malicious traffic fingerprint database; wherein, the malicious traffic fingerprint database stores the correspondence between malicious traffic fingerprints and malicious traffic family names; When a malicious traffic fingerprint identical to the negotiation family fingerprint is found in the malicious traffic fingerprint database, it is determined that the network traffic belongs to the malicious traffic family indicated by the malicious traffic family name corresponding to the malicious traffic fingerprint identical to the negotiation family fingerprint of the detection information.

2. The method according to claim 1, characterized in that, Based on the probability of occurrence of each byte value in the strings contained in the obtained server negotiation phase information, the information entropy of each server negotiation phase information in the multiple server negotiation phase information is calculated, including: The information entropy of each server negotiation phase information in multiple server negotiation phase information is calculated using the following formula: in, The information entropy of each server negotiation phase information in multiple server negotiation phase information is represented by z, which is 256.

3. A device for identifying encrypted malicious traffic families, characterized in that, include: The acquisition module is used to acquire network traffic and parse out multiple server negotiation phase information and multiple client negotiation phase information carried in the network traffic; The calculation module is used to calculate the information entropy of multiple server-side negotiation phase information and multiple client-side negotiation phase information respectively; wherein, calculating the information entropy of multiple server-side negotiation phase information includes: Count the number of bytes in the string contained in each of the multiple server negotiation phase information; Count the number of occurrences of each byte value in the strings contained in the negotiation phase information of each server; The probability of each byte value appearing in the strings contained in the negotiation phase information of each server is calculated using the following formula: in, This indicates the probability of each byte value in the string contained in the server negotiation phase information appearing in the string contained in the server negotiation phase information; This indicates the number of occurrences of each byte value in the string contained in the negotiation phase information of each server; This indicates the number of bytes in the strings contained in the negotiation phase information of each server; Based on the probability of occurrence of each byte value in the strings contained in the obtained server negotiation phase information, the information entropy of each server negotiation phase information in multiple server negotiation phase information is calculated. The first determining module is used to determine information whose information entropy is less than or equal to the information entropy threshold among the multiple server-side negotiation phase information and the multiple client-side negotiation phase information as information to be identified, so as to remove randomly generated strings from the multiple server-side negotiation phase information and the multiple client-side negotiation phase information; The second determining module is used to determine the number of consecutive visible characters in the information to be identified; The third determining module is used to determine the information to be identified with a number of consecutive visible characters less than the character number threshold as detection information, and to determine the information to be identified with a number of consecutive visible characters greater than or equal to the character number threshold as a manually set string, and to exclude such strings from the detection information. The hash calculation module is used to perform hash calculations on the detection information to obtain the negotiated family fingerprint of the detection information; The fourth determination module is used to determine the encrypted malicious traffic family to which the network traffic belongs based on the negotiated family fingerprint of the detection information. The fourth determining module is specifically used to: perform a traversal operation in the malicious traffic fingerprint database using the negotiated family fingerprint of the detection information; wherein, the malicious traffic fingerprint database stores the correspondence between malicious traffic fingerprints and malicious traffic family names; when a malicious traffic fingerprint identical to the negotiated family fingerprint is found in the malicious traffic fingerprint database, it is determined that the network traffic belongs to the malicious traffic family indicated by the malicious traffic family name corresponding to the malicious traffic fingerprint identical to the negotiated family fingerprint of the detection information.

4. The apparatus according to claim 3, characterized in that, The calculation module is used to calculate the information entropy of each server negotiation phase information in multiple server negotiation phase information based on the probability of occurrence of each byte value in the string contained in each server negotiation phase information, including: The information entropy of each server negotiation phase information in multiple server negotiation phase information is calculated using the following formula: in, The information entropy of each server negotiation phase information in multiple server negotiation phase information is represented by z, which is 256.

5. A computer-readable storage medium storing a computer program thereon, characterized in that, The computer program, when executed by a processor, performs the steps of the method described in any one of claims 1-2.

6. An electronic device, characterized in that, The electronic device includes a memory, a processor, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the processor of the steps of the method according to any one of claims 1-2.