Encrypted traffic identification method and apparatus, storage medium, and electronic device
By differentiating the terminal types of encrypted traffic and using different application classification models, the problem of low accuracy in encrypted traffic identification is solved, achieving refined analysis and efficient identification.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHINA TELECOM CORP LTD
- Filing Date
- 2022-04-25
- Publication Date
- 2026-06-23
AI Technical Summary
In existing technologies, the accuracy of identifying encrypted traffic is low. Traditional methods have a low recognition rate when faced with HTTPS session reuse and network quality fluctuations, making it difficult to distinguish encrypted traffic from different applications.
Encrypted traffic is forwarded to the first packet parsing device via a splitter to determine the terminal type, and then forwarded to the corresponding second packet parsing device. The encrypted traffic is classified into applications using a pre-trained application classification model, and a terminal type library and application classification model are constructed using machine learning algorithms.
It enables refined analysis of encrypted traffic, improves the accuracy of encrypted traffic identification, and meets the needs of log retention, business awareness, and information security management.
Smart Images

Figure CN117014156B_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the field of communication technology, and more specifically, to an encrypted traffic identification method, an encrypted traffic identification device, a computer-readable storage medium, and an electronic device. Background Technology
[0002] With the development of internet technology, more and more applications and scenarios will use encrypted traffic. Encrypted traffic can be based on general encryption protocols such as HTTPS (Hypertext Transfer Protocol over Secure Socket Layer) and QUIC (Quick UDP Internet Connections), or it can be based on proprietary encryption protocols, with HTTPS being the primary encryption protocol.
[0003] Taking HTTPS traffic as an example, there are currently two main methods for identifying HTTPS traffic. One is identification based on the certificate chain, but HTTPS session reuse is very serious, and the certificate chain will become invalid when the session is reused. The other is identification based on the statistical characteristics of the flow, but the statistical characteristics of the flow are significantly affected by network quality, and the statistical characteristics of different applications may not be clearly distinguishable, resulting in a high false identification rate.
[0004] It should be noted that the information disclosed in the background section above is only used to enhance the understanding of the background of this disclosure, and therefore may include information that does not constitute prior art known to those skilled in the art. Summary of the Invention
[0005] This disclosure provides an encrypted traffic identification method, an encrypted traffic identification device, a computer-readable storage medium, and an electronic device, thereby overcoming, to at least a certain extent, the problem of low accuracy in identifying encrypted traffic due to limitations in related technologies.
[0006] According to a first aspect of this disclosure, an encrypted traffic identification method is provided, applied to an encrypted traffic identification system, the encrypted traffic identification system including a traffic splitter, a first packet parsing device, and a second packet parsing device, the method comprising:
[0007] The target encrypted traffic is forwarded to the first packet parsing device via the traffic splitter;
[0008] The first message parsing device determines the terminal type of the target encrypted traffic and forwards the target encrypted traffic to the second message parsing device corresponding to the terminal type of the target encrypted traffic;
[0009] The second message parsing device uses a pre-trained application classification model to classify the target encrypted traffic by application, thereby obtaining the application type of the target encrypted traffic.
[0010] In one exemplary embodiment of this disclosure, when forwarding target encrypted traffic to the first packet parsing device via the traffic splitter, the method further includes:
[0011] The splitter forwards the interface information associated with the target encrypted traffic to the first packet parsing device.
[0012] In an exemplary embodiment of this disclosure, determining the terminal type of the target encrypted traffic through the first packet parsing device and forwarding the target encrypted traffic to a second packet parsing device corresponding to the terminal type of the target encrypted traffic includes:
[0013] The first message parsing device extracts the five-tuple information of the target encrypted traffic and the interface information associated with the target encrypted traffic.
[0014] The terminal type of the target encrypted traffic is determined based on the five-tuple information of the target encrypted traffic and the interface information associated with the target encrypted traffic;
[0015] The system matches the terminal type of the target encrypted traffic with the terminal type from the terminal type library and forwards the target encrypted traffic to the second packet parsing device.
[0016] In one exemplary embodiment of this disclosure, the pre-trained application classification model includes multiple application classification models corresponding to the terminal type of the target encrypted traffic; the step of classifying the target encrypted traffic by the second packet parsing device using the pre-trained application classification model to obtain the application type of the target encrypted traffic includes:
[0017] The second message parsing device uses multiple application classification models corresponding to the terminal type of the target encrypted traffic to classify the target encrypted traffic into multiple application prediction values.
[0018] The application type of the target encrypted traffic is determined based on the multiple application prediction values.
[0019] In one exemplary embodiment of this disclosure, after the second message parsing device performs application classification on the target encrypted traffic using a pre-trained application classification model to obtain the application type of the target encrypted traffic, the method further includes:
[0020] The second message parsing device filters the target encrypted traffic and generates a call detail record (CDR) file in a preset format, which contains the application type of the target encrypted traffic.
[0021] In one exemplary embodiment of this disclosure, a training set is obtained, the training set including encrypted traffic of multiple terminals and application types corresponding to the encrypted traffic;
[0022] The splitter forwards the encrypted traffic of each of the multiple terminals and the application type corresponding to the encrypted traffic to the second message parsing device corresponding to each of the multiple terminals.
[0023] Based on the encrypted traffic of each of the multiple terminals and the application type corresponding to the encrypted traffic, the application classification model corresponding to each of the multiple terminals is obtained by training a machine learning classification algorithm through the second message parsing device corresponding to each of the multiple terminals.
[0024] The terminal type library is constructed from the application classification model corresponding to each of the multiple terminals.
[0025] In one exemplary embodiment of this disclosure, the machine learning classification algorithm includes any of the following: Bayesian, decision tree, and support vector machine.
[0026] According to a second aspect of this disclosure, an encrypted traffic identification device is provided, comprising:
[0027] An encrypted traffic forwarding module is used to forward target encrypted traffic to the first packet parsing device through the traffic splitter;
[0028] The terminal type determination module is used to determine the terminal type of the target encrypted traffic through the first packet parsing device, and forward the target encrypted traffic to the second packet parsing device corresponding to the terminal type of the target encrypted traffic;
[0029] The application type determination module is used to classify the target encrypted traffic by the second message parsing device using a pre-trained application classification model to obtain the application type of the target encrypted traffic.
[0030] According to a third aspect of this disclosure, an encrypted traffic identification system is provided, the encrypted traffic identification system comprising a traffic splitter, a first packet parsing device, and a second packet parsing device, wherein,
[0031] The traffic splitter is used to forward the target encrypted traffic to the first packet parsing device;
[0032] The first message parsing device is used to determine the terminal type of the target encrypted traffic and forward the target encrypted traffic to the second message parsing device corresponding to the terminal type of the target encrypted traffic;
[0033] The second message parsing device is used to classify the target encrypted traffic using a pre-trained application classification model to obtain the application type of the target encrypted traffic; filter the target encrypted traffic to generate a call detail record (CDR) file in a preset format, wherein the CDR file contains the application type of the target encrypted traffic; and train an application classification model corresponding to each of the plurality of terminals using a machine learning classification algorithm.
[0034] According to a fourth aspect of this disclosure, a computer-readable storage medium is provided having a computer program stored thereon, which, when executed by a processor, implements the method described in any of the preceding claims.
[0035] According to a fifth aspect of this disclosure, an electronic device is provided, comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the method described in any of the preceding methods by executing the executable instructions.
[0036] The exemplary embodiments disclosed herein may have some or all of the following beneficial effects:
[0037] In the encrypted traffic identification method provided in the exemplary embodiments of this disclosure, target encrypted traffic is forwarded to a first packet parsing device via a traffic splitter; the first packet parsing device determines the terminal type of the target encrypted traffic and forwards the target encrypted traffic to a second packet parsing device corresponding to the terminal type of the target encrypted traffic; the second packet parsing device uses a pre-trained application classification model to classify the target encrypted traffic by application, thereby obtaining the application type of the target encrypted traffic. This disclosure achieves refined analysis of encrypted traffic by distinguishing the terminal type of encrypted traffic and using different application classification models to identify encrypted traffic of different terminal types, thus improving the accuracy of encrypted traffic identification.
[0038] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and are not intended to limit this disclosure. Attached Figure Description
[0039] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this disclosure and, together with the description, serve to explain the principles of this disclosure. It is obvious that the drawings described below are merely some embodiments of this disclosure, and those skilled in the art can obtain other drawings based on these drawings without any inventive effort.
[0040] Figure 1 A schematic diagram of an exemplary system architecture for an encrypted traffic identification method and apparatus to which embodiments of the present disclosure can be applied is shown;
[0041] Figure 2 A flowchart illustrating an encrypted traffic identification method according to an embodiment of the present disclosure is shown schematically;
[0042] Figure 3 A flowchart illustrating the determination of the terminal type of encrypted traffic according to one embodiment of the present disclosure is shown schematically;
[0043] Figure 4 A flowchart illustrating an encrypted traffic identification method according to another embodiment of the present disclosure is shown schematically;
[0044] Figure 5 A block diagram of an encrypted traffic identification device according to an embodiment of the present disclosure is shown schematically;
[0045] Figure 6 A schematic diagram of the structure of a computer system suitable for implementing the embodiments of the present disclosure is shown. Detailed Implementation
[0046] Example embodiments will now be described more fully with reference to the accompanying drawings. However, example embodiments can be implemented in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided to make this disclosure more comprehensive and complete, and to fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics can be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a full understanding of embodiments of this disclosure. However, those skilled in the art will recognize that the technical solutions of this disclosure can be practiced with one or more of the specific details omitted, or other methods, components, apparatus, steps, etc., can be employed. In other instances, well-known technical solutions are not shown or described in detail to avoid obscuring various aspects of this disclosure.
[0047] Furthermore, the accompanying drawings are merely illustrative of this disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and therefore repeated descriptions of them will be omitted. Some block diagrams shown in the drawings are functional entities and do not necessarily correspond to physically or logically independent entities. These functional entities may be implemented in software, in one or more hardware modules or integrated circuits, or in different network and / or processor devices and / or microcontroller devices.
[0048] Figure 1 A schematic diagram of a system architecture for an exemplary application environment in which an encrypted traffic identification method and apparatus according to embodiments of the present disclosure can be applied is shown.
[0049] like Figure 1 As shown, the encrypted traffic identification system 100 may include one or more terminal devices 101, a network 102, a collector 103, a first message parsing device 104, and a second message parsing device 105. Terminal devices 101 may be various electronic devices with different application types installed, including but not limited to smartphones and tablets; terminal devices 101 may also be IoT terminal devices, such as... The network 102 serves as the medium for providing communication links between terminal devices 101 and collectors 103, the first message parsing device 104, and the second message parsing device 105. The network 102 may include various connection types, such as wired, wireless communication links, or fiber optic cables. The collector 103 can aggregate small traffic volumes, split large traffic volumes, report message types and message summaries, and has simple mirroring capabilities, allowing it to copy traffic based on network layer information (such as source IP address, source port number, destination IP address, destination port number, protocol type, etc.) and provide it to upper-layer applications. The upper-layer application can be the first packet parsing device 104, the second packet parsing device 105, or a security analysis system, etc., and this disclosure does not specifically limit it. In the example embodiment of this disclosure, the collector 103 can be used to forward the target encrypted traffic to the first packet parsing device 104. The first packet parsing device 104 can be a DPI (Deep Packet Inspection) device, used to determine the terminal type of the target encrypted traffic and forward the target encrypted traffic to the second packet parsing device 105 corresponding to the terminal type of the target encrypted traffic. The second packet parsing device 105 can also be a DPI device, used to classify the target encrypted traffic using a pre-trained application classification model to obtain the application type of the target encrypted traffic; filter the target encrypted traffic to generate a call detail record (CDR) file in a preset format; and train an application classification model corresponding to each terminal using a machine learning classification algorithm, etc. It should be understood that... Figure 1The number of terminal devices 101, network 102, collector 103, first packet parsing device 104, and second packet parsing device 105 shown in the diagram is merely illustrative. Depending on implementation needs, any number of terminal devices, networks, collectors, first packet parsing devices, and second packet parsing devices can be included.
[0050] The encrypted traffic identification method provided in the exemplary embodiments of this disclosure is generally executed by a server consisting of a collector 103, a first message parsing device 104, and a second message parsing device 105. Correspondingly, the encrypted traffic identification device is generally installed in this server. For example, the collector 103 forwards the collected target encrypted traffic to the first message parsing device 104. After the first message parsing device 104 determines the terminal type of the target encrypted traffic, it forwards the target encrypted traffic to the second message parsing device 105 corresponding to the terminal type of the target encrypted traffic. The second message parsing device 105 then uses a pre-trained application classification model to classify the target encrypted traffic by application, obtains the application type of the target encrypted traffic, and marks the application type in the generated call detail record (CDR) file. Finally, the CDR file can be sent to the terminal device 101 in the form of a statistical report for display to the user.
[0051] The technical solutions of the embodiments of this disclosure are described in detail below:
[0052] Currently, to meet the needs of business monitoring such as internet access log retention and business awareness, as well as information security management, a unified DPI system can be deployed in the mobile core network. As a distributed probe for big data collection, the unified DPI system can capture internet access data from various network nodes. Through big data processing, it can perform complex correlation analysis and traffic modeling, as well as routing and quality analysis, such as analyzing plaintext traffic based on load feature recognition methods.
[0053] However, with the widespread use of encrypted traffic, traditional DPI (Digital Perception Point) technology is unable to accurately identify it. For example, traditional DPI technology uses machine learning algorithms to model and analyze the statistical characteristics of encrypted traffic. However, for the same application (such as video application A), the statistical characteristics of encrypted traffic differ significantly across different mobile terminals (or IoT terminals). Correspondingly, the traffic statistical patterns of encrypted traffic on different mobile terminals (or IoT terminals) also vary considerably. Using a single model is insufficient to distinguish the characteristic expressions of a particular application across various mobile terminals (or IoT terminals), thus reducing the accuracy of encrypted traffic identification.
[0054] Based on one or more of the aforementioned problems, this exemplary embodiment provides an encrypted traffic identification method. By distinguishing the terminal type of encrypted traffic and using different application classification models to identify encrypted traffic from different terminal types, the accuracy of encrypted traffic identification can be improved. This encrypted traffic identification method can be applied to an encrypted traffic identification system, which may include a traffic splitter, a first packet parsing device, and a second packet parsing device. (Reference) Figure 2 As shown, the encrypted traffic identification method may include the following steps S210 to S230:
[0055] Step S210. Forward the target encrypted traffic to the first packet parsing device through the traffic splitter;
[0056] Step S220. Determine the terminal type of the target encrypted traffic through the first message parsing device, and forward the target encrypted traffic to the second message parsing device corresponding to the terminal type of the target encrypted traffic;
[0057] Step S230. The second message parsing device uses a pre-trained application classification model to classify the target encrypted traffic by application, thereby obtaining the application type of the target encrypted traffic.
[0058] In the encrypted traffic identification method provided in the exemplary embodiments of this disclosure, target encrypted traffic is forwarded to a first packet parsing device via a traffic splitter; the first packet parsing device determines the terminal type of the target encrypted traffic and forwards the target encrypted traffic to a second packet parsing device corresponding to the terminal type of the target encrypted traffic; the second packet parsing device uses a pre-trained application classification model to classify the target encrypted traffic by application, thereby obtaining the application type of the target encrypted traffic. This disclosure achieves refined analysis of encrypted traffic by distinguishing the terminal type of encrypted traffic and using different application classification models to identify encrypted traffic of different terminal types, thus improving the accuracy of encrypted traffic identification.
[0059] The steps described above in this example implementation will now be explained in more detail.
[0060] In step S210, the target encrypted traffic is forwarded to the first packet parsing device through the splitter.
[0061] In the exemplary embodiments of this disclosure, HTTPS traffic can be used as an example for illustration. For instance, the target encrypted traffic can be HTTPS traffic of a mobile terminal (or IoT terminal) running a specific application. Specifically, the target encrypted traffic can be HTTPS traffic of mobile terminal 1 running video application A. The target encrypted traffic can also be HTTPS traffic of multiple mobile terminals (or IoT terminals) running different applications simultaneously, and this disclosure does not specifically limit this. The application can be an application of different application types. The application type can be a primary application category, such as video, game, or social communication applications, or a secondary application category, such as video application A, video application B, or video application C.
[0062] A traffic splitter is a device installed between a production network mirror port and an analytics device cluster. It aggregates traffic mirrored from one or more production network devices and distributes it to one or more data analytics devices. In this example embodiment, the data analytics device can be a packet parsing device, such as a DPI device. Furthermore, this example embodiment deploys two levels of packet parsing devices to accurately identify HTTPS traffic: a first packet parsing device and a second packet parsing device. Correspondingly, the first packet parsing device can be a first-level DPI device, and the second packet parsing device can be a second-level DPI device.
[0063] For example, after inputting HTTPS traffic from multiple mobile terminals running different applications into a traffic splitter, the splitter can perform aggregation, filtering, and replication on the HTTPS traffic. HTTPS traffic can be data packets composed of five-tuple information. Correspondingly, the HTTPS traffic for one operation can be saved in a message. Specifically, each message can record five-tuple information, application name, and application type, etc. The five-tuple information includes: source IP address, source port number, destination IP address, destination port number, and protocol type. For instance, the splitter can replicate HTTPS traffic and provide it to a primary DPI device based on network layer information (such as source IP address, source port number, destination IP address, destination port number, protocol type, etc.). The primary DPI device can then parse the terminal type of the HTTPS traffic, and the secondary DPI device can identify the application type of the HTTPS traffic based on the corresponding terminal type, achieving refined analysis of the HTTPS traffic. It should be noted that the splitter can forward HTTPS traffic to the primary DPI device according to the same-origin, same-destination principle to ensure that all data packets of the same session or all data packets of the same user are output from the same interface.
[0064] When forwarding target encrypted traffic to the first packet parsing device via a traffic splitter, the traffic splitter can also forward interface information associated with the target encrypted traffic to the first packet parsing device. For example, the interface information associated with HTTPS traffic can be forwarded to the first-level DPI device. This interface information can include the source IP address, gtpTe-ID (used to identify the GTP tunnel), and pei (used to identify the terminal type) of the UPF (User Plane Function) uplink traffic. Specifically, the source IP address and gtpTe-ID of the UPF uplink traffic are the interface information for the core network N3 interface, and the gtpTe-ID and pei are the interface information for the core network N11 interface. The source IP address of the UPF uplink traffic can be used to associate encrypted traffic with the N3 interface, and the gtpTe-ID can be used to associate the N3 interface with the N11 interface. Therefore, the first-level DPI device can determine the terminal type of the HTTPS traffic based on the source IP address, gtpTe-ID, and pei of the UPF uplink traffic. Among them, the N3 interface is the communication interface between the UPF network element and the base station, and the N11 interface is the communication interface between the AMF (Access and Mobility Management Function) network element and the SMF (Session Management Function) network element.
[0065] In this example, in the mobile core network scenario, identifying the application type of encrypted traffic based on source characteristics can improve the accuracy of DPI devices in fine-grained identification of encrypted applications, thereby effectively serving business monitoring needs such as log retention and business awareness, as well as security management needs such as mobile malware detection.
[0066] In step S220, the terminal type of the target encrypted traffic is determined by the first message parsing device, and the target encrypted traffic is forwarded to the second message parsing device corresponding to the terminal type of the target encrypted traffic.
[0067] It is understandable that traffic consists of plaintext and encrypted traffic. Therefore, the traffic received by the first message parsing device and forwarded by the splitter may include both plaintext and encrypted traffic. For plaintext traffic, the first-level DPI device can directly parse the plaintext traffic and generate call detail records (CDRs) in a preset format, such as XDRs. For encrypted traffic, before the first-level DPI device parses and identifies the encrypted traffic, a terminal type library can be pre-created to determine the terminal type of the encrypted traffic based on this library.
[0068] For example, a training set can be obtained, which may include encrypted traffic from multiple terminals and the corresponding application types. A traffic splitter forwards the encrypted traffic and corresponding application types of each of the multiple terminals to a second packet parsing device corresponding to each of the multiple terminals. Based on the encrypted traffic and corresponding application types of each of the multiple terminals, a machine learning classification model can be trained using the second packet parsing device corresponding to each of the multiple terminals to obtain an application classification model for each of the multiple terminals. A terminal type library is then constructed from these application classification models.
[0069] Specifically, encrypted traffic from multiple terminals running different applications can be collected as a training set. Machine learning algorithms can be used to train multiple application classification models for different terminals. These machine learning classification algorithms can be Bayesian, decision tree, and support vector machine algorithms, among others. For example, after sending encrypted traffic from mobile terminal 1 to the corresponding secondary DPI device 101, the application type of the encrypted traffic from mobile terminal 1 (e.g., video application A) can be used as a label value. A support vector machine algorithm is used to determine the predicted value of the application type corresponding to the encrypted traffic from mobile terminal 1. Based on the label value and the predicted value of the application type, the model parameters of application classification model I are continuously adjusted. After training, application classification model I for mobile terminal 1 is obtained. Similarly, multiple application classification models for mobile terminal 1 can be obtained, such as three application classification models for mobile terminal 1, used to distinguish video application A, video application B, and video application C, respectively. Furthermore, multiple application classification models for different terminals can be obtained, and a terminal type library can be constructed from these models so that the secondary DPI devices corresponding to each terminal can use different application classification models for application identification.
[0070] Table 1 illustrates the relationship between mobile terminals, secondary DPI devices, and application classification models in the terminal type library. The terminal type library can include multiple terminals such as Mobile Terminal 1 and Mobile Terminal 2. For each application, each mobile terminal corresponds to an application classification model used to identify that application. For example, for video application A, Mobile Terminal 1 corresponds to Video Application A Classification Model I, and Mobile Terminal 2 corresponds to Video Application A Classification Model II. Additionally, Mobile Terminal 1 corresponds to Secondary DPI Device I, and Mobile Terminal 2 corresponds to Secondary DPI Device II. It can be understood that each mobile terminal corresponds to one secondary DPI device, and this secondary DPI device contains application classification models for multiple applications.
[0071] Table 1
[0072]
[0073] By modeling encrypted traffic based on terminal differences, the accuracy of encrypted traffic identification and the precision of the DPI system can be further improved, thus enabling large-scale deployment in existing networks. Furthermore, in mobile core network scenarios, modeling and application identification based on source characteristics can enhance the performance of application classification models, meeting the requirements of network big data collection.
[0074] In one example implementation, reference Figure 3 As shown, HTTPS traffic can be parsed and the terminal type of HTTPS traffic can be determined according to steps S310 to S330.
[0075] Step S310. Extract the five-tuple information of the target encrypted traffic and the interface information associated with the target encrypted traffic through the first message parsing device.
[0076] For example, the source IP address can be extracted from the five-tuple information of the target encrypted traffic by the first packet parsing device. The first-level DPI device can also be associated with the N3 and N11 interfaces of the core network to extract the source IP address and gtpTe-ID of the UPF uplink traffic from the N3 interface, and then extract the gtpTe-ID and pei interface information through the N11 interface.
[0077] Step S320. Determine the terminal type of the target encrypted traffic based on the five-tuple information of the target encrypted traffic and the interface information associated with the target encrypted traffic.
[0078] Correspondingly, the terminal type of HTTPS traffic can be determined based on the source IP address, gtpTe-ID, and pei interface information of the UPF uplink traffic. Specifically, the target encrypted traffic can be associated with the N3 interface through the source IP address, and the N3 interface and N11 interface can be associated through the gtpTe-ID of the N3 interface. Since the pei in the N11 interface is used to identify the terminal type stored in the user session, the terminal type corresponding to the source IP address can be determined, thus determining the terminal type of the HTTPS traffic.
[0079] For example, if the source IP address of HTTPS traffic is 1.1.1, the gtpTe-ID of the N3 interface can be associated with the source IP address. If the gtpTe-ID is 123, then the pei of the N11 interface can be associated with the gtpTe-ID. If the pei is 2, it means that the terminal type is mobile terminal 2.
[0080] Step S330. Match the second packet parsing device corresponding to the terminal type of the target encrypted traffic from the terminal type library, and forward the target encrypted traffic to the second packet parsing device.
[0081] After determining the terminal type of HTTPS traffic, the terminal type database can be queried to identify the corresponding secondary DPI device. The HTTPS traffic can then be forwarded to the appropriate secondary DPI device. For example, querying the terminal type database reveals that mobile terminal 2 corresponds to secondary DPI device II. The HTTPS traffic can then be forwarded to secondary DPI device II for application identification.
[0082] In this example, by associating multi-interface signaling in the core network, it is easier to distinguish the terminal type of encrypted traffic, thus enabling large-scale application across the network.
[0083] In step S230, the second message parsing device uses a pre-trained application classification model to classify the target encrypted traffic by application, thereby obtaining the application type of the target encrypted traffic.
[0084] After receiving the target encrypted traffic, the second message parsing device can classify the target encrypted traffic into multiple application prediction values based on a pre-trained application classification model that corresponds to the terminal type of the target encrypted traffic. Based on these multiple application prediction values, the application type of the target encrypted traffic can be determined.
[0085] For example, if the terminal type of HTTPS traffic for a certain application is determined to be mobile terminal 2, and the HTTPS traffic is forwarded to a secondary DPI device II, then the secondary DPI device II can identify the HTTPS traffic using different classification models, such as Video Application A Classification Model II and Video Application B Classification Model II, and obtain the corresponding identification results, which can be probability values or scores. For example, the probability value output by Video Application A Classification Model II is 90%, the probability value output by Video Application B Classification Model II is 98%, and so on. After obtaining the probability values output by each application classification model, all probability values can be sorted, such as in descending order. Then, the video application corresponding to the application classification model with the highest output probability value can be taken as the application model of the HTTPS traffic, such as Video Application B.
[0086] After parsing and identifying the target encrypted traffic, a second message parsing device can filter the traffic and generate a call detail record (CDR) file in a preset format. This CDR file can include the application type of the target encrypted traffic. For example, the preset format could be XDR (external data representation), which provides an architecture-independent representation of data, resolving differences in data byte ordering, byte size, data representation, and data alignment. Correspondingly, the CDR file can be an xDR CDR, with the application type marked on the xDR CDR. Finally, based on the encrypted traffic identification results, information analysis and statistics can be performed to generate statistical reports for the user.
[0087] In one example implementation, reference Figure 4 As shown, the application type of encrypted traffic can be accurately identified according to steps S410 to S440.
[0088] Step S410. The traffic splitter forwards HTTPS traffic from different terminals to the primary DPI device. The traffic splitter can forward HTTPS traffic from different terminals to the primary DPI device according to the same origin and destination principle;
[0089] Step S420. The Level 1 DPI device determines the terminal type corresponding to the terminal IP through the interface information associated with the HTTPS traffic. The Level 1 DPI device associates with interfaces such as N3 and N11 in the core network, extracts interface information such as source IP address, gtpTe-ID, and pei, and obtains the terminal type based on the terminal IP address, gtpTe-ID, and pei interface information;
[0090] Step S430. The primary DPI device queries the terminal type library and forwards HTTPS traffic from different terminals to the secondary DPI device. The primary DPI device queries the terminal type library based on the terminal type of different terminals to determine the application classification model corresponding to each terminal, and forwards the HTTPS traffic from different terminals to the secondary DPI device so that the secondary DPI device can parse the HTTPS traffic according to the application classification model corresponding to each terminal;
[0091] Step S440. The secondary DPI device analyzes, processes, and identifies the HTTPS traffic from different terminals. Each secondary DPI device can use multiple pre-trained application classification models to analyze and identify the received HTTPS traffic, and mark the application type on the xDR call detail record (CDR) according to the identification results. For example, the application type marked on user A's xDR CDR is video application B.
[0092] In this example, by accurately identifying the application type of HTTPS traffic, network traffic composition and performance analysis of applications can be performed. This meets business monitoring needs such as internet access log retention and business awareness, as well as information security management needs such as mobile malware detection. Furthermore, deploying a unified DPI system in a mobile core network scenario can meet the requirements for network big data collection, analysis, and operation and maintenance.
[0093] In the encrypted traffic identification method provided in the exemplary embodiments of this disclosure, target encrypted traffic is forwarded to a first packet parsing device via a traffic splitter; the first packet parsing device determines the terminal type of the target encrypted traffic and forwards the target encrypted traffic to a second packet parsing device corresponding to the terminal type of the target encrypted traffic; the second packet parsing device uses a pre-trained application classification model to classify the target encrypted traffic by application, thereby obtaining the application type of the target encrypted traffic. This disclosure achieves refined analysis of encrypted traffic by distinguishing the terminal type of encrypted traffic and using different application classification models to identify encrypted traffic of different terminal types, thus improving the accuracy of encrypted traffic identification.
[0094] It should be noted that although the steps of the method in this disclosure are described in a specific order in the accompanying drawings, this does not require or imply that the steps must be performed in that specific order, or that all the steps shown must be performed to achieve the desired result. Additional or alternative steps may be omitted, multiple steps may be combined into one step, and / or a step may be broken down into multiple steps.
[0095] Furthermore, this example embodiment also provides an encrypted traffic identification device. This device can be applied to a server or terminal device. (See reference...) Figure 5 As shown, the encrypted traffic identification device 500 may include an encrypted traffic forwarding module 510, a terminal type determination module 520, and an application type determination module 530, wherein:
[0096] The encrypted traffic forwarding module 510 is used to forward the target encrypted traffic to the first packet parsing device through the splitter;
[0097] The terminal type determination module 520 is used to determine the terminal type of the target encrypted traffic through the first message parsing device, and forward the target encrypted traffic to the second message parsing device corresponding to the terminal type of the target encrypted traffic;
[0098] The application type determination module 530 is used to classify the target encrypted traffic by the second message parsing device using a pre-trained application classification model to obtain the application type of the target encrypted traffic.
[0099] In an optional implementation, the encrypted traffic identification device 500 further includes:
[0100] The interface information forwarding module is used to forward the interface information associated with the target encrypted traffic to the first packet parsing device through the splitter.
[0101] In one optional implementation, the terminal type determination module 520 includes:
[0102] The traffic information extraction unit is used to extract the five-tuple information of the target encrypted traffic and the interface information associated with the target encrypted traffic through the first message parsing device;
[0103] A terminal type determination unit is used to determine the terminal type of the target encrypted traffic based on the five-tuple information of the target encrypted traffic and the interface information associated with the target encrypted traffic;
[0104] An encrypted traffic forwarding unit is used to match a second packet parsing device corresponding to the terminal type of the target encrypted traffic from a terminal type library, and forward the target encrypted traffic to the second packet parsing device.
[0105] In one optional implementation, the pre-trained application classification model includes multiple application classification models corresponding to the terminal type of the target encrypted traffic; the application type determination module 530 is configured to classify the target encrypted traffic by the second packet parsing device using the multiple application classification models corresponding to the terminal type of the target encrypted traffic to obtain multiple application prediction values; and determine the application type of the target encrypted traffic based on the multiple application prediction values.
[0106] In an optional implementation, the encrypted traffic identification device 500 further includes:
[0107] The call detail record (CDR) file generation module is used to filter the target encrypted traffic through the second message parsing device and generate a CDR file in a preset format, wherein the CDR file contains the application type of the target encrypted traffic.
[0108] In an optional implementation, the encrypted traffic identification device 500 further includes:
[0109] The training set acquisition module is used to acquire a training set, which includes encrypted traffic from multiple terminals and the application types corresponding to the encrypted traffic.
[0110] The training set forwarding module is used to forward the encrypted traffic of each terminal among the multiple terminals and the application type corresponding to the encrypted traffic to the second packet parsing device corresponding to each of the multiple terminals through the traffic splitter;
[0111] The classification model training module is used to train an application classification model for each of the multiple terminals based on the encrypted traffic of each terminal and the application type corresponding to the encrypted traffic, by using a machine learning classification algorithm through a second message parsing device corresponding to each of the multiple terminals.
[0112] The terminal type library construction module is used to construct the terminal type library from the application classification model corresponding to each of the plurality of terminals.
[0113] In one alternative implementation, the machine learning classification algorithm in the classification model training module includes any of the following: Bayesian, decision tree, and support vector machine.
[0114] The specific details of each module in the above-mentioned encrypted traffic identification device have been described in detail in the corresponding encrypted traffic identification method, so they will not be repeated here.
[0115] Exemplary embodiments of this disclosure also provide a computer-readable storage medium having a program product stored thereon capable of implementing the methods described above in this specification. In some possible embodiments, various aspects of this disclosure may also be implemented as a program product including program code that, when run on an electronic device, causes the electronic device to perform the steps described in the "Exemplary Methods" section of this specification according to various exemplary embodiments of this disclosure. This program product may be a portable compact disc read-only memory (CD-ROM) including program code and may run on an electronic device, such as a personal computer. However, the program product of this disclosure is not limited thereto. In this document, the readable storage medium may be any tangible medium containing or storing a program that may be used by or in conjunction with an instruction execution system, apparatus, or device.
[0116] The program product may employ any combination of one or more readable media. A readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (a non-exhaustive list) of readable storage media include: electrical connections having one or more wires, portable disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof.
[0117] Computer-readable signal media may include data signals propagated in baseband or as part of a carrier wave, carrying readable program code. Such propagated data signals may take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. A readable signal medium may also be any readable medium other than a readable storage medium, capable of sending, propagating, or transmitting programs for use by or in conjunction with an instruction execution system, apparatus, or device.
[0118] The program code contained on the readable medium may be transmitted using any suitable medium, including but not limited to wireless, wired, optical fiber, RF, etc., or any suitable combination thereof.
[0119] Program code for performing the operations of this disclosure can be written in any combination of one or more programming languages, including object-oriented programming languages such as Java and C++, and conventional procedural programming languages such as C or similar languages. The program code can execute entirely on the user's computing device, partially on the user's computing device, as a standalone software package, partially on the user's computing device and partially on a remote computing device, or entirely on a remote computing device or server. In cases involving remote computing devices, the remote computing devices can be connected to the user's computing device via any type of network, including a local area network (LAN) or a wide area network (WAN), or can be connected to an external computing device (e.g., via the Internet using an Internet service provider).
[0120] Exemplary embodiments of this disclosure also provide an electronic device capable of implementing the above-described method. Referring below... Figure 6 To describe an electronic device 600 according to such an exemplary embodiment of the present disclosure. Figure 6 The electronic device 600 shown is merely an example and should not impose any limitation on the functionality and scope of use of the embodiments disclosed herein.
[0121] like Figure 6 As shown, the electronic device 600 can be represented as a general-purpose computing device. The components of the electronic device 600 may include, but are not limited to: at least one processing unit 610, at least one storage unit 620, a bus 630 connecting different system components (including storage unit 620 and processing unit 610), and a display unit 640.
[0122] Storage unit 620 stores program code that can be executed by processing unit 610, causing processing unit 610 to perform the steps described in the "Exemplary Methods" section of this specification according to various exemplary embodiments of this disclosure. For example, processing unit 610 can execute... Figures 2 to 4 Any one or more of the method steps.
[0123] Storage unit 620 may include readable media in the form of volatile storage units, such as random access memory (RAM) 621 and / or cache memory 622, and may further include read-only memory (ROM) 623.
[0124] Storage unit 620 may also include a program / utility 624 having a set (at least one) of program modules 625, including but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of these examples may include an implementation of a network environment.
[0125] Bus 630 can represent one or more of several types of bus structures, including a memory cell bus or memory cell controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local bus using any of the various bus structures.
[0126] Electronic device 600 can also communicate with one or more external devices 700 (e.g., keyboard, pointing device, Bluetooth device, etc.), and with one or more devices that enable a user to interact with electronic device 600, and / or with any device that enables electronic device 600 to communicate with one or more other computing devices (e.g., router, modem, etc.). This communication can be performed via input / output (I / O) interface 650. Furthermore, electronic device 600 can also communicate with one or more networks (e.g., local area network (LAN), wide area network (WAN), and / or public networks, such as the Internet) via network adapter 660. As shown, network adapter 660 communicates with other modules of electronic device 600 via bus 630. It should be understood that, although not shown in the figures, other hardware and / or software modules can be used in conjunction with electronic device 600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems.
[0127] From the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein can be implemented by software or by combining software with necessary hardware. Therefore, the technical solutions according to the embodiments of this disclosure can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (such as a CD-ROM, USB flash drive, external hard drive, etc.) or on a network, including several instructions to cause a computing device (such as a personal computer, server, terminal device, or network device, etc.) to execute the method according to the exemplary embodiments of this disclosure.
[0128] Furthermore, the above figures are merely illustrative representations of the processes included in the methods according to exemplary embodiments of this disclosure, and are not intended to be limiting. It is readily understood that the processes shown in the above figures do not indicate or limit the temporal order of these processes. Additionally, it is readily understood that these processes may be executed synchronously or asynchronously, for example, in multiple modules.
[0129] It should be noted that although several modules or units for the device used to perform actions have been mentioned in the detailed description above, this division is not mandatory. In fact, according to embodiments of this disclosure, the features and functions of two or more modules or units described above can be embodied in one module or unit. Conversely, the features and functions of one module or unit described above can be further divided and embodied by multiple modules or units.
[0130] It should be understood that this disclosure is not limited to the precise structures described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope. The scope of this disclosure is limited only by the appended claims.
Claims
1. A method for identifying encrypted traffic, characterized in that, Applied to an encrypted traffic identification system, the encrypted traffic identification system including a traffic splitter, a first packet parsing device, and multiple second packet parsing devices, the method includes: The target encrypted traffic is forwarded to the first packet parsing device via the traffic splitter; The first message parsing device determines the terminal type of the target encrypted traffic and forwards the target encrypted traffic to the second message parsing device corresponding to the terminal type of the target encrypted traffic; The second message parsing device uses a pre-trained application classification model corresponding to the terminal type of the target encrypted traffic to classify the application type of the target encrypted traffic. The application classification model is pre-trained for each of the multiple different terminals so that the second message parsing device for different terminal types can use different application classification models for application identification.
2. The encrypted traffic identification method according to claim 1, characterized in that, When forwarding the target encrypted traffic to the first packet parsing device through the traffic splitter, the method further includes: The splitter forwards the interface information associated with the target encrypted traffic to the first packet parsing device.
3. The encrypted traffic identification method according to claim 2, characterized in that, The step of determining the terminal type of the target encrypted traffic through the first message parsing device and forwarding the target encrypted traffic to the second message parsing device corresponding to the terminal type of the target encrypted traffic includes: The first message parsing device extracts the five-tuple information of the target encrypted traffic and the interface information associated with the target encrypted traffic. The terminal type of the target encrypted traffic is determined based on the five-tuple information of the target encrypted traffic and the interface information associated with the target encrypted traffic; The system matches the terminal type of the target encrypted traffic with the terminal type from the terminal type library and forwards the target encrypted traffic to the second packet parsing device.
4. The encrypted traffic identification method according to claim 1, characterized in that, The pre-trained application classification model includes multiple application classification models corresponding to the terminal type of the target encrypted traffic; the step of classifying the target encrypted traffic by the second packet parsing device using the pre-trained application classification model to obtain the application type of the target encrypted traffic includes: The second message parsing device uses multiple application classification models corresponding to the terminal type of the target encrypted traffic to classify the target encrypted traffic into multiple application prediction values. The application type of the target encrypted traffic is determined based on the multiple application prediction values.
5. The encrypted traffic identification method according to claim 1, characterized in that, After classifying the target encrypted traffic by the second message parsing device using a pre-trained application classification model to obtain the application type of the target encrypted traffic, the method further includes: The second message parsing device filters the target encrypted traffic and generates a call detail record (CDR) file in a preset format, which contains the application type of the target encrypted traffic.
6. The encrypted traffic identification method according to claim 1, characterized in that, The method further includes: Obtain a training set, which includes encrypted traffic from multiple terminals and the application types corresponding to the encrypted traffic; The splitter forwards the encrypted traffic of each of the multiple terminals and the application type corresponding to the encrypted traffic to the second message parsing device corresponding to each of the multiple terminals. Based on the encrypted traffic of each of the multiple terminals and the application type corresponding to the encrypted traffic, the application classification model corresponding to each of the multiple terminals is obtained by training a machine learning classification algorithm through the second message parsing device corresponding to each of the multiple terminals. A terminal type library is constructed from the application classification model corresponding to each of the multiple terminals.
7. The encrypted traffic identification method according to claim 6, characterized in that, The machine learning classification algorithm includes any of the following: Bayesian, decision tree, and support vector machine.
8. An encrypted traffic identification device, characterized in that, include: The encrypted traffic forwarding module is used to forward the target encrypted traffic to the first packet parsing device through the traffic splitter; The terminal type determination module is used to determine the terminal type of the target encrypted traffic through the first packet parsing device, and forward the target encrypted traffic to the second packet parsing device corresponding to the terminal type of the target encrypted traffic; The application type determination module is used to classify the target encrypted traffic by the second message parsing device using a pre-trained target application classification model corresponding to the terminal type of the target encrypted traffic, thereby obtaining the application type of the target encrypted traffic; wherein, a corresponding application classification model is pre-trained for each of the multiple different terminals, so that the second message parsing device corresponding to different terminal types can use different application classification models for application identification.
9. An encrypted traffic identification system, characterized in that, The encrypted traffic identification system includes a traffic splitter, a first packet parsing device, and multiple second packet parsing devices, wherein... The traffic splitter is used to forward the target encrypted traffic to the first packet parsing device; The first message parsing device is used to determine the terminal type of the target encrypted traffic and forward the target encrypted traffic to the second message parsing device corresponding to the terminal type of the target encrypted traffic; The second message parsing device is configured to classify the target encrypted traffic into applications using a pre-trained target application classification model corresponding to the terminal type of the target encrypted traffic, thereby obtaining the application type of the target encrypted traffic; filter the target encrypted traffic to generate a call detail record (CDR) file in a preset format, wherein the CDR file contains the application type of the target encrypted traffic; and train an application classification model corresponding to each terminal using a machine learning classification algorithm, so that the second message parsing device for different terminal types uses different application classification models for application identification.
10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the method of any one of claims 1-7.
11. An electronic device, characterized in that, include: processor; as well as Memory for storing the executable instructions of the processor; The processor is configured to execute the method of any one of claims 1-7 by executing the executable instructions.