Attack detection method and system
An attack detection and detection technology, applied in the Internet field, can solve the problems of lower detection efficiency and low detection rate of unknown attacks
Inactive Publication Date: 2016-02-17
BEIJING NORMAL UNIVERSITY
3 Cites 25 Cited by
AI-Extracted Technical Summary
Problems solved by technology
This method has a relatively high detection rate and low false detection rate for known attacks, but the detection rate for unknown attacks is very low. At th...
Method used
[0092] Applying the embodiment shown in FIG. 1 of the present invention can actively discover unknown attacks, improve the detection rate of unknown attacks, and reduce the false detection rate of detection. In addition, multiple detection models are used for detection, and the weighted value of multiple models is obtained through an optimization algorithm, which avoids the limitations of a single detection model, reduces the occurrence of false positives and false positives, and has a low false detection rate and strong detection strength....
Abstract
The embodiment of the invention discloses an attack detection method and system. The method comprises the following steps: establishing a plurality of detection models related to an HTTP request in advance, and detecting each record after web access logs are decomposed by utilizing each detection model respectively to obtain a parameter abnormal value of each detection model aiming at the record; calculating an optimized weighted value corresponding to the parameter abnormal value of each detection model, carrying out weighted calculation to obtain a final parameter abnormal value, and determining a final abnormal threshold; judging whether the final parameter abnormal value calculated aiming at the log record to be detected is greater than the determined final abnormal threshold; and if yes, determining the HTTP request of the log record to be detected as attack behavior. By applying the embodiment of the invention, unknown attacks can be actively discovered, so that the detection rate of the unknown attacks can be improved; and optimized weighting of multiple detection models is adopted for detecting, so that the limitation of a single detection model is avoided, false-reporting and under-reporting conditions are reduced, and the false detection rate is lowered.
Application Domain
Transmission
Technology Topic
Weight valueFalse detection +3
Image
Examples
- Experimental program(1)
Example Embodiment
[0047] The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.
[0048] In order to solve the problems of the prior art, the embodiments of the present invention provide an attack detection method and system. The following first introduces an attack detection method provided by an embodiment of the present invention.
[0049] A preset number of detection models related to HTTP requests are established in advance.
[0050] figure 1 This is a schematic flowchart of an attack detection method provided by an embodiment of the present invention, which may include:
[0051] S101: Obtain web access logs;
[0052] Wherein, the web access log includes multiple records, and each record includes multiple parameters of the HTTP request of the record;
[0053] S102: Decompose the obtained web access log to obtain multiple records;
[0054] S103: For each record obtained, judge whether the HTTP request of the record is in a successful state, and if so, execute S104;
[0055] S104: Extract the first data of the record;
[0056] Wherein, the first data includes at least: multiple parameters of the HTTP request of the record;
[0057] S105: Detect the first data using each pre-established detection model, and obtain the parameter abnormal value of each detection model for the record.
[0058] S106: Use the web access log sample set to calculate the optimized weighted value corresponding to the parameter abnormal value of each detection model according to the optimization algorithm, and the parameter abnormal value of each detection model for the record, and the weighted calculation for the record Final parameter abnormal value;
[0059] S107: According to the final parameter abnormal values of all records in the web access log sample set, determine the final abnormal threshold threshold in an iterative manner;
[0060] S108: For the log record to be detected, obtain the parameter abnormal value of each detection model for the log record to be detected;
[0061] S109: Calculate the final parameter abnormal value of the log record to be detected according to the optimized weight value and the parameter abnormal value for the log record to be detected;
[0062] S110: Determine whether the final parameter abnormal value recorded for the log to be detected is greater than the determined final abnormal threshold threshold, and if so, perform S111;
[0063] S111: Determine the HTTP request of the log record to be detected as an attack.
[0064] Specifically, in practical applications, the obtained web access log can be decomposed and processed through a database to obtain multiple records; a large amount of data can be processed through the database, which is simple to operate, easy to store and use, and greatly improves data processing Efficiency and operating efficiency of each module.
[0065] After decomposing the obtained web access log to obtain multiple records, it can be determined for each record whether the HTTP request for the record is in a successful state. Specifically, each record also includes: the response status code of the HTTP request of the record, the value of the response status code of the HTTP request of the record can be obtained according to the response status code of the HTTP request of the record, and the obtained Whether the value of the response status code is within the preset value range, if it is, it means that the HTTP request for the record is in a successful state; in actual applications, the preset value range can be 200 to 300.
[0066] In the case of judging that the HTTP request of the record is in a successful state, multiple parameters of the HTTP request of the record can be extracted; each detection model established in advance is used to detect multiple parameters of the HTTP request of the record, Obtain the parameter abnormal value of each detection model for the record.
[0067] In practical applications, there can be 4 pre-established detection models, which are enumerated type models, parameter correlation models, length distribution models, and character distribution models.
[0068] The multiple parameters of the HTTP request for the record are detected by using the enumerated type model, the parameter association model, the length distribution model, and the character distribution model to obtain the parameter abnormal value of each detection model for the record.
[0069] Use the enumerated type model to detect multiple parameters of the HTTP request for this record, and obtain the parameter abnormal value of the enumerated type model for the record. The type of the HTTP request parameter of the record can be used to determine the The enumeration type model is for the parameter abnormal value of this record.
[0070] Specifically, the types of parameters can be divided into random and enumerated types. To determine the type of parameter, you can check whether the number of parameter values is limited to a threshold. When the type of a parameter value gradually increases with the continuous increase of the sample, the type of the parameter is random; when the type of a parameter value reaches a certain value, the type of the parameter value no longer gradually increases with the continuous increase of the sample When increasing, the type of this parameter is enumerated.
[0071] When the HTTP request parameter of the record is an enumeration type, it is judged whether there is a parameter value that is the same as the HTTP request parameter value of the record in the processed data. If it exists, the abnormal value of the first parameter is determined as an enumeration The type model is for the parameter abnormal value of the record. If it does not exist, the second parameter abnormal value is determined as the parameter abnormal value of the enumeration type model for the record.
[0072] When the HTTP request parameter of the record is random, the abnormal value of the first parameter is determined as the parameter abnormal value of the enumeration type model for the record.
[0073] In practical applications, the abnormal value of the first parameter may be 0, and the abnormal value of the second parameter may be 1.
[0074] Use the parameter correlation model to detect the first data, and obtain the parameter abnormal value of the parameter correlation model for the record. The occurrence of the parameter in the query string of the HTTP request of the record and training The occurrence of the parameters in the obtained parameter subset set is determined to determine the parameter abnormal value of the parameter association model for the record.
[0075] Specifically, all parameters of each log record constitute a parameter subset, and all log records constitute a parameter set. The repeated parameter subsets in the parameter set are deduplicated through a database or a hash table to obtain a parameter subset set. Detect whether the occurrence of the parameter in the HTTP request query string of this record matches the occurrence of the parameter in the parameter subset set obtained through training. If the match is successful, the abnormal value of the third parameter is determined as the parameter association model for this If the matching is unsuccessful for the parameter abnormal value of a record, the fourth parameter abnormal value is determined as the parameter abnormal value of the parameter correlation model for the record.
[0076] In practical applications, the parameter association model can match whether the parameters in the query string are repeated, whether the parameters are missing or should not appear at the same time, etc.; the abnormal value of the third parameter can be 0, and the abnormal value of the fourth parameter can be 1.
[0077] Use the length distribution model to detect the first data to obtain the parameter abnormal value of the length distribution model for the record, which can be based on the length of the HTTP request parameter value of the record and the normal request obtained from the training set The parameter value length determines the parameter abnormal value of the length distribution model for the record.
[0078] Specifically, the length distribution model can use Chebyshev's inequality to detect the abnormality of the length of the HTTP request parameter value of the record. If there is an attack, there will be script injection or additional character implantation, so that the length of the parameter value will be different from the length of the normal request parameter value, and the HTTP request for this record will be determined as an abnormal situation.
[0079] Specifically, perform statistics on the processed samples in advance to obtain the mean value μ and mean square error σ of the parameter value length of each HTTP request parameter 2; Use Chebyshev's inequality to detect and judge the abnormality of the length of the parameter value, and approximate all the values of the random variable to an even distribution, assuming that the length of the HTTP request parameter value of this record is L, when L When
[0080] In practical applications, the abnormal value of the fifth parameter may be 0, and the abnormal value of the sixth parameter may be 1.
[0081] Use the character distribution model to detect the first data to obtain the parameter abnormal value of the character distribution model for the record. The HTTP of the record can be calculated according to the character probability distribution of each parameter obtained from the training set. The character distribution of the request parameter and the chi-square value of the character probability distribution of each parameter obtained from the training set; according to the calculated chi-square value, the parameter abnormal value of the character distribution model for the record is determined.
[0082] Specifically, the character distribution model is based on the fact that the character distribution corresponding to the HTTP request parameter value has specific characteristics. When an attack code is injected, the probability distribution of the character is usually affected, and the record is judged to be abnormal. Perform statistics on the parameter value strings of different parameters in the training set to obtain the character probability distribution of the parameter and store it in the preset database; obtain the character distribution of the parameter of the record; compare the character distribution of the parameter of the record obtained with The character distribution of the parameters of the training set stored in the database is matched, and the chi-square value is calculated through chi-square detection; the corresponding credibility is determined by querying the chi-square distribution table, so that the determined credibility is determined as the length distribution model The parameter abnormal value for this record.
[0083] Using the web access log sample set record, after the detection of the above four detection models is completed, the optimized weighting value corresponding to the parameter abnormal value of each detection model is calculated according to the optimization algorithm, and the parameter abnormality of the record according to each detection model Value and the optimized weighted value calculated according to the optimization algorithm, the final parameter abnormal value of the record is obtained by weighted calculation, which can be calculated according to the following formula:
[0084] The final parameter abnormal value of this record=∑ m W m *P m;
[0085] Among them, m∈the pre-established detection model, W m To optimize the weighted value for the detection model m, P m To detect the parameter abnormal value of the model m for the record.
[0086] At this time, the pre-established detection models related to HTTP requests are: enumeration type model, parameter correlation model, length distribution model, and character distribution model.
[0087] In actual applications, other detection models related to HTTP requests can also be established, and the embodiment of the present invention does not introduce other established detection models related to HTTP requests here.
[0088] After the final parameter abnormal value is calculated for all records in the web access log sample set, the final abnormal threshold threshold can be determined by iterative method according to the final parameter abnormal value of all records in the web access log sample set.
[0089] Specifically, the abnormal probability value can be determined according to the final parameter abnormal value of all records in the web access log sample set; when the abnormal probability value is greater than the abnormal threshold threshold, the false positive rate is obtained; when the false positive rate is not If it is less than the preset false positive rate, the abnormal threshold is adjusted until the obtained false positive rate is less than the preset false positive rate, and the current abnormal threshold is determined as the final abnormal threshold.
[0090] At this time, the optimized weight value for the detection model and the final abnormal threshold threshold are obtained. For the log record to be detected, the parameter abnormal value of each detection model for the log record to be detected is obtained; according to the optimized weight value and for the log record to be detected Detect the parameter abnormal value of the log record, calculate the final parameter abnormal value for the log record to be detected, determine whether the final parameter abnormal value of the log record to be detected is greater than the determined final abnormal threshold threshold, when the log record to be detected When the recorded final parameter abnormal value is greater than the determined final abnormal threshold threshold, the HTTP request for the log to be detected is considered to be an abnormal request, that is, the HTTP request for the log to be detected is an attack.
[0091] Specifically, in practical applications, the optimization algorithm is used to obtain the optimal weight value of the detection model, and the gradient descent algorithm and normalization can be used to obtain the optimization weight value of the detection model.
[0092] Apply the invention figure 1 The illustrated embodiment can actively discover unknown attacks, improve the detection rate of unknown attacks, and reduce the false detection rate of detection. In addition, multiple detection models are used for detection, and the weighted value of multiple models is obtained through optimization algorithms, which avoids the limitations of a single detection model, reduces the occurrence of false positives and under-reports, has a low false detection rate and large detection efforts.
[0093] figure 2 It is a schematic structural diagram of an attack detection system provided by an embodiment of the present invention, which may include: a data preprocessing module 201, a detection module 202, an optimization module 203, a testing module 204, and a preset number of detection models 205 related to HTTP requests, among them,
[0094] The data preprocessing module 201 is configured to obtain a web access log. The web access log includes multiple records, and each record includes multiple parameters of the HTTP request of the record; the obtained web access log is decomposed to obtain multiple records. Records; for each record obtained, determine whether the HTTP request for the record is in a successful state; if so, extract the first data of the record, the first data includes at least: the number of HTTP requests for the record Parameters;
[0095] Specifically, each record also includes: the response status code of the HTTP request of the record, according to the response status code of the HTTP request of the record, the value of the response status code of the HTTP request of the record is obtained, and the obtained value is determined Whether the value of the response status code is within the preset value range, if it is, it means that the HTTP request for the record is in a successful state; in actual applications, the preset value range can be 200 to 300.
[0096] The detection module 202 is configured to detect the first data extracted by the data preprocessing module 201 by using each detection model in the detection model 205 related to the HTTP request to obtain the parameters of each detection model for the record. Outlier
[0097] In practical applications, the detection model 205 related to the HTTP request may include four detection models: an enumeration type model, a parameter correlation model, a length distribution model, and a character distribution model.
[0098] In practical applications, the first data is detected using the enumeration type model to obtain the parameter abnormal value of the enumeration type model for the record, which can be determined according to the type of the HTTP request parameter of the record The enumerated type model is for the parameter abnormal value of the record.
[0099] In practical applications, the first data is detected using the parameter correlation model to obtain the parameter abnormal value of the parameter correlation model for the record, and the occurrence of the parameter in the query string can be based on the HTTP request of the record Circumstances and the occurrence of parameters in the parameter subset set obtained through training, determine the parameter abnormal value of the parameter association model for the record.
[0100] In practical applications, the length distribution model is used to detect the first data to obtain the parameter abnormal value of the length distribution model for the record. The parameter value length of the HTTP request of the record and the training set The obtained normal request parameter value length determines the parameter abnormal value of the length distribution model for the record.
[0101] In practical applications, the first data is detected using the character distribution model to obtain the parameter abnormal value of the character distribution model for the record. The character probability distribution of each parameter obtained from the training set can be used to calculate the The character distribution of the HTTP request parameters of a record and the chi-square value of the character probability distribution of each parameter obtained from the training set; according to the calculated chi-square value, the parameter abnormal value of the character distribution model for the record is determined.
[0102] The optimization module 203 is used to use the web access log sample set to calculate the optimized weighted value corresponding to the parameter abnormal value of each detection model calculated according to the optimization algorithm, and the parameter abnormal value of each detection model for the record, the weighted calculation is obtained For the final parameter abnormal value of the record; according to the final parameter abnormal value of all records in the web access log sample set, determine the final abnormal threshold threshold in an iterative manner;
[0103] Specifically, in practical applications, the optimization module 203 shown in the embodiment of the present invention can be specifically used for:
[0104] Using the web access log sample set, the optimized weighting value corresponding to the parameter abnormal value of each detection model is calculated according to the optimization algorithm, and the parameter abnormal value of each detection model for the record, and the weighted calculation is calculated for the record Final parameter outlier==∑ m W m *P m;
[0105] Among them, m∈the pre-established detection model, W m To optimize the weighted value for the detection model m, P m To detect the parameter abnormal value of the model m for the record.
[0106] Determine the abnormal probability value according to the final parameter abnormal value of all records in the web access log sample set;
[0107] When the abnormal probability value is greater than the abnormal threshold threshold, a false positive rate is obtained;
[0108] In the case that the misjudgment rate is not less than the preset misjudgment rate, the abnormality threshold is adjusted until the obtained misjudgment rate is less than the preset misjudgment rate, and the current abnormality threshold is determined as the final abnormality threshold. .
[0109] The testing module 204 is configured to obtain, for the log record to be detected, the parameter abnormal value of each detection model for the log record to be detected; calculate the all parameters according to the optimized weight value and the parameter abnormal value for the log record to be detected The final parameter abnormal value of the log record to be detected; it is determined whether the abnormal value of the final parameter of the log record to be detected is greater than the final abnormal threshold determined by the optimization module 203; if so, the HTTP request for the log record to be detected Determined as an offensive behavior.
[0110] Apply the invention figure 2 The illustrated embodiment can actively discover unknown attacks, improve the detection rate of unknown attacks, and reduce the false detection rate of detection. In addition, multiple detection models are used for detection, and the weighted value of multiple models is obtained through an optimization algorithm, which avoids the limitations of a single detection model, reduces the occurrence of false positives and under-reports, has a low false detection rate and a large detection force.
[0111] It should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply one of these entities or operations. There is any such actual relationship or order between. Moreover, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article, or device that includes a series of elements includes not only those elements, but also includes Other elements of, or also include elements inherent to this process, method, article or equipment. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other same elements in the process, method, article, or equipment including the element.
[0112] The various embodiments in this specification are described in a related manner, and the same or similar parts between the various embodiments can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, as for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for related parts, please refer to the partial description of the method embodiment.
[0113] A person of ordinary skill in the art can understand that all or part of the steps in the above-mentioned method embodiments can be implemented by a program instructing relevant hardware. The program can be stored in a computer readable storage medium, which is referred to herein Storage media, such as ROM/RAM, magnetic disks, optical disks, etc.
[0114] The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the protection scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are all included in the protection scope of the present invention.
PUM


Description & Claims & Application Information
We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.