Http protocol information extraction method and device
An http protocol and information extraction technology, applied in the field of data analysis, can solve problems such as slow speed and low efficiency, and achieve the effect of simple extraction
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0024] First, Embodiment 1 of the present invention provides a method for extracting http protocol information, which mainly describes the process of extracting http protocol information, see figure 1 , the method may include the following steps:
[0025] Step S102: Load the extraction rules for http protocol information extraction, and store them in memory.
[0026] When extracting the http protocol information, the extraction rules are first loaded and stored in the memory. According to the characteristics of the http protocol, the extraction rule includes a plurality of rules, which are respectively matched with hosts and urls in different situations.
[0027] Step S104: Obtain the host and url in a piece of data from the data to be analyzed.
[0028] The data to be analyzed can be big data, and during processing, the http protocol information is extracted one by one from the data to be analyzed. In this step, for each piece of data, the host and url in the data are obta...
Embodiment 2
[0035] This embodiment is a further preferred method for extracting http protocol information on the basis of Embodiment 1, see figure 2 , the method may include the following steps:
[0036] Step S202: Loading extraction rules for http protocol information extraction.
[0037] Preferably, the extraction rules are written in the form of an xml configuration file. When loading the extraction rules, the following steps are adopted:
[0038] Use SAXReader to read in the xml configuration file; traverse the host tag to construct the HostInfo entity object; traverse the urlinfo tag under the host tag to construct the UrlInfo entity object, and verify the validity of the protocol small class code and custom class; traverse the urlinfo tag under the getinfo tag, construct GetInfo entity object, verify the validity of pType, srcData attribute and custom class; traverse the todata tag under the getinfo tag, construct Todata entity object, verify the validity of keystring and custom cla...
Embodiment 3
[0112] Corresponding to the method for extracting http protocol information provided in Embodiment 1 of the present invention, the embodiment of the present invention also provides a device for extracting http protocol information, see image 3 , the device may include a rule loader and a rule parser.
[0113] Among them, the rule loader is used to load the extraction rules for http protocol information extraction and store them in the memory. The specific process is as described in the second embodiment above, and will not be repeated here, in order to meet the diversity and complexity of the content of the http protocol , it is also possible to implement a personalized extraction interface according to specific requirements, and then achieve custom personalized extraction; the rule parser is used to obtain the host and url in a piece of data from the data to be analyzed, and judge whether the obtained host and url are consistent with the extracted The rules are matched, and ...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 