A method of normalizing URL
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHANGHAI GUAN AN INFORMATION TECH
- Publication Date
- 2019-10-01
- Estimated Expiration
- Not applicable · inactive patent
Smart Images

Figure 1 
Figure 2 
Figure 3
Abstract
Description
technical field
[0001] The invention relates to the field of URL normalization, in particular to a method for normalizing URLs. Background technique
[0002] When analyzing web logs, we often need to perform some statistical calculations on web pages, such as calculating the number of visits per hour of a page, the number of visits to IPs, the distribution of response status codes, etc., by establishing a time series model for these statistics, or using They are used as features to build a more complex anomaly discovery model, which is used to discover abnormal pages accessed within a certain period of time. But in actual analysis, we can't see the real page visited by the user, only the URL (the address of the standard resource on the Internet) visited by the user can be seen from the access log, so strictly speaking, the object of our analysis is not the "page , instead of "URL".
[0003] Regardless of whether the server uses apache, nginx or IIS, the log format they rec...