Automatic mining method of website channel
An automatic mining and website technology, applied in the field of automatic mining of website channels, can solve the problems of large space occupation, long time-consuming capture and classification, time-consuming and labor-intensive problems, achieve small disk space occupation, improve capture and classification efficiency, and reduce errors rate effect
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0048] figure 1 It is a flow chart of the method for automatic mining of website channels.
[0049] Such as figure 1 As shown, the website channel automatic mining method includes the following steps:
[0050] Step 1, grabbing the URL data of each website from Internet data;
[0051] Step 2, decomposing the URL data into multiple URL patterns;
[0052]Step 3, filtering the multiple URL patterns obtained by decomposing, removing URL patterns that are repeatedly included, and obtaining candidate URL patterns;
[0053] Step 4, sampling the URL data included in the filtered candidate URL patterns;
[0054] Step 5, crawling the webpage content on the URL data left by the sampling, and classifying the webpage;
[0055] Step 6, counting the URL data contained in each URL pattern, setting the same proportion threshold for classification, and leaving the patterns whose URL data classification exceeds the proportion threshold;
[0056] Step 7, merging the patterns in the URL patte...
Embodiment 2
[0058] figure 1 It is a flow chart of the method for automatic mining of website channels.
[0059] Such as figure 1 As shown, the website channel automatic mining method includes the following steps:
[0060] Step 1, grabbing the URL data of each website from Internet data;
[0061] Collect URL data of various websites on the Internet through customized web crawlers, or / and from broadcast data of Internet advertising networks;
[0062] The specific steps of collecting URL data of various websites on the Internet through a customized web crawler are as follows: a customized web crawler refers to crawling web pages from several large portal websites, collecting URLs in the web pages, and adding the URLs to the candidate queue Middle; further continue to grab the URLs in the candidate queue, collect URLs from web pages, still add them to the candidate queue, remove duplicate URLs, and so on, until hundreds of millions of URL data are collected;
[0063] The specific steps of...
Embodiment 3
[0072] Step 1, grabbing the URL data of each website from Internet data;
[0073] Collect URL data of various websites on the Internet through customized web crawlers, or / and from broadcast data of Internet advertising networks;
[0074] The specific steps of collecting URL data of various websites on the Internet through a customized web crawler are as follows: a customized web crawler refers to crawling web pages from several large portal websites, collecting URLs in the web pages, and adding the URLs to the candidate queue Middle; further continue to grab the URLs in the candidate queue, collect URLs from web pages, still add them to the candidate queue, remove duplicate URLs, and so on, until hundreds of millions of URL data are collected;
[0075] The specific steps of collecting the URL data of each website on the Internet from the broadcast data of the Internet advertising network are as follows: each Internet advertising network will broadcast all the URLs accessed by ...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 
