Unlock instant, AI-driven research and patent intelligence for your innovation.

Automatic mining method of website channel

An automatic mining and website technology, applied in the field of automatic mining of website channels, can solve the problems of large space occupation, long time-consuming capture and classification, time-consuming and labor-intensive problems, achieve small disk space occupation, improve capture and classification efficiency, and reduce errors rate effect

Active Publication Date: 2018-03-30
上海晶赞企业管理咨询有限公司
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] In order to solve the problems of long time-consuming capture and classification, large space occupation, and time-consuming and labor-intensive problems, the present invention provides an automatic mining method for website channels. The technical solution is as follows:

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic mining method of website channel

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0048] figure 1 It is a flow chart of the method for automatic mining of website channels.

[0049] Such as figure 1 As shown, the website channel automatic mining method includes the following steps:

[0050] Step 1, grabbing the URL data of each website from Internet data;

[0051] Step 2, decomposing the URL data into multiple URL patterns;

[0052]Step 3, filtering the multiple URL patterns obtained by decomposing, removing URL patterns that are repeatedly included, and obtaining candidate URL patterns;

[0053] Step 4, sampling the URL data included in the filtered candidate URL patterns;

[0054] Step 5, crawling the webpage content on the URL data left by the sampling, and classifying the webpage;

[0055] Step 6, counting the URL data contained in each URL pattern, setting the same proportion threshold for classification, and leaving the patterns whose URL data classification exceeds the proportion threshold;

[0056] Step 7, merging the patterns in the URL patte...

Embodiment 2

[0058] figure 1 It is a flow chart of the method for automatic mining of website channels.

[0059] Such as figure 1 As shown, the website channel automatic mining method includes the following steps:

[0060] Step 1, grabbing the URL data of each website from Internet data;

[0061] Collect URL data of various websites on the Internet through customized web crawlers, or / and from broadcast data of Internet advertising networks;

[0062] The specific steps of collecting URL data of various websites on the Internet through a customized web crawler are as follows: a customized web crawler refers to crawling web pages from several large portal websites, collecting URLs in the web pages, and adding the URLs to the candidate queue Middle; further continue to grab the URLs in the candidate queue, collect URLs from web pages, still add them to the candidate queue, remove duplicate URLs, and so on, until hundreds of millions of URL data are collected;

[0063] The specific steps of...

Embodiment 3

[0072] Step 1, grabbing the URL data of each website from Internet data;

[0073] Collect URL data of various websites on the Internet through customized web crawlers, or / and from broadcast data of Internet advertising networks;

[0074] The specific steps of collecting URL data of various websites on the Internet through a customized web crawler are as follows: a customized web crawler refers to crawling web pages from several large portal websites, collecting URLs in the web pages, and adding the URLs to the candidate queue Middle; further continue to grab the URLs in the candidate queue, collect URLs from web pages, still add them to the candidate queue, remove duplicate URLs, and so on, until hundreds of millions of URL data are collected;

[0075] The specific steps of collecting the URL data of each website on the Internet from the broadcast data of the Internet advertising network are as follows: each Internet advertising network will broadcast all the URLs accessed by ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention belongs to the field of website channel mining technologies, and provides an automatic website channel mining method. The method comprises: capturing URL data of each website from internet data; decomposing the URL data into a plurality of URL modes; filtering the URL modes obtained through decomposition, and removing repeating URL modes to obtain candidate URL modes; performing sampling processing on URL data comprised in the filtered candidate URL modes; performing web page content capturing on URL data resulting from the sampling, and performing web page classification; collecting statistics on URL data comprised in each URL mode, setting a threshold ratio with same classification, and keeping modes which comprise URL data classification that is greater than the threshold ratio; and combining URL modes having an inclusion relationship to obtain a channel list. The method provided by the present invention can automatically discover channels of each website and classify the channels, improving efficiency in capturing and classifying URL data, occupying small disk space, saving time and effort, and achieving higher accuracy in sorting.

Description

technical field [0001] The invention belongs to the technical field of website channel mining, and in particular relates to a website channel automatic mining method for analyzing and processing large-scale webpage URLs, automatically discovering the channels of each website and classifying the channels. Background technique [0002] With the continuous development of Internet technology and the continuous expansion of information, people's demand for network information is getting higher and higher. How to analyze and manage massive network URLs is a difficult problem that many network applications need to face. The website channel automatic mining method is to analyze and process large-scale webpage URLs, automatically discover and classify the channels of each website, and further realize the classification of URLs. [0003] With the development of the Internet advertising industry, the mainstream advertising delivery method has changed from placing the same advertisement...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/951G06F16/958
Inventor 汤奇峰刘作涛
Owner 上海晶赞企业管理咨询有限公司