Unlock instant, AI-driven research and patent intelligence for your innovation.

Classification recognition method based on URL (uniform resource locator)

A classification recognition and category technology, applied in the Internet field, can solve problems such as storage overload, inability to crawl and index in advance, inability to complete online service requests, etc.

Active Publication Date: 2015-02-11
ZHEJIANG PANSHI INFORMATION TECH
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the visit frequency of web pages follows a power-law distribution. Most of the pages are rarely visited. These pages will not be crawled and indexed for online advertising services, because the storage cost of offline processing and indexing is prohibitive. , and storing all web page URLs and their categories will lead to storage overload
What's more, pages with dynamic content or user-generated authentication that require authentication cannot be crawled and indexed in advance
These webpages that are not in the ad serving index are non-indexed webpages. For ad requests for non-indexed webpages, the usual method cannot be used, because it is not in the index at all, and the online service request cannot be completed

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Classification recognition method based on URL (uniform resource locator)
  • Classification recognition method based on URL (uniform resource locator)
  • Classification recognition method based on URL (uniform resource locator)

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] In Internet advertising matching, the current practice is to crawl the media pages, classify the web pages according to their text content by means of parsing and classification, and store the classifications in one-to-one correspondence with the URLs of the web pages in the index file. , when there is an advertisement request, go to the index to find the category information of the corresponding URL and then select the matching advertisement. The method of storing all web page URLs will lead to an overload of storage capacity, and all pages must be processed and classified offline, and new pages that have not been processed will not be classified in time, so that online service requests cannot be completed.

[0031] Aiming at these bottlenecks, the method proposed by the present invention assumes that classification structure information may exist in URLs, and a cluster of webpage content where similar URLs are located may correspond to similar classifications. The idea...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a classification recognition method based on URL (uniform resource locator), comprising the following steps: step 1 of classifying web pages serving advertisements by a classifier to obtain classes of web pages corresponding to all URLs of a website; step 2 of generating an URL tree of the website according to all URLs of the website; step 3 of matching the URL tree according to URL requested by advertisements, and returning a matching result. The classification recognition method based on URL described in the invention can solve the problem that advertisement matching is delayed, URL memory space is large, and pages without indexes are not classified in time.

Description

technical field [0001] The invention relates to the Internet, in particular to a URL-based classification identification method. Background technique [0002] In Internet advertising, advertisements related to the content of web pages are embedded in the pages. When a user visits a web page, its publisher requests advertisements from ad networks such as Google, Microsoft, and Yahoo. Due to the stringent requirements of latency, communication costs, and storage capacity, it is not feasible for publishers to send the entire page to the advertising network, and the advertising network crawls the entire page within milliseconds. It is not feasible to analyze the content and select the most relevant advertisement. of. [0003] At present, the common practice is to crawl the media page offline, and extract its classification and keywords from the content of the page. The same is true for the advertisement itself, with a set of classification and keywords provided by the advertis...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
Inventor 吴欢琴田宁刘崟谭磊
Owner ZHEJIANG PANSHI INFORMATION TECH