A web crawler system and method

A web crawler and crawler technology, applied in the field of web search, can solve problems such as inability to effectively extract dynamic URLs, achieve the effect of improving efficiency and performance, and maintaining safe applications

Inactive Publication Date: 2011-11-30
BEIJING VENUS INFORMATION TECH +1
View PDF5 Cites 22 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] The technical problem to be solved by the present invention is to provide a web crawler system and method to solve the technical defect that the dynamic URL cannot be effectively extracted in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A web crawler system and method
  • A web crawler system and method
  • A web crawler system and method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] The implementation of the present invention will be described in detail below in conjunction with the accompanying drawings and examples, so as to fully understand and implement the process of how to apply technical means to solve technical problems and achieve technical effects in the present invention.

[0045] First of all, if there is no conflict, the embodiments of the present invention and various features in the embodiments can be combined with each other, all within the protection scope of the present invention. In addition, the steps shown in the flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and, although a logical order is shown in the flow diagrams, in some cases, the sequence may be different. The steps shown or described are performed in the order herein.

[0046] figure 1 It is a schematic diagram of the principle of a static crawler process in the prior art. Such as figure 1 As sh...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a web crawler system and method, which solves the technical defect that the dynamic URL cannot be effectively extracted in the prior art, wherein the method includes: setting a first deduplication queue; receiving a target page; Page is crawled; The Uniform Resource Locator (URL) that this static crawler can't analyze in this target page is used as dynamic URL; This dynamic URL is submitted to this first deduplication queue; Adopt dynamic crawler to continue this first deduplication queue Dynamic URLs in Crawl. The invention overcomes the technical defect that the dynamic URL cannot be effectively extracted in the prior art, effectively improves the search efficiency and performance of the webpage, and is beneficial to maintaining the safe application of the webpage.

Description

technical field [0001] The invention relates to webpage search technology, in particular to a webpage crawler system and method. Background technique [0002] A web crawler is a program that automatically extracts web pages. It downloads web pages from the Internet for search engines and is an important component of search engines. Traditional crawlers start from the Uniform Resource Locator (URL) of one or several initial web pages, and obtain the URL on the initial web page. So it goes round and round until it traverses the entire Internet and stops when the latter meets certain stop conditions of the system. [0003] In terms of the application range of crawlers, it is mainly used in search engines such as Google (Google), Baidu and subdivided professional search engines (such as job search engines, etc.), and it is also used in the collection of virus samples and network security monitoring. cloud security etc. [0004] According to whether the webpage contains script...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 肖小剑李天武
Owner BEIJING VENUS INFORMATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products