Method for realizing web crawler tasks

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A web crawler and task technology, applied in the field of web crawlers, achieves the effects of speed assurance, shortened development cycle, and reduced development difficulty

Inactive Publication Date: 2013-03-27

维我软件(上海)有限公司

View PDF4 Cites 27 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] The web crawlers in the prior art are usually designed for a specific website in advance, and it is difficult to modify the target website and its crawling parameters after setting. If it is necessary to crawl the content of other websites, it is usually necessary to redesign the web crawler program

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment ： 1 system approach ， specific 1 、 approach 。 specific Embodiment approach

[0033] A. In step S11, write a template for storing page link addresses. This step creates a template for storing page link address information for each page that needs to be crawled. This template is equivalent to a blank page address record book, which can be used to save the link address of the crawled page and the depth information of the page. For example, the address link of the detailed information page of Wanfang Papers is:

[0034] (http: / / d.wanfangdata.com.cn / Periodical_ahzylczz201203001.aspx), the page depth is 3, then the content stored in the template is the above address link and depth value 3.

[0035] B. In step S12, write a link resolver. First, establish a regular expression, analyze the website that needs to be crawled, and write a regular expression that can extract the link address of this type of page from the content of the web page according to the characteristics of the link address of each page that needs to be crawled; secondly, Concrete implementa...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a method for realizing web crawler tasks. The method comprises the following steps: 1, initializing a link address of a webpage to be crawled to a client; 2, packaging the link address of the webpage to be crawled to a task request to a server by the client; 3, sending an HTTP request by the server to the page to be crawled and returning the information required to the client; 4, receiving the information and processing the information by the client; and 5, repeating the process and completing the webpage crawling in a crawling list sequentially. The invention provides a universal crawling frame for crawling different network contents. Through adopting the method, crawlers for crawling a special network can be quickly compiled. According to the method, the development difficulty of developers is greatly reduced, and the development period is shortened. As the method is established based on the distributed network crawler frame, the network crawling speed can be further guaranteed. The method provided by the invention can be used to medical information systems.

Description

technical field [0001] The invention relates to the technical field of web crawlers, more specifically, to a method for realizing web crawler tasks, and is mainly used in medical information systems. Background technique [0002] Web crawler (also known as web spider, web robot) is a program or script that automatically grabs information on the World Wide Web according to certain rules. It downloads web pages from the World Wide Web for search engines and is an important component of search engines. Traditional crawlers start from the URL of one or several initial webpages, obtain the URLs on the initial webpage, and continuously extract new URLs from the current page and put them into the queue during the process of crawling webpages, until they meet certain stop conditions of the system and terminate the operation . [0003] The current practical web crawler programs are usually distributed. The distributed web crawler contains multiple crawlers. The tasks that each craw...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30

Inventor 金博

Owner 维我软件(上海)有限公司

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Method for realizing web crawler tasks

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment ： 1 system approach ， specific 1 、 approach 。 specific Embodiment approach

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology