Webpage information processing method and device

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
An information processing method and webpage information technology, which are applied in the field of webpage information processing methods and devices, and can solve problems such as time-consuming

Pending Publication Date: 2020-07-17

行吟信息科技(上海)有限公司

View PDF8 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] However, because the content of webpage information in the Internet is huge, it is time-consuming to crawl the webpage information of a certain website or a certain type of resource through a single web crawler.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0054]At present, webpage information can be grabbed based on network resource data (such as the above-mentioned URL) by a single server, but the amount of webpage information is huge, and it will take a lot of time to grab it by a single server. For this reason, this embodiment provides a distributed webpage information crawling system, the architecture of the distributed webpage information crawling system is as follows: figure 1 As shown, it may include: master server 10, each slave server 20 ( figure 1 Take the first slave server to the nth slave server as an example for illustration), the queue 30 and the database 40.

[0055] Wherein the master server 10 is used to determine the network resource data corresponding to the subject to be captured, the subject to be captured can be designated by the user, after the network resource data corresponding to the subject to be captured is stored in the queue 30, each slave server 20 starts from the standby state Change into the w...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a webpage information processing method and device. A main server stores acquired network resource data corresponding to a to-be-captured theme into a queue; each slave server acquires at least one piece of network resource data in the queue, sends an acquisition request to the network resource data and receives webpage information corresponding to the network resource data,so that the webpage information corresponding to the network resource data is captured at least through each slave server, and the capturing efficiency of the webpage information is improved. The main server empties the queue after the storage time of the network resource data corresponding to the to-be-captured theme reaches the expiration time, the updated network resource data corresponding tothe to-be-captured theme is re-stored in the queue, and the network resource data in the queue is updated regularly, so that the webpage information can be captured based on the changed network resource data, the webpage information is incrementally acquired, and the webpage information is prevented from being repeatedly acquired.

Description

technical field [0001] The invention belongs to the technical field of distributed processing, and in particular relates to a webpage information processing method and device. Background technique [0002] A web crawler is a kind of crawler that embeds one or more buried points in the Internet, obtains URL (Uniform Resource Locator, Uniform Resource Locator) addresses through these buried points, and sends acquisition requests to URL addresses to obtain web page information from URL addresses. , and extract a new URL address from the web page information, send a request to the new URL address to obtain web page information from the new URL address, and so on, obtain more web page information by continuously obtaining new URL addresses. [0003] However, because the content of webpage information in the Internet is huge, it is time-consuming to crawl the webpage information of a certain website or a certain type of resource through a single web crawler. Contents of the inve...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F16/951

CPCG06F16/951

Inventor何鲁敏宋子杰

Owner行吟信息科技(上海)有限公司

Webpage information processing method and device

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology