Webpage information processing method and device

An information processing method and webpage information technology, which are applied in the field of webpage information processing methods and devices, and can solve problems such as time-consuming

Pending Publication Date: 2020-07-17
行吟信息科技(上海)有限公司
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, because the content of webpage information in the Internet is huge, it is time-consuming to crawl the webpage information of a certain website or a certain type of resource through a single web crawler.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Webpage information processing method and device
  • Webpage information processing method and device
  • Webpage information processing method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0054]At present, webpage information can be grabbed based on network resource data (such as the above-mentioned URL) by a single server, but the amount of webpage information is huge, and it will take a lot of time to grab it by a single server. For this reason, this embodiment provides a distributed webpage information crawling system, the architecture of the distributed webpage information crawling system is as follows: figure 1 As shown, it may include: master server 10, each slave server 20 ( figure 1 Take the first slave server to the nth slave server as an example for illustration), the queue 30 and the database 40.

[0055] Wherein the master server 10 is used to determine the network resource data corresponding to the subject to be captured, the subject to be captured can be designated by the user, after the network resource data corresponding to the subject to be captured is stored in the queue 30, each slave server 20 starts from the standby state Change into the w...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a webpage information processing method and device. A main server stores acquired network resource data corresponding to a to-be-captured theme into a queue; each slave server acquires at least one piece of network resource data in the queue, sends an acquisition request to the network resource data and receives webpage information corresponding to the network resource data,so that the webpage information corresponding to the network resource data is captured at least through each slave server, and the capturing efficiency of the webpage information is improved. The main server empties the queue after the storage time of the network resource data corresponding to the to-be-captured theme reaches the expiration time, the updated network resource data corresponding tothe to-be-captured theme is re-stored in the queue, and the network resource data in the queue is updated regularly, so that the webpage information can be captured based on the changed network resource data, the webpage information is incrementally acquired, and the webpage information is prevented from being repeatedly acquired.

Description

technical field [0001] The invention belongs to the technical field of distributed processing, and in particular relates to a webpage information processing method and device. Background technique [0002] A web crawler is a kind of crawler that embeds one or more buried points in the Internet, obtains URL (Uniform Resource Locator, Uniform Resource Locator) addresses through these buried points, and sends acquisition requests to URL addresses to obtain web page information from URL addresses. , and extract a new URL address from the web page information, send a request to the new URL address to obtain web page information from the new URL address, and so on, obtain more web page information by continuously obtaining new URL addresses. [0003] However, because the content of webpage information in the Internet is huge, it is time-consuming to crawl the webpage information of a certain website or a certain type of resource through a single web crawler. Contents of the inve...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/951
CPCG06F16/951
Inventor 何鲁敏宋子杰
Owner 行吟信息科技(上海)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products