Method for crawling webpage contents with paging

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A web page content and paging technology, applied in the field of JAVA platform, can solve the problem that the paging part cannot be directly captured

Inactive Publication Date: 2018-09-21

ZHUHAI HENGQIN SHENGDA ZHAOYE TECH INVESTMENT CO LTD

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] The technical problem to be solved by the present invention is to provide a method for grabbing webpage content with paging; it solves the problem that the paginated parts of webpages with paging cannot be directly grabbed

This solves the problem that the paginated part of the webpage with paging cannot be directly crawled

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0016] like figure 1 Shown, the present invention adopts following steps:

[0017] Step 1. Check whether the URL of the current page to be crawled has a page number. If not, use the developer tool to analyze it, find out its query parameters and the requested URL, and splice out a URL with a page number based on them;

[0018] 1) Open the webpage to be crawled through mainstream browsers such as 360 or Google;

[0019] 2) Open the developer tools;

[0020] 3) Find the Headers sub-tab in the Network tab;

[0021] 4) Find the requested main URL and request method in General;

[0022] 5) Obtain the content of the Request Headers request header;

[0023] 6) Find the parameter content of Query String Parameters, assemble it with the same main URL as above, and generate a URL with paging numbers;

[0024] Step 2, use a network tool to load it, and obtain the Html information content;

[0025] / / 1) Initialize the network tool according to the request header information

[0026...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to the technical field of JAVA platforms, and particular relatesto a method for crawling webpage contents with paging. The method includes the steps that firstlywhether a URL ofa current page to be crawled has a paging number is checked,if not, a developer tool is used for parsing to find out query parameters and the requested URL, and a URL with apaging number is splicedaccording to the query parameters and the requested URL; a network tool is used for loading to obtain Html information contents, and a crawler tool is used for extracting information such as the total number of pages, the current number of pages and the like;the total number of pages is used as an end value, the current number of pages is used as a starting value, circulation is conducted, and cyclicvariables are used for replacing the paging number in the URL during the circulation to generatea URL of each page; finally, the network tool is used for loadingURLs of pages, the crawler tool is used for extracting the required contents, and the obtained data is saved to a database. The method solves the problem that paged parts which arenot displayed of a webpage with pages cannot be directly crawled.

Description

technical field [0001] The invention relates to the technical field of the JAVA platform, in particular to a method for grabbing web page content with paging. Background technique [0002] When crawling webpage intelligence information, it is often encountered that a lot of content to be crawled has paging. What we can capture is only the data of the page we are currently viewing. For other paging data, we need to click the pagination button. to be loaded. If the page has tens of thousands of pages, it is not advisable to manually click the button to load the content of the page for crawling. In order to solve these problems, it is necessary to implement a function that can simulate clicking the pagination button to obtain the URLs of all pagination pages so as to capture some information that has not been loaded. Contents of the invention [0003] The technical problem solved by the present invention is to provide a method for grabbing webpage content with paging; it so...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F17/30

Inventor 陈林张来卿庞严冬

Owner ZHUHAI HENGQIN SHENGDA ZHAOYE TECH INVESTMENT CO LTD

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Method for crawling webpage contents with paging

What is Al technical title? Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document. A web page content and paging technology, applied in the field of JAVA platform, can solve the problem that the paging part cannot be directly captured

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A web page content and paging technology, applied in the field of JAVA platform, can solve the problem that the paging part cannot be directly captured

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology