Method and device for capturing webpage content

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A webpage content and webpage technology, applied in the field of webpage content crawling, can solve the problems of low efficiency of webpage content crawling and high complexity of webpage content crawling, and achieve the effect of reducing complexity and improving efficiency

Inactive Publication Date: 2015-08-26

SMART CITY INFORMATION TECH

View PDF3 Cites 17 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] Embodiments of the present invention provide a method and device for capturing webpage content, which are used to solve the problems of high complexity of capturing webpage content and low efficiency of capturing webpage content in the process of capturing different types of webpage content. question

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0031] In order to solve the problems of high complexity and low efficiency of webpage content crawling in the current process of crawling different types of webpage content. In the embodiment of the present invention, when a webpage to be crawled is detected, the URL of the webpage to be crawled is searched from the preset crawling rule base, and when there is no crawling rule corresponding to the URL in the crawling rule base , analyzing content in the webpage to be crawled, and generating crawling rules for the webpage to be crawled that meet the conditions. By adopting the technical scheme of the present invention, the content in the webpage to be grabbed is analyzed, and the corresponding grabbing rules for the webpage to be grabbed are automatically generated according to the analysis results, and there is no need to manually set the grabbing rules, which effectively reduces the complexity of webpage content grabbing and improves Improve the efficiency of web content cra...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a method and a device for capturing webpage content, and aims to solve the problems in the prior art that in a process of capturing different types of webpage content, the webpage content capturing complexity is high, and the webpage content capturing efficiency is low. According to method provided by the embodiment of the invention, when a webpage to be captured is detected, the URL of the webpage to be captured is searched from a preset capturing rule base; when no capturing rule corresponding to the URL exists in the capturing rule base, the content of the webpage to be captured is analyzed, and a capturing rule is generated for the qualified webpage to be captured. Through the adoption of the technical scheme, the content of the webpage to be captured is analyzed, the capturing rule corresponding to the webpage to be captured is automatically generated according to the analysis result, the capturing rule does not need to be manually set, the webpage content capturing complexity is effectively reduced, and the webpage content capturing efficiency is improved.

Description

technical field [0001] The invention relates to the technical field of computer applications, in particular to a method and device for grabbing webpage content. Background technique [0002] Web crawlers are a fundamental part of search engine technology. Web crawler technology starts from the URL (Uniform Resource Locator, Uniform Resource Locator) of one or several initial webpages, and obtains the URLs on the initial webpage. Pull new URLs on the page and put them in the queue until some kind of stop condition is met. Then, the captured web page information is stored in the server of the search engine, thereby speeding up the user's search speed. [0003] At present, in the process of crawling webpages using web crawler technology, the crawling rules are set manually. For different types of webpages, corresponding crawling rules need to be manually set. When there are many types of webpages to be crawled, it will consume a lot of More manpower is required to set up cra...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30G06F17/27

Inventor狄东杰孙德山姚臻

OwnerSMART CITY INFORMATION TECH

Method and device for capturing webpage content

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology