Web page crawler method and device and electronic equipment

A web crawler and URL technology, applied in the computer field, can solve the problems of raising the threshold of crawling data, poor general effect, increased workload, etc., and achieve the effects of reducing dependence, improving utilization efficiency, and reducing maintenance costs

Inactive Publication Date: 2018-07-27
湖北省楚天云有限公司
View PDF5 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

It raises the threshold for crawling data, and also increases the workload; at the same time, due to any field or website change, the co

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Web page crawler method and device and electronic equipment
  • Web page crawler method and device and electronic equipment
  • Web page crawler method and device and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0054] In order to enable those skilled in the art to better understand the technical solutions in this specification, the technical solutions in the embodiments of this specification will be clearly and completely described below in conjunction with the drawings in the embodiments of this specification. Obviously, the described The embodiments are only some of the embodiments of the present application, but not all of them. Based on the embodiments of this specification, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the scope of protection of this application.

[0055] figure 1 It is a schematic diagram of the webpage crawling process involved in the solution of this specification in a practical application scenario. For example, if a user wants to crawl content related to a topic, the user can set the URL to be crawled, field information to be crawled, URL rules, field rules, and other information through t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

An embodiment of the invention discloses a web page crawler method and device and electronic equipment. The method comprises steps as follows: configuration information is set on the basis of a configuration template, wherein the configuration information comprises list page configuration information and detail page configuration information; list page information is crawled according to the listpage configuration information; designated information is crawled according to the obtained list page configuration information and the detail page configuration information. The configuration information and a code are decoupled by the configuration template, dependency of an underlying code on crawler configuration is reduced, a user is not required to analyze rules and content of a to-be-crawled web page, and newly increased demands can be met without code modification; code maintenance cost is reduced, and code utilization efficiency is improved. Meanwhile, the configuration information corresponding to multiple website rules, field rules, web page coding types and the like are preset in the configuration template, multiple demands and extending demands of the user can be met, and applicable range of the code is also expanded.

Description

technical field [0001] This description relates to the field of computer technology, and in particular to a web crawler method, device and electronic equipment. Background technique [0002] With the development of network technology, the network contains more and more data. If people want to obtain data, they usually use crawler technology to obtain data from web pages or databases. [0003] In the crawler scheme in the prior art, generally first select the website that needs to crawl data, which can be one or more, and determine the specific content and other information that need to be crawled; next, the professionals analyze the website, and after the analysis After crawling the website, determine the extraction rules; for writing codes for different websites, if you need to crawl a new website or find that some new content in the original crawled website needs to be crawled, you must re-modify the code and then fix it. Execute crawler tasks in code. It raises the thre...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F8/35G06F8/41
CPCG06F8/35G06F8/443G06F16/951
Inventor 罗立志
Owner 湖北省楚天云有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products