Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and device for configuring page crawling rules

A configuration method and page technology, applied in the computer field, can solve the problems of time-consuming and energy-consuming, slow generation of crawling rules, etc., and achieve the effect of improving the generation speed

Active Publication Date: 2021-10-15
BEIJING GRIDSUM TECH CO LTD
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, as the number of crawled pages continues to increase, manually writing page element paths and regular expressions that need to be crawled content verification requires writers to have certain professional knowledge and analysis capabilities, and it takes time and energy, resulting in crawling Rules are slower to generate

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for configuring page crawling rules
  • Method and device for configuring page crawling rules
  • Method and device for configuring page crawling rules

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present invention are shown in the drawings, it should be understood that the invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present invention and to fully convey the scope of the present invention to those skilled in the art.

[0048] The embodiment of the present invention provides a configuration method of page crawling rules, such as figure 1 As shown, this method configures the page crawling rule file through the display rules of the generated page elements in the page to be crawled and the position information of the page elements in the page to be crawled, so that the page crawling rules can be automatically generated, and the crawling rate is improved. The gene...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a configuration method and device for page crawling rules, which relate to the field of computer technology, and the main purpose is to automatically generate page crawling rules and improve the generation speed of crawling rules. The main technical solutions of the invention are as follows: Select the page element to be crawled from the page that needs to be configured with crawling rules; generate the path information of the page element according to the attribute information corresponding to the page element to be crawled; set the regular expression that matches the page element to be crawled template to generate a regular expression matching the content of the page element to be crawled; according to the display rules of the page element to be crawled in the page to be crawled and the page element to be crawled The location information in the page configures the page crawling rules of the page to be crawled. The present invention is mainly used for the configuration of page crawling rules.

Description

technical field [0001] The present invention relates to the field of computer technology, in particular to a method and device for configuring page crawling rules. Background technique [0002] With the in-depth development of cloud computing and big data technology, a large amount of structured and unstructured information search and mining technology on the web has become a hot research issue. It often takes a lot of time and energy to analyze data. In the era of big data, crawler technology has become an important way to obtain network data. [0003] Usually, crawling rules need to be manually configured before crawlers crawl pages. Crawling rules include path information of crawled page elements and regular expressions that need to be checked for crawled content. Due to the complex and changeable web page structure, in the process of configuring the crawling rules in the prior art, the page elements are firstly positioned manually, and then the page elements are analyze...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/953
CPCG06F16/951
Inventor 满悦
Owner BEIJING GRIDSUM TECH CO LTD