Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Crawler crawling method in the field of automatic vertical subdivision and management system

A crawler and domain technology, applied in the field of crawler crawling methods and its management system, can solve problems such as efficient scheduling, inability to predict crawler running time, and lack of a system

Active Publication Date: 2018-01-16
杭州金智塔科技有限公司
View PDF9 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0012] 1. They are all relatively general methods. In the application scenarios that require a large number of customized crawlers, it is impossible to predict and efficiently schedule the running time of crawlers;
[0013] 2. A relatively complete system has not been formed from crawler configuration, crawler scheduling, crawler execution, and data processing

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Crawler crawling method in the field of automatic vertical subdivision and management system
  • Crawler crawling method in the field of automatic vertical subdivision and management system
  • Crawler crawling method in the field of automatic vertical subdivision and management system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0090] Below in conjunction with accompanying drawing and specific embodiment the present invention is described in further detail:

[0091] Such as figure 2 A management system for the crawler crawling method shown in the automated vertical subdivision field includes a crawler crawling core layer and a crawler control management layer, which are used for parameter configuration, operation management, and real-time monitoring of the crawler. The management system .

[0092] The crawler crawling core layer is based on the Scrapy crawler application framework. Scrapy uses the Twisted asynchronous network library to process network communication. The structure is clear and includes various middleware interfaces, which can flexibly fulfill various requirements. image 3 It is the algorithm flow chart of the core layer of the crawler.

[0093] The crawler crawling core layer specifically includes the following components:

[0094] Engine: controls the data processing flow of th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the technology of crawler crawling and management scheduling and aims at providing a crawler crawling method in the field of automatic vertical subdivision and a management system. The crawler crawling method in the field of the automatic vertical subdivision comprises the steps of predicting crawler operation time; carrying out batch crawler scheduling optimization according to prediction time and the parallel number; and carrying out crawler crawling. Compared with the prior art, the method and the system have the advantages that the crawling efficiency of crawlers in the field of the automatic vertical subdivision is relatively high; through combination of features of vertical subdivision crawlers, a time prediction model for the crawlers is creatively introduced; and through combination of a longest processing time algorithm, the efficient scheduling of the parallel crawlers is carried out, so the crawling time is reduced.

Description

technical field [0001] The invention relates to the technical field of crawler crawling and management scheduling, in particular to a crawler crawling method and a management system in the field of automatic vertical subdivision. Background technique [0002] Although the information age of the data explosion contains massive amounts of information and data from all walks of life, human beings are limited in the amount of information they can receive and the ability to process information. We are often occupied by a large amount of redundant and useless information. It is becoming more and more difficult to personalize information, so various vertical segments and personalized recommendations emerge as the times require. Vertical segmentation focuses attention and services on a specific category, and data crawling for vertical segmentation is an important and basic work for services such as personalized recommendations. [0003] A web crawler is a program that automatically...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/18
Inventor 郑小林张建勇林炜华
Owner 杭州金智塔科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products