Distributed vertical crawler method and terminal equipment

A distributed and crawler technology, applied in the field of information retrieval, can solve problems such as difficult to support queries, inability to find and obtain well, and low crawling efficiency, so as to avoid pressure and single point of failure, improve data processing efficiency, and improve Effects on accuracy and performance

Pending Publication Date: 2020-03-31
贵州小叮当信息技术有限公司
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, existing general search engines also have certain limitations, such as: users in different fields and backgrounds often have different retrieval purposes and needs, and the results returned by general search engines include a large number of web pages that users do not care about; The goal of a general search engine is to cover the network as much as possible, and the contradiction between limited search engine server resources and unlimited network data resources will be further deepened; the abundance of data forms on the World Wide Web and the continuous development of networ

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed vertical crawler method and terminal equipment
  • Distributed vertical crawler method and terminal equipment
  • Distributed vertical crawler method and terminal equipment

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0035] The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.

[0036] See Figure 1-3 , The present invention provides a technical solution: a distributed vertical crawler method, including the following steps:

[0037] A. First, the web crawler in the data capture unit crawls web resource data;

[0038] B. Preprocess the crawled web resource data afterwards;

[0039] C. Classify the preprocessed web resource data to obtain classified data;

[0040] D. Transmit the classified data to the data analysis un...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a distributed vertical crawler method and terminal equipment. The distributed vertical crawler method comprises the following steps: A, crawling webpage resource data by a webcrawler in a data crawling unit; B, preprocessing the captured webpage resource data; C, classifying the preprocessed webpage resource data to obtain classified data; D, transmitting the classified data to a data analysis unit for data analysis; E, transmitting the analyzed data to a storage unit for encrypted storage; and F, finally transmitting the encrypted and stored data to a background monitoring terminal. The webpage resource data can be quickly captured, preprocessed, classified and encrypted, data processing efficiency is improved, safety is high, and data leakage is avoided.

Description

technical field [0001] The invention relates to the technical field of information retrieval, in particular to a distributed vertical crawler method and terminal equipment. Background technique [0002] With the rapid development of the network, the World Wide Web has become the carrier of a large amount of information, how to effectively extract and use this information has become a huge challenge. As a tool to assist people in retrieving information, search engines become the entrance and guide for users to access the World Wide Web. However, existing general search engines also have certain limitations, such as: users in different fields and backgrounds often have different retrieval purposes and needs, and the results returned by general search engines include a large number of web pages that users do not care about; The goal of a general search engine is to cover the network as much as possible, and the contradiction between limited search engine server resources and u...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F21/60G06F21/62G06F16/951G06F16/906G06F16/9535
CPCG06F16/906G06F16/951G06F16/9535G06F21/602G06F21/6218G06F2221/2107
Inventor 侯林勇方程张亮杨坤袁率王俊李亚萍刘婉莹
Owner 贵州小叮当信息技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products