Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Distributed vertical crawler method and terminal equipment

A distributed and crawler technology, applied in the field of information retrieval, can solve problems such as difficult to support queries, inability to find and obtain well, and low crawling efficiency, so as to avoid pressure and single point of failure, improve data processing efficiency, and improve Effects on accuracy and performance

Pending Publication Date: 2020-03-31
贵州小叮当信息技术有限公司
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, existing general search engines also have certain limitations, such as: users in different fields and backgrounds often have different retrieval purposes and needs, and the results returned by general search engines include a large number of web pages that users do not care about; The goal of a general search engine is to cover the network as much as possible, and the contradiction between limited search engine server resources and unlimited network data resources will be further deepened; the abundance of data forms on the World Wide Web and the continuous development of network technology, pictures, databases, A large number of different data such as audio, video and multimedia appear, and general search engines are often powerless to these data with dense information content and certain structure, and cannot be well discovered and acquired; most general search engines provide keyword-based retrieval, which is difficult to support semantic Inquiries made for information, etc.
At present, when the amount of data that web crawlers need to crawl is huge, the crawling efficiency of the existing distributed crawler architecture is low, so it is necessary to improve

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed vertical crawler method and terminal equipment
  • Distributed vertical crawler method and terminal equipment
  • Distributed vertical crawler method and terminal equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0036] see Figure 1-3 , the present invention provides a technical solution: a distributed vertical crawler method, comprising the following steps:

[0037] A. First, the web crawler in the data capture unit crawls the webpage resource data;

[0038] B. Preprocessing the captured web resource data afterwards;

[0039] C. Classifying the preprocessed web page resource data to obtain classified data;

[0040] D. Transfer the classified data to ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a distributed vertical crawler method and terminal equipment. The distributed vertical crawler method comprises the following steps: A, crawling webpage resource data by a webcrawler in a data crawling unit; B, preprocessing the captured webpage resource data; C, classifying the preprocessed webpage resource data to obtain classified data; D, transmitting the classified data to a data analysis unit for data analysis; E, transmitting the analyzed data to a storage unit for encrypted storage; and F, finally transmitting the encrypted and stored data to a background monitoring terminal. The webpage resource data can be quickly captured, preprocessed, classified and encrypted, data processing efficiency is improved, safety is high, and data leakage is avoided.

Description

technical field [0001] The invention relates to the technical field of information retrieval, in particular to a distributed vertical crawler method and terminal equipment. Background technique [0002] With the rapid development of the network, the World Wide Web has become the carrier of a large amount of information, how to effectively extract and use this information has become a huge challenge. As a tool to assist people in retrieving information, search engines become the entrance and guide for users to access the World Wide Web. However, existing general search engines also have certain limitations, such as: users in different fields and backgrounds often have different retrieval purposes and needs, and the results returned by general search engines include a large number of web pages that users do not care about; The goal of a general search engine is to cover the network as much as possible, and the contradiction between limited search engine server resources and u...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F21/60G06F21/62G06F16/951G06F16/906G06F16/9535
CPCG06F16/906G06F16/951G06F16/9535G06F21/602G06F21/6218G06F2221/2107
Inventor 侯林勇方程张亮杨坤袁率王俊李亚萍刘婉莹
Owner 贵州小叮当信息技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products