Unlock instant, AI-driven research and patent intelligence for your innovation.

Web crawler system based on big data

A web crawler and big data technology, applied in the field of crawler, can solve the problems of inability to identify valid data, low crawling efficiency, difficulty in applying information crawling, etc., to reduce the amount of invalid crawling data and improve crawling efficiency.

Pending Publication Date: 2020-12-01
BEIJING JINHER SOFTWARE
View PDF3 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In order to overcome at least to a certain extent the problem that the existing web crawler system cannot identify valid data and blindly crawl all the information on the webpage, which is not only time-consuming, but also has low crawling efficiency and is difficult to apply to the information crawling of big data networks. This application provides A web crawler system based on big data, including:

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Web crawler system based on big data
  • Web crawler system based on big data
  • Web crawler system based on big data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] In order to make the purpose, technical solution and advantages of the present application clearer, the technical solution of the present application will be described in detail below. Apparently, the described embodiments are only some of the embodiments of this application, not all of them. Based on the embodiments in the present application, all other implementation manners obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present application.

[0031] figure 1 A functional structure diagram of a web crawler system based on big data provided for an embodiment of the present application, such as figure 1 As shown, the big data-based web crawler system includes:

[0032] The configuration module 11 is used to configure crawler parameters;

[0033] In some embodiments, crawler parameters include, but are not limited to, network URLs and data fields in web pages.

[0034] By configuring the URL of the da...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a web crawler system based on big data, and the web crawler system based on big data comprises a configuration module which is used for configuring crawler parameters; the generation module is used for generating a crawler program according to the crawler parameters; and the crawling module is used for crawling information according to the crawler program. Content of configuration parameter configuration can be accurately crawled, invalid crawling data volume is reduced, crawling efficiency is improved, and the method adapts to big data network requirements.

Description

technical field [0001] This application belongs to the technical field of reptiles, and in particular relates to a web crawler system based on big data. Background technique [0002] With the rapid development of Internet technology, the era of big data has arrived, and data collection has become a crucial link. As an important source of data collection, the crawler system plays an irreplaceable role. The biggest application scenario of traditional web crawlers is search engines, and ordinary enterprises are more likely to make websites or applications. Web crawlers are responsible for collecting and storing Internet information from various websites, and use it as data support for search engine companies to provide search services. Web crawlers make the huge Internet searchable, and make the explosive growth of the Internet easier to access and obtain. In the foreseeable future, Internet crawler technology will continue to develop. As the largest knowledge warehouse in h...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/951G06F8/71
CPCG06F16/951G06F8/71
Inventor 梁强杨刚乌兰连守财
Owner BEIJING JINHER SOFTWARE