A method capable of significantly improving the network information capturing and storage speed

A technology for network information and storage speed, applied in the field of big data, can solve problems such as time-consuming, cluttered content, and complex network information capture methods, and achieve the effect of improving collection and capture speed and storage speed.

Inactive Publication Date: 2019-04-23
河南大瑞物联网科技有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] The amount of network data is large and the content is messy. The existing crawler big data data collection technology is more complex in capturin

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] A method that dramatically increases the speed at which web information is captured and stored, including:

[0020] Step 1, grab the required information from the Internet through a web crawler, and extract the required keywords;

[0021] Step 2: Provide the crawler with URLs that need to crawl the data network through the URL queue; the URL is only a part of all seed URLs, put these URLs into the URL queue to be crawled, and take out the URL to be crawled from the URL queue to be crawled URL, resolve DNS, and get the IP of the host, download the webpage corresponding to the URL, store it in the downloaded webpage library, put the URL of the downloaded webpage into the crawled URL queue, and analyze the URLs in the crawled queue ;

[0022] Step 3, process the content captured by the crawler through the data classification module;

[0023] Step 4, store the URL information of the website that needs to grab data, the data extracted from the webpage by the crawler, and t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method capable of remarkably improving network information capturing and storage speed, which comprises the following steps of 1 capturing required information from the Internet through a web crawler, and extracting required keywords; 2 defining that the URL is only a part of all seed URLs; putting the URLs into a URL queue to be grabbed; taking out the URL to be grabbedfrom the URL queue to be grabbed, analyzing the DNS, obtaining the IP of the host, downloading the webpage corresponding to the URL, storing the webpage into a downloaded webpage library, putting theURL with the downloaded webpage into the grabbed URL queue, and analyzing the URL in the grabbed queue; 3 processing the content captured by the crawler through a data classification module; and 4 storing the URL information of the website needing to capture the data, the data extracted from the webpage by the crawler and the data processed by the DP through a data storage module. According to theinvention, the collection and capture speed of the network information is improved, and the storage speed of the captured information is improved.

Description

technical field [0001] The invention relates to the field of big data, in particular to a method that can significantly improve the speed of network information capture and storage. Background technique [0002] The amount of network data is large and the content is messy. The existing crawler big data data collection technology is more complex in capturing network information and takes a lot of time. This leads to related companies wanting to obtain more resources and can only expand The server scale, the cost is high. Contents of the invention [0003] In order to solve the above problems, the present invention provides a method that can significantly improve the speed of network information capture and storage, including: [0004] Step 1, grab the required information from the Internet through a web crawler, and extract the required keywords; [0005] Step 2: Provide the crawler with URLs that need to crawl the data network through the URL queue; the URL is only a par...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/951G06F16/955
Inventor 韩金花
Owner 河南大瑞物联网科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products