Data crawling method and device, storage medium and processor

A storage medium and processor technology, applied in network data retrieval, network data indexing, other database retrieval, etc., can solve the problems of inability to crawl, reduce the accuracy of analysis results, and complete processing, and achieve the effect of comprehensive and accurate analysis

Inactive Publication Date: 2019-07-30
BEIJING GRIDSUM TECH CO LTD
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Under the existing data crawling technology, the crawler starts to crawl the webpage after a certain device visits the webpage, but at this time the webpage server may not have processed all the network requests corresponding to the webpage, which leads to the webpage Part of the page content in did not finish loading
In this case, the crawler will not be able to crawl all the content of the webpage, which will lead to a decrease in the accuracy of the analysis results of the crawled data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data crawling method and device, storage medium and processor
  • Data crawling method and device, storage medium and processor
  • Data crawling method and device, storage medium and processor

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

[0047] see figure 1 , this embodiment discloses a data crawling method, which specifically includes the following steps:

[0048] S101. Record a network request sent to a webpage server when a first device loads a first webpage, and obtain a network request record.

[0049] in, figure 1 The shown method can be applied to a server, and can also be applied to an intermediate device that is communicatively connected to the server and ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data crawling method and device, a storage medium and a processor, and the method comprises the steps of recording a network request sent to a webpage server when a first device loads a first webpage, obtaining a network request record, updating the network request record according to the response to the network request returned by the webpage server to the first device;then determining whether the webpage server processes all network requests sent to the webpage server when the first device loads the first webpage according to the network request record, when the webpage server processes all network requests sent to the webpage server when the first device loads the first webpage, controlling a crawler to perform data crawling on the first webpage. According tothe present invention, the situation that all data corresponding to the network request sent to the webpage server when the first device loads the first webpage is crawled is ensured, and further thecomprehensive and accurate analysis on a website is realized.

Description

technical field [0001] The present invention relates to the technical field of network data processing, and more specifically, to a data crawling method, device, storage medium and processor. Background technique [0002] A web crawler is a program that automatically extracts web content, and it crawls Internet information according to certain rules. The web crawler starts from the URL (Uniform Resource Locator, Uniform Resource Locator) of one or several initial webpages, obtains the content on the initial webpage, and continues to extract new URLs that meet the rules, put them into the queue, and repeat until the completion Stop when certain conditions set by the system are met. [0003] At present, a large number of websites use AJAX (Asynchronous JavaScript And XML, asynchronous JavaScript and XML) technology. When a device accesses a web page, it may need to send multiple network requests to the web server corresponding to the web page to load multiple pages. content,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/951
CPCG06F16/951
Inventor 满悦
Owner BEIJING GRIDSUM TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products