Method and equipment for capturing data of website

A website and device technology, applied in the Internet field, can solve the problems of low data capture accuracy, high script maintenance cost, and low data capture efficiency, and achieve the effects of reducing maintenance costs, ensuring accuracy, and improving efficiency

Active Publication Date: 2013-08-14
BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
View PDF2 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] In the prior art, fetching data from data-providing websites generally requires executing a script for each website separately, but when the number of data-providing websites is large, multiple sets of fetching scripts need to be maintained, so script maintenance costs are high, and data fetching The efficiency is not high; at the same time, after the classification information is set on the data providing website, the cookie information of the last classification information will be stored on the server side, but because the traditional dat...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and equipment for capturing data of website
  • Method and equipment for capturing data of website
  • Method and equipment for capturing data of website

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] The present invention will be described in further detail below in conjunction with the accompanying drawings.

[0035] figure 1 A schematic diagram of a device for capturing website data according to one aspect of the present invention is shown. Wherein, the grabbing device 1 includes a first obtaining device 111 , a judging device 112 , a first circulation device 113 , a first grabbing device 114 and a second circulation device 115 .

[0036] Here, the grabbing device 1 is a network device, including but not limited to a computer, a network host, a single network server, a set of multiple network servers or a cloud formed by multiple servers. Here, the cloud is composed of a large number of computers or network servers based on cloud computing (Cloud Computing), wherein cloud computing is a kind of distributed computing, a super virtual computer composed of a group of loosely coupled computer sets.

[0037] Here, any communication method can be used between the grab...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention aims to provide a method and equipment for capturing data of a website. The method includes selecting a non-visited link from all links in a current root page according to information of a topological structure of the website and acquiring a next-layer page directed by the non-visited link; judging whether the next-layer page is a target information page or not according to a first preset rule; utilizing the next-layer page as a current root page when the next-layer page is not the target information page, and executing a step a and a step b repeatedly until a first preset condition is met; capturing the target information page when the next-layer page is the target information page; and utilizing a previous root page as a current root page when a second preset condition is met, and executing the step a, the step b, a step c1 and a step c2 repeatedly. Compared with the prior art, the method and the equipment have the advantages that a depth-first traversal mode is adopted, the target data of the integral website can be captured, the target data capturing accuracy is guaranteed, and the data capturing efficiency is improved.

Description

technical field [0001] The invention relates to the technical field of the Internet, in particular to a technology for grabbing website data. Background technique [0002] In the prior art, fetching data from data-providing websites generally requires executing a script for each website separately, but when the number of data-providing websites is large, multiple sets of fetching scripts need to be maintained, so script maintenance costs are high, and data fetching The efficiency is not high; at the same time, after the classification information is set on the data providing website, the cookie information of the last classification information will be stored on the server side, but because the traditional data capture generally adopts the breadth-first crawling method, and on the same page When changing the classification information in the website, the uniform resource locator (URL) of the page link will not change, so that after visiting each classification information li...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 江军余庆生
Owner BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products