Website data acquisition method and device

A data collection and website technology, applied in the field of communication, can solve problems such as inability to collect data

Inactive Publication Date: 2014-03-05
BEIJING PEOPLE ONLINE NETWORK
View PDF3 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The invention provides a method and device for collecting website data, which is used to solve the problem tha

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Website data acquisition method and device
  • Website data acquisition method and device
  • Website data acquisition method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021] The technical solutions of the present invention will be described in further detail below with reference to the accompanying drawings and embodiments.

[0022] figure 1 It is a schematic flowchart of a website data collection method provided by an embodiment of the present invention. In this embodiment, the method is executed by a website data collection device. Such as figure 1 As shown, the website data collection method includes the following steps:

[0023] 101. Pre-set the identity information for logging into the website to be collected in the web crawler program; the identity information includes a login account and a login password.

[0024] In this embodiment, a web crawler program is set in the website data collection device, and the administrator of the network data collection device can register with the website to be collected and obtain identity information for logging into the website. The identity information may include a login account and a login ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a website data acquisition method and device. The method comprises the first step of setting identity information used for logging in on a website to be acquired in a web spider program in advance, wherein the identity information comprises a log-in account and a log-in password, the second step of writing the identity information in a log-in form of the log-in page of the website, the third step of enciphering the log-in form and sending the log-in form to a server corresponding to the website so that the validity of the identity information can be authorized through the server, and the fourth step of receiving identification codes which are used for having access to the website and sent by the server, wherein the identification codes are sent after the server authorizes that the identity information is valid, and data acquisition is performed on the website by adopting the web spider program when the identification codes are used for having access to all the webpages of the network. According to the website data acquisition method and device, the identity information used for logging in on the website to be acquired is stored in the web spider program in advance, the identification codes are obtained from the server through the identity information, and then the web spider program accesses all the webpages of the website based on the identification codes so that data acquisition on the website can be achieved.

Description

technical field [0001] The invention relates to the communication field, in particular to a website data collection method and device. Background technique [0002] At present, web crawler programs are mostly used to collect data from websites. Among them, a web crawler program is a program that roams a collection of web pages (Web) documents along links. The web crawler reads the corresponding web document through a given Uniform Resource Locator (URL) link, using standard protocols such as the Hypertext Transfer Protocol (HTTP), and then uses the All included URL links that have not been visited are used as a new starting point, and the roaming is continued until there are no new URL links that have not been visited. After the web crawler program completes all roaming, it downloads the pages pointed to by all URL links, saves them, and performs element analysis to obtain the data collection results of the website. [0003] Nowadays, there are some websites on the Intern...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): H04L12/26H04L29/06
Inventor 杜璞周凌燕胡羽中
Owner BEIJING PEOPLE ONLINE NETWORK
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products