Method and device for collecting website data

A data acquisition and website technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problem of unable to classify and obtain website data, and achieve the effect of easy format storage, improve efficiency, and save storage space

Inactive Publication Date: 2015-07-08
TVMINING BEIJING MEDIA TECH
View PDF4 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention provides a method and device for collecting website data, which is used to solve the problem of unable to classify and obtain website data, and realize the purpose of quickly classifying and obtaining required data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for collecting website data
  • Method and device for collecting website data
  • Method and device for collecting website data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0077] Figure 4 It is a website data collection method provided in Embodiment 1 of the present invention. In the first embodiment, the acquired website data is stored in a hierarchical manner, thereby saving the process of data classification. Such as Figure 4 As shown, the method includes the following steps S401-S406:

[0078] Step S401, pre-configuring the root URL of the website.

[0079] In step S402, the navigation bar information of the website is obtained according to the root URL, and the navigation bar information includes channel information of each channel.

[0080] Step S403, matching the required channel from the channel information.

[0081] Step S404, obtaining a content list in each channel according to the matched channel.

[0082] In step S405, the content data is obtained according to the classification of the content list, and the content data is the required website data.

[0083] Step S406, store the website data hierarchically, and perform unifi...

Embodiment 2

[0086] Figure 5 It is a website data collection method provided in Embodiment 2 of the present invention. In the second embodiment, the content data is obtained from the source code, and the obtained website data is classified and stored according to the website structure cluster, thereby saving the process of data classification. Such as Figure 5 As shown, the method includes the following steps S501-S504:

[0087] Step S501, pre-configuring the root URL of the website.

[0088] In step S502, the navigation bar information of the website is acquired according to the root URL, and the navigation bar information includes channel information of each channel.

[0089] Step S503, matching the required channel from the channel information.

[0090] Step S504, obtaining a content list in each channel according to the matched channel.

[0091] Step S505, determining the address of the corresponding content page according to the content list.

[0092] Step S506, determine the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method and a device for collecting website data. The method and the device for collecting the website data are used to solve the problem that the website data can not be obtained in a classified mode, and achieve the purpose of rapidly obtaining the needed data in the classified mode. The method for collecting the website data includes: configuring a root web address of a website in advance; obtaining navigation bar information of the website according to the root web address, wherein the navigation bar information includes frequency channel information; matching needed frequency channels in the frequency channel information; obtaining the website data step by step according to the matched frequency channels. The method for collecting the website data obtains the website data step by step in allusion to each matched frequency channel, and thereby can obtain the website data in the classified mode. Simultaneously, the obtained data corresponds to a website structure cluster, and furthermore a later website data classification process is saved, and data collection efficiency is improved.

Description

technical field [0001] The invention relates to the technical field of data collection, in particular to a method and device for website data collection. Background technique [0002] With the continuous enrichment of network resources and the continuous expansion of network information, people's dependence on the network is becoming stronger and stronger, but it also brings inconvenience to the service objects to quickly find the specific resources they need from the vast Internet resources; Information has infinite value since ancient times. With the continuous development of the times, human beings have entered the information age without knowing it. All walks of life are flooded with countless information, and the value of information lies in the circulation of data. If the data can be timely The real incomparable value of information can only be brought into play when it is circulated and transmitted; under the conditions of a market economy, data collection has become ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 王兰莎
Owner TVMINING BEIJING MEDIA TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products