Method and device for collecting website data
A data acquisition and website technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problem of unable to classify and obtain website data, and achieve the effect of easy format storage, improve efficiency, and save storage space
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0077] Figure 4 It is a website data collection method provided in Embodiment 1 of the present invention. In the first embodiment, the acquired website data is stored in a hierarchical manner, thereby saving the process of data classification. Such as Figure 4 As shown, the method includes the following steps S401-S406:
[0078] Step S401, pre-configuring the root URL of the website.
[0079] In step S402, the navigation bar information of the website is obtained according to the root URL, and the navigation bar information includes channel information of each channel.
[0080] Step S403, matching the required channel from the channel information.
[0081] Step S404, obtaining a content list in each channel according to the matched channel.
[0082] In step S405, the content data is obtained according to the classification of the content list, and the content data is the required website data.
[0083] Step S406, store the website data hierarchically, and perform unifi...
Embodiment 2
[0086] Figure 5 It is a website data collection method provided in Embodiment 2 of the present invention. In the second embodiment, the content data is obtained from the source code, and the obtained website data is classified and stored according to the website structure cluster, thereby saving the process of data classification. Such as Figure 5 As shown, the method includes the following steps S501-S504:
[0087] Step S501, pre-configuring the root URL of the website.
[0088] In step S502, the navigation bar information of the website is acquired according to the root URL, and the navigation bar information includes channel information of each channel.
[0089] Step S503, matching the required channel from the channel information.
[0090] Step S504, obtaining a content list in each channel according to the matched channel.
[0091] Step S505, determining the address of the corresponding content page according to the content list.
[0092] Step S506, determine the ...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com