Method and terminal for crawling third-party website data

A website and data technology, applied in the field of crawling third-party website data, can solve the problems of easy omission, time-consuming and labor-intensive, unable to guarantee the accuracy of data, etc., to achieve the effect of ensuring integrity, saving manpower and time costs

Pending Publication Date: 2021-05-25
宝宝巴士股份有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] At present, when it is necessary to crawl the data of a third-party platform, it is necessary to manually log in to the platform, copy the required content and store it locally. Due to the huge am

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and terminal for crawling third-party website data
  • Method and terminal for crawling third-party website data

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0063]Example one

[0064]Please refer tofigure 1 As shown, a method of crawling third party website data, including:

[0065]S1, from the database to get the URL of the website that needs to climb data and its corresponding account and password, log in to the website according to the website, account number, and password;

[0066]S2, climb the data of the website according to the preset rules;

[0067]S3, verify the integrity of the data, and if the data is complete, the data is stored in the database.

[0068]Among them, the S1 includes:

[0069]S0, store the website that needs to climb data and its corresponding account and password into the database.

[0070]Wherein, the S1 logs in to the website, account, and password, including:

[0071]The website is opened based on the website, and it is determined whether the website has detecting the manual operation behavior. If there is, the manual operation (can be simulated by selenium and other tools), and enter the account and password, judgment Whether the...

Example Embodiment

[0082]Example 2

[0083]Please refer tofigure 2A computer program that climbs the third party website data, including the memory 2, the processor 3, and a computer program stored on the memory 2 and can operate on the processor 3, the processor 3 performs the computer program. Implement the steps in the first embodiment.

[0084]In summary, the method and terminal of the present invention will climb third-party website data, after obtaining the information requiring climbing information from the database, climbing the data of these websites according to the preset rules, and can be The data is integrity judgment to ensure the integrity of the climb data, while avoiding manual login to multiple websites to crawling data, saving human and time cost.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a method and terminal for crawling third-party website data, and the method comprises the following steps: S1, obtaining a website of a website needing to crawl data and a corresponding account number and password from a database, and logging in the website according to the website, the account number and the password; s2, crawling data of the website according to a preset rule; and S3, verifying the integrity of the data, and if the data is complete, storing the data into the database. According to the technical scheme, after the website information needing to be crawled is obtained from the database, the data of the websites are automatically crawled according to the preset rule, and the data can be subjected to integrity judgment, so that the integrity of the crawled data is ensured, meanwhile, the situation that multiple websites are manually logged in to crawl the data is avoided, and the manpower and time cost is saved.

Description

technical field [0001] The invention relates to the field of computer software, in particular to a method and a terminal for crawling third-party website data. Background technique [0002] At present, when it is necessary to crawl the data of a third-party platform, it is necessary to manually log in to the platform, copy the required content and store it locally. Due to the huge amount of data to be crawled and the large number of platforms to be crawled, manual crawling is not only time-consuming and laborious, but also easy to miss , and cannot guarantee the accuracy of the data. Contents of the invention [0003] (1) Technical problems to be solved [0004] In order to solve the above-mentioned problems in the prior art, the present invention provides a method and terminal for crawling third-party website data, which can save manpower. [0005] (2) Technical solution [0006] In order to achieve the above purpose, a technical solution adopted by the present inventi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/951G06F21/64
CPCG06F16/951G06F21/64
Inventor 陈翔唐光宇闫乃永卢学明林智明
Owner 宝宝巴士股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products