Supercharge Your Innovation With Domain-Expert AI Agents!

Method and terminal for crawling third-party website data

A website and data technology, applied in the field of crawling third-party website data, can solve the problems of easy omission, time-consuming and labor-intensive, unable to guarantee the accuracy of data, etc., to achieve the effect of ensuring integrity, saving manpower and time costs

Pending Publication Date: 2021-05-25
宝宝巴士股份有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] At present, when it is necessary to crawl the data of a third-party platform, it is necessary to manually log in to the platform, copy the required content and store it locally. Due to the huge amount of data to be crawled and the large number of platforms to be crawled, manual crawling is not only time-consuming and laborious, but also easy to miss , the accuracy of the data cannot be guaranteed

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and terminal for crawling third-party website data
  • Method and terminal for crawling third-party website data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0064] Please refer tofigure 1 As shown, a method of crawling third-party website data includes:

[0065] S1. Obtain the website address and the corresponding account number and password of the website that needs to crawl data from the database, and log in to the website according to the website address, account number and password;

[0066] S2. Crawling the data of the website according to preset rules;

[0067] S3. Verify the integrity of the data, and if the data is complete, store the data in the database.

[0068] Wherein, before said S1 includes:

[0069] S0. Store the websites that need to crawl data and their corresponding account numbers and passwords into the database.

[0070] Wherein, logging in to the website according to the website address, account number and password in S1 includes:

[0071] Open described website according to described website, judge whether described website has the detection to manual operation behavior, if exist, then simulate manual ope...

Embodiment 2

[0083] Please refer to figure 2 , a method terminal 1 for crawling third-party website data, including a memory 2, a processor 3, and a computer program stored on the memory 2 and operable on the processor 3, when the processor 3 executes the computer program Implement the steps in Embodiment 1.

[0084] In summary, the present invention provides a method and terminal for crawling third-party website data. After obtaining the website information to be crawled from the database, the data of these websites can be crawled autonomously according to the preset rules, and can be used for these websites. The integrity of the data is judged to ensure the integrity of the crawled data, and at the same time avoid manual login to multiple websites to crawl the data, saving manpower and time costs.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a method and terminal for crawling third-party website data, and the method comprises the following steps: S1, obtaining a website of a website needing to crawl data and a corresponding account number and password from a database, and logging in the website according to the website, the account number and the password; s2, crawling data of the website according to a preset rule; and S3, verifying the integrity of the data, and if the data is complete, storing the data into the database. According to the technical scheme, after the website information needing to be crawled is obtained from the database, the data of the websites are automatically crawled according to the preset rule, and the data can be subjected to integrity judgment, so that the integrity of the crawled data is ensured, meanwhile, the situation that multiple websites are manually logged in to crawl the data is avoided, and the manpower and time cost is saved.

Description

technical field [0001] The invention relates to the field of computer software, in particular to a method and a terminal for crawling third-party website data. Background technique [0002] At present, when it is necessary to crawl the data of a third-party platform, it is necessary to manually log in to the platform, copy the required content and store it locally. Due to the huge amount of data to be crawled and the large number of platforms to be crawled, manual crawling is not only time-consuming and laborious, but also easy to miss , and cannot guarantee the accuracy of the data. Contents of the invention [0003] (1) Technical problems to be solved [0004] In order to solve the above-mentioned problems in the prior art, the present invention provides a method and terminal for crawling third-party website data, which can save manpower. [0005] (2) Technical solution [0006] In order to achieve the above purpose, a technical solution adopted by the present inventi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/951G06F21/64
CPCG06F16/951G06F21/64
Inventor 陈翔唐光宇闫乃永卢学明林智明
Owner 宝宝巴士股份有限公司
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More