Method and terminal for crawling third-party website data

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A website and data technology, applied in the field of crawling third-party website data, can solve the problems of easy omission, time-consuming and labor-intensive, unable to guarantee the accuracy of data, etc., to achieve the effect of ensuring integrity, saving manpower and time costs

Pending Publication Date: 2021-05-25

宝宝巴士股份有限公司

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0002] At present, when it is necessary to crawl the data of a third-party platform, it is necessary to manually log in to the platform, copy the required content and store it locally. Due to the huge amount of data to be crawled and the large number of platforms to be crawled, manual crawling is not only time-consuming and laborious, but also easy to miss , the accuracy of the data cannot be guaranteed

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0064] Please refer tofigure 1 As shown, a method of crawling third-party website data includes:

[0065] S1. Obtain the website address and the corresponding account number and password of the website that needs to crawl data from the database, and log in to the website according to the website address, account number and password;

[0066] S2. Crawling the data of the website according to preset rules;

[0067] S3. Verify the integrity of the data, and if the data is complete, store the data in the database.

[0068] Wherein, before said S1 includes:

[0069] S0. Store the websites that need to crawl data and their corresponding account numbers and passwords into the database.

[0070] Wherein, logging in to the website according to the website address, account number and password in S1 includes:

[0071] Open described website according to described website, judge whether described website has the detection to manual operation behavior, if exist, then simulate manual ope...

Embodiment 2

[0083] Please refer to figure 2 , a method terminal 1 for crawling third-party website data, including a memory 2, a processor 3, and a computer program stored on the memory 2 and operable on the processor 3, when the processor 3 executes the computer program Implement the steps in Embodiment 1.

[0084] In summary, the present invention provides a method and terminal for crawling third-party website data. After obtaining the website information to be crawled from the database, the data of these websites can be crawled autonomously according to the preset rules, and can be used for these websites. The integrity of the data is judged to ensure the integrity of the crawled data, and at the same time avoid manual login to multiple websites to crawl the data, saving manpower and time costs.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a method and terminal for crawling third-party website data, and the method comprises the following steps: S1, obtaining a website of a website needing to crawl data and a corresponding account number and password from a database, and logging in the website according to the website, the account number and the password; s2, crawling data of the website according to a preset rule; and S3, verifying the integrity of the data, and if the data is complete, storing the data into the database. According to the technical scheme, after the website information needing to be crawled is obtained from the database, the data of the websites are automatically crawled according to the preset rule, and the data can be subjected to integrity judgment, so that the integrity of the crawled data is ensured, meanwhile, the situation that multiple websites are manually logged in to crawl the data is avoided, and the manpower and time cost is saved.

Description

technical field [0001] The invention relates to the field of computer software, in particular to a method and a terminal for crawling third-party website data. Background technique [0002] At present, when it is necessary to crawl the data of a third-party platform, it is necessary to manually log in to the platform, copy the required content and store it locally. Due to the huge amount of data to be crawled and the large number of platforms to be crawled, manual crawling is not only time-consuming and laborious, but also easy to miss , and cannot guarantee the accuracy of the data. Contents of the invention [0003] (1) Technical problems to be solved [0004] In order to solve the above-mentioned problems in the prior art, the present invention provides a method and terminal for crawling third-party website data, which can save manpower. [0005] (2) Technical solution [0006] In order to achieve the above purpose, a technical solution adopted by the present inventi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F16/951G06F21/64

CPCG06F16/951G06F21/64

Inventor 陈翔唐光宇闫乃永卢学明林智明

Owner 宝宝巴士股份有限公司

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Method and terminal for crawling third-party website data

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology