Method and terminal for crawling third-party website data

A website and data technology, applied in the field of crawling third-party website data, can solve the problems of easy omission, time-consuming and labor-intensive, unable to guarantee the accuracy of data, etc., to achieve the effect of ensuring integrity, saving manpower and time costs

Pending Publication Date: 2021-05-25
宝宝巴士股份有限公司
0 Cites 0 Cited by

AI-Extracted Technical Summary

Problems solved by technology

[0002] At present, when it is necessary to crawl the data of a third-party platform, it is necessary to manually log in to the platform, copy the required content and store it locally. Due to the huge am...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Abstract

The invention relates to a method and terminal for crawling third-party website data, and the method comprises the following steps: S1, obtaining a website of a website needing to crawl data and a corresponding account number and password from a database, and logging in the website according to the website, the account number and the password; s2, crawling data of the website according to a preset rule; and S3, verifying the integrity of the data, and if the data is complete, storing the data into the database. According to the technical scheme, after the website information needing to be crawled is obtained from the database, the data of the websites are automatically crawled according to the preset rule, and the data can be subjected to integrity judgment, so that the integrity of the crawled data is ensured, meanwhile, the situation that multiple websites are manually logged in to crawl the data is avoided, and the manpower and time cost is saved.

Application Domain

Technology Topic

Image

  • Method and terminal for crawling third-party website data
  • Method and terminal for crawling third-party website data

Examples

  • Experimental program(2)

Example Embodiment

[0063]Example one
[0064]Please refer tofigure 1 As shown, a method of crawling third party website data, including:
[0065]S1, from the database to get the URL of the website that needs to climb data and its corresponding account and password, log in to the website according to the website, account number, and password;
[0066]S2, climb the data of the website according to the preset rules;
[0067]S3, verify the integrity of the data, and if the data is complete, the data is stored in the database.
[0068]Among them, the S1 includes:
[0069]S0, store the website that needs to climb data and its corresponding account and password into the database.
[0070]Wherein, the S1 logs in to the website, account, and password, including:
[0071]The website is opened based on the website, and it is determined whether the website has detecting the manual operation behavior. If there is, the manual operation (can be simulated by selenium and other tools), and enter the account and password, judgment Whether the website exists verification code check, if there is, the resolution verification code image acquire the verification code, and enter the verification code to log in to the website.
[0072]Wherein, the data of the website is climbed in accordance with the preset rules, including:
[0073]Judging whether the website exposes the data interface;
[0074]If so, the interface data of the data interface is obtained, and the interface data is stored in the database;
[0075]If not, climb the web page of the website, parse the web page to obtain web page data, and store the web page data into the database.
[0076]Wherein, S3 includes:
[0077]The data is compared with historical data, and it is judged whether or not the difference between the data volume of the data and the difference in data values ​​are within the preset range.
[0078]If so, it is judged that the data is complete, and the data is stored in the database;
[0079]If it is determined that the data is incomplete, re-re-regenerates the data of the website, and verifies the integrity of the data again. If the verification result is still incomplete, send the email to the developer account. Notice.
[0080]Among them, if not, it is judged that the data is incomplete to:
[0081]It is determined whether the login information has expired. If it is, the website will be re-logged in.

Example Embodiment

[0082]Example 2
[0083]Please refer tofigure 2A computer program that climbs the third party website data, including the memory 2, the processor 3, and a computer program stored on the memory 2 and can operate on the processor 3, the processor 3 performs the computer program. Implement the steps in the first embodiment.
[0084]In summary, the method and terminal of the present invention will climb third-party website data, after obtaining the information requiring climbing information from the database, climbing the data of these websites according to the preset rules, and can be The data is integrity judgment to ensure the integrity of the climb data, while avoiding manual login to multiple websites to crawling data, saving human and time cost.
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Similar technology patents

Goods distribution method and system of distributor, and terminal

InactiveCN107067218AIntegrity guaranteedImprove accuracyForecastingLogisticsOrder managementTransportation capacity
Owner:北京惠赢天下网络技术有限公司

Classification and recommendation of technical efficacy words

  • Save manpower and time cost
  • Integrity guaranteed

Method for removing burrs of straight slot broach

Owner:SHANGHAI ELECTRIC POWER GENERATION EQUIPMENT CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products