A universal internet data acquisition anti-reverse crawling system and method

A data acquisition and Internet technology, applied in transmission systems, electrical components, etc., can solve the problem that the universality and flexibility of the anti-climbing method cannot be improved, the Internet information acquisition rate is low, and the web page data acquisition party cannot cope with diverse situations Anti-climbing verification methods and other issues, to achieve the effect of increasing the difficulty of identification, improving the rationality, and effectively obtaining

Active Publication Date: 2019-05-07
北京宸瑞科技股份有限公司
View PDF6 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This is mainly because the diversity of anti-crawling verification methods and the combination of various anti-crawling verification methods make the interception methods diverse and complicated, while the versatility and flexibility of the anti-crawling method cannot be improved. Unable to cope with diversified anti-climbing verification methods, low rate of Internet information acquisition

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A universal internet data acquisition anti-reverse crawling system and method
  • A universal internet data acquisition anti-reverse crawling system and method
  • A universal internet data acquisition anti-reverse crawling system and method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0173] Adopt anti-anti-climbing method provided in the present invention to obtain FaceBook and Sina Weibo web page information:

[0174] 1. Receive the UA verification request from the server through the UA header sending module 011, randomly extract the UA header from the UA header list 012, and provide a random UA header to the server; where the UA header in the UA header list 012 is based on the version of the browser loaded The information is divided into different UA header sublists through the UA header management module 013, and the probability of the UA header being selected in each UA header sublist is consistent with the market share of the corresponding browser; the UA header list 012 is shown in Table 1;

[0175] 2. Receive the IP verification request proposed by the server through the proxy IP sending module 021, and send a request to transfer the proxy IP to the proxy IP management module 022, and the proxy IP management module 022 obtains a random proxy IP from ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a universal internet data acquisition anti-reverse crawling method and system. The method comprises the following steps: a UA verification unit (01) provides a random UA headerto a server through; a random agent IP is provided for the server through an IP verification unit (02); An interval verification unit (03) has a random request interval according to the specification. the login is simulated through an authorization state verification unit (04); and the verification code is identified through a verification code identification unit (05) or Or through the above combination to separately respond to the request UA verification, request IP verification, request interval verification, authorization status verification, manual operation verification, or a combination thereof in the Internet anti-climbing verification,and the method can bypass interception of various anti-crawling verification means combinations , and achieve effective acquisition of the websiteinformation.

Description

technical field [0001] The invention mainly relates to Internet data collection technology, and in particular to common Internet data anti-climbing verification means, general Internet data collection anti-climbing system and method. Background technique [0002] The network developed at an astonishing speed has made the World Wide Web a treasure with a large amount of information resources, and the search engine based on the information resources of the World Wide Web has realized the effective extraction and utilization of information; but the arrival of the era of big data has made us more interested in the Internet. There is a new need for information, so the Internet data collection that realizes automatic batch collection through programming, that is, crawlers, came into being; and a large number of crawlers greatly increased the load pressure on the web data server. Based on the consideration of server pressure or data properties, web pages The data owner uses anti-cr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): H04L29/06H04L29/08
Inventor 白晓哲尚林林
Owner 北京宸瑞科技股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products