Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Crawler recognition and processing method and related device

An identification method and crawler technology, applied in the field of computer networks, can solve the problems of high development and maintenance costs, high degree of coupling between identification and processing, and affect the overall throughput, so as to improve scalability, reduce false positives, and reduce adverse effects Effect

Active Publication Date: 2017-11-24
BEIJING XIAODU INFORMATION TECH CO LTD
View PDF5 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The embodiment of the present invention provides a crawler identification and processing method and related devices, which are used to solve the development and maintenance costs caused by the fact that the crawler identification and processing in the prior art are partial to real-time processing, which affects the overall throughput and the high degree of coupling between identification and processing. higher question

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Crawler recognition and processing method and related device
  • Crawler recognition and processing method and related device
  • Crawler recognition and processing method and related device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021] In order to enable those skilled in the art to better understand the solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention.

[0022] In some processes described in the specification and claims of the present invention and the above-mentioned drawings, a plurality of operations appearing in a specific order are contained, but it should be clearly understood that these operations may not be performed in the order in which they appear herein Execution or parallel execution, the sequence number of the operation, such as S101, S102, etc., is only used to distinguish different operations, and the sequence number itself does not represent any execution order. Additionally, these processes can include more or fewer operations, and these operations can be performed sequentially or in parallel. It should be n...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention provides a crawler recognition and processing method and a related device. The method includes the steps that webpage access record data is read, and feature fields in the access record data are counted according to user identifiers to generate a detection dataset; the generated detection dataset and a pre-marked training dataset are input into multiple classifiers for classification learning, and multiple classification results corresponding to first detection data in the detection dataset in the multiple classifiers are obtained; according to the number of classification results belonging to crawlers in the multiple classification results and the total number of the classification results, the classification probability that the first user identifier corresponding to the first detection data belongs to crawlers is determined. According to the method and the device, crawlers are recognized through learning with a lot of statistical data, and the adverse effect of real-time recognition on system throughput is avoided; besides, when crawlers are updated, the crawlers can be effectively recognized by adjusting the feature fields, and the expandability of crawler recognition is improved.

Description

technical field [0001] The embodiments of the present invention relate to the field of computer networks, in particular to a crawler identification and processing method and related devices. Background technique [0002] In related technologies, the identification of crawlers usually identifies and defends against real-time access of users and crawlers through whitelists and blacklists. For example, the access requests of all IPs are stored in real time through local memory or redis, memcache, etc., and each IP is accessed once, and the corresponding counter is incremented by 1. When the predetermined threshold is exceeded, the relevant IP will be restricted, for example, a verification code will be returned. If the verification fails many times, the IP will be identified as a crawler and added to the blacklist to prohibit access. [0003] In related technologies, the recognition of reptiles is convenient for real-time processing, and the recognition and processing of rept...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F21/56G06F17/30H04L29/06
CPCG06F21/566G06F16/285H04L63/0236H04L63/101H04L63/1441H04L63/205
Inventor 李晨曦
Owner BEIJING XIAODU INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products