Unlock instant, AI-driven research and patent intelligence for your innovation.

A web crawler system and implementation method supporting artificial session grafting

A web crawler, artificial technology, applied in the use of artificial session grafting technology to enhance the crawling ability of web crawlers, in the field of web crawlers, it can solve problems such as difficult identification of crawlers

Active Publication Date: 2018-01-19
INST OF INFORMATION ENG CHINESE ACAD OF SCI
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The starting point is to hand over the problems that restrict network robots to humans. For example, verification codes are difficult for crawlers to identify, but they are very easy for humans to identify.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A web crawler system and implementation method supporting artificial session grafting
  • A web crawler system and implementation method supporting artificial session grafting
  • A web crawler system and implementation method supporting artificial session grafting

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] The present invention will be further described below through specific embodiments and accompanying drawings.

[0029] figure 1 It is a schematic diagram of four parts of the present invention cooperating to complete session grafting, figure 2 It is a schematic diagram of the specific module composition, image 3 It is a flow chart of manual session grafting operation. In the network crawler program, the invention separates the network access module and provides an HTTP access interface; and adds a user simulation module to share the same network access module with the network crawler module. The user simulation module also provides an interface to input information that needs to be determined by artificial intelligence to the artificial intelligence participation module, including user names, passwords, verification codes, and answers to other Turing test questions.

[0030] After locating the target website, the user simulation module initiates access to the targe...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a web crawler system and a web crawler implementation method capable of supporting artificial session grafting. Firstly, a target website to be crawled is analyzed, and a login page is set to be an initial page; a user analog module establishes network connection with the target website, and an information input interface is provided for artificial operation; an artificial intellect participation module inputs needed information and transmits the information to the user analog module; the user analog module positions an input box and a login button in the login page, and related information is input and sent to the target website through a network access module so that login can be conducted; after login is finished, the user analog module opens the page where a crawler needs to crawl, and response information of the page is transmitted to a crawler module; after obtaining execution permission, the crawler module follows a network session artificially created in the user analog module, has access to the target website and collets website content. According to the web crawler system and the web crawler implementation method capable of supporting artificial session grafting, the network session created in an artificial intellect participation mode is used for the crawler module, and the network crawler can have the same network access capability as a real person surfing the internet has.

Description

technical field [0001] The invention belongs to the technical field of computer applications, relates to web crawler technology, and in particular to a method for implementing artificial session grafting technology to enhance the crawling ability of web crawlers. Background technique [0002] With the rapid development of the network, the World Wide Web has become the carrier of a large amount of information, how to effectively extract and use this information has become a huge challenge. Users search for information through search engines, but the returned results often contain information that users do not care about, and users need to filter the large number of results again. And now the form of data is more and more abundant, such as: pictures, databases, audio, video, etc., search engines are often powerless to this kind of data with dense information and a certain structure. Moreover, search engines usually search for keywords, and it is difficult to support queries b...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30H04L29/06
CPCG06F16/951
Inventor 龚晓锐孙骁永朴爱花邹维
Owner INST OF INFORMATION ENG CHINESE ACAD OF SCI