Scrappy-based distributed dark-net resource mining system and method

A distributed and dark web technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as memory bottlenecks, stand-alone memory exhaustion, etc., and achieve speed-up, crawling speed, and high flexibility Effect

Inactive Publication Date: 2018-11-16
成都康乔电子有限责任公司 +1
View PDF5 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, it is likely to cause stand-alone memory exhaustion
Using scrapy to crawl on a single machine, memory will be the bottleneck

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Scrappy-based distributed dark-net resource mining system and method
  • Scrappy-based distributed dark-net resource mining system and method
  • Scrappy-based distributed dark-net resource mining system and method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] Due to the small disclosure range of hidden service domain name addresses, it is difficult to collect them. The present invention combines the dark web crawler and the clear web crawler independently, and obtains more information through extensive and deep crawling of the clear web. Darknet domain names, darknet crawls more darknet domain names from darknet pages on the basis of artificially providing and obtaining darknet domain names through clearnet crawling. When it is not necessary to obtain the domain name of the dark network through the clear network, the central control node module can send an http request to the crawling module of the slave node to terminate the crawling of the clear network without affecting the dark network. When it is found that there are many tasks to be crawled on the dark web, the central control node can send an http request to the slave node to start a new dark web crawler to increase the speed of dark web crawling.

[0031] The present...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the field of data mining and discloses a scrappy-based distributed dark-net resource mining system and a scrappy-based distributed dark-net resource mining method. The scrappy-based distributed dark-net resource mining system and the scrappy-based distributed dark-net resource mining method are used for improving the efficiency, scope and flexibility of dark-net resource mining. The system comprises a center node control module and a slave-node crawling module, wherein the center node control module includes a crawler seed task queue, a task preprocessing module, a dark-net task queue and an open-net task queue; the slave-node crawling module includes a dark-net crawling module, an open-net crawling module and a crawler manager; the dark net crawls more dark-net domain names from a dark-net page and an open-net page through the dark-net crawling module and the open-net crawling module on the basis of artificial provision and the dark-net domain name crawled through the open net so as to implement the effects that a lot of dark-net domain names are acquired and dark-net web pages are stored. The scrappy-based distributed dark-net resource mining system and the scrappy-based distributed dark-net resource mining method are suitable for mining of dark-net resources.

Description

technical field [0001] The invention relates to the field of data mining, in particular to a scrapy-based distributed dark network resource mining system and method. Background technique [0002] The dark web refers to a network that can only be accessed through special software or using non-standard communication protocols and ports. Tor is currently the most mainstream anonymous communication system on the dark web. Due to the complete anonymity of the dark web, it has given birth to a large number of illegal transactions. Therefore, the research is of great significance to the mining of dark web resources. Traditional search engines and crawler technology can crawl only a small part of web information provided on the Internet, that is, information on the Mingnet. The mining of dark web resources cannot be realized. Most of the existing research is on the non-surface web content on the Internet that cannot be indexed by standard search engines, that is, the deep web, no...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 刘丹杜凤媛王永松郑云彬
Owner 成都康乔电子有限责任公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products