Content aggregation method based on distributed web crawlers

A technology of distributed network and content aggregation, applied in the field of content aggregation based on distributed web crawler, can solve the problem of network information aggregation and classification that cannot be customized in large quantities, and achieve the effect of fast calculation speed and accurate similarity comparison.

Inactive Publication Date: 2016-01-27
江苏未来网络集团有限公司
View PDF5 Cites 43 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The present invention implements a content aggregation method based on distributed web crawlers, and aims to solve the problem that web crawler technology in the prior art cannot effectively aggregate and classify customized large-scale network information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Content aggregation method based on distributed web crawlers
  • Content aggregation method based on distributed web crawlers
  • Content aggregation method based on distributed web crawlers

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] The present invention provides a content aggregation method based on a distributed web crawler. In order to make the object, technical solution and effect of the present invention clearer and clearer, the present invention is further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific implementations described here are only used to explain the present invention, not to limit the present invention.

[0030] The structural diagram of the content aggregation system provided by the present invention is as follows: Figure 6 As shown, the system includes:

[0031] User interface: the user manages the system and schedules tasks through the graphical user interface. The scheduling service is handled by each node crawler, which mainly provides crawler task start, task stop and task status services; the graphical user interface is the visualization provided to the user by the content aggregation platform Operat...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a content aggregation method based on distributed web crawlers, which comprises the steps that firstly different crawler platforms are arranged at different devices, a request is sent to a crawling network information source end, and the crawler platforms fabricate crawling rules according to target information required by a user and crawl information in which the target user is interested; the crawled network information is processed, similarity detection is carried out based on a data transmission and conversion method in a real-time database and by being combined with a locality sensitive hashing (LSH) method so as to reduce the redundancy of the information; and the information is classified and sorted by the system according to the category, the heat and keywords and then displayed on user equipment. According to the method provided by the invention, LSH and similarity comparison are carried out according to the data information acquired in an actual network so as to acquire a comparison result. Compared with a comparison result acquired by adopting a traditional mode of whole data duplication checking in the prior art, the content aggregation method is higher in calculation speed and more accurate in similarity comparison.

Description

technical field [0001] The present invention relates to the technical field related to web crawlers, in particular to a content aggregation method based on distributed web crawlers. Background technique [0002] With the continuous development of the Internet, the era of big data is coming, and the value of massive data will be more reflected. Due to the increasing amount of Internet information such as massive streaming video resources and rich web content, it is difficult for specific users to accurately and effectively obtain the network data they need through handheld devices in a limited fragmented time period. However, most of the existing content aggregation technologies use simulations based on the upper-level architecture to prove the superiority of their content aggregation systems, and lack the implementation and application of specific information corresponding to the real network environment and target user groups. [0003] The filter conditions selected by the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/9566G06F16/951
Inventor 黄韬魏亮魏静波邓晓涛周洪利
Owner 江苏未来网络集团有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products