Distributed real-time news information acquisition system

An information collection and distributed technology, applied in the field of information collection, can solve problems such as the bottleneck of the URL distribution module, and achieve good scalability, low cost, and easy deployment

Inactive Publication Date: 2011-05-25
SICHUAN UNIV
View PDF3 Cites 41 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The disadvantage is that if the central host fails, the entire system will stop working, and th

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed real-time news information acquisition system
  • Distributed real-time news information acquisition system
  • Distributed real-time news information acquisition system

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0031] The present invention will be further described below in conjunction with the drawings:

[0032] In the present invention, each independent collection node is responsible for the collection of news pages, all collected pages communicate with the central node through TCP / IP, and the sub-collection node forwards the collected pages to the central node. The central node is responsible for storing all downloaded news pages in the database.

[0033] The Web information collection system can basically be divided into seven parts: URL processor, protocol processor, duplicate content detector, URL extractor, Meta information obtainer, semantic information parser and database, which coordinate to obtain information from the Web .

[0034] 1. URL processor: This component mainly sorts the URLs to be collected and allocates URLs to the protocol processor according to a certain strategy. Depending on the scale of the collection system, the URL can be multiple collection queues or a URL ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a distributed real-time news information acquisition system, and the architecture of the system comprises a central server, a plurality of sub-acquisition nodes and a database server, wherein each sub-acquisition node is responsible for acquiring a news page, communicating with the central server through TCP/IP (transmission control protocol/Internet protocol) and transmitting the acquired news page to the central server, and the central server is used for storing all the downloaded news pages into the databse server. The system can overcome the defects of the prior art and is mainly used for performing high-efficient and stable acquisition against the characteristics of a large amount of news data, fast updating speed and high repeatability; furthermore, the cost is low, and the deployment is easy.

Description

technical field [0001] The invention relates to the technical field of information collection, in particular to a distributed system capable of real-time discovery and collection of news information on the Internet. Background technique [0002] In the field of information collection, the design of the collector is often studied from two aspects: one is the system architecture and topology of the information collector; the other is the way the collector downloads network resources and the task allocation strategy. At present, the system architecture of information collector is mainly divided into two types: centralized and distributed, but there are not many special studies on the architecture of the acquisition system. Centralized collectors are mainly used in small systems such as intelligent agents, which do not require high performance. However, distributed collectors are most widely used on large search engines and have high performance requirements. The main purpose ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 章毅彭德中张蕾吕建成张海仙徐小伟
Owner SICHUAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products