MapReduce-based FTP distributed collection method

A collection method and distributed technology, applied in multi-channel programming devices, digital transmission systems, electrical components, etc., can solve the problems of troublesome maintenance and slow single-thread collection, and achieve the effect of improving speed and simplifying maintenance work.

Pending Publication Date: 2017-05-31
上海轻维软件有限公司
View PDF2 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Traditional single-thread acquisition is slow, deploying multi-application multi-thread acquisition, maintenance is more troublesome

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • MapReduce-based FTP distributed collection method
  • MapReduce-based FTP distributed collection method
  • MapReduce-based FTP distributed collection method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

[0020] figure 1 It is a flow chart of FTP distributed acquisition based on MapReduce in the present invention.

[0021] See figure 1 , the FTP distributed collection method based on MapReduce provided by the present invention, comprises the steps:

[0022] S1) pre-configure a plurality of FTP server information and log file paths, and store the configuration information in the HDFS of Hadoop as the data input of MapReduce;

[0023] S2) the input directory and the number of Reduce tasks of MapReduce are set;

[0024] S3) use MapReduce to distribute different log records to different HDFS cluster nodes for processing;

[0025] S4) After each HDFS cluster node reads the FTP server information, use the account password to connect to the FTP server, expand the pre-configured log file path, and write the file into HDFS through the IO stream, so as to real...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a MapReduce-based FTP distributed collection method, which comprises the following steps of (S1) preparing information and log file paths of multiple FTP servers in advance and storing configuration information into an HDFS of Hadoop as data input of MapReduce; (S2) setting an input directory of the MapReduce and a task number of the MapReduce; (S3) distributing different log records into different HDFS cluster nodes for processing by using the MapReduce; and (S4) reading the information of the FTP servers by each HDFS cluster node, connecting the FTP servers by using an account password, spreading the log file paths prepared in advance and writing files into the HDFS through IO stream, thereby achieving the condition that multiple HDFS cluster nodes simultaneously collect the log information of the multiple FTP servers. According to the MapReduce-based FTP distributed collection method, the collection speed can be improved and the maintenance work can be simplified.

Description

technical field [0001] The invention relates to a remote data collection method, in particular to a MapReduce-based FTP distributed collection method. Background technique [0002] At present, there are several ways to download the data on the remote server by FTP commonly used: [0003] 1) Single-threaded, using Apache FTP to download data from remote servers; [0004] 2) Multi-threading, using Apache FTP, enabling multiple threads, and multiple FTP Clients to download data from remote servers; [0005] 3) Deploy multiple services, use Apache FTP, start multiple threads, and download data from remote servers with multiple FTP Clients. [0006] The main disadvantages of the prior art are as follows: [0007] 1) When using Apache FTP to download data from a remote server with a single thread, the collection speed is obviously insufficient, and neither the bandwidth nor the IO rate can exert relatively high performance, so the collection speed is obviously not high. [000...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): H04L12/24H04L29/06H04L29/08G06F9/46
CPCH04L41/069H04L63/0428H04L63/083H04L67/06H04L67/1097H04L67/30G06F9/466
Inventor 程永新谢涛廖德辉
Owner 上海轻维软件有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products