Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Hadoop-based data processing method and system

A data processing system and data processing technology, applied in the field of data processing, can solve the problems of intermediate data occupying a large disk space and not being flexible enough, and achieve the effects of reducing network bandwidth, saving CPU time, and reducing disk space occupation

Inactive Publication Date: 2014-05-21
BEIJING IZP NETWORK TECH CO LTD
View PDF4 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The disadvantage of this method is firstly that it requires human input to collect requirements, secondly, the generated intermediate data needs to occupy a large amount of disk space, and thirdly, it is not flexible enough. If the requirements of the MAP program change, the intermediate data needs to be regenerated, etc.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Hadoop-based data processing method and system
  • Hadoop-based data processing method and system
  • Hadoop-based data processing method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0036] Figure 2a It is a schematic diagram of data transmission in Hadoop distributed system, Figure 2b is a schematic diagram of data transmission in the present invention, such as Figure 2a with Figure 2b As shown, the main improvement of the present invention is that on the server where the source data is located, before the MAP inputs the data, an intermediate processing module is added to filter unnecessary fields to form a corresponding intermediate file.

[0037] image 3 It is a schematic diagram of the overall technical solution of the present invention described in this embodiment, as image 3 As shown, the present invention formats the source data before MAP input data, that is, distinguishes each column of data, and after formatting the source data into column structure data, converts the column data into KEY / VALUE through MAP / REDUCE Format, according to the required fields requested by the MAP program, filter unnecessary fields to form a corresponding inte...

Embodiment 2

[0041] Figure 4 It is the flow chart of the Hadoop-based data processing method described in this embodiment, such as Figure 4 As shown, the Hadoop-based data processing method described in this embodiment includes:

[0042] S401. Obtain the source data and demand fields requested by the MAP program, and convert the source data into KEY / VALUE format through MAP / REDUCE;

[0043] The source data includes various data forms such as file data stored on the disk, data in XML format stored on the disk, and / or two-dimensional table data stored in the database.

[0044] S402. Determine whether the source data is column-structured data, if so, execute step S404, otherwise execute step S403;

[0045] S403. Format the source data into column-structured data;

[0046] That is, the source data is formatted into column structure data by distinguishing each column of data. For example, after the source data is formatted, the data is column structure data including fields F1, F2, F3, F4,...

Embodiment 3

[0053] According to the same concept of the present invention, the present invention also provides a Hadoop-based data processing system,

[0054] Figure 5 It is a structural block diagram of the Hadoop-based data processing system described in this embodiment, such as Figure 5 As shown, the Hadoop-based data processing system described in this embodiment is used for data interaction between the data server and the MAP program, wherein the data server includes a data formatting module and a data filtering module, and the MAP program includes data Request module and adaptation recognition module. The modules are introduced as follows:

[0055] Data request module: used to send a data request to the data server, the data request includes the source data of the specified request and the required field of the request;

[0056] The source data includes various data forms such as file data stored on the disk, data in XML format stored on the disk, and / or two-dimensional table d...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a Hadoop-based data processing method and a Hadoop-based data processing system, which are used for data interaction between a data server and a cluster data server to which an MAP program belongs. The Hadoop-based data processing method comprises the following steps: S1, when the data server receives a data request of the cluster data server, extracting a required field, which is requested, from the data request, and meanwhile, converting source data into a KEY / VALUE format; S2, extracting data corresponding to the required field from the data converted into the KEY / VALUE format through the data server, and sending the data corresponding to the required field to the cluster data server; S3, when the cluster data server receives the data corresponding to the required field, adaptively identifying the data corresponding to the required field according to preset configuration information, and performing subsequent operation. According to the Hadoop-based data processing method and the Hadoop-based data processing system, by sequentially screening and transmitting the data, the network bandwidth during data transmission can be reduced, and the program execution efficiency can be improved.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to a Hadoop-based data processing method and system thereof. Background technique [0002] Hadoop is a reliable, efficient, and scalable software framework capable of distributed processing of large amounts of data. It is a distributed system based on shared-nothing architecture for massive data storage and computing. It consists of several members, mainly including: HDFS (Hadoop Distributed File System, distributed file system), MAPREDUCE (a framework for Hadoop parallel computing, including MAP and REDUCE programs), HBase (an open source implementation of Google BigTable), etc. Among them, MAPREDUCE, as an open parallel computing framework, can be combined with various popular distributed products to realize flexible parallel computing and distributed computing functions. HDFS, HBase, Cassabdra (a hybrid non-relational database ) and other platforms are used as the input ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F9/5044G06F9/5066
Inventor 薛洪贺罗峰黄苏支李娜
Owner BEIJING IZP NETWORK TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products