Hadoop-based data processing method and system

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A data processing system and data processing technology, applied in the field of data processing, can solve the problems of intermediate data occupying a large disk space and not being flexible enough, and achieve the effects of reducing network bandwidth, saving CPU time, and reducing disk space occupation

Inactive Publication Date: 2014-05-21

BEIJING IZP NETWORK TECH CO LTD

View PDF4 Cites 11 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The disadvantage of this method is firstly that it requires human input to collect requirements, secondly, the generated intermediate data needs to occupy a large amount of disk space, and thirdly, it is not flexible enough. If the requirements of the MAP program change, the intermediate data needs to be regenerated, etc.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0036] Figure 2a It is a schematic diagram of data transmission in Hadoop distributed system, Figure 2b is a schematic diagram of data transmission in the present invention, such as Figure 2a with Figure 2b As shown, the main improvement of the present invention is that on the server where the source data is located, before the MAP inputs the data, an intermediate processing module is added to filter unnecessary fields to form a corresponding intermediate file.

[0037] image 3 It is a schematic diagram of the overall technical solution of the present invention described in this embodiment, as image 3 As shown, the present invention formats the source data before MAP input data, that is, distinguishes each column of data, and after formatting the source data into column structure data, converts the column data into KEY / VALUE through MAP / REDUCE Format, according to the required fields requested by the MAP program, filter unnecessary fields to form a corresponding inte...

Embodiment 2

[0041] Figure 4 It is the flow chart of the Hadoop-based data processing method described in this embodiment, such as Figure 4 As shown, the Hadoop-based data processing method described in this embodiment includes:

[0042] S401. Obtain the source data and demand fields requested by the MAP program, and convert the source data into KEY / VALUE format through MAP / REDUCE;

[0043] The source data includes various data forms such as file data stored on the disk, data in XML format stored on the disk, and / or two-dimensional table data stored in the database.

[0044] S402. Determine whether the source data is column-structured data, if so, execute step S404, otherwise execute step S403;

[0045] S403. Format the source data into column-structured data;

[0046] That is, the source data is formatted into column structure data by distinguishing each column of data. For example, after the source data is formatted, the data is column structure data including fields F1, F2, F3, F4,...

Embodiment 3

[0053] According to the same concept of the present invention, the present invention also provides a Hadoop-based data processing system,

[0054] Figure 5 It is a structural block diagram of the Hadoop-based data processing system described in this embodiment, such as Figure 5 As shown, the Hadoop-based data processing system described in this embodiment is used for data interaction between the data server and the MAP program, wherein the data server includes a data formatting module and a data filtering module, and the MAP program includes data Request module and adaptation recognition module. The modules are introduced as follows:

[0055] Data request module: used to send a data request to the data server, the data request includes the source data of the specified request and the required field of the request;

[0056] The source data includes various data forms such as file data stored on the disk, data in XML format stored on the disk, and / or two-dimensional table d...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a Hadoop-based data processing method and a Hadoop-based data processing system, which are used for data interaction between a data server and a cluster data server to which an MAP program belongs. The Hadoop-based data processing method comprises the following steps: S1, when the data server receives a data request of the cluster data server, extracting a required field, which is requested, from the data request, and meanwhile, converting source data into a KEY / VALUE format; S2, extracting data corresponding to the required field from the data converted into the KEY / VALUE format through the data server, and sending the data corresponding to the required field to the cluster data server; S3, when the cluster data server receives the data corresponding to the required field, adaptively identifying the data corresponding to the required field according to preset configuration information, and performing subsequent operation. According to the Hadoop-based data processing method and the Hadoop-based data processing system, by sequentially screening and transmitting the data, the network bandwidth during data transmission can be reduced, and the program execution efficiency can be improved.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to a Hadoop-based data processing method and system thereof. Background technique [0002] Hadoop is a reliable, efficient, and scalable software framework capable of distributed processing of large amounts of data. It is a distributed system based on shared-nothing architecture for massive data storage and computing. It consists of several members, mainly including: HDFS (Hadoop Distributed File System, distributed file system), MAPREDUCE (a framework for Hadoop parallel computing, including MAP and REDUCE programs), HBase (an open source implementation of Google BigTable), etc. Among them, MAPREDUCE, as an open parallel computing framework, can be combined with various popular distributed products to realize flexible parallel computing and distributed computing functions. HDFS, HBase, Cassabdra (a hybrid non-relational database ) and other platforms are used as the input ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G06F17/30

CPCG06F9/5044G06F9/5066

Inventor薛洪贺罗峰黄苏支李娜

OwnerBEIJING IZP NETWORK TECH CO LTD

Hadoop-based data processing method and system

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology