User-defined serializable data structure, hadoop cluster, server and application method thereof

A technology of hadoop cluster and data structure, applied in the field of serializable data structure, it can solve problems such as inability to meet requirements, shorten time overhead, etc., and achieve the effect of reducing errors and being easy to use

Active Publication Date: 2016-07-13
上海晶赞企业管理咨询有限公司
View PDF4 Cites 23 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] Although Hadoop has a variety of built-in Writable classes to provide users with choices, Hadoop implements the RawComparable interface for the Java basic type packaging Writable class, so that these objects can be sorted at the byte stream level without the deserialization process, which greatly shortens the processing time. However, when more complex objects are needed, Hadoop’s built-in Writable class cannot meet the requirements. At this time, you need to customize your own Writable class, especially when using it as a key (key). , in order to achieve more efficient storage and fast comparison

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • User-defined serializable data structure, hadoop cluster, server and application method thereof
  • User-defined serializable data structure, hadoop cluster, server and application method thereof

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] The specific embodiment of the present invention is described in detail below in conjunction with accompanying drawing, please refer to figure 1 , figure 2 .

[0030] The present invention provides a kind of self-defining data structure, can use Protobuf (abbreviation of ProtocolBuffers) language to define structure, and can use the structure of Protobuf to store data, satisfied the advantage that Protobuf provides, simultaneously this data structure is inherited from Hadoop platform The Writable interface can be stored in the Hadoop platform, and the data can be read and written directly on the Hadoop platform. The present invention defines this data structure object as PBWritable.

[0031] The self-defined serializable data structure of the present invention, including data content and tag value class, tag value structure class, and data mapping relationship class, are all implemented by Java language, and the tag content of the data source is set at the front end ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a user-defined serializable data structure, a hadoop cluster, a server and an application method thereof. The user-defined serializable data structure comprises a data content and tag value type, a flag value structure type and a data mapping relationship type; tag contents of a data source are arranged at the front end of data; data contents and the tag value type are used for analyzing data tag values and data contents; the tag value structure type is used for reading, recognition and writing of tag contents; the data mapping relationship type is used for storing and loading mapping relationships of data tag contents and different data sources, and mapping relationships of data tag contents and corresponding Protobuf compiled classes. The hadoop cluster comprises the user-defined serializable data structure, and the server comprises the hadoop cluster. The user-defined data structure has characteristics of both Protobuf and Writable, can be used for realizing a deserialization interface in Hive, and is more rapid and convenient to use in Hive compared with common text data, and thus errors can be reduced.

Description

technical field [0001] The invention relates to the field of computer applications, in particular to a self-defined serializable data structure and an application method thereof. Background technique [0002] The existing Writable data classes on Hadoop (such as Text, LongWritable, IntWritable, FloatWritable, etc.) are used to process basic, flat data types, and for multi-level structured data, such as nested structures in structures, The structure contains the data type of the list. At present, if you want to transmit and store it on the Hadoop platform, you can only use a bytesWritable type of Hadoop itself. This class itself cannot parse the data, but can only transfer the data. Therefore, the above types are extremely inconvenient to use. If users want to use a certain data, they must first understand the data and the data structure before they can parse the data, which will lead to a series of problems of different versions of the data fields. [0003] ProtocolBuffers ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/2219G06F16/80
Inventor 汤奇峰小米
Owner 上海晶赞企业管理咨询有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products