Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Data processing method and system under Hadoop platform

A data processing and data technology, applied in the field of data processing, can solve the problems of untimely feedback of results, large amount of data query and analysis calculation in data archives, etc., to ensure compatibility, improve data query and analysis efficiency, and save computing resources. Effect

Inactive Publication Date: 2016-03-30
BEIJING ADVANCED DIGITAL TECH
View PDF5 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The technical problem to be solved in this application is to provide a method and system for data processing under the Hadoop platform, which solves the problem of large amount of calculation and untimely result feedback during data archiving and data query and analysis

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data processing method and system under Hadoop platform
  • Data processing method and system under Hadoop platform
  • Data processing method and system under Hadoop platform

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0050] refer to figure 1 , shows a flowchart of a data processing method under the Hadoop platform of the present application. The method is applied to a scenario where data needs to be archived, stored, and inquired and analyzed, and includes the following steps.

[0051] Step 100, obtain the Hive table structure information of the stock data on the Hadoop platform, compare the structure information of the data to be stored with the Hive table structure information, and obtain data structure change information.

[0052] Wherein, the Hive table structure information of the data at least includes: the attribute of the data column, and the position of the data column. The data structure change information includes: no change, or includes: adding a column, deleting a column, adjusting a column position, any one flag or a combination of multiple flags.

[0053]When the attributes of the data columns in the data structure information of the data to be stored are exactly the same ...

Embodiment 2

[0061] see figure 2 As shown, based on the data processing method in Embodiment 1 of the present application, another embodiment of the present application further includes:

[0062] Step 160, re-store the stored data according to the updated Hive table structure.

[0063] When the data format of the data to be stored is different from the data format of the stock data, that is, when the obtained data structure change information includes the "new column" flag, by adding the corresponding data column after the obtained structure of the stock data, Update the Hive table structure. That is, when the method disclosed in Embodiment 1 of the present application is used for data storage, the data columns of the Hive table may increase with the increase of the data storage capacity and the passage of time. In order to facilitate the compliance management of stored data, preferably, the data structure of all stored data needs to be adjusted to the updated Hive table structure, that...

Embodiment 3

[0067] Based on the data processing method under the Hadoop platform of Embodiment 1 of the present application, in another embodiment of the present application, such as figure 2 shown, also includes:

[0068] Step 180, determine the Hive table structure of the stored data corresponding to the time point specified in the data query instruction;

[0069] Step 200, read the stored data, and add null data columns or delete redundant data columns at the end of the stored data according to the determined Hive table structure to obtain query data.

[0070] When data is archived and stored, when the data format of the data to be stored is different from that of the stock data, that is, when the obtained data structure change information contains the "new column" flag, the structure of the stock data acquired Then add the corresponding data columns and update the Hive table structure, that is, as the amount of data storage increases, the data columns of the Hive table may increase....

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present application provides a data processing method under a Hadoop platform, and belongs to the field of data processing. The method comprises: acquiring Hive table structure information of inventory data of a Hadoop platform, comparing structure information of to-be-stored data with the Hive table structure information, and obtaining data structure change information; updating a Hive table structure according to the obtained data structure change information and the acquired Hive table structure information of the inventory data; and formatting the to-be-stored data according to the updated Hive table structure, and storing the formatted to-be-stored data. With adoption of the method disclosed by the present application, compatibility of the data structure of the stored data is effectively ensured, and when the archived and stored data needs to be analyzed and queried, the data format does not need to be counted and converted, so that not only are computing resources saved, but also analysis and query results can be rapidly fed back, and data query and analysis efficiency is improved.

Description

technical field [0001] This application relates to the field of data processing, in particular to a data processing method and system under the Hadoop platform. Background technique [0002] With the development of computer technology, more and more data needs to be stored and processed, and the structure of data generated by different terminals or services in different periods may also be different. [0003] For example, when using a Hadoop cluster for data management, the historical data of the business system needs to be archived and saved in the archive system. In the existing technology, the massive data from the business system needs to be stored in the Hive table, which is convenient for data management and query. However, due to changes in business requirements and other reasons, it is inevitable that some tables in the business system will change in table structure, which will cause the data format of the archiving source data in each period to mismatch. [0004] W...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/235G06F16/22
Inventor 朱大勇完献忠滕一勤
Owner BEIJING ADVANCED DIGITAL TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products