Orderly table data management method and system based on Hadoop distributed file system (HDFS)

A distributed file and management method technology, applied in the orderly management of table data based on the Hadoop distributed file system and the system field, can solve long delays, lack of optimization mechanisms for multi-channel data aggregation, and inability to meet manual queries, etc. problems, to achieve the effect of improving efficiency

Active Publication Date: 2013-10-16
BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
View PDF4 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, since Hive is more inclined to a general-purpose data warehouse, it needs to convert all queries into MapReduce tasks, so it causes a long

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Orderly table data management method and system based on Hadoop distributed file system (HDFS)
  • Orderly table data management method and system based on Hadoop distributed file system (HDFS)
  • Orderly table data management method and system based on Hadoop distributed file system (HDFS)

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0042] Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

[0043] The technical scheme proposed by the invention mainly manages the table data imported in batches, and executes slice management on the table data belonging to the same table. That is, the batch data of the table imported each time is sorted based on Hadoop as the Base / Patch of the table, after that, index data is generated for the Base / Patch, and the sorted batch data and the generated Indexed data distributions are stored as separate files. In this way, a logically globally ordered table management system can be constructed. In addition, in the system of the present invention, the operation on any table record in the table (such as inserting a record, modifying a field in a record, deleting a record, etc.) A record of operations that manipulate data. Therefore, the batch data imported each time actually includes multiple operation recor...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an orderly table data management method based on a Hadoop distributed file system (HDFS) and a method and a system for reading table data. The orderly table data management method based on the HDFS comprises the following steps of: receiving the name of a to-be-operated table input by a user and bulk data comprising multiple operation records; sorting the received bulk data based on Hadoop, generating index data from the sorted bulk data, storing the sorted bulk data and the generated index data in a file form under a specified directory of the HDFS, and transmitting the name of the table, the file name of a file stored with the bulk data, the file name of a file stored with the index data and path data of the specified directory to a master server.

Description

technical field [0001] This application relates to a method for orderly managing table data based on Hadoop Distributed File System (HDFS) and a system using the method, in particular to a method for analyzing table data in Hadoop Distributed File System (HDFS). A method and system for managing and sorting slices and generating index data, and managing the table data in the form of files. Background technique [0002] In various systems, it is necessary to manage massive data. For this reason, Hadoop technology has been widely used. It can store massive data such as logs, web pages, URLs, etc. in the Hadoop Distributed File System (HDFS). Generally, the following processing and operations are required on these data: [0003] 1. In the log analysis and problem tracing program, manually check one or more pieces of data; [0004] 2. Read data in batches; [0005] 3. Traverse all or part of the data in batches in a specific order, such as traversing all URLs of a site; [00...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 张众谭待
Owner BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products