Orderly table data management method and system based on Hadoop distributed file system (HDFS)

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A distributed file and management method technology, applied in the orderly management of table data based on the Hadoop distributed file system and the system field, can solve long delays, lack of optimization mechanisms for multi-channel data aggregation, and inability to meet manual queries, etc. problems, to achieve the effect of improving efficiency

Active Publication Date: 2013-10-16

BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD

View PDF4 Cites 11 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, since Hive is more inclined to a general-purpose data warehouse, it needs to convert all queries into MapReduce tasks, so it causes a long delay, cannot meet the needs of manual queries, and lacks the ability to aggregate multi-channel data based on key values. optimization mechanism

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0042] Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

[0043] The technical scheme proposed by the invention mainly manages the table data imported in batches, and executes slice management on the table data belonging to the same table. That is, the batch data of the table imported each time is sorted based on Hadoop as the Base / Patch of the table, after that, index data is generated for the Base / Patch, and the sorted batch data and the generated Indexed data distributions are stored as separate files. In this way, a logically globally ordered table management system can be constructed. In addition, in the system of the present invention, the operation on any table record in the table (such as inserting a record, modifying a field in a record, deleting a record, etc.) A record of operations that manipulate data. Therefore, the batch data imported each time actually includes multiple operation recor...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides an orderly table data management method based on a Hadoop distributed file system (HDFS) and a method and a system for reading table data. The orderly table data management method based on the HDFS comprises the following steps of: receiving the name of a to-be-operated table input by a user and bulk data comprising multiple operation records; sorting the received bulk data based on Hadoop, generating index data from the sorted bulk data, storing the sorted bulk data and the generated index data in a file form under a specified directory of the HDFS, and transmitting the name of the table, the file name of a file stored with the bulk data, the file name of a file stored with the index data and path data of the specified directory to a master server.

Description

technical field [0001] This application relates to a method for orderly managing table data based on Hadoop Distributed File System (HDFS) and a system using the method, in particular to a method for analyzing table data in Hadoop Distributed File System (HDFS). A method and system for managing and sorting slices and generating index data, and managing the table data in the form of files. Background technique [0002] In various systems, it is necessary to manage massive data. For this reason, Hadoop technology has been widely used. It can store massive data such as logs, web pages, URLs, etc. in the Hadoop Distributed File System (HDFS). Generally, the following processing and operations are required on these data: [0003] 1. In the log analysis and problem tracing program, manually check one or more pieces of data; [0004] 2. Read data in batches; [0005] 3. Traverse all or part of the data in batches in a specific order, such as traversing all URLs of a site; [00...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30

Inventor张众谭待

OwnerBAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD

Orderly table data management method and system based on Hadoop distributed file system (HDFS)

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology