Data parallel processing system based on Cassandra

A parallel processing and data technology, applied in the field of Cassandra-based data parallel processing systems, to achieve the effects of high reliability, availability, and strong scalability

Active Publication Date: 2013-05-15
HUAZHONG UNIV OF SCI & TECH
View PDF2 Cites 40 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] For the defects of the prior art, the purpose of the present invention is to provide a data parallel processing system based on Cassandra, aiming to solve the deficiency of the existing Cassandra system for complex data processing functions, the system has high reliability, It has the advantages of good scalability, high throughput rate and the ability to quickly respond to simple data queries, and at the same time has the ability to complexly process massive amounts of data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data parallel processing system based on Cassandra
  • Data parallel processing system based on Cassandra

Examples

Experimental program
Comparison scheme
Effect test

example

[0042] The basic hardware configuration of the server of master node 1 of the present invention is shown in Table 1. If the system load is heavy, the modules on master node 1 can be installed on multiple servers respectively.

[0043] CPU Memory hard disk operating system The internet E56202.40G 16G 800G Linux 2.6 Dual 1000M network card and above switch

[0044] Table 1 Hardware and network configuration of the master node

[0045] Node 2 is a computing and storage cluster that can be dynamically expanded. The basic configuration is shown in Table 2:

[0046] CPU Memory hard disk operating system The internet E56202.40G 4G 500G Linux2.6 10M network card

[0047] Table 2 Hardware and network configuration of child nodes

[0048] The present invention effectively combines the MapReduce data processing technology with the decentralized distributed storage technology. It utilizes the advantages of high r...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data parallel processing system based on Cassandra. The data parallel processing system based on the Cassandra comprises a Hadoop main node, a plurality of Hadoop auxiliary nodes and a Cassandra storage end arranged on the Hadoop auxiliary node, wherein the main node comprises a user interface module, a Cassandra inquiring module, a job scheduling module, a job queue module, and a job tracker, wherein the auxiliary node comprises a task tracker, an input module, an output module and a Mapreduce module, the user interface module is used for receiving a user request, and judging that the type of the user request is a data inquiring request, or a submitting data processing job request, or a job information inquiring request, if the type of the user request is the data inquiring request, the user interface module sends the data inquiring request to the Cassandra inquiring module, if the type of the user request is the submitting data processing job request or the job information inquiring request, and the user interface module sends the submitting data processing job request or the job information inquiring request to the job scheduling module. The data parallel processing system based on the Cassandra has the advantages of being high in reliability, good in expansibility, and high in a throughput rate. The data parallel processing system based on the Cassandra has the capacity of simply inquiring and rapidly responding to the data, and meanwhile has the complex processing capacity to mass data.

Description

technical field [0001] The invention belongs to the field of distributed computing and system structure in the computer field, and more specifically relates to a Cassandra-based data parallel processing system. Background technique [0002] Cassandra is an open source, distributed, centerless, elastically scalable, highly available, fault-tolerant, adjustable consistency, column-oriented non-relational database. It is based on the distributed design of Amazon's Dynamo database and the data model of Google's BigTable, created by Facebook, which has already achieved applications in some of the most popular websites. At present, with the rise of Web2.0, the amount of data is increasing rapidly, and the storage and processing requirements of massive data pose a challenge to traditional relational databases, because traditional relational databases cannot meet the needs of ultra-large-scale and high-concurrency data. Process demand. For example, a Web2.0 website needs to genera...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 石宣化金海吴松刘炜
Owner HUAZHONG UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products