Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

System and method for carrying out real-time statistics on mass data based on Hadoop

A Hadoop cluster, massive data technology, applied in the field of real-time statistics of massive data based on Hadoop, can solve problems such as the inability to meet real-time statistics of data, and achieve the effect of improving data storage speed

Inactive Publication Date: 2015-05-27
INSPUR GROUP CO LTD
View PDF4 Cites 23 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] It can be seen that in the existing technology, massive data is uploaded to the HBase database offline, so it cannot meet the requirements of real-time data statistics

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • System and method for carrying out real-time statistics on mass data based on Hadoop
  • System and method for carrying out real-time statistics on mass data based on Hadoop
  • System and method for carrying out real-time statistics on mass data based on Hadoop

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the drawings in the embodiments of the present invention. Apparently, the described embodiments are only some of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0040] An embodiment of the present invention proposes a method for real-time statistics of massive data based on Hadoop, see figure 1 , the method includes:

[0041] Step 101: set up a Hadoop cluster consisting of multiple nodes and an Hbase database in the Hadoop cluster;

[0042] Step 102: setting the memory database;

[0043] Step 103: Obtain network data, and analyze the obtained network data;

[0044] Step 104: Organize the parsed data into a s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a system and a method for carrying out real-time statistics on mass data based on Hadoop. The system comprises a Hadoop cluster composed of a plurality of nodes, an Hbase database of the Hadoop cluster and a main memory database. The system also comprises a network data processing unit, an enqueue unit, a Storm processing unit and an uploading unit, wherein the network data processing unit is used for obtaining network data and parsing the obtained network data; the enqueue unit is used for organizing parsed data to structural data format to be stored in an MQ queue; the Storm processing unit is used for carrying out streaming calculating processing on data in the MQ queue through Storm to obtain processed data and storing obtained processed data in the main memory database; and the uploading unit is used for carrying out summarization and persistence on predetermined amount of data in the main memory database to be uploaded to the Hbase database of the Hadoop cluster. By utilizing the system and the method, mass data can be stored in the Hbase database in real time.

Description

technical field [0001] The invention relates to the technical field of network communication, in particular to a method and device for real-time statistics of massive data based on Hadoop. Background technique [0002] With the explosive growth of information data and the business needs of various industries, Hadoop, a distributed system infrastructure, has emerged. Hadoop provides high-speed computing and massive storage for clusters. [0003] HBase is a distributed, column-oriented, highly reliable, and scalable open source database. It is a sub-project of the Hadoop project, just as Bigtable utilizes the distributed data storage provided by the Google File System (File System). HBase provides capabilities similar to Bigtable on top of Hadoop. [0004] At present, based on Hadoop, it is possible to conduct offline statistics on massive data. The implementation methods include: acquiring and caching massive data; processing the stored massive data; Upload to Hadoop's HBas...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/27
Inventor 牛硕徐正礼魏金雷臧勇真赵明超
Owner INSPUR GROUP CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products