Mass data real-time acquisition and processing method under supercomputing environment

A massive data, real-time collection technology, applied in the direction of electronic digital data processing, computing, resource allocation, etc., can solve the problems of data loss, system security impact, unable to meet the reliability requirements of data collection and processing, etc., to achieve high availability, Guaranteed isolation and improved data reliability

Inactive Publication Date: 2018-10-19
XI AN JIAOTONG UNIV
View PDF1 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the common data collection tools in the supercomputing environment built with very large computing nodes can no longer meet the reliability requirements of data collection and processing. If the data is generated too fast, data loss will occur, and the node storing metadata will be down. will affect the security of the entire system

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mass data real-time acquisition and processing method under supercomputing environment
  • Mass data real-time acquisition and processing method under supercomputing environment
  • Mass data real-time acquisition and processing method under supercomputing environment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0042] First, the application program is sent by an external visitor to request forwarding through the nginx proxy of the manager node to be mapped to a specific specific container application. The flume process will log information in the log directory (including: IP, Date, addr, port, number of cores) , memory, running time, etc.) are sent to kafka, and hou spark reads the messages in kafka and makes statistics. The opinion only accumulates the number of visits to the same one, and displays the results through the data visualization tool ichart. When there is no data, it is in a waiting state.

[0043] Considering that the generation speed of the data end is too fast, the processing speed of spark is different due to the complexity of logical calculations. When the speeds of the two ends are not synchronized, data loss will occur. Here, kafka is used as the intermediate buffer function. Here, kafka The role of the cache is the same as that of the cache between the computer's...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a mass data real-time acquisition and processing method under the supercomputing environment. Firstly, the message source of the data source side generated by the supercomputing cluster data is collected through the source side of the flume software, the collected message source is then gathered into Kafka software through the flume software, and the message sources is stored after being buffered by the Kafka software, the message source to be processed is extracted from the Kafka software through Spark software for data processing, so that the real-time collection andprocessing of massive data under the supercomputing environment is realized. The Kafka is used as an intermediate buffer, and the reliability of the data is improved. A distributed message subscription system based on the Kafka can have multiple message producers and multiple consumers, and the high availability of the system message is guaranteed. The method combines with the docker containerization technology and the load balancing technology to realize container orchestration and management, the method can be applied to real distributed data collection, real-time processing and scalable super-large cluster environment.

Description

technical field [0001] The present invention relates to a massive data collection and processing software architecture in a supercomputing center environment, in particular to a massive data real-time collection and processing framework under the requirements of high concurrency, high availability, data security and completeness. The stream processing data processing technology, distributed message subscription technology, and distributed storage technology are applied to build a platform from massive data collection to processing. Background technique [0002] With the implementation of my country's innovation-driven strategy and the continuous advancement of industrial transformation and upgrading and the deep integration of industrialization and industrialization, industrial product research and development has received unprecedented attention. The Ministry of Science and Technology of China proposed: Relying on the national high-performance computing environment, combine...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/50
CPCG06F9/505
Inventor 伍卫国张祥俊
Owner XI AN JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products