IndexR real-time data analyzing library

A real-time data and data technology, applied in the Internet field, can solve the problems of good scanning, compression performance, old data cannot be deleted, lack of index flexibility, etc., to achieve the effect of simple and reliable structure, high availability, and efficient hardware utilization

Inactive Publication Date: 2017-09-05
广州舜飞信息科技有限公司
View PDF0 Cites 33 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

4. Scalable, low cost, and easy maintenance: business will develop rapidly, new data sources will be generated, new tables will be added, and old data cannot be deleted, which will bring huge cost pressure and operation and maintenance pressure
The current Arquet, ORC and other data formats usually have good scanning and compression performance, but lack effective indexes and necessary flexibility

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • IndexR real-time data analyzing library
  • IndexR real-time data analyzing library

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment

[0031]An IndexR real-time data analysis library, including: system architecture, deployment architecture, storage structure and real-time modules;

[0032] The system architecture is responsible for file storage format, including index and data, real-time data import, table definition operation, query optimization, and data cache. The distributed computing framework (Drill / Spark) is responsible for specific query operations on IndexR data, as well as other computing tasks, Hadoop and peripheral tools - providing distributed file storage, offline batch computing, offline data management, and various offline ETL tasks, The perfect combination of IndexR and Hadoop can be used as a highly compressed file format with its own index, compatible with all operations of Hive, Kafka-message queue, data flows into IndexR through Kafka, Zookeeper-cluster status management;

[0033] In the Hadoop system environment, deploying IndexR on an existing cluster can usually be completed within hal...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an IndexR real-time data analyzing library. The IndexR real-time data analyzing library realizes a structure data format which can be arranged in a distribution environment, can be subjected to parallel processing, is provided with an index and is of a rank type. On the basis of the data format, a data warehouse system is built through the IndexR; mass data sets can be subjected to rapid statistical analysis (OLAP) on the basis of Hadoop ecology; the data can be imported in real time, moreover, the zero delay of querying is realized. The IndexR is deigned to solve the problems such as slow analyzing under a big data scene, data delay, and complex system. According to the IndexR real-time data analyzing library, the data is stored in HDFS; the Zookeeper is used for communicating and negotiating in the cluster; the Hive is used for conveniently managing partitioned data; the data can be rapidly imported in real time through Kafka; an outstanding distributive querying engine Apache Drill is used in a querying layer.

Description

technical field [0001] The invention belongs to the technical field of the Internet, in particular to an IndexR real-time data analysis library. Background technique [0002] The programmatic advertising business needs to connect with major media on the entire network, and generate millions of analytical data per second. These data track and describe the process of advertising activities in detail, such as the number of creative impressions, clicks, registrations generated by activities, and return visits. We need to analyze and process these data in real time, including customer reporting, delivery optimization, fraud analysis, fee settlement, etc. The query pattern of data users is non-fixed and unpredictable, and with the surge of business volume, the amount of data also increases sharply. We need a new technology to solve these requirements: 1. Super large data set, low query latency: the query mode cannot be predicted and cannot be pre-calculated; the amount of table ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06Q30/02
CPCG06F16/172G06F16/13G06F16/182G06F16/22G06F16/221G06F16/2228G06F16/245G06F16/2453G06F16/2465G06F16/27G06F2216/03G06Q30/0246
Inventor 李华煜韦万
Owner 广州舜飞信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products