Unlock instant, AI-driven research and patent intelligence for your innovation.

A Hive-based hierarchical design method of a university data warehouse

A data warehouse and design method technology, applied in the database field, can solve the problems of poor flexibility and the inability to realize incremental + full data synchronization in the three-tier design framework, and achieve the effect of strong scalability

Inactive Publication Date: 2019-01-11
北京桃花岛信息技术有限公司
View PDF6 Cites 29 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The traditional data warehouse is mainly divided into the ODS data storage layer (basically saving the full amount of data) - the DW data warehouse layer - the DM (Data Market) data mart layer. The traditional three-tier design framework cannot achieve incremental + full data synchronization. Put the complex logic of data in the DW layer, which is less flexible

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Hive-based hierarchical design method of a university data warehouse

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0048] Such as figure 1 , is a university data warehouse framework, the whole framework is divided into four layers, namely data source, data storage layer, data analysis layer and data application layer.

[0049] The data sources include data from various systems of the school, and the format includes structured tables and unstructured log data;

[0050] ETL tools, such as Sqoop tools or open source kettle, clean, convert, and load data from data sources to the Hadoop distributed platform, use Hdfs (distributed file system) distributed storage, and Hive distributed processing;

[0051] Through the Hive tool, the data of the data storage layer is established as a data warehouse, that is, a data analysis layer. The data warehouse is divided into an ODS data storage layer, a DWD data detail layer, a DW data summary layer, and a DWA data application layer;

[0052] Among them, the ODS data storage layer is a data cache layer, which is used to store the acquired original data, re...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a Hive-based hierarchical design method of a university data warehouse. The method comprises the following steps: acquiring data; extracting data by using an ETL tool; synchronizing the acquired structured and unstructured data to a Hive platform; constructing a data warehouse by Hive and dividing the data warehouse into an ODS data storage layer, a DWD data detail layer,a DW data aggregation layer and a DWA data application layer; performing data warehouse modeling, determining the analysis topic, using a dimension modeling method, and using the smallest granularityto design a dimension table, and designing a fact table; designing a fact table, divided into fact table into a fact table-non-partitioned tables and a partitioned fact table. The hierarchical designof the data warehouse provided by the invention is more flexible and expandable than the three-layer analysis design of the other large data warehouses, and the corresponding analysis subject can be added according to the service requirements in the later stage, and the advantages of the Hive large data platform and the star model design method of the data warehouse are effectively combined.

Description

technical field [0001] The invention belongs to the technical field of databases, and in particular relates to a Hive-based hierarchical design method for university data warehouses. Background technique [0002] With the maturation of university information system construction and the continuous improvement of management requirements, data warehouse technology can be introduced to restructure the data of university information system. According to the characteristics and development needs of universities, it can be designed according to the angle that is more conducive to decision-making analysis. Data mining and other analysis are carried out on the data warehouse, so that these valuable data resources can realize the real information value, improve the utilization rate of management information data, and then improve the management level of universities. [0003] Hive is a data warehouse tool based on Hadoop, which can map structured data files into a database table, and ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/21G06F16/28
Inventor 杨连群
Owner 北京桃花岛信息技术有限公司