Method for changing Hive data based on indexes

A data and indexing technology, which is used in electronic digital data processing, digital data information retrieval, special data processing applications, etc. It can solve the problems that data changes cannot be reflected in the database synchronously, Hive cannot support data changes, and cannot achieve accurate one-by-one updates. , to achieve the effect of incremental update

Pending Publication Date: 2022-01-28
SHANDONG LANGCHAO YUNTOU INFORMATION TECH CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, there is an important problem in this scenario: the system construction process can collect the historical data at one time, but how to collect the incremental data of the subsequent business departments into the lake, so as to realize the accurate data entry into the lake and ensure the consistency of the data
Because the main problem with using HDFS and Hive as the storage engine of the data lake at present is that the data modified by the business department cannot be accurately updated one by one when it is synchronized to Hive
[0005] In order to solve the dilemma that Hive cannot support data changes when users synchronize data from traditional databases to data lakes, resulting in data changes that cannot be reflected in the database synchronously, this invention proposes a method for changing Hive data based on indexes

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for changing Hive data based on indexes

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] In order to enable those skilled in the art to better understand the technical solutions in the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the embodiments of the present invention. Apparently, the described embodiments are only some of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts shall fall within the protection scope of the present invention.

[0033] The method for changing Hive data based on indexes includes the following steps:

[0034] The first step is to obtain the primary key Keys of the stock data HData and the primary key Keys of the incremental data SData through Spark (computing engine);

[0035] In the second step, Spark obtains the file path of the corresponding historical stoc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention particularly relates to a method for changing Hive data based on indexes. The method for changing the Hive data based on the indexes comprises the steps that stock data HData primary keys Keys and incremental data SData primary keys Keys are obtained through Spark; the Spark obtains a file path corresponding to the historical stock data HData from the HBase database index table according to a primary key Keys of the incremental data SData; then, the Spark merges the incremental data SData with the corresponding historical stock data HData according to the primary key Keys and the flag bit to form a new file; and finally, the Spark marks whether the file is valid or not according to the primary key Keys and the file path, and if the file is invalid, the file is marked to be invalid. According to the method for changing the Hive data based on the indexes, the file related to modification is quickly read and positioned by utilizing the characteristics of recording the file position by the HBase index table and quickly merging the file by the Spark, so that the merging of the newly added data and the historical stock data is realized, and the incremental updating of the Hive data is realized.

Description

technical field [0001] The invention relates to the technical field of big data computing applications, in particular to a method for modifying Hive data based on indexes. Background technique [0002] The construction of a new generation of information technology infrastructure, the development of the digital economy, and the industrial Internet are currently the most concerned concepts. These concepts involve all aspects and cannot be separated from the support of big data. For example, 5G networks, data centers, artificial intelligence, etc. are all driven by technological innovation and based on information networks to provide support for social digital transformation, intelligent upgrades, and basic services. These technologies are based on data fusion and big data integration, based on artificial intelligence and machine learning, and provide data foundation and computing power support for enterprises, governments, education and other industries. How to collect and st...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/13G06F16/174G06F16/182
CPCG06F16/134G06F16/1756G06F16/182
Inventor 周永进胡清
Owner SHANDONG LANGCHAO YUNTOU INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products