Kettle-based method for extraction and statistics of data on large data platform based on kettle
A big data platform and data extraction technology, applied in database models, relational databases, electrical digital data processing, etc., can solve problems such as big data cluster network resource consumption
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Examples
Embodiment 1
[0030] A kettle-based big data platform data extraction and statistics method, the method transforms the source code of the kettle to obtain the situation of each data extraction task, and records it in an hbase table, the hbase table is called historical situation Table, which records the data volume of all data tables;
[0031] The relational database is regularly incrementally extracted through the sqoop task every day. The amount of data extracted at one time is the data increment of one day. The daily data increment of each data table is recorded and written into the hbase table to realize the data volume. The situation is queried according to the combination of table and time.
[0032] Traditional relational databases supporting online systems and big data technology processing offline statistical analysis will coexist for a long time. In these two systems, the kettle acts as a bridge and is responsible for data transmission. Through the transformation of the source co...
Embodiment 2
[0034] On the basis of Example 1, the method described in this example records the daily data increment of each data table into a data history table of hbase, and performs rowkey (row primary key) on this history table design:
[0035] Serial number rowkey rowkey example qualifier
[0036] 1 {table name} person_info data volume
[0037] 2 {table name} spacer {time} person_info@20150604 data volume
[0038] 3 {time} spacer {table name} 20150604@person_info data volume
[0039] Among them, the table name in rowkey is the table name of the data table, not the table name of the historical situation table;
[0040] In the qualifier of the rowkey in 1, the data amount indicates the total amount of data in the data table recorded in the rowkey, so that the data amount of a certain data table can be quickly queried;
[0041] In 2, the rowkey is composed of the name of the data table and the time. The spacer distinguishes the name of the table from the time. The amount of data in t...
Embodiment 3
[0045] On the basis of Embodiment 2, after obtaining the sqoop task information, this embodiment records the data volume of this task in the historical situation table, and the three rowkeys in the historical situation table must be written or updated.
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com