Unified data development and distributed scheduling system

A scheduling system and distributed technology, applied in the field of big data, can solve the problems of lack of architecture and development experience, error-prone, high difficulty for non-developers, and achieve the effect of improving data development efficiency, reducing usage costs, and enriching business scenarios

Pending Publication Date: 2021-10-22
深圳银兴智能数据有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Data source: To unify data development, the first problem we face is how to unify multiple different types of data sources (such as Sqlserver, Mysql, Oracle, MongoDB, Hbase, Hive, FTP, hdfs, ES, etc.), traditional data warehouses The solution is to divide the development of ODS, EDW and ETL into tools from different vendors, which leads us to operate on multiple platforms or through multiple different tools. The steps are cumbersome and error-prone, because of the lack of a unified architecture and The development experience makes the operation complicated and the operation and maintenance work efficiency is not high;
[0005] Data processing: The traditional method is based on the offline analysis of stand-alone relational databases. The primary problem is that the business scenarios are not rich enough, and only relational databases are supported. Others such as hive and impala cannot support them, let alone real-time message queues. Kafka, and secondly, the processing speed of a single machine is not high, which is unbearable for many enterprise organizations. The above are the many challenges faced in data processing;
[0006] Scheduling problem: There is no unified scheduling system for scheduling, and all rely on manual linux command operations. Even the online workflow and configuration timing scheduling need to write python files for execution. This is too difficult for non-developers, and it is not easy to detect errors without a visual interface , at the same time, the original airflow is too simple, and the operation experience is not friendly enough, so the biggest problem of scheduling is the technical difficulty;

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Unified data development and distributed scheduling system
  • Unified data development and distributed scheduling system
  • Unified data development and distributed scheduling system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] In order to make the purpose, technical solution and advantages of the present invention clearer, the following will further describe the implementation of the present invention in detail in conjunction with the accompanying drawings.

[0041] The invention provides a unified data development and distributed scheduling system, which solves the problem of how to unify data sources; improves the speed of data processing and enriches the supported business scenarios; solves the problem of difficult imperative scheduling technology, and improves For the benefits of O&M efficiency, see Figure 1-Figure 3 , including big data self-service platforms;

[0042] Further, one end of the big data self-service platform is connected to the home page, and the big data self-service platform is respectively equipped with my workbench, account management, resource management, project space, data authority management and system management;

[0043] The project space is respectively equip...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of big data, and particularly relates to a unified data development and distributed scheduling system which comprises a big data self-service platform, one end of the big data self-service platform is connected with a homepage, the big data self-service platform is provided with a my workbench, an account management module, a resource management module, a project space, a data authority management module and a system management module. A data management module, a data development module, a scheduling monitoring module and a project configuration module are respectively arranged in the project space; an HDFS (Hadoop Distributed File System) file query module, a table maintenance module and a data query module are respectively set in the data management module; a data source, a data exchange platform, a task development module and a workflow development module are arranged in the data development module; the data development module comprises a data integration module, a batch calculation module and a scheduling monitoring module, the data integration module comprises a datax task and a sqoop task, the problems of high technical difficulty and unfriendly user experience are solved, and the scheduling efficiency of operation and maintenance personnel is improved.

Description

technical field [0001] The invention relates to the field of big data technology, in particular to a unified data development and distributed scheduling system. Background technique [0002] Big data refers to a collection of data that cannot be captured, managed, and processed by conventional software tools within a certain period of time. It is a massive, high-growth rate that requires a new processing model to have stronger decision-making power, insight and discovery, and process optimization capabilities. and diverse information assets. [0003] At present, in the process of big data workflow development, task development and scheduling monitoring, the following problems are usually faced: [0004] Data source: To unify data development, the first problem we face is how to unify multiple different types of data sources (such as Sqlserver, Mysql, Oracle, MongoDB, Hbase, Hive, FTP, hdfs, ES, etc.), traditional data warehouses The solution is to divide the development of...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/25G06F16/27G06F16/28
CPCG06F16/252G06F16/27G06F16/283
Inventor 李勇
Owner 深圳银兴智能数据有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products