Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Distributed scheduling analysis method, system and device for big data, and storage medium

A big data and distributed technology, applied in the field of data processing, can solve the problems of low efficiency of big data analysis and processing, poor standardization, and poor big data scheduling ability, and achieve the effect of improving job processing performance and data analysis efficiency.

Inactive Publication Date: 2017-12-22
CHINA CONSTRUCTION BANK
View PDF6 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In view of the above-mentioned defects of the prior art, the embodiments of the present invention provide a big data distributed scheduling analysis method, system, device and storage medium, which can effectively solve the problem of current big data scheduling by adopting a distributed scheduling method and a mature job framework. Insurmountable problems such as poor ability, low processing efficiency, low efficiency and poor standardization of big data analysis and processing

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed scheduling analysis method, system and device for big data, and storage medium
  • Distributed scheduling analysis method, system and device for big data, and storage medium
  • Distributed scheduling analysis method, system and device for big data, and storage medium

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment

[0026] figure 1 It is a schematic flowchart of a big data distributed scheduling analysis method according to an embodiment of the present invention. refer to figure 1 , specific examples are as follows, the method includes:

[0027] In step S1, the scheduling server module obtains the use status information of the distributed job servers in sequence, and selects the job server that meets the resource idle standard to run the job program according to the use status information. Among them, the ETL systems deployed on multiple job servers monitor their own status by regularly updating the database status to maintain stability.

[0028] In step S2, the big data analysis module performs big data analysis and processing on the data after the operation program is run by using a componentized operation framework.

[0029] The present invention can effectively improve job processing performance and data analysis efficiency based on big data by adopting a distributed scheduling mod...

Embodiment 2

[0043] In another embodiment of the present invention, in addition to the above-mentioned processing method, the method further includes: adopting a dual monitoring mode to monitor and maintain the running process of the scheduling server module. In dual monitoring mode, check_monitor monitors monitor, and monitor also monitors check_monito. At the same time, the monitor monitors the ETL server process; it also scans version patches to maintain the version consistency of the ETL system.

[0044] By adopting the above-mentioned dual monitoring modules, the operation stability of each Server can be effectively improved and manual intervention can be reduced.

Embodiment 3

[0046] In another embodiment of the present invention, in addition to the above-mentioned processing method, the method further includes: the scheduling server module selects the corresponding language execution tool according to the job language category, and uses dynamic loading way to execute multilingual calling programs. The scheduling server module, as a core module, supports multilingual scheduling, refreshes through the Oracle status flag field, and realizes intercommunication between different Server states, between job flow states, and between job states; and loading jobs, adjusting Job relationship, complete job scheduling.

[0047] By supporting multi-language developers for collaborative development, developers can focus more on business logic rather than specific languages.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a distributed scheduling analysis method, system and device for big data, and a storage medium. The method comprises the following steps that: a scheduling service side module obtains the use state information of a distributed operation server in sequence, and according to the use state information, the operation server which conforms to a resource idle state is selected to operate an operation program; and a big data analysis module adopts a componentized operation frame to carry out big data analysis processing on data which operates the operation program. Through a distributed scheduling way and a mature operation framework, operation processing performance and data analysis efficiency based on big data can be effectively improved.

Description

technical field [0001] The present invention relates to the field of data processing, and more specifically, to a large data distributed scheduling analysis method, system device and storage medium. Background technique [0002] With the popularity of the Internet and e-commerce, the scheduling processing and data analysis of big data have become more and more complicated. For big data processing in actual production, due to the complexity and diversity of real data, it is essential to combine multiple technologies (distributed platform technology Hadoop, distributed database technology GreenPlum), and multiple languages ​​(commonly used such as Java , Perl, Python, Bash) and even distributed processing are required. [0003] Generally speaking, the existing ETL (abbreviation of Extract-Transform-Load, used to describe the process of extracting, transforming, and loading data from the source to the destination) system is mainly used to construct Data Warehouse (DW), whose ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/2219G06F16/252
Inventor 李威范会善苏建标王泽龙吴仰波
Owner CHINA CONSTRUCTION BANK
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products