[0027] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.
[0028] The inventor found a technical problem: the existing stand-alone deployment, that is, the business system is deployed on one or several fixed machines. Disadvantages include: low reliability. Tasks are fixedly deployed on one server. If an abnormality occurs on the server, it will affect the operation of tasks deployed on it. At the same time, the stand-alone performance is limited. As the business grows and the amount of data increases, the system may face performance bottlenecks. In the stand-alone deployment mode, when the performance of the batch running system encounters bottlenecks, the only solution is to add servers and split tasks. However, this solution lacks effective management of resources, and the cost is relatively high; and the reliability and robustness of the system need to be improved, and it is not easy to maintain.
[0029] Therefore, because the inventor has considered the above technical problems, and with the development of Internet finance, the business scale, business type and business data volume in the banking system have expanded rapidly, and the performance and reliability of the single-machine deployment running batch architecture gradually cannot meet the needs of the business. Requirements require changes from the software architecture level to introduce a highly available, scalable, and easy-to-maintain batch task management framework.
[0030] The present invention combines the existing distributed technology and banking business requirements to design a batch running technology solution based on distributed deployment, that is, a batch running solution based on distributed architecture, see figure 2 , the solution adopts distributed cluster deployment: it is composed of scheduling module and task cluster, and each server in the cluster is an independent task node with the same capability. Nodes complete task execution according to scheduling instructions. This solution can be based on the Dubbo framework, introduce distributed technology, and through the customized development of the scheduling module, the scheduling module will coordinate the task execution and evenly distribute the tasks to the task nodes in the distributed cluster. Among them, Dubbo is an open source high-performance distributed service framework, which realizes the output and input functions of services through high-performance RPC. In this scheme, the scheduling module assigns the execution instructions of running batch tasks to the task nodes in the cluster, which reduces the coupling between tasks and servers and enhances the reliability of the system. At the same time, the load balancing of the task cluster can be realized through the scheduling module, which improves the overall resource utilization of the cluster. At the same time, the cost is low and the maintenance is simple. The following is a detailed introduction to the batch processing solution based on the distributed architecture.
[0031] figure 2 is a schematic structural diagram of a batch processing device based on a distributed architecture in an embodiment of the present invention, such as figure 2 As shown, the device includes: a task trigger module 01, a scheduling module 02, and a task cluster 03 including multiple task execution nodes; wherein:
[0032] A task trigger module, configured to initiate a batch running task execution request to the scheduling module according to a pre-configured batch running task execution strategy;
[0033] A scheduling module, configured to determine, from a plurality of task execution nodes, a task execution node that executes the batch running task execution request according to the batch running task execution request, and send the batch running task execution instruction to the determined task execution node;
[0034] Multiple task execution nodes are used to execute batch running tasks according to the batch running task execution instructions sent by the scheduling module.
[0035] The batch processing device based on the distributed architecture provided by the embodiment of the present invention improves the reliability, system performance and ease of maintenance of the batch processing.
[0036] Combine below figure 2 Each step involved in the embodiment of the present invention is introduced in detail.
[0037] 1. First, introduce the steps to transform the batch system.
[0038] Introduce the Dubbo governance framework to build task node clusters. Transform batch tasks into RPC services and publish them externally. When transforming, the inventor overcomes the technical difficulties:
[0039] 1. Transform the batch tasks in the stock stand-alone mode into RPC services under the Dubbo framework. The workload of this work is proportional to the size of the existing batch tasks, but the transformation from a stand-alone system to a distributed system is a "cross-generation" change of the system.
[0040] 2. On the premise of retaining the original main frame, how to integrate the Dubbo frame into the existing frame as an "auxiliary component" is a major difficulty in the transformation process. Dubbo components are generally used as the core main framework of distributed software projects. In this patent, in order to reduce the cost of system transformation, the inventor creatively retained the main architecture of the original batch system, "broken" the original architecture, and then inserted Dubbo into the original framework as an "auxiliary" subframe.
[0041] The Chinese name of RPC is: Remote Procedure Call (Remote Procedure Call). It is based on the Remote Procedure Call Protocol (Remote Procedure Call Protocol). A complete RPC framework, including service discovery, load, fault tolerance, network transmission and other mechanism modules. The Dubbo framework that this patent is based on is one of the best examples of RPC in the industry.
[0042] Second, secondly, introduce the running batch processing scheme based on the distributed architecture.
[0043] 1. First, introduce the pre-configured execution strategy of batch running tasks and the scheme of triggering batch running tasks.
[0044] In one embodiment, the execution policy of running batch tasks includes: the time, period, range of batch execution and batch trigger mode of batch execution of tasks.
[0045] During specific implementation, the configuration of the batch task execution strategy (running batch task execution strategy) can be completed in the task configuration management center, and the configuration content can include batch execution time and period, batch execution range, batch trigger mode, etc. All tasks are generally running tasks periodically. When the preset time point is reached, the task configuration management center will trigger the task. After the task is triggered, the task scheduling module completes the task scheduling.
[0046] In the specific implementation, the execution strategy belongs to the business category. That is, the setting of the corresponding execution plan is completed according to the business requirement. For example: batch execution cycle plan: whether it is annual or monthly; batch supplementary running plan: whether to automatically supplementary running after execution failure and when; batch execution data range setting: data analysis range control.
[0047] 2. Next, introduce the status monitoring of task execution nodes, further improve the accuracy of running batch processing, load balance to improve resource utilization, and reduce costs.
[0048] The inventor discovered the technical problems existing in the existing stand-alone deployment:
[0049] (1) There is a waste of resources and the utilization rate is low. Under the stand-alone deployment scheme, the tasks configured under each task node are different, and the hardware performance loss is also different, and no-load or overload may occur. That is, under the stand-alone system, each task is fixed and deployed on only one task node. With the evolution and development of the business, the performance loss of the tasks on the task nodes will increase or decrease, which will eventually lead to the performance overload of some task nodes and the idleness of some performance resources.
[0050] (2) The cost is high. Due to the low resource utilization of the stand-alone deployment architecture mentioned in (1) above, the cost of the stand-alone deployment architecture is relatively high under the same processing power.
[0051] In one embodiment, the above-mentioned distributed architecture-based batch processing device may further include: a first monitoring module, configured to monitor the status of multiple task execution nodes;
[0052] The scheduling module is specifically used for: according to the batch running task execution request and the states of multiple task execution nodes, determine the task execution node that executes the batch running task execution request from multiple task execution nodes, and send the The above-mentioned running batch task execution instructions are sent to the determined task execution nodes.
[0053] In one embodiment, the state of the task execution node includes: a health state of the task execution node and a pressure load state of the task execution node.
[0054] During specific implementation, the status of multiple task execution nodes is monitored by the first monitoring module, and the status is fed back to the scheduling module. Therefore, when scheduling, the scheduling module can consider the status of multiple task execution nodes and dynamically select the most suitable node Execute tasks to achieve load balancing of the task cluster, improve the overall resource utilization of the cluster, and further improve the accuracy of running batches. Since the overload or idle performance of a certain task node is avoided, server resources are fully utilized, and the overall cost is reduced. The following is an example of a load balancing solution and a solution to further improve the accuracy of running batches.
[0055] (1) The health state of the task execution node may include: normal state, fault abnormal state or performance degradation state, etc. For example, if the first monitoring module monitors the state of a certain task execution node to fail, then the first monitoring module will feed back the abnormal state of the fault to the scheduling module, and the scheduling module will not assign the task to the task execution node, so , the scheduling module can sense task node failures, avoid task execution failures caused by faulty nodes, and improve the accuracy of running batch processing. Another example: if the first monitoring module monitors that the status of a certain task execution node is not good and the performance is degraded, then the first monitoring module will feed back the performance degradation status to the scheduling module, and the scheduling module will allocate less to the task execution node. Tasks to achieve reasonable scheduling.
[0056] (2) The pressure load state of the task execution node may include the overload or idleness of the task execution node, that is, the degree of busyness or idleness. For example, if the first monitoring module monitors that the state of a certain task execution node is busy, then the first monitoring module will feed back the busy state to the scheduling module, and the scheduling module will assign a small amount of tasks to the task execution node, on the contrary The same is true. Therefore, the load balancing of the task cluster is realized, the overall resource utilization of the cluster is improved, and the cost is reduced.
[0057] 3. Next, introduce another optimization scheme.
[0058] In one embodiment, the above distributed architecture-based batch processing device may further include: a second monitoring module, configured to monitor execution results of multiple task execution nodes, and feed back the execution results to operation and maintenance personnel.
[0059] During specific implementation, the task execution node receives the command (run batch task execution instruction), completes the task execution, and records the work log to return to the task execution status. Enter the waiting state, waiting for the transmission of the next scheduled execution command. The second monitoring module periodically scans the execution results of each task, analyzes and displays the results, and pushes and displays the execution results in quasi-real time. So that operation and maintenance personnel can control the overall batch operation status and respond to exceptions in a timely manner.
[0060] During specific implementation, the task triggering module may be a task timing server. The scheduling module can be a scheduling server. The same is true for the task cluster nodes in the task cluster figure 2 Node 1, Node 2 ... Node n are shown.
[0061] Based on the same inventive concept, an embodiment of the present invention also provides a batch processing device based on a distributed architecture, as described in the following embodiments. Since the problem-solving principle of the batch processing device based on the distributed architecture is similar to the batch processing method based on the distributed architecture, the implementation of the batch processing device based on the distributed architecture can be found in the batch processing method based on the distributed architecture. implementation, the repetition will not be repeated. As used below, the term "unit" or "module" may be a combination of software and/or hardware that realizes a predetermined function. Although the devices described in the following embodiments are preferably implemented in software, implementations in hardware, or a combination of software and hardware are also possible and contemplated.
[0062] image 3 It is a schematic flow diagram of the batch processing method based on the distributed architecture in the embodiment of the present invention, such as image 3 As shown, the method includes:
[0063] Step 101: The task trigger module initiates a batch running task execution request to the scheduling module according to the pre-configured batch running task execution strategy;
[0064] Step 102: The scheduling module determines the task execution node that executes the batch task execution request from multiple task execution nodes according to the batch execution request, and sends the batch task execution instruction to the determined task execution node;
[0065] Step 103: Multiple task execution nodes execute batch running tasks according to the batch running task execution instructions sent by the scheduling module
[0066] In one embodiment, the above-mentioned method for running batch processing based on a distributed architecture may further include: the first monitoring module monitors the status of multiple task execution nodes;
[0067] The scheduling module determines a task execution node that executes the batch execution request from multiple task execution nodes according to the batch execution request, and sends the batch execution instruction to the determined task execution node, include:
[0068] The scheduling module determines the task execution node that executes the batch task execution request from the multiple task execution nodes according to the execution request of the batch running task and the states of the multiple task execution nodes, and executes the batch task execution instruction Send to the determined task execution node.
[0069] In one embodiment, the state of the task execution node includes: a health state of the task execution node and a pressure load state of the task execution node.
[0070] In one embodiment, the above-mentioned method for running batch processing based on a distributed architecture may further include: a second monitoring module monitors execution results of multiple task execution nodes, and feeds back the execution results to operation and maintenance personnel.
[0071] In one embodiment, the execution policy of running batch tasks includes: the time, period, range of batch execution and batch trigger mode of batch execution of tasks.
[0072] An embodiment of the present invention also provides a computer device, including a memory, a processor, and a computer program stored on the memory and operable on the processor. When the processor executes the computer program, the above-mentioned distributed architecture-based Run the batch method.
[0073] An embodiment of the present invention also provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program for executing the above method for running batch processing based on a distributed architecture.
[0074] The beneficial technical effect of the technical solution provided by the embodiment of the present invention is: the reliability of running batch processing, the system performance and the ease of maintenance are improved. Improve resource utilization and reduce costs.
[0075] Those skilled in the art should understand that the embodiments of the present invention may be provided as methods, systems, or computer program products. Accordingly, the present invention can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
[0076] The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and a combination of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing equipment produce a in real process Figure 1 process or multiple processes and/or boxes Figure 1 means for the function specified in one or more boxes.
[0077] These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions The device is implemented in the process Figure 1 process or multiple processes and/or boxes Figure 1 function specified in one or more boxes.
[0078] These computer program instructions can also be loaded onto a computer or other programmable data processing device, causing a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process, thereby instructions are provided for implementing the flow in Figure 1 process or multiple processes and/or boxes Figure 1 steps of the function specified in the box or boxes.
[0079]The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. For those skilled in the art, various modifications and changes may be made to the embodiments of the present invention. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.