Fault tolerance optimizing method of intermediate data in cloud computing environment

A cloud computing environment, intermediate data technology, applied in data exchange networks, digital transmission systems, electrical components, etc., can solve problems such as reducing the overall performance of the system, affecting the completion time of tasks, occupying network resources, etc., to solve the problem of network resource contention. use, ensure low interference, and the effect of high backup overhead

Inactive Publication Date: 2011-06-22
2 Cites 29 Cited by

AI-Extracted Technical Summary

Problems solved by technology

While improving the fault tolerance of intermediate data, this method will occupy network resou...
View more


The invention relates to a fault tolerance optimizing method of intermediate data in a cloud computing environment. The method comprises the following four steps of: 1. collecting a network load, task execution progress and position information; 2. judging and analyzing task input data and mainly judging a data generating position; 3. classifying working modes according to the current environmentand adjusting recourse allocation according to a control parameter; and 4. feeding back a backup condition and updating data backup information. In the invention, the frequency of node failure in thecloud computing environment is taken into consideration at first, the intermediate data is replicated according to the task finishing time requirement and the resource use condition, and real-time monitoring and flexible management are carried out on the network load and task execution progress so as to ensure that the replication of the intermediate data is realized for the system under the condition that performance is free from influence. The fault tolerance optimizing method provided by the invention has wide practical value and application prospect in the field of cloud computing data management.

Application Domain

Data switching networks

Technology Topic

Cloud computingControl parameters +4


  • Fault tolerance optimizing method of intermediate data in cloud computing environment
  • Fault tolerance optimizing method of intermediate data in cloud computing environment
  • Fault tolerance optimizing method of intermediate data in cloud computing environment


  • Experimental program(1)

Example Embodiment

[0018] In order to make the objectives, technical solutions and advantages of the present invention more clearly expressed, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.
[0019] The main idea of ​​the present invention is to dynamically adjust the replication of intermediate data according to the network load, task execution progress and resource usage status during task execution in the cloud computing environment, and to perform real-time control of the intermediate data replication by combining statistical network load and task execution status, To ensure that the intermediate data is replicated without interfering with the execution of the foreground task.
[0020] First classify the data to reduce replication overhead. The data generated locally will be automatically deleted after use in the next stage, while the data generated by other nodes will be automatically saved after different nodes are used. The collected information includes: network load information, with a period of 1800 seconds, analyze the load situation in each time period (200 seconds); collect task execution, such as the complete time of execution, as one of the metrics for intermediate data replication; Collect intermediate data backup copy time as a reference for copy time of data with the same parameters. It is necessary to collect the above information in real time and update the data in time to ensure low interference and efficiency of intermediate data replication.
[0021] An example is described below, such as figure 2 As shown, the following steps are included.
[0022] Step 201: Real-time statistics of network load information, observing the round-trip delay and packet loss of the network in each time period (200 seconds), and predicting future network conditions based on this. It also monitors the task execution progress in real time, collects the TaskID of the task set, the location of the task execution node, the task executed time, and the task progress score, and predicts the task completion time based on the task progress score. The prediction algorithm is as follows:
[0023] .
[0024] Step 202: While waiting for the input data transmission process, real-time statistics of the input data position of the task. The computing node sends the query command to the master node in the form of heartbeat information to query the generation location of the input data or the computing node queries the generation location of the input data through the current execution thread, and then determines the source of the data after knowing the information, such as input data From the local node, the data will be automatically deleted after the task is executed, so the data needs to be backed up; and the data from other nodes will be saved.
[0025] Step 203: When the data is ready to be replicated, first make a judgment based on the current network load information. If the network load is high, then suspend replication and wait for the network idle time; if the network load is low, enter the stage of preparing for replication. During the replication process, the network load is still being monitored. When the network load increases, the replication speed is reduced or the replication is suspended and the network load is monitored. If the network load decreases, the replication speed is increased or the replication is resumed. Before the replication starts, first judge the replication cost, take the collected task completion time and the intermediate data backup time with the same parameters as a reference, and compare the task re-execution and data replication overhead. If the replication overhead is high, the replication will be abandoned; if the replication overhead is small , The time is judged according to the execution of the monitoring task and the prediction formula. If the copy time is less than the task completion time, the copy will start, otherwise the copy will be abandoned. That is, predict the replication status, save system resource overhead, and improve replication efficiency.
[0026] Step 204: After the replication is completed, the computing node feeds back information such as the data storage location and size to the master control node. The master control node updates the stored data information after receiving the information.
[0027] In this example, when viewing parameters such as network load and task execution progress and performing corresponding parameter updates, it is executed cyclically according to the task execution cycle during the data copy process. Using the foregoing intermediate data replication method can achieve as high a replication efficiency as possible with low interference to the foreground tasks, so as to meet the requirements of fault tolerance.
[0028] Finally, it should be noted that the above embodiments are only used to illustrate rather than limit the technical solutions of the present invention. Although the present invention has been described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that the present invention can still be modified Or equivalent replacements, any modifications or partial replacements that do not depart from the spirit and scope of the present invention shall be covered by the scope of the claims of the present invention.


no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.

Similar technology patents

Network acceleration realization method in LINUX

ActiveCN102394815AEfficient data forwarding processingImprove system performance

LTE-A non-codebook beam forming method based on user satisfaction

ActiveCN103595455AImprove user fairnessImprove system performance

Classification and recommendation of technical efficacy words

  • Improve system performance

Sensor exploration and management through adaptive sensing framework

InactiveUS20080243439A1improve system performanceoptimal performance

System and method for distributed asynchronous task queue execution in cloud environment

ActiveCN103780635AImprove task execution efficiencyImprove system performance

Method, device and system for searching repeated data

InactiveCN102495894AReduce link overheadImprove system performance

Method, device and system of data migration of distributed type storage system

ActiveCN103067433AReal-time smooth linear expansionImprove system performance
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products