Application fault injection method and apparatus, and computing device cluster

By injecting faults into the intrinsic fault module of the target application through a fault injection controller, the problem of a large fault impact range is solved, the convenience and security of fault injection are achieved, the node deployment requirements are reduced, and the use cases are expanded.

WO2026123727A1PCT designated stage Publication Date: 2026-06-18HUAWEI CLOUD COMPUTING TECHNOLOGIES CO LTD

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
HUAWEI CLOUD COMPUTING TECHNOLOGIES CO LTD
Filing Date
2025-08-06
Publication Date
2026-06-18

AI Technical Summary

Technical Problem

In existing technologies, fault injection proxies directly inject faults into nodes, which have a wide impact range and cause abnormal business processing. In addition, it is necessary to install proxies on nodes, which increases workload and deployment requirements.

Method used

Fault indication information is generated by the fault injection controller, and faults are directly injected into the intrinsic fault module of the target application to control the scope of fault impact, avoid installing agents on nodes, and realize fault injection by utilizing the fault mode relation library and the intrinsic fault module.

🎯Benefits of technology

It achieves controllable fault impact range, improves the convenience and security of fault injection, reduces deployment requirements for nodes, and expands application scenarios.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN2025112965_18062026_PF_FP_ABST
    Figure CN2025112965_18062026_PF_FP_ABST
Patent Text Reader

Abstract

Disclosed are an application fault injection method and apparatus, and a computing device cluster, relating to the technical field of computers. Upon receiving a fault injection request of a user, an endogenous fault module of a target application directly injects a fault into a service module of the target application. The fault injection scope is limited to the service module of the target application, and the fault impact scope is limited to a service of the target application. The fault injection scope is controllable, thereby ensuring that the fault impact scope is controllable, and further providing a guarantee for normal processing of the service. In addition, the endogenous fault module of the target application directly injects the fault, and fault injection can be implemented without the need to install a fault injection agent on a node on which the target application is deployed, thereby improving the convenience of use, reducing the requirements for the node on which the target application is deployed, and expanding the use scenarios.
Need to check novelty before this filing date? Find Prior Art

Description

A method, apparatus, and computing device cluster for applying fault injection

[0001] This application claims priority to Chinese Patent Application No. 202411825622.8, filed on December 10, 2024, entitled "An Application of Fault Injection Method, Apparatus and Computing Device Cluster", the entire contents of which are incorporated herein by reference. Technical Field

[0002] This application relates to the field of computer technology, and in particular to an application fault injection method, apparatus and computing device cluster. Background Technology

[0003] Fault injection is a reliability verification technique that assesses system reliability by injecting faults into the system and observing the system's behavior after the fault injection. Typically, computing devices can install fault injection agents on each node in the system, and inject faults into the nodes with the fault injection agents installed through these agents, then observe the system's performance after the fault injection to evaluate system reliability.

[0004] In the above process, the computing device uses the fault injection agent installed on the node to inject faults into the node, which may affect all applications deployed on the node. The scope of the fault impact is large and affects the normal processing of business. Summary of the Invention

[0005] This application provides an application fault injection method, apparatus, and computing device cluster to solve the problem that directly injecting faults into nodes with fault injection agents installed results in a large scope of fault impact and affects the normal processing of business operations.

[0006] In a first aspect, this application provides an application fault injection method. The method provided in the first aspect is applied to a fault injection controller, which runs on an infrastructure. At least one application of a user is deployed on this infrastructure, and each of the at least one application includes a business module and an intrinsic fault module. The method provided in the first aspect includes obtaining a fault injection request input by the user. The fault injection request includes target application information and fault mode information. The target application information indicates the target application for which a fault is injected. The target application belongs to at least one application. The fault mode information indicates the fault to be injected into the target application. Based on the fault injection request, fault indication information is generated. The fault indication information instructs the intrinsic fault module of the target application to inject a fault into the business module of the target application. The fault indication information is then sent to the target application.

[0007] The computing device injects a fault into a node via a fault injection agent, causing the fault to affect all applications on that node. In contrast, the first aspect of this application generates fault indication information based on the user's fault injection request after receiving the request. This allows the target application's inherent fault module to directly inject the fault into the target application's business module based on the fault indication information. The fault injection scope is limited to the target application's business module, and the impact of the fault is limited to the target application's business operations. The controllable fault injection scope ensures a controllable impact range, thereby guaranteeing normal business processing. Furthermore, the direct injection of faults by the target application's inherent fault module eliminates the need to install a fault injection agent on the node where the target application is deployed, improving ease of use, reducing requirements on the node where the target application is deployed, and expanding application scenarios.

[0008] In one possible implementation, the infrastructure stores a fault mode relation library. This library indicates the correspondence between the faults and application-native faults. Based on the fault injection request, fault indication information is generated, including: converting the fault into an application-native fault based on the fault mode relation library; and generating fault indication information based on target application information and application-native faults. In this way, users can inject faults into the target application using existing methods without changing the way users inject faults, improving the user-friendliness and ease of use of fault injection.

[0009] In another possible implementation, the application fault injection method further includes: obtaining application-native fault library configuration information input by the user, and generating the application-native fault library based on the configuration information. In this way, users can build application-native fault libraries according to the needs of actual applications, expanding the use cases.

[0010] In another possible implementation, the infrastructure includes multiple nodes. When the target application is deployed on a target node among these nodes, fault indication information is sent to the target application, including sending fault indication information to the target node. Thus, when the target application is deployed on a target node, sending fault indication information to the target node is sufficient to inject faults into the target application, making the operation simple.

[0011] In another possible implementation, the infrastructure comprises multiple nodes, and the target applications include a first target application and a second target application. With the first target application deployed on the first node and the second target application deployed on the second node, fault indication information is sent to the target applications, including sending fault indication information to both the first and second nodes. Thus, a single operation can inject faults into both target applications deployed on two nodes, simplifying the process and increasing fault injection efficiency.

[0012] In another possible implementation, before generating fault indication information based on the fault injection request, the method further includes providing a configuration interface. The configuration interface includes an application-inherent fault configuration area. In response to the user's first action on the configuration interface, the method retrieves the application-inherent fault from the application-inherent fault configuration area. In this way, users can configure application-inherent faults according to the actual application needs, expanding the use cases.

[0013] In another possible implementation, the method further includes: in response to a second user action on the configuration interface, establishing a mapping between application-native faults and general faults; and adding the mapping between application-native faults and general faults to a fault mode relation library. This allows users to set the mapping between application-native faults and general faults according to the needs of their actual application, improving flexibility.

[0014] In another possible implementation, obtaining the fault injection request from user input includes: obtaining the user's permission information. If the permission information meets the permission conditions, the fault injection request is obtained. In this way, only authorized users can inject faults into the target application, improving security.

[0015] Secondly, this application provides an application fault injection apparatus. The apparatus includes modules for performing the methods described in the first aspect or any possible design of the first aspect.

[0016] Thirdly, this application provides a processor. The processor is used to perform operational steps of the method described in the first aspect or any possible design of the first aspect.

[0017] Fourthly, this application provides a computing device cluster. The computing device cluster includes at least one computing device, each computing device including a processor and a memory. The processor of the at least one computing device is configured to execute instructions stored in the at least one memory, causing the computing device cluster to perform operational steps of the method described in the first aspect or any possible design of the first aspect.

[0018] Fifthly, this application provides a computer-readable storage medium, comprising: computer software instructions; which, when executed in a cluster of computing devices, cause the cluster of computing devices to perform operational steps of the method as described in the first aspect or any possible implementation thereof.

[0019] Sixthly, this application provides a computer program product. When the computer program product is run on a computer cluster, it causes the computing device cluster to perform the operational steps of the method as described in the first aspect or any possible implementation of the first aspect.

[0020] The beneficial effects of aspects two through six above can be described with reference to the first aspect or any implementation thereof, and will not be repeated here. Based on the implementations provided in the above aspects, this application can be further combined to provide more implementations. Attached Figure Description

[0021] Figure 1 is a schematic diagram of an infrastructure architecture provided in this application;

[0022] Figure 2 is a flowchart illustrating an application of the fault injection method provided in this application;

[0023] Figure 3 is an example diagram of a fault mode relation library provided in this application;

[0024] Figure 4A is an example diagram of the first type of fault injection provided in this application;

[0025] Figure 4B is an example diagram of the second type of fault injection provided in this application;

[0026] Figure 4C is an example diagram of the third type of fault injection provided in this application;

[0027] Figure 4D is an example diagram of the fourth type of fault injection provided in this application;

[0028] Figure 5 is an example diagram of a node architecture provided in this application;

[0029] Figure 6 is a flowchart illustrating a method for establishing a fault mode relation library provided in this application;

[0030] Figure 7 is an example diagram of an application of the fault injection method provided in this application;

[0031] Figure 8 is a structural schematic diagram of an application fault injection device provided in this application;

[0032] Figure 9 is a schematic diagram of the structure of a computing device provided in this application;

[0033] Figure 10 is a schematic diagram of the structure of a chip provided in this application;

[0034] Figure 11 is a schematic diagram of the structure of a computing device cluster provided in this application;

[0035] Figure 12 is a schematic diagram of the connection between computing devices provided in this application. Detailed Implementation

[0036] This application provides an application fault injection method. In this method, upon receiving a fault injection request from a user, the inherent fault module of the target application directly injects a fault into the target application's business module. The scope of fault injection is the target application's business module, and the scope of fault impact is the target application's business operations. The controllable fault injection scope ensures the controllability of the fault impact, thereby guaranteeing the normal processing of business operations. Furthermore, the direct injection of faults by the target application's inherent fault module eliminates the need to install a fault injection agent on the node where the target application is deployed, improving ease of use, reducing requirements on the node where the target application is deployed, and expanding application scenarios.

[0037] To ensure clarity and brevity in the description of the following embodiments, some concepts that may be involved in this application will be briefly introduced first.

[0038] (1) Reliability.

[0039] Reliability refers to a product's ability to perform its intended function under specified conditions and within a specified time. Reliability tests can be used to verify the reliability of a product.

[0040] (2) Distributed system.

[0041] A distributed system is a software system built on a network. One or more applications can be deployed on a distributed system. Typically, fault diagnosis and chaos engineering are used to assess the reliability of one or more applications deployed on a distributed system, thereby evaluating the overall reliability of the distributed system. Specifically, faults can be actively injected into the applications of a distributed system, and the system's resilience after the fault injection can be observed to assess its ability to withstand faults, and thus, its reliability.

[0042] (3) Fault.

[0043] A fault is an event that prevents a device or system (such as a distributed system) from performing its intended function.

[0044] (4) Fault injection.

[0045] Fault injection refers to the deliberate introduction of faults into a device or system (such as a distributed system) through controlled experiments and the observation of the device or system's behavior after the fault is introduced. Fault injection can be used to evaluate the reliability of a device or system. Fault injection can include hardware-based fault injection, software-based fault injection, and simulation-based fault injection. Software-based fault injection refers to causing hardware-level failures by generating errors at the software level, such as modifying memory data.

[0046] Having briefly introduced some concepts that may be involved in this application above, the following is a brief introduction to the relevant technologies in conjunction with the accompanying drawings.

[0047] Fault injection is a reliability verification technique. Computing devices can use fault injection to verify the reliability of a system. Specifically, the computing device injects a fault into the system and observes the system after the fault injection to detect the system's reliability. Typically, computing devices can inject faults into the system using the following methods (1) to (3) to verify the system's reliability.

[0048] (1) The computing device installs a fault injection agent (such as a fault injection agent or probe) on each node in the system.

[0049] (2) The computing device sends a fault injection request to the nodes in the system.

[0050] (3) The node receives the fault injection request and injects the fault into the node using the fault injection agent.

[0051] In the above process, the fault injection agent directly injects faults into the node, which may affect all applications deployed on that node, resulting in a wide impact. Furthermore, the computing device needs to install the fault injection agent on the system's nodes, which is intrusive to the nodes, and a separate fault injection agent needs to be developed to inject faults into the nodes, resulting in a large workload.

[0052] Based on this, this application provides an application fault injection method. This fault injection method can be applied to the infrastructure shown in Figure 1. In some possible scenarios, the infrastructure can also constitute a distributed system. For example, if the infrastructure includes multiple nodes that communicate with each other, these nodes deploy applications, and these nodes collaboratively execute the business logic corresponding to the applications, then the infrastructure can be considered a distributed system. Figure 1 is a schematic diagram of the architecture of an infrastructure provided by this application. As shown in Figure 1, a fault injection controller 110 runs on the infrastructure 100, and at least one user application is also deployed on the infrastructure 100. Each application in the at least one application includes a business module and an intrinsic fault module. The infrastructure 100 may include one or more nodes, which can be computing devices, such as personal computers, desktop computers, mobile phones, servers, etc. For more details on computing devices, please refer to the relevant description in Figure 9 below, which will not be repeated here.

[0053] The fault injection controller 110 is used to: acquire a fault injection request input by a user, generate fault indication information based on the fault injection request, and send the fault indication information to the target application. The target application belongs to at least one application. The fault injection controller 110 can be a device with a physical entity, such as a node in infrastructure 100. The fault injection controller 110 can also be a device without a physical entity, such as a container (Docker), virtual machine (VM), etc., running on a node of infrastructure 100. The target application can include one or more applications deployed on infrastructure 100, as described below in different scenarios.

[0054] In the first alternative scenario, the target application includes one of at least one applications deployed on infrastructure 100.

[0055] For example, one application is deployed on one node of infrastructure 100. If the target application is a second application, the second application is deployed on node 1.

[0056] For example, the application is deployed on multiple nodes of infrastructure 100. If the target application is a first application, the first application is deployed on nodes 1 to 3.

[0057] In the second alternative scenario, the target application includes multiple applications from at least one application deployed on infrastructure 100.

[0058] For example, the multiple applications are deployed on one node of infrastructure 100. If the target applications are a second application and a fourth application, the second application and the fourth application are deployed on node 1.

[0059] For example, the multiple applications are deployed on multiple nodes of infrastructure 100. If the target applications are application 1 to application 3, application 1 is deployed on node 1 to application 3, application 2 is deployed on node 1, and application 3 is deployed on node 2 and application 3.

[0060] The infrastructure for the application of the fault injection method has been described above with reference to Figure 1. The application fault injection method provided in this application will be described in detail below with reference to the contents shown in Figures 2 to 7.

[0061] Figure 2 is a flowchart illustrating an application fault injection method provided in this application. As shown in Figure 2, this method is applied to a fault injection controller. The fault injection controller runs on an infrastructure on which at least one user application is deployed. For more details about the infrastructure, please refer to the relevant description in Figure 1 above, which will not be repeated here. As shown in Figure 2, the application fault injection method provided in this application may include the following steps S210 to S240.

[0062] S210, the fault injection controller obtains the fault injection request input by the user.

[0063] The fault injection request includes target application information and fault mode information. The target application information indicates the target application for which the fault is injected. The target application belongs to at least one application. The fault mode information indicates the fault to be injected into the target application. The fault can refer to a failure affecting the device or software system supporting the application, such as infrastructure failures, operating system failures, network communication failures, container failures, etc. Faults can include, but are not limited to: processor (central processing unit, CPU) failures, memory (MEM) failures, disk failures, network (net) failures, etc.

[0064] The target application information may include one or more of the following: an application identifier indicating the application, and a node identifier indicating the node. The application mentioned above is at least one application deployed on the infrastructure, and the node mentioned above is a node included in the infrastructure. Depending on the content included in the target application, the target application indicated by the target application information will differ, as explained below.

[0065] In the first possible scenario, the target application information includes the application identifier.

[0066] The application identifier can be the identifier of a single application or the identifiers of multiple applications. For example, if the target application information includes the identifier of a single application (such as the first application), the target application information is used to indicate that single application. Similarly, if the target application includes the identifiers of multiple applications (such as the first application and the second application), the target application information is used to indicate all of those applications.

[0067] In the second possible scenario, the target application information includes node identifiers.

[0068] The node identifier can be the identifier of a single node or the identifiers of multiple nodes. For example, if the target application information includes the identifier of a single node, the target application information is used to indicate all applications deployed on that single node. If the target application includes the identifier of node 1 of the infrastructure, in this case, the target application information is used to indicate all applications deployed on node 1. Similarly, if the target application information includes the identifiers of multiple nodes, the target application information is used to indicate all applications deployed on each of those multiple nodes. If the target application includes the identifiers of node 1 and node 2 of the infrastructure, in this case, the target application information is used to indicate all applications deployed on node 1 and all applications deployed on node 2.

[0069] In the third possible scenario, the target application information includes the application identifier and the node identifier.

[0070] In this scenario, the target application information is used to instruct the injection of faults into all applications deployed on the application indicated by the application identifier and the node indicated by the node identifier. For example, the target application information includes the application identifier of a third application and the node identifier of node 1. The third application is deployed on nodes 2 and 3. The first, second, and fourth applications are deployed on node 1. In this scenario, the target applications indicated by the target application information are applications one through three.

[0071] The fault type varies depending on the content included in the target application information. If the target application information includes an application identifier, the fault type is application-level fault. If the target application information includes the node identifier of a single node, the fault type is node-level fault. If the target application information includes the node identifiers of multiple nodes, the fault type is cluster-level fault. Different fault types correspond to different fault contents; Table 1 provides an example of one type of fault content.

[0072] Table 1 Examples of Fault Contents

[0073] Table 1 provides examples of fault content corresponding to the three types of faults. Depending on the needs of the actual application scenario, more types of faults and more fault content can be defined, which is not limited in this application.

[0074] The above example illustrates the relevant content of a fault injection request, using a user sending a single fault injection request to the fault injection controller. In some possible scenarios, to simplify user operations, the fault injection controller supports users sending a single request carrying target application information, fault mode information, and pending business information. The fault injection controller receives requests carrying these three types of information and parses them to obtain the target application information and fault mode information. When a user sends a single request carrying these three types of information, the fault injection controller also supports the user including a fault injection identifier in the request. This fault injection identifier is used to instruct the inherent fault module of the target application to inject a fault into the target's intended business module. Fault injection identifiers include, but are not limited to, control commands, etc. Control commands include, but are not limited to, command words. Command words can be preset keywords or keywords customized by the user according to the actual application needs; this application does not limit this. Command words can be added to the request header, request body, or the Uniform Resource Identifier (URI) carried in the request; this application does not limit the method of carrying the fault injection identifier in the request.

[0075] The fault injection controller can support users to send fault injection requests to the fault injection controller in a variety of ways, including but not limited to: console interface, open application programming interface (API), command-line interface (CLI), etc., which are not limited in this application.

[0076] In some possible scenarios, to improve the security of fault injection and prevent unauthorized operators from injecting faults into at least one application of the infrastructure, the fault injection controller can detect the permissions of the user entering the fault injection request and support authorized users to send fault injection requests to the fault injection controller. Specifically, the fault injection controller can perform the following ① and ② to ensure that only authorized users can inject faults.

[0077] ① The fault injection controller obtains the permission information of the user who sent the fault injection request.

[0078] In this scenario, the fault injection controller can provide an access control interface. This interface may include a username input area and a password input area. The fault injection controller can retrieve the username entered by the user from the username input area and the password entered by the user from the password input area. Based on the username and password, the fault injection controller can then generate the user's access control information.

[0079] ② If the permission information meets the permission conditions, the fault injection controller obtains the fault injection request.

[0080] Permission conditions may include, but are not limited to, a permission list. The permission list includes multiple usernames that support fault injection and their corresponding passwords. If the fault injection controller can find the username entered by the user in the permission list, and the password entered by the user matches the password corresponding to that username in the permission list, then the permission information is considered to satisfy the permission conditions.

[0081] The above example, using username and password as the permission information and a permission list as the permission condition, illustrates how to enable authorized users to send fault injection requests to the fault injection controller. Depending on the needs of the actual application, other permission information and conditions can be set, which are not limited in this application.

[0082] The preceding text explained the relevant content regarding fault injection requests. As can be seen, the fault types differ depending on the content of the target application information. The fault injection controller can convert faults into faults that support the application's intrinsic fault module for injection into business modules (i.e., application-intrinsic faults), enabling the target application's intrinsic fault module to inject the fault into the target application's business modules. The process of converting faults into application-intrinsic faults is explained below in conjunction with S220.

[0083] S220, the fault injection controller generates fault indication information based on the fault injection request.

[0084] Among them, the fault indication information is used to instruct the inherent fault module of the target application to inject faults into the business module of the target application.

[0085] In some possible scenarios, the infrastructure stores a fault mode relationship database. This database indicates the relationship between faults and application-inherent faults. The fault injection controller can generate fault indication information by performing the following operations: The fault injection controller converts faults into application-inherent faults based on the fault mode relationship database; and the fault injection controller generates fault indication information based on target application information and application-inherent faults.

[0086] The fault mode relational database can include the correspondence between multiple faults and multiple application-inherent faults. A fault belongs to multiple faults, and an application-inherent fault belongs to multiple application-inherent faults. An application-inherent fault can refer to a fault acting on an application, including but not limited to: application unresponsiveness, application response delay, application response anomalies, etc. For example, if the application-inherent fault is application unresponsiveness, and the target application is the first application, if the application-inherent fault module of the first application injects the application-inherent fault into the application-inherent fault's business module, the first application will exhibit unresponsiveness when receiving a business request.

[0087] Figure 3 is an example diagram of a fault mode relation library provided in this application. As shown in Figure 3, the fault mode relation library includes the correspondence between multiple faults (such as application-level faults, node-level faults, and cluster-level faults) and multiple application-inherent faults (such as application-inherent fault 1 to application-inherent fault n). These multiple faults can be stored in a fault library, and these multiple application-inherent faults can be stored in an application-inherent fault library. The fault library, application-inherent fault library, and fault mode relation library can be stored in the same location (such as the fault injection controller) or in different locations (such as the fault injection controller storing the fault mode relation library, and other nodes of the infrastructure storing the fault library and the application-inherent fault library). This application does not limit the storage method of the fault library, application-inherent fault library, and fault mode relation library. The fault mode relation library can be preset or set by the user according to actual needs; this application does not limit this. When the fault mode relation library is set by the user according to actual needs, the process shown in Figure 6 can be used to construct the fault mode relation library. Please refer to the following text for relevant descriptions; they will not be repeated here.

[0088] In some possible scenarios, the application-native fault library can be pre-defined or created by the user according to the actual application needs; this application does not limit this. When the application-native fault library is created by the user according to the actual application needs, the fault injection controller can perform the following operations to create the application-native fault library. Specifically, before the fault injection controller obtains the fault injection request input by the user, the fault injection controller obtains the application-native fault configuration information input by the user, and generates the application-native fault library based on the application-native fault library configuration information. The application-native fault library configuration information may include information used to create the application-native fault library, such as the database type of the application-native fault library, the storage location of the application-native fault library, etc.

[0089] S230, the fault injection controller sends fault indication information to the target application.

[0090] In this scenario, the fault injection controller sends fault indication information to nodes in the infrastructure that match the target application.

[0091] In some possible scenarios, the target application can be deployed on target nodes across multiple nodes. In this case, the fault injection controller sends fault indication information to the target nodes. There can be one or more target applications; this application does not limit this. Similarly, there can be one or more target nodes; this application does not limit this. For example, the target applications may include a first target application and a second target application. The first target application is deployed on the first node among multiple nodes, and the second target application is deployed on the second node among multiple nodes. In this case, the fault injection controller sends fault indication information to both the first and second nodes. That is, depending on the location where the target application is deployed, the node matching the target application will also differ. The target application can be deployed on one node of the infrastructure or on multiple nodes of the infrastructure; this will be explained in detail below.

[0092] In the first possible scenario, the target application is a single entity, deployed on a single node of the infrastructure.

[0093] In this scenario, both the target application and the target node are one, and the fault injection controller sends a fault indication message to this single node. If the target application is a second application deployed on node 1 of the infrastructure, the fault injection controller also sends a fault indication message to node 1.

[0094] In the second possible scenario, there is one target application, which is deployed on multiple nodes of the infrastructure.

[0095] In this scenario, there is one target application, more than one target node, and the fault injection controller sends fault indication information to these multiple nodes. For example, if the target application is the first application, and the first application is deployed on nodes 1 through 3 of the infrastructure, the fault injection controller sends fault indication information to nodes 1 through 3.

[0096] In the third possible scenario, there are multiple target applications, and they are deployed on a single node of the infrastructure.

[0097] In this scenario, the number of target applications is greater than one, the number of target nodes is one, and the fault injection controller sends fault indication information to that single node. For example, the target applications are the second and fourth applications. Both the second and fourth applications are deployed on node 1 of the infrastructure. In this case, the fault injection controller sends fault indication information to node 1.

[0098] In the fourth possible scenario, there are multiple target applications, deployed on multiple nodes of the infrastructure.

[0099] In this scenario, the number of target applications and target nodes is greater than one, and the fault injection controller sends fault indication information to these multiple nodes. For example, the target applications are Application 1, Application 2, and Application 4. Application 1 is deployed on nodes 1 through 3 of the infrastructure, while Applications 2 and 4 are both deployed on node 1 of the infrastructure. In this case, the fault injection controller sends fault indication information to nodes 1 through 3.

[0100] Accordingly, the target application receives fault indication information. For example, the target application can use a request receiving layer to receive fault indication information. The request receiving layer can be responsible for receiving requests, invoking the business logic layer (or Service layer) or the data access object (DAO) layer to process the requests, and returning the processing results. For more information on request receiving layers, please refer to the description of general techniques; it will not be repeated here. The target application then executes S240 in response to the fault indication information.

[0101] S240, The intrinsic fault module of the target application injects the application's intrinsic fault into the target application's business module.

[0102] The process of injecting faults varies depending on the number of target applications and their deployment methods. The following sections will explain the different scenarios.

[0103] In scenario A, the target application consists of an application that is deployed on a single node.

[0104] Figure 4A is an example diagram of the first type of fault injection provided in this application. As shown in Figure 4A, the target application includes a first application, which is deployed on node 1 of the infrastructure. In this case, the fault injection controller sends fault indication information to node 1. Node 1 receives the fault indication information, and the inherent fault module of the first application on node 1 injects the application's inherent fault into the service module of the first application. The fault injection controller can send fault indication information to node 1 based on the fault injection control logic.

[0105] In scenario B, the target application includes multiple applications, and these multiple applications are deployed on a single node.

[0106] Figure 4B is an example diagram of the second type of fault injection provided in this application. As shown in Figure 4B, the target application includes a first application and a second application, and the first application and the second application are deployed on node 1 of the infrastructure. In this case, the fault injection controller sends fault indication information to node 1. Node 1 receives the fault indication information. The intrinsic fault module of the first application on node 1 injects an application intrinsic fault into the business module of the first application, and the intrinsic fault module of the second application on node 1 injects an application intrinsic fault into the business module of the second application. Similarly, the fault injection controller can send fault indication information to node 1 based on the fault injection control logic.

[0107] In scenario C, the target application consists of one application that is deployed on multiple nodes.

[0108] Figure 4C is an example diagram of the third type of fault injection provided in this application. As shown in Figure 4C, the target application includes a first application, which is deployed on nodes 1 to 3 of the infrastructure. In this case, the fault injection controller sends fault indication information to nodes 1 to 3. Nodes 1 to 3 receive the fault indication information. The intrinsic fault module of the first application at node 1 injects an application-intrinsic fault into the service module of the first application; the intrinsic fault module of the first application at node 2 injects an application-intrinsic fault into the service module of the first application; and the intrinsic fault module of the first application at node 3 injects an application-intrinsic fault into the service module of the first application. The fault injection controller can send fault indication information to nodes 1 to 3 based on the fault injection control logic.

[0109] In scenario D, the target application includes multiple applications and these multiple applications are deployed on multiple nodes.

[0110] Figure 4D is an example diagram of the fourth type of fault injection provided in this application. As shown in Figure 4D, the target application includes a first application and a second application. The first application is deployed on nodes 1 to 3 of the infrastructure, and the second application is deployed on node 3 of the infrastructure. In this case, the fault injection controller sends fault indication information to nodes 1 to 3. Nodes 1 to 3 receive the fault indication information. The intrinsic fault module of the first application on node 1 injects an application-intrinsic fault into the service module of the first application. The intrinsic fault module of the first application on node 2 injects an application-intrinsic fault into the service module of the first application. The intrinsic fault module of the first application on node 3 injects an application-intrinsic fault into the service module of the first application, and the intrinsic fault module of the second application on node 3 injects an application-intrinsic fault into the service module of the second application. The fault injection controller can send fault indication information to nodes 1 to 3 based on the fault injection control logic.

[0111] In some possible scenarios, application-native faults are implemented in code, such as by using first code to implement the application-native fault. Before injecting the application-native fault into the target application, first code for implementing the application-native fault can be written into the target application. Furthermore, upon receiving fault indication information, the target application can invoke and execute the first code to inject the application-native fault into the target application.

[0112] Depending on the needs of the actual application, the user can also send a service request carrying a cancel fault injection flag to the fault injection controller, so that the intrinsic fault module of the target application cancels the injection of application intrinsic faults into the service module of the target application. Thus, by carrying the cancel fault injection flag in the service request, the user can control the target application to cancel the injection of application intrinsic faults, improving operational convenience. Specifically, the fault injection controller can also receive service requests. If the service request carries a cancel fault injection flag, the fault injection controller sends a first command to the target application. The first command carries information instructing the intrinsic fault module of the target application to inject application intrinsic faults into the service module of the target application. Correspondingly, the target application receives the first command and, in response to the first command, causes the intrinsic fault module of the target application to inject application intrinsic faults into the service module of the target application. This allows the service module of the target application to execute normal logic. Normal logic can refer to the logic for processing service requests without the injection of application intrinsic faults. Similar to the fault injection flag described in S210, the cancel fault injection flag includes, but is not limited to, control commands. For details regarding control commands and the method of carrying the cancel fault injection flag in the service request, please refer to the description above; it will not be repeated here.

[0113] The above, with accompanying diagrams, describes the process of injecting application-inherent faults into the target application's business modules using the target application's inherent fault module, and the process of canceling the injection of application-inherent faults. After injecting application-inherent faults into the target application's business modules, the target application can use fault injection logic to process all received business requests, or, depending on the actual application needs, it can use fault injection logic to process some of the received business requests and use normal logic to process the rest. Fault injection logic refers to the logic for handling business requests when an application-inherent fault is injected.

[0114] In some possible scenarios, the target application can intercept business requests that meet the fault logic injection conditions and process them using fault injection logic, while processing business requests that do not meet the conditions using normal logic. In this way, different processing logic is executed for different business requests, meeting the diverse needs of business processing.

[0115] The target application can use interceptors or filters to intercept or filter business requests to obtain business requests that meet the fault logic injection conditions and those that do not. In this case, the target application's interceptors or filters can intercept business requests based on the obtained fault logic injection conditions. If the first information carried by the business request meets the fault logic injection conditions, the business request is processed using fault injection logic. If the first information carried by the business request does not meet the fault logic injection conditions, the second business request is processed using normal logic. Specifically, the following (1) to (3) can be used to implement the business request processing logic.

[0116] (1) The target application receives the service request.

[0117] The business request carries primary information. This primary information includes one or more of the following: user identifier, request type, and deployment environment.

[0118] The user identifier is used to indicate the user who sent the service request. The user identifier may include, but is not limited to, the user name and the user code.

[0119] The request type is used to indicate the scope of injected faults to be activated. The request type may include, but is not limited to, a first request type and a second request type. The first request type indicates the activation of all injected faults, and the business module that has activated all injected faults processes the business request. The second request type indicates the activation of some of the injected faults, and the business module that has activated some of the faults processes the business request. The request type and the scope of injected faults to be activated can be preset or set by the user according to the actual application needs; this application does not limit this.

[0120] For example, the application-inherent faults injected into the target application receiving the service request include faults 21 to 2i. The first information carried by the service request includes a first request type. In this case, the target application receiving the service request initiates the injected faults 21 to 2i and processes the service request after initiating faults 21 to 2i. As another example, the first information carried by the service request includes a second request type. In this case, the target application receiving the second service request initiates the injected faults 21 to 2j and processes the service request after initiating faults 21 to 2j, where j is less than or equal to i.

[0121] The deployment environment is used to indicate the application's operational information. Operational information can refer to the application being in one of the following states: testing state, normal operation state, maintenance state, and fault state, etc.

[0122] (2) If the first information carried by the business request meets the fault logic injection condition, the target application will use the fault injection logic to process the business request.

[0123] The fault logic injection conditions include one or more of the following: user identifier matches the set user identifier, request type matches the set request type, and deployment environment matches the set deployment environment. Fault injection logic can refer to the logic of the target application's business module processing business requests after the first code corresponding to an application-internal fault has been executed. Fault logic injection conditions can be preset or set by the user according to the actual application needs; this application does not limit this. For example, the node receiving the business request receives the fault logic injection conditions issued by the fault injection controller.

[0124] For example, if the user identifier sending the service request is Xiaohong, and the user identifier is set to Xiaohong, the target application receiving the service request can use fault injection logic to process the service request.

[0125] For example, if the request type is the first request type and the request type is also set to the first request type, the business module of the target application receiving the business request can use fault injection logic to process the business request.

[0126] For example, if the deployment environment is in a test state and is also set to a test state, the business module of the target application receiving the business request can use fault injection logic to process the business request.

[0127] (3) If the first information carried by the business request does not meet the fault logic injection conditions, the target application will process the business request using normal logic.

[0128] For example, if the first information carried by the business request does not meet the fault logic injection conditions, the second business request is processed using the business logic layer with normal logic.

[0129] In some possible scenarios, before the target application uses interceptors or filters to intercept or filter business requests, code for implementing interception or filtering functions can be written into the target application to achieve the interception and / or filtering of business requests. In this case, the software development framework can include: code for implementing interceptors or filters, code for implementing application-inherent faults, and business logic code for implementing the business logic of business modules. Depending on the needs of the actual application, the code for implementing filters or interceptors, fault injection logic, and business logic code can be encapsulated in one application, or they can be encapsulated in multiple applications; this application does not limit this. Figure 5 is an example diagram of a node architecture provided by this application. As shown in Figure 5, the node encapsulates the code for implementing interceptors or filters, the code for implementing application-inherent faults, and the business logic code for implementing business functions (or business logic) in one application.

[0130] The above text, with reference to the accompanying diagram, describes the process by which the target application handles business requests after an intrinsic application fault is injected. The following text, with reference to the accompanying diagram, describes the process of establishing a fault mode relational database.

[0131] Before the fault injection controller converts the fault into an application-native fault based on the fault mode relation library, the fault mode relation library can also be established using the method described in Figure 6. Figure 6 is a flowchart example of a method for establishing a fault mode relation library provided in this application. As shown in Figure 6, the method may include the following (1) to (4).

[0132] (1) The fault injection controller provides a configuration interface.

[0133] This configuration interface includes an application-inherent fault configuration area.

[0134] (2) The fault injection controller responds to the user’s first operation on the configuration interface and obtains the application-inherent fault from the application-inherent fault configuration area.

[0135] For example, a user performs a first operation in the application-inherent fault configuration area to configure application-inherent faults. In response to the user's first operation on the configuration interface, application-inherent faults are retrieved from the application-inherent fault configuration area. For instance, if the user performs the first operation in the application-inherent fault configuration area to configure application-inherent faults including faults 21 to 2N, and in response to the first operation, application-inherent faults including faults 21 to 2N are retrieved.

[0136] In some possible scenarios, if the target application does not contain the code corresponding to the faults (such as fault 22) included in the user-configured application-inherent faults, a code input field is provided. In response to the user's interaction with the code input field, code implementing fault 22 is written to the target application.

[0137] (3) The fault injection controller responds to the user's second operation on the configuration interface and establishes the correspondence between application-inherent faults and faults.

[0138] For example, a user performs a second action in the configuration interface to input the mapping between application-inherent faults and faults. In response to the user's second action in the configuration interface, the mapping between application-inherent faults and faults is established.

[0139] (4) The fault injection controller adds the correspondence between application-inherent faults and general faults to the fault mode relation library.

[0140] For example, add the correspondence between application-inherent faults and faults to the fault mode relation library.

[0141] The above example illustrates the process of establishing a fault mode relational database, using user-defined application-inherent faults and the correspondence between them. In some possible examples, this fault mode relational database can also be pre-defined; this application does not limit the method of obtaining the fault mode relational database.

[0142] The process of establishing a fault mode relation database has been described above with reference to the accompanying drawings. The application fault injection method provided in this application will be summarized below with reference to the accompanying drawings.

[0143] Figure 7 is an example diagram of an application fault injection method provided in this application. As shown in Figure 7, the process may include the following (1) to (9).

[0144] (1) The fault injection controller receives the fault injection request input by the user.

[0145] (2) The fault injection controller receives and responds to the fault injection request and generates fault indication information.

[0146] This fault indication information is used to instruct the inherent fault module of the target application to inject faults into the business modules of the target application.

[0147] (3) The fault injection controller sends fault indication information to the nodes in the infrastructure that match the target application.

[0148] During this process, the fault injection controller can receive service requests (such as a first service request) sent by the user carrying control commands. The fault injection controller parses the service request. If the first service request carries a fault injection identifier, the fault injection controller sends fault indication information to the node matching the target application. And if the first service request carries a cancel fault injection identifier, the fault injection controller discards the fault indication information.

[0149] (4) The node that matches the target application receives the fault indication information.

[0150] (5) The node matching the target application receives and responds to the fault indication information, and the intrinsic fault module of the target application injects the application intrinsic fault into the business module of the target application.

[0151] After the infrastructure executes (1) to (5) above to inject the inherent fault module of the target application into the business module of the target application, it can also execute (6) to (9) below to process business requests. The following uses the target application as the first application deployed on the first node of the infrastructure as an example to illustrate the process of the target application processing business requests after the fault is injected.

[0152] (6) The target application can use the request receiving layer to receive the second business request.

[0153] (7) The target application uses an interceptor or filter to filter the second service request. If the second service request meets the fault logic injection conditions, then execute (8). If the second service request does not meet the fault logic injection conditions, then execute (9).

[0154] (8) The target application uses fault injection logic to process business requests.

[0155] (9) The target application uses a business logic layer with normal logic to process business requests.

[0156] Upon receiving a user's fault injection request, the target application's inherent fault module directly injects faults into the target application's business modules. The scope of fault injection is the target application's business modules, and the impact of the fault extends to the target application's business operations. The controllable scope of fault injection ensures a controllable impact, thereby guaranteeing normal business processing. Furthermore, direct fault injection from the target application's inherent fault module eliminates the need to install a fault injection agent on the nodes where the target application is deployed, improving ease of use, reducing requirements on the nodes where the target application is deployed, and expanding application scenarios.

[0157] It is understood that, in order to achieve the functions in the above embodiments, the fault injection platform includes hardware structures and / or software modules corresponding to the execution of each function. Those skilled in the art should readily recognize that, based on the units and method steps described in conjunction with the embodiments disclosed in this application, this application can be implemented in hardware or a combination of hardware and computer software. Whether a function is executed in hardware or by computer software driving hardware depends on the specific application scenario and design constraints of the technical solution.

[0158] The method steps in this embodiment can be implemented in hardware or by a processor executing software instructions. The software instructions can consist of corresponding software modules, which can be stored in random access memory (RAM), flash memory, read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), registers, hard disks, portable hard disks, CD-ROMs, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor, enabling the processor to read information from and write information to the storage medium. Of course, the storage medium can also be a component of the processor. The processor and storage medium can reside in an ASIC. Alternatively, the ASIC can reside in a computing device. Of course, the processor and storage medium can also exist as discrete components in a network device or terminal device.

[0159] The application fault injection method provided in this embodiment has been described in detail above with reference to Figures 1 to 7. The application fault injection device provided in this embodiment will be described below with reference to Figure 8.

[0160] This application also provides an application fault injection device, and Figure 8 is a schematic diagram of the structure of an application fault injection device provided in this application. This application fault injection device can be used to implement the function of the fault injection controller in the above-described application fault injection method embodiments, so as to send fault indication information to the target application according to the fault injection request, and thus also achieve the beneficial effects of the above-described method embodiments. In this embodiment, the application fault injection device can be the computing device 900 in Figure 9 below, or a module (such as chip 1000) of the computing device shown in subsequent embodiments.

[0161] Figure 8 is a schematic diagram of an application fault injection device provided in this application. As shown in Figure 8, the application fault injection device 800 includes a transceiver module 810 and a processing module 820. The transceiver module 810 is used to: acquire a fault injection request input by a user. The fault injection request includes target application information and fault mode information. The target application information indicates the target application to which the fault is injected, and the target application belongs to at least one application. The fault mode information indicates the fault to be injected into the target application. The processing module 820 is used to: generate fault indication information according to the fault injection request. The fault indication information instructs the inherent fault module of the target application to inject a fault into the business module of the target application. The transceiver module 810 is also used to: send the fault indication information to the target application.

[0162] In some possible scenarios, the application fault injection device 800 stores a fault mode relation library, which is used to indicate the correspondence between faults and application-native faults. The processing module 820 is specifically used to: convert faults into application-native faults based on the fault mode relation library. The processing module 820 is also specifically used to: generate fault indication information based on target application information and application-native faults.

[0163] In some possible scenarios, the transceiver module 810 is also configured to: acquire application-native fault configuration information input by the user. The processing module 820 is also configured to: generate an application-native fault database based on the application-native fault database configuration information.

[0164] In some possible scenarios, the infrastructure comprises multiple nodes. In the case where the target application is deployed on a target node across multiple nodes, the transceiver module 810 is specifically used to send fault indication information to the target node.

[0165] In some possible scenarios, the infrastructure should include multiple nodes. When the target application includes a first target application and a second target application, with the first target application deployed on the first node and the second target application deployed on the second node, the transceiver module 810 is specifically used to: send fault indication information to the first node. The transceiver module 810 is also specifically used to: send fault indication information to the second node.

[0166] In some possible scenarios, the processing module 820 is further configured to: provide a configuration interface. The configuration interface includes an application-inherent fault configuration area. The processing module 820 is further configured to: in response to a user's first operation on the configuration interface, retrieve application-inherent faults from the application-inherent fault configuration area.

[0167] In some possible scenarios, the processing module 820 is further configured to: establish a correspondence between application-inherent faults and faults in response to a second user operation on the configuration interface. The processing module 820 is also configured to: add the correspondence between application-inherent faults and faults to the fault mode relation library.

[0168] In some possible scenarios, the transceiver module 810 is further configured to: receive service requests. The processing module 820 is further configured to: if the service request carries a cancellation fault injection flag, send a first command to the target application, the first command carrying information to instruct the target application's intrinsic fault module to cancel the injection of faults into the target application's service module.

[0169] In some possible scenarios, the transceiver module 810 is specifically used to: obtain user permission information. The transceiver module 810 is also specifically used to: if the permission information meets the permission conditions, obtain a fault injection request.

[0170] For more details on the transceiver module 810 and the processing module 820, please refer to the above description of the application fault injection method, which will not be repeated here.

[0171] When the application fault injection device 800 corresponds to the steps executed by the fault injection controller in the application fault injection method described in the embodiments of this application, the above and other operations and / or functions of each module in the application fault injection device 800 are respectively to implement the method flow executed by the fault injection controller in the foregoing figures.

[0172] It is worth noting that if the above-mentioned application fault injection devices are implemented through software modules, for example, the software module can be provided to users through a cloud service subscription model, and users can choose different subscription levels according to their needs; or, for example, the software module can also provide enterprise-level customized services with professional domain customization, interface personalization and extended functions according to the needs of users or enterprises.

[0173] In addition, the application fault injection device 800 provided in this application can also be provided to users as an value-added service, and this application does not limit this.

[0174] The application fault injection device 800 in this embodiment can also be implemented in hardware, such as a computing device. Figure 9 is a schematic diagram of the structure of a computing device provided in this application. As shown in Figure 9, the computing device 900 includes: a communication interface 914, a processor 911, and a memory 912. The communication interface 914 is used to communicate with devices located outside the computing device 900. For example, a user inputs a fault injection request to the computing device 900 through the communication interface 914, and the computing device 900 obtains the processing result (such as generating fault indication information) based on the received fault injection request. The communication interface 914 can be an input / output (I / O) interface.

[0175] Processor 911 is the core of computing device 900 for both computation and control. It can include: a central processing unit (CPU), specific integrated circuits, other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. In practical applications, computing device 900 may also include multiple processors. Processor 911 may include one or more processor cores. An operating system and other software programs are installed in processor 911, enabling it to access memory 912 and various peripheral component interconnect (PCIe) devices.

[0176] The processor 911 is connected to the memory 912 via a bus 916. The bus 916 can be a double data rate (DDR) bus or other types of bus. The memory 912 is the main memory of the computing device 900. The memory 912 is typically used to store various running software in the operating system. To improve the access speed of the processor 911, the memory 912 needs to have a high access speed. In traditional computer devices, dynamic random access memory (DRAM) is typically used as the memory 912. Besides DRAM, the memory 912 can also be other random access memories, such as static random access memory (SRAM). Alternatively, the memory 912 can also be a read-only memory (ROM). For example, a read-only memory could be a programmable read-only memory (PROM) or an erasable programmable read-only memory (EPROM). This embodiment does not limit the number or type of memory 912.

[0177] For example, the processor 911 in FIG9 can be implemented by a chip, as shown in FIG10. FIG10 is a schematic diagram of the structure of a chip provided in this application. For example, the chip 1000 includes a core 1001, a CPU 1002, a system buffer 1003, an input / output (I / O) device 1005, and a DDR 1006.

[0178] In this implementation, CPU 1002 accepts tasks (such as compression, decompression, and fault injection tasks) and calls core 1001 to execute those tasks. When chip 1000 has multiple cores 1001, CPU 1002 also handles scheduling tasks. For example, CPU 1002 can be implemented using an ARM processor, which is small, low-power, and uses a 34-bit reduced instruction set, offering simple and flexible addressing. Of course, in some implementations, CPU 1002 can also be implemented using other processors.

[0179] Core 1001 provides the computing power required for tasks such as fault injection. In one optional configuration, core 1001 includes a load / store unit (LSU), a cube computing unit, a scalar computing unit, a vector computing unit, and a buffer. The LSU loads data to be processed and stores processed data; it also manages read / write operations between different buffers within the core and performs format conversions. The cube computing unit provides the core computing power for matrix multiplication. The scalar computing unit is a single-instruction, single-data-stream unit.

[0180] A Single Instruction Single Data (SISD) processor processes only one data item (usually an integer or floating-point number) at a time. A vector computation unit, also known as an array processor, is a processor capable of directly manipulating an array or vector for computation. There may be one or more buffers; for example, this buffer primarily refers to the level 1 cache (L1 buffer). The buffer temporarily stores data that the core 1001 needs to access repeatedly, thus reducing bus reads and writes. Additionally, the implementation of certain data format conversion functions also requires the source data to be located in the buffer. In this embodiment, because the buffer is located in the core, the distance between the cube computation unit in the core and the storage area where the data is located is reduced, decreasing the cube computation unit's access to the DDR 1006, thereby reducing data access latency and core data processing latency.

[0181] System buffer 1003 mainly refers to the secondary cache, which is used to temporarily store input data, intermediate results or final results that have passed through the chip.

[0182] DDR 1006 is an off-chip memory that can be replaced by high bandwidth memory (HBM) or other off-chip memory. Located between the chip and external memory, DDR 1006 overcomes the access speed limitations of shared memory read / write operations in computing resource sharing.

[0183] The I / O devices 1005 included in chip 1000 refer to the hardware that performs data transmission, or devices that interface with the I / O interface. Common I / O devices 1005 include network cards, printers, keyboards, and mice. All external storage devices can also be used as I / O devices 1005, such as hard drives, floppy disks, and optical discs.

[0184] Core 1001, CPU 1002, system buffer 1003, I / O devices 1005, and DDR 1006 are connected via a bus. The bus may include a pathway for transferring information between the aforementioned components (such as CPU 1002 and system buffer 1003). In addition to a data bus, the bus may also include a power bus, control bus, and status signal bus. However, for clarity, the bus may be a PCIe bus, or an extended industry standard architecture (EISA) bus, a unified bus (Ubus or UB), a compute express link (CXL), a cache coherent interconnect for accelerators (CCIX), etc. For example, core 1001 can access these I / O devices 1005 via the PCIe bus. Core 1001 is connected to system buffer 1003 via the DDR bus. Here, different system buffers 1003 may use different data buses to communicate with the core 1001. Therefore, the DDR bus can also be replaced with other types of data buses. This application embodiment does not limit the bus type.

[0185] For example, after CPU 1002 loads the data to be processed for fault injection (such as a fault injection request) into DDR 1006, the LSU in core 1001 reads (loads) the data from DDR 1006, processes the data, and obtains the processing result (such as fault indication information). After obtaining the processing result, the LSU then loads (stores) the processing result into DDR 1006, and the network interface card sends the processing result to the client device.

[0186] It is understood that the structure illustrated in this embodiment does not constitute a specific limitation on the computing device or processor. In other embodiments, the computing device or processor may include more or fewer components than illustrated, or combine some components, or split some components, or have different component arrangements. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

[0187] This application also provides a computing device cluster. The computing device cluster includes at least one computing device, which may be a server. In some embodiments, the computing device may also be a desktop computer, a laptop computer, or a smartphone, or other terminal device.

[0188] As shown in Figure 11, which is a schematic diagram of a computing device cluster provided in this application, the computing device cluster includes at least one computing device 900. The memory 912 of one or more computing devices 900 in the computing device cluster may store the same instructions for executing application fault injection methods.

[0189] In some possible implementations, the memory 912 of one or more computing devices 900 in the computing device cluster may also store partial instructions for executing the application fault injection method. In other words, a combination of one or more computing devices 900 can jointly execute instructions for executing the application fault injection method. In this case, one or more computing devices in the computing device cluster can be used to implement the function of a fault injection controller, and other computing devices in the computing device cluster can be used to implement the function of a node matching the target application. When other computing devices can be used to implement the function of a node matching the target application, the function of a single node among the nodes matching the target application can be implemented using one computing device, or the function of multiple nodes among the nodes matching the target application can be implemented using multiple nodes; this application does not limit this.

[0190] It should be noted that the memories 912 in different computing devices 900 within the computing device cluster can store different instructions, which are used to execute the functions of the fault injection controller and the nodes matched with the target application, respectively. When the computing device cluster is used to implement the function of the fault injection controller, the instructions stored in the memories 912 of different computing devices 900 can implement the functions of one or more modules in the transceiver module 810 and the processing module 820. When the computing device cluster is used to implement the function of the fault injection platform, the instructions stored in the memories 912 of different computing devices 900 can implement the functions of the fault injection controller and one or more nodes in the distributed system. The following description uses the example of the computing device cluster being used to implement the function of the fault injection controller to illustrate the computing device cluster provided in this application.

[0191] In this scenario, one or more computing devices in a computing device cluster can be connected via a network. This network can be a wide area network (WAN) or a local area network (LAN). Figure 12 illustrates one possible implementation. As shown in Figure 12, this application provides a schematic diagram of a connection between computing devices, where two computing devices 900A and 900B are connected via a network. Specifically, they are connected to the network through communication interfaces in each computing device. In this type of possible implementation, the instructions stored in the memory 912 of computing device 900A can implement the functions implemented by the transceiver module 810. Simultaneously, the instructions stored in the memory 912 of computing device 900B can implement the functions implemented by the processing module 820.

[0192] This application also provides a computer program product containing instructions. The computer program product may be a software or program product containing instructions capable of running on a computing device or stored on any usable medium. When the computer program product is run on at least one computing device, it causes the at least one computing device to execute an application fault injection method.

[0193] This application also provides a computer-readable storage medium. The computer-readable storage medium can be any available medium that a computing device can store, or a data storage device such as a data center containing one or more available media. The available medium can be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid-state drive). The computer-readable storage medium includes instructions that instruct the computing device to perform an application fault injection method.

[0194] This application also provides a chip. The chip includes an interface circuit and a control circuit. The interface circuit is used to acquire a fault injection request, and the control circuit is used to implement the function of a fault injection controller in the applied fault injection method.

[0195] In the above embodiments, implementation can be achieved entirely or partially through software, hardware, firmware, or any combination thereof. When implemented using software, it can be implemented entirely or partially in the form of a computer program product. The computer program product includes one or more computer programs or instructions. When the computer program or instructions are loaded and executed on a computer, the processes or functions described in the embodiments of this application are performed entirely or partially. The computer can be a general-purpose computer, a special-purpose computer, a computer network, a network device, a user equipment, or other programmable device. The computer program or instructions can be stored in a computer-readable storage medium or transferred from one computer-readable storage medium to another. For example, the computer program or instructions can be transferred from one website, computer, server, or data center to another website, computer, server, or data center via wired or wireless means. The computer-readable storage medium can be any available medium that a computer can access or a data storage device such as a server or data center that integrates one or more available media. The available medium can be a magnetic medium, such as a floppy disk, hard disk, or magnetic tape; it can also be an optical medium, such as a digital video disc (DVD); or it can be a semiconductor medium, such as a solid-state drive (SSD).

[0196] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any person skilled in the art can easily conceive of various equivalent modifications or substitutions within the technical scope disclosed in this application, and these modifications or substitutions should all be covered within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.

Claims

1. A method for applying fault injection, characterized in that, The method is applied to a fault injection controller, which runs on an infrastructure on which at least one application of a user is deployed. Each of the at least one application includes a business module and an intrinsic fault module. The method includes: Obtain a fault injection request input by the user. The fault injection request includes target application information and fault mode information. The target application information is used to indicate the target application for which the fault is injected. The target application belongs to the at least one application. The fault mode information is used to indicate the fault for which the fault is injected into the target application. Based on the fault injection request, fault indication information is generated, which is used to instruct the inherent fault module of the target application to inject the fault into the business module of the target application; Send the fault indication information to the target application.

2. The method according to claim 1, characterized in that, The infrastructure stores a fault mode relationship database, which indicates the correspondence between the faults and application-inherent faults. The step of generating fault indication information based on the fault injection request includes: Based on the fault mode relation library, the fault is converted into an application-native fault; Based on the target application information and the application's inherent faults, the fault indication information is generated.

3. The method according to claim 1, characterized in that, Before obtaining the fault injection request input by the user, the method further includes: Obtain the application-inherent fault library configuration information input by the user; An application-native fault database is generated based on the application-native fault database configuration information.

4. The method according to any one of claims 1-3, characterized in that, The infrastructure includes multiple nodes. When the target application is deployed on a target node among these multiple nodes, sending the fault indication information to the target application includes: The fault indication information is sent to the target node.

5. The method according to any one of claims 1-3, characterized in that, The infrastructure includes multiple nodes. In the case where the target application includes a first target application and a second target application, the first target application is deployed on a first node among the multiple nodes, and the second target application is deployed on a second node among the multiple nodes,... Sending the fault indication information to the target application includes: Send the fault indication information to the first node; The fault indication information is sent to the second node.

6. The method according to any one of claims 1-5, characterized in that, Before generating fault indication information based on the fault injection request, the method further includes: A configuration interface is provided; the configuration interface includes an application-inherent fault configuration area. In response to the user's first operation on the configuration interface, the application-inherent fault is obtained from the application-inherent fault configuration area.

7. The method according to claim 6, characterized in that, The method further includes: In response to the user's second operation on the configuration interface, a correspondence between the application's inherent faults and the faults is established; Add the correspondence between the application's inherent faults and the faults to the fault mode relationship library.

8. The method according to any one of claims 1-7, characterized in that, The fault injection request for obtaining user input includes: Obtain the user's permission information; If the permission information meets the permission conditions, obtain the fault injection request.

9. A fault injection device, characterized in that, The device includes: The transceiver module is used to: obtain a fault injection request input by a user, wherein the fault injection request includes target application information and fault mode information, wherein the target application information is used to indicate the target application for which the fault is injected, wherein the target application belongs to the at least one application, and the fault mode information is used to indicate the fault for which the fault is injected into the target application; The processing module is configured to: generate fault indication information based on the fault injection request, wherein the fault indication information is used to instruct the inherent fault module of the target application to inject the fault into the business module of the target application; The transceiver module is also used to: send the fault indication information to the target application.

10. The apparatus according to claim 9, characterized in that, The device stores a fault mode relationship database, which indicates the correspondence between the fault and application-inherent faults. The processing module is specifically used to: convert the fault into an application-native fault based on the fault mode relation library; The processing module is further specifically used to: generate the fault indication information based on the target application information and the application's inherent faults.

11. The apparatus according to claim 9, characterized in that, The transceiver module is further configured to: obtain the application-inherent fault configuration information input by the user; The processing module is further configured to: generate an application-specific intrinsic fault library based on the intrinsic fault configuration information.

12. The apparatus according to any one of claims 9-11, characterized in that, The infrastructure includes multiple nodes, with the target application deployed on a target node among these nodes. The transceiver module is specifically used to send the fault indication information to the target node.

13. The apparatus according to any one of claims 9-11, characterized in that, The infrastructure includes multiple nodes, and the target applications include a first target application and a second target application. The first target application is deployed on a first node among the multiple nodes, and the second target application is deployed on a second node among the multiple nodes. The transceiver module is specifically used to: send the fault indication information to the first node; The transceiver module is also specifically used to: send the fault indication information to the second node.

14. The apparatus according to any one of claims 9-13, characterized in that, The processing module is further configured to: provide a configuration interface; the configuration interface includes an application-inherent fault configuration area; The processing module is further configured to: in response to a user's first operation on the configuration interface, obtain the application-inherent fault from the application-inherent fault configuration area.

15. The apparatus according to claim 14, characterized in that, The processing module is further configured to: in response to a second operation by the user on the configuration interface, establish a correspondence between the application-inherent fault and the fault; The processing module is further configured to: add the correspondence between the application-inherent faults and the faults to the fault mode relationship library.

16. The apparatus according to any one of claims 9-14, characterized in that, The transceiver module is specifically used to: obtain the user's permission information; The transceiver module is further specifically used to: if the permission information meets the permission conditions, obtain the fault injection request.

17. A processor, characterized in that, The processor includes an interface circuit and a control circuit; the interface circuit is used to acquire a fault injection request and, in conjunction with the control circuit, execute the method of any one of claims 1-8.

18. A computing device cluster, characterized in that, The computing device cluster includes at least one computing device, and each computing device includes a processor and memory; The processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device to cause the cluster of computing devices to perform the method as described in any one of claims 1-8.

19. A computer-readable storage medium, characterized in that, The computer-readable storage medium includes computer instructions; when the computer instructions are executed in a computing device, the computing device performs the method of any one of claims 1-8.

20. A computer program product, characterized in that, When the computer program product is run in a computing device, the computing device performs the method of any one of claims 1-8.