Method, apparatus, and machine readable storage medium for commissioning an aerial work platform

The reinforcement learning model constructed by the objective reinforcement learning model and the dual-delay deep deterministic policy algorithm, combined with the Actor and Critic networks, solves the problems of low debugging efficiency and controllability of aerial work vehicles, and realizes an efficient and reliable debugging process.

CN117284981BActive Publication Date: 2026-06-12ZOOMLION INTELLIGENT ACCESS MASCH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ZOOMLION INTELLIGENT ACCESS MASCH CO LTD
Filing Date
2023-09-04
Publication Date
2026-06-12

Smart Images

  • Figure CN117284981B_ABST
    Figure CN117284981B_ABST
Patent Text Reader

Abstract

The application discloses a method, device and machine readable storage medium for debugging aerial work machinery. The method comprises the following steps: determining a target machine; determining a variable step current debugging strategy according to a target reinforcement learning model; debugging the target machine according to the variable step current debugging strategy; judging whether the target machine completes the debugging; in the case that the target machine does not complete the debugging, obtaining a fixed step debugging strategy and the target machine which does not complete the debugging; and debugging the target machine which does not complete the debugging according to the fixed step debugging strategy to complete the debugging of the aerial work machinery. The application can improve the debugging efficiency, standardization and controllability of the aerial work engineering vehicle by using the target reinforcement learning model and the fixed step debugging strategy to debug the aerial work machinery.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of high-altitude work technology, and more specifically to a method, apparatus, and machine-readable storage medium for debugging high-altitude work machinery. Background Technology

[0002] Aerial work platforms, due to their complex operating environments, require diverse equipment postures to adapt to target scenarios. Considering safety and comfort, before leaving the factory, commissioning workers need to adjust the current of the equipment to appropriate levels under various operating postures, ensuring the equipment can complete action transitions within a specified time. However, due to the inherent systematic errors of numerous components, the optimal current value for each action varies significantly for each piece of equipment, resulting in highly repetitive and time-consuming work for commissioning personnel. Besides manual commissioning, fixed-step commissioning schemes have been successfully implemented, but these are inefficient. Variable-step schemes can compensate for efficiency issues, but due to a lack of relevant experience and data in the aerial work field, unstable variable-step schemes are not widely used. Therefore, existing commissioning schemes suffer from low commissioning efficiency, standardization, and controllability for aerial work platforms. Summary of the Invention

[0003] The purpose of this application is to provide a method, apparatus, and machine-readable storage medium for debugging aerial work machinery, in order to solve the problems of low debugging efficiency, standardization, and controllability of aerial work vehicles in the prior art.

[0004] To achieve the above objectives, the first aspect of this application provides a method for commissioning aerial work machinery, the method comprising:

[0005] Acquire the target machinery;

[0006] Determine the variable step size current tuning strategy based on the target reinforcement learning model;

[0007] The target machinery was debugged according to the variable step current debugging strategy;

[0008] Determine whether the target machine has completed commissioning;

[0009] In the case where the target machine has not completed debugging, obtain the fixed-step debugging strategy and the target machine that has not completed debugging;

[0010] The target machinery that has not yet completed the debugging is debugged according to the fixed step length debugging strategy in order to complete the debugging of the aerial work machinery.

[0011] In this embodiment of the application, the method further includes:

[0012] A reinforcement learning model architecture is constructed based on a dual-delay deep deterministic strategy algorithm;

[0013] Determine the initial reinforcement learning model based on the reinforcement learning model architecture;

[0014] The initial reinforcement learning model is trained to obtain the target reinforcement learning model.

[0015] In this embodiment of the application, training the initial reinforcement learning model to obtain the target reinforcement learning model includes:

[0016] Obtain the initial current value and the maximum number of interactions;

[0017] Input the initial current value into the target machine to output the first round of current debugging results;

[0018] The results of the first round of current tuning are input into the initial reinforcement learning model to output the predicted current value in the first round.

[0019] The predicted current value from the previous round is used as the input value for the target mechanical debugging, and the current debugging result of the target mechanical debugging is used as the input value for the initial reinforcement learning model in the next round.

[0020] Determine if the maximum number of interactions has been reached;

[0021] When the maximum number of interactions is reached, all data from the interaction process will be used as training data.

[0022] The initial reinforcement learning model is trained based on the training data to obtain the target reinforcement learning model.

[0023] In this embodiment of the application, the reinforcement learning model architecture includes:

[0024] Multiple actor networks are configured to determine action strategies based on the current environment;

[0025] Multiple Critic networks are configured to evaluate the value of action strategies.

[0026] In this embodiment of the application, the target reinforcement learning model satisfies formula (1):

[0027] a1=f(t area ,a0,t0); (1)

[0028] Where a1 is the current value set in this program, t area The required range for device attitude change time is defined as follows: a0 is the current value set in the previous round of debugging, and t0 is the time of the previous round of debugging action.

[0029] In this embodiment of the application, debugging the engineering vehicle according to the target reinforcement learning model includes:

[0030] Obtain the current value set in the previous round of debugging, the time of the previous round of debugging actions, and the required range of equipment attitude change time;

[0031] Based on the current value set in the previous round of debugging, the time of the previous round of debugging actions, and the required range of equipment attitude change time, the current value set in this program is output through the Actor network.

[0032] In this embodiment of the application, the method further includes:

[0033] Obtain the current value set in the previous round of debugging, the time of the previous round of debugging actions, the required range of device attitude change time, and the current value set in the current program;

[0034] Based on the current value set in the previous round of debugging, the time of the previous round of debugging actions, the required range of equipment attitude change time, and the current value set in this program, the decision value is output through the Critic network.

[0035] In this embodiment of the application, the method further includes:

[0036] After the target machinery that has not yet completed debugging is debugged according to the fixed step size debugging strategy, the target reinforcement learning model is optimized.

[0037] A second aspect of this application provides a device for debugging aerial work machinery, comprising:

[0038] The memory is configured to store instructions; and

[0039] The processor is configured to retrieve instructions from memory and, when executing the instructions, to implement the aforementioned method for debugging aerial work machinery.

[0040] A third aspect of this application provides a machine-readable storage medium storing instructions for causing a machine to perform the method described above for commissioning aerial work machinery.

[0041] The above technical solution acquires the target machinery and determines a variable-step current debugging strategy based on the target reinforcement learning model. The target machinery is then debugged according to the variable-step current debugging strategy. The debugging status of the target machinery is then determined. If the target machinery has not completed debugging, a fixed-step debugging strategy and the un-debugged target machinery are acquired. The un-debugged target machinery is then debugged according to the fixed-step debugging strategy to complete the debugging of the aerial work platform machinery. This improves the debugging efficiency, standardization, and controllability of the aerial work platform machinery.

[0042] Other features and advantages of the embodiments of this application will be described in detail in the following detailed description section. Attached Figure Description

[0043] The accompanying drawings are provided to further illustrate the embodiments of this application and form part of the specification. They are used together with the following detailed description to explain the embodiments of this application, but do not constitute a limitation on the embodiments of this application. In the drawings:

[0044] Figure 1 A flowchart illustrating a method for commissioning aerial work machinery according to an embodiment of this application is shown schematically.

[0045] Figure 2 The flowchart illustrating a dual-strategy debugging method according to a specific embodiment of this application is shown schematically.

[0046] Figure 3 The diagram schematically illustrates a structural block diagram of a device for commissioning aerial work machinery according to an embodiment of this application;

[0047] Figure 4 The diagram illustrates a TD3 network structure according to a specific embodiment of this application. Detailed Implementation

[0048] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are only for illustration and explanation of the embodiments of this application and are not intended to limit the embodiments of this application. All other embodiments obtained by those skilled in the art based on the embodiments of this application without creative effort are within the scope of protection of this application.

[0049] It should be noted that if the embodiments of this application involve directional indicators (such as up, down, left, right, front, back, etc.), the directional indicators are only used to explain the relative positional relationship and movement of the components in a certain specific posture (as shown in the figure). If the specific posture changes, the directional indicators will also change accordingly.

[0050] Furthermore, if the embodiments of this application involve descriptions such as "first" or "second," these descriptions are for descriptive purposes only and should not be construed as indicating or implying their relative importance or implicitly specifying the number of technical features indicated. Therefore, features defined with "first" or "second" may explicitly or implicitly include at least one of those features. Additionally, the technical solutions of various embodiments can be combined with each other, but this must be based on the ability of those skilled in the art to implement them. If the combination of technical solutions is contradictory or impossible to implement, it should be considered that such a combination of technical solutions does not exist and is not within the scope of protection claimed in this application.

[0051] Figure 1 A flowchart illustrating a method for commissioning aerial work machinery according to an embodiment of this application is shown schematically. Figure 1 As shown in the figure, this application provides a method for debugging aerial work machinery, which may include the following steps.

[0052] Step 101: Obtain the target machinery;

[0053] Step 102: Determine the variable step size current tuning strategy based on the target reinforcement learning model;

[0054] Step 103: Adjust the target machinery according to the variable step current adjustment strategy;

[0055] Step 104: Determine whether the target machine has completed the debugging process;

[0056] Step 105: If the target machine has not completed debugging, obtain the fixed-step debugging strategy and the target machine that has not completed debugging;

[0057] Step 106: Adjust the target machinery that has not been adjusted according to the fixed step length adjustment strategy to complete the adjustment of the aerial work machinery.

[0058] In this embodiment, the target machinery is an aerial work platform machine to be debugged. The target reinforcement learning model refers to a model constructed for debugging the aerial work platform machine. The target reinforcement learning model can be obtained by writing a reinforcement learning model program based on the reinforcement learning model architecture and then training the reinforcement learning model program. By constructing the target reinforcement learning model, variable step current debugging of the aerial work platform machine can be achieved. After obtaining the target reinforcement learning model, a variable step current debugging strategy is determined based on the target reinforcement learning model. Then, the target machinery is debugged according to the variable step current debugging strategy. It is determined whether the target machinery has completed debugging. If the target machinery has completed debugging, the debugging of the aerial work platform machine is complete. If the target machinery has not completed debugging, a fixed step debugging strategy and the target machinery that has not completed debugging are obtained. Fixed step debugging is an automated debugging method. During the process, the equipment first tests a default current value, and then adjusts the current value by a fixed step each time based on the equipment's action feedback. By debugging the target machinery that has not completed debugging using the fixed step debugging strategy, the debugging of the aerial work platform machine can be completed. By employing a fixed-step debugging strategy when the target reinforcement learning model is unsuitable, the reliability of the overall debugging scheme can be enhanced.

[0059] The above technical solution acquires the target machinery and determines a variable-step current debugging strategy based on the target reinforcement learning model. The target machinery is then debugged according to the variable-step current debugging strategy. The debugging status of the target machinery is then determined. If the target machinery has not completed debugging, a fixed-step debugging strategy and the un-debugged target machinery are acquired. The un-debugged target machinery is then debugged according to the fixed-step debugging strategy to complete the debugging of the aerial work platform machinery. This improves the debugging efficiency, standardization, and controllability of the aerial work platform machinery.

[0060] In this embodiment of the application, the method may further include:

[0061] A reinforcement learning model architecture is constructed based on a dual-delay deep deterministic strategy algorithm;

[0062] Determine the initial reinforcement learning model based on the reinforcement learning model architecture;

[0063] The initial reinforcement learning model is trained to obtain the target reinforcement learning model.

[0064] Specifically, when constructing the target reinforcement learning model, the reinforcement learning model architecture is first built based on the TwinDelayed Deep Deterministic Policy Gradient (TD3) algorithm. The TD3 algorithm can autonomously learn and optimize strategies based on specific engineering vehicle attitude transformation targets. Facing engineering vehicles with complex components leading to increased system errors, the TD3 algorithm can adapt to and handle these uncertainties, exhibiting stronger generalization ability and making it suitable for complex high-altitude operations. This model architecture can consist of six neural networks: two Actor networks and four Critic networks. The Actor networks determine action strategies based on the current environment, while the Critic networks evaluate the value of the Actor networks' action strategies. The training objective of the Actor networks is to obtain higher Critic network evaluations for their action strategies, while the training objective of the Critic networks is to make their evaluations closer to the true value of the evaluation function. After constructing the reinforcement learning model architecture, the initial reinforcement learning model is determined based on the architecture. The model inputs are the required range of equipment attitude transformation time, the current value set in the previous round of debugging, and the time of the previous round of debugging actions; the output is the current value set in the current program. After obtaining the initial reinforcement learning model, the initial reinforcement learning model is trained to obtain the target reinforcement learning model.

[0065] In this embodiment of the application, training an initial reinforcement learning model to obtain a target reinforcement learning model may include:

[0066] Obtain the initial current value and the maximum number of interactions;

[0067] Input the initial current value into the target machine to output the first round of current debugging results;

[0068] The results of the first round of current tuning are input into the initial reinforcement learning model to output the predicted current value in the first round.

[0069] The predicted current value from the previous round is used as the input value for the target mechanical debugging, and the current debugging result of the target mechanical debugging is used as the input value for the initial reinforcement learning model in the next round.

[0070] Determine if the maximum number of interactions has been reached;

[0071] When the maximum number of interactions is reached, all data from the interaction process will be used as training data.

[0072] The initial reinforcement learning model is trained using training data to obtain the target reinforcement learning model. Specifically, after obtaining the initial reinforcement learning model, it can be trained. After deploying the initial reinforcement learning model program to a mobile device, it is allowed to freely interact with the engineering vehicle. In the first round, a default current value, i.e., the initial current value, is used for interaction. The initial current value is input to the target machine, which outputs the first round of current adjustment results. The first round of current adjustment results are then input to the initial reinforcement learning model, which outputs the predicted current value for the first round. The predicted current value from the first round is then used as the input value for the second round of target machine adjustment. This process continues, using the predicted current value from the previous round as the input value for the target machine adjustment, and the current adjustment result from the target machine as the input value for the next round of initial reinforcement learning model interaction. During the interaction process, it is determined whether the maximum number of interactions has been reached. If the maximum number of interactions has been reached, all data from the interaction process is determined as training data. The initial reinforcement learning model will adjust the initial current value based on the current adjustment results trained in the first round. For example, if the device operates too quickly, an ideal model should decrease the current value from the initial value. However, an untrained reinforcement learning model often randomly increases or decreases the current. If, after modification, the device's operating time increasingly aligns with the target time range, it proves the modification is in the right direction. The data generated in this process can be used as training data. Finally, the initial reinforcement learning model can be trained using the training data to obtain the target reinforcement learning model.

[0073] In this embodiment of the application, the reinforcement learning model architecture may include:

[0074] Multiple actor networks are configured to determine action strategies based on the current environment;

[0075] Multiple Critic networks are configured to evaluate the value of action strategies.

[0076] Specifically, the reinforcement learning model architecture can consist of six neural networks: two Actor networks and four Critic networks. The Actor networks determine action strategies based on the current environment, while the Critic networks evaluate the value of the Actor networks' action strategies. The training goal of the Actor networks is to obtain higher evaluations from the Critic networks for their action strategies. The training goal of the Critic networks is to make their evaluations closer to the true value of the evaluation function.

[0077] In this embodiment of the application, the target reinforcement learning model can satisfy formula (1):

[0078] a1=f(t area ,a0,t0); (1)

[0079] Where a1 is the current value set in this program, t area The required range for device attitude change time is defined as follows: a0 is the current value set in the previous round of debugging, and t0 is the time of the previous round of debugging action.

[0080] Specifically, the target reinforcement learning model can satisfy the formula a1=f(t) area (a0, t0). Where the output a1 of the target reinforcement learning model is the current value set in this program, and the inputs of the target reinforcement learning model are t0, t1, t2, t3, t4, t5, t6, t7, t8, t9, t1 ... area a0 and t0. t area Here, a0 represents the required time range for equipment attitude change, t0 represents the current value set in the previous round of debugging, and t0 represents the time of the previous round of debugging actions. By constructing a target reinforcement learning model to debug aerial work machinery, the accuracy and efficiency of debugging can be improved.

[0081] In this embodiment of the application, debugging the engineering vehicle according to the target reinforcement learning model may include:

[0082] Obtain the current value set in the previous round of debugging, the time of the previous round of debugging actions, and the required range of equipment attitude change time;

[0083] Based on the current value set in the previous round of debugging, the time of the previous round of debugging actions, and the required range of equipment attitude change time, the current value set in this program is output through the Actor network.

[0084] Specifically, when debugging the engineering vehicle based on the target reinforcement learning model, the current value set in the previous debugging round, the time of the previous debugging action, and the required range of the equipment attitude change time can be obtained first. The current value set in the previous debugging round, the time of the previous debugging action, and the required range of the equipment attitude change time are then input into the Actor network. The Actor network can output the current value set in the current program.

[0085] In this embodiment of the application, the method may further include:

[0086] Obtain the current value set in the previous round of debugging, the time of the previous round of debugging actions, the required range of device attitude change time, and the current value set in the current program;

[0087] Based on the current value set in the previous round of debugging, the time of the previous round of debugging actions, the required range of equipment attitude change time, and the current value set in this program, the decision value is output through the Critic network.

[0088] Specifically, after obtaining the current value set in the current program through the Actor network, the current value set in the previous round of debugging, the time of the previous round of debugging actions, the required range of device attitude change time, and the current value set in the current program are output together through the Critic network. The Critic network can output the decision value. The decision value is the action strategy value of the Actor network.

[0089] In this embodiment of the application, the method may further include:

[0090] After the target machinery that has not yet completed debugging is debugged according to the fixed step size debugging strategy, the target reinforcement learning model is optimized.

[0091] Specifically, when the target machine is not fully debugged, a fixed-step debugging strategy is needed to debug it. At this point, the target reinforcement learning model needs to be optimized. After debugging the target machine according to the fixed-step debugging strategy, the target reinforcement learning model is trained again to improve its debugging accuracy.

[0092] The above technical solution acquires the target machinery and determines a variable-step current debugging strategy based on the target reinforcement learning model. The target machinery is then debugged according to the variable-step current debugging strategy. The debugging status of the target machinery is then determined. If the target machinery has not completed debugging, a fixed-step debugging strategy and the un-debugged target machinery are acquired. The un-debugged target machinery is then debugged according to the fixed-step debugging strategy to complete the debugging of the aerial work platform machinery. This improves the debugging efficiency, standardization, and controllability of the aerial work platform machinery.

[0093] Figure 2 A flowchart illustrating a dual-strategy debugging method according to a specific embodiment of this application is shown schematically. Figure 2 As shown in the illustration, a specific embodiment of this application provides a dual-strategy debugging method. In this embodiment, a TD3 reinforcement learning model (i.e., the target reinforcement learning model of this application) is first constructed. The TD3 reinforcement learning model is then interactively trained using an engineering vehicle prototype. After obtaining the target reinforcement learning model through training, variable-step current debugging is performed on a batch of un-debugged engineering vehicles using the target reinforcement learning model, and it is determined whether the debugging is completely completed. If the debugging is completely completed, the debugging of the aerial work platform is completed. If the debugging is not completely completed, a fixed-step debugging strategy is invoked to debug the un-debugged engineering vehicles. Simultaneously, after debugging is completed, the target reinforcement learning model undergoes subsequent optimization processing to improve its accuracy.

[0094] Figure 3 This schematically illustrates a structural block diagram of a device for commissioning aerial work machinery according to an embodiment of this application. Figure 3 As shown in the figure, this application embodiment provides a device for debugging aerial work machinery, which may include:

[0095] Memory 310 is configured to store instructions; and

[0096] The processor 320 is configured to retrieve instructions from the memory 310 and, when executing the instructions, to implement the aforementioned method for debugging aerial work machinery.

[0097] Specifically, in this embodiment of the application, the processor 320 can be configured to:

[0098] Acquire the target machinery;

[0099] Determine the variable step size current tuning strategy based on the target reinforcement learning model;

[0100] The target machinery was debugged according to the variable step current debugging strategy;

[0101] Determine whether the target machine has completed commissioning;

[0102] In the case where the target machine has not completed debugging, obtain the fixed-step debugging strategy and the target machine that has not completed debugging;

[0103] The target machinery that has not yet completed the debugging is debugged according to the fixed step length debugging strategy in order to complete the debugging of the aerial work machinery.

[0104] Furthermore, the processor 320 can also be configured as follows:

[0105] A reinforcement learning model architecture is constructed based on a dual-delay deep deterministic strategy algorithm;

[0106] Determine the initial reinforcement learning model based on the reinforcement learning model architecture;

[0107] The initial reinforcement learning model is trained to obtain the target reinforcement learning model.

[0108] Furthermore, the processor 320 can also be configured as follows:

[0109] Training the initial reinforcement learning model to obtain the target reinforcement learning model includes:

[0110] Obtain the initial current value and the maximum number of interactions;

[0111] Input the initial current value into the target machine to output the first round of current debugging results;

[0112] The results of the first round of current tuning are input into the initial reinforcement learning model to output the predicted current value in the first round.

[0113] The predicted current value from the previous round is used as the input value for the target mechanical debugging, and the current debugging result of the target mechanical debugging is used as the input value for the initial reinforcement learning model in the next round.

[0114] Determine if the maximum number of interactions has been reached;

[0115] When the maximum number of interactions is reached, all data from the interaction process will be used as training data.

[0116] The initial reinforcement learning model is trained based on the training data to obtain the target reinforcement learning model. Furthermore, the processor 320 can also be configured to:

[0117] Reinforcement learning model architectures include:

[0118] Multiple actor networks are configured to determine action strategies based on the current environment;

[0119] Multiple Critic networks are configured to evaluate the value of action strategies.

[0120] Furthermore, the processor 320 can also be configured as follows:

[0121] The objective reinforcement learning model satisfies formula (1):

[0122] a1=f(t area ,a0,t0); (1)

[0123] Where a1 is the current value set in this program, t area The required range for device attitude change time is defined as follows: a0 is the current value set in the previous round of debugging, and t0 is the time of the previous round of debugging action.

[0124] Furthermore, the processor 320 can also be configured as follows:

[0125] The debugging of the engineering vehicle based on the objective reinforcement learning model includes:

[0126] Obtain the current value set in the previous round of debugging, the time of the previous round of debugging actions, and the required range of equipment attitude change time;

[0127] Based on the current value set in the previous round of debugging, the time of the previous round of debugging actions, and the required range of equipment attitude change time, the current value set in this program is output through the Actor network.

[0128] Furthermore, the processor 320 can also be configured as follows:

[0129] Obtain the current value set in the previous round of debugging, the time of the previous round of debugging actions, the required range of device attitude change time, and the current value set in the current program;

[0130] Based on the current value set in the previous round of debugging, the time of the previous round of debugging actions, the required range of equipment attitude change time, and the current value set in this program, the decision value is output through the Critic network.

[0131] Furthermore, the processor 320 can also be configured as follows:

[0132] After the target machinery that has not yet completed debugging is debugged according to the fixed step size debugging strategy, the target reinforcement learning model is optimized.

[0133] The above technical solution acquires the target machinery and determines a variable-step current debugging strategy based on the target reinforcement learning model. The target machinery is then debugged according to the variable-step current debugging strategy. The debugging status of the target machinery is then determined. If the target machinery has not completed debugging, a fixed-step debugging strategy and the un-debugged target machinery are acquired. The un-debugged target machinery is then debugged according to the fixed-step debugging strategy to complete the debugging of the aerial work platform machinery. This improves the debugging efficiency, standardization, and controllability of the aerial work platform machinery.

[0134] Figure 4 The diagram illustrates a TD3 network architecture according to a specific embodiment of this application. Figure 4 As shown, a specific embodiment of this application provides a TD3 network structure. In this embodiment, the TD3 network may include 2 Actor networks and 4 Critic networks. (Figure S) t S represents the environment in which the agent is located at time t, in the scenario of this invention. t Given the required range of equipment attitude change time, the current value set in the previous round of commissioning, and the time of the previous round of commissioning actions, at For the agent's decision at time t, in the scenario of this invention, a t This is the current output value of the intelligent agent during this round of debugging. Q0(S) t ,a t The value of the current output by the agent during this round of debugging is denoted as . The Actor network can determine the action strategy based on the current environment, while the Critic network can evaluate the value of the Actor network's action strategy. The training goal of the Actor network is to obtain a higher evaluation from the Critic network for its action strategy, while the training goal of the Critic network is to make its evaluation closer to the true value of the evaluation function.

[0135] The above technical solution acquires the target machinery and determines a variable-step current debugging strategy based on the target reinforcement learning model. The target machinery is then debugged according to the variable-step current debugging strategy. The debugging status of the target machinery is then determined. If the target machinery has not completed debugging, a fixed-step debugging strategy and the un-debugged target machinery are acquired. The un-debugged target machinery is then debugged according to the fixed-step debugging strategy to complete the debugging of the aerial work platform machinery. This improves the debugging efficiency, standardization, and controllability of the aerial work platform machinery.

[0136] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0137] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart... Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0138] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0139] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0140] In a typical configuration, a computing device includes one or more processors (CPU), input / output interfaces, network interfaces, and memory.

[0141] Memory may include non-persistent memory in computer-readable media, such as random access memory (RAM) and / or non-volatile memory, such as read-only memory (ROM) or flash RAM. Memory is an example of computer-readable media.

[0142] Computer-readable media includes both permanent and non-permanent, removable and non-removable media that can store information using any method or technology. Information can be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transferable medium that can be used to store information accessible by a computing device. As defined herein, computer-readable media does not include transient computer-readable media, such as modulated data signals and carrier waves.

[0143] It should also be noted that the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such process, method, article, or apparatus. Unless otherwise specified, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes that element.

[0144] The above are merely embodiments of this application and are not intended to limit the scope of this application. Various modifications and variations can be made to this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the scope of the claims of this application.

Claims

1. A method for debugging aerial work machinery, characterized in that, The method includes: Identify the target machinery; Determine the variable step size current tuning strategy based on the target reinforcement learning model; The target machine is debugged according to the variable step current debugging strategy; Determine whether the target machine has completed the debugging process; In the case where the target machine has not completed the debugging, obtain the fixed step debugging strategy and the target machine that has not completed the debugging; The target machinery that has not been debugged is debugged according to the fixed step length debugging strategy in order to complete the debugging of the aerial work machinery; Wherein, the target reinforcement learning model satisfies formula (1): (1) in, The current value set for this program. This refers to the required range of equipment attitude change time. The current value set in the previous round of debugging, This refers to the time allotted for the previous round of debugging actions.

2. The method according to claim 1, characterized in that, The method further includes: A reinforcement learning model architecture is constructed based on a dual-delay deep deterministic strategy algorithm; The initial reinforcement learning model is determined based on the reinforcement learning model architecture; The initial reinforcement learning model is trained to obtain the target reinforcement learning model.

3. The method according to claim 2, characterized in that, The process of training the initial reinforcement learning model to obtain the target reinforcement learning model includes: Obtain the initial current value and the maximum number of interactions; The initial current value is input to the target machine to output the first round of current debugging results; The first round of current adjustment results are input into the initial reinforcement learning model to output the first round of predicted current values; The predicted current value from the previous round is used as the input value for the target mechanical debugging, and the current debugging result of the target mechanical debugging is used as the input value for the initial reinforcement learning model in the next round for interaction. Determine whether the number of interactions has reached the maximum number of interactions; If the number of interactions reaches the maximum number of interactions, all data during the interaction process will be determined as training data. The initial reinforcement learning model is trained based on the training data to obtain the target reinforcement learning model.

4. The method according to claim 2, characterized in that, The reinforcement learning model architecture includes: Multiple actor networks are configured to determine action strategies based on the current environment; Multiple Critic networks are configured to evaluate the value of the stated action strategy.

5. The method according to claim 1, characterized in that, The step of debugging the engineering vehicle according to the target reinforcement learning model includes: Obtain the current value set in the previous round of debugging, the time of the previous round of debugging actions, and the required range of equipment attitude change time; Based on the current value set in the previous round of debugging, the time of the previous round of debugging actions, and the required range of device attitude change time, the current value set in this program is output through the Actor network.

6. The method according to claim 1, characterized in that, The method further includes: Obtain the current value set in the previous round of debugging, the time of the previous round of debugging actions, the required range of device attitude change time, and the current value set in the current program; Based on the current value set in the previous round of debugging, the time of the previous round of debugging actions, the required range of the device attitude change time, and the current value set in this program, the decision value is output through the Critic network.

7. The method according to claim 1, characterized in that, The method further includes: After the target machine that has not completed debugging is debugged according to the fixed step size debugging strategy, the target reinforcement learning model is optimized.

8. A device for debugging aerial work machinery, characterized in that, include: The memory is configured to store instructions; as well as The processor is configured to retrieve the instructions from the memory and, when executing the instructions, to implement the method for commissioning aerial work machinery according to any one of claims 1 to 7.

9. A machine-readable storage medium, characterized in that, The machine-readable storage medium stores instructions for causing the machine to perform the method for commissioning aerial work machinery according to any one of claims 1 to 7.