Control device, control method, and control program
The control technique addresses the limitations of MPPI by allowing for a higher degree of freedom in cost function design, enabling risk-sensitive control and management of causal systems through state sequence-dependent costs.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- NEC CORP
- Filing Date
- 2024-12-09
- Publication Date
- 2026-06-19
AI Technical Summary
The Model Predictive Path Integral (MPPI) control method lacks flexibility in cost function design, limiting its ability to perform risk-sensitive control due to constraints on state sequence-dependent costs.
A control technique that calculates an input sequence to minimize a cost function comprising a state sequence-dependent cost that cannot be decomposed into a sum of individual state-dependent costs, allowing for a higher degree of freedom in cost function design, including risk-sensitive indices.
Enables risk-sensitive control and the ability to manage causal systems by incorporating non-linear and multiple state-dependent costs, enhancing the flexibility and effectiveness of the control method.
Smart Images

Figure 2026100491000001_ABST
Abstract
Description
[Technical Field]
[0001] This disclosure relates to a control device, a control method, and a control program. [Background technology]
[0002] Model Predictive Path Integral (MPPI) control is known as one of the control methods for systems such as robots and mobile devices. In the Model Predictive Path Integral method, the input sequence U is input to the system. * =( u * 0,u * 1,…,u * T-1 ) Let U=(u0,u1,…,u) be the input sequence that minimizes the cost function Φ(U). T-1 This calculates the model predictive path integral control method. For example, Non-Patent Document 1 discloses this method. [Prior art documents] [Non-patent literature]
[0003] [Non-Patent Document 1] Williams G., Drews P., Goldfain B., Rehg JM, Theodorou Evangelos A., Information-theoretic model predictive control: Theory and applications to autonomous driving, IEEE Transactions on Robotics, 34 (6) (2018), pp. 1603-1622 [Overview of the project] [Problems that the invention aims to solve]
[0004] However, in the model predictive path integral method described in Non-Patent Document 1, there was a problem that the degree of freedom of available cost functions was low. As an example, a cost function including a risk sensitivity index could not be used, and as a result, there was a problem that risk-sensitive probability control could not be realized.
[0005] The present disclosure has been made in view of the above problems, and an exemplary object thereof is to provide a control technique with a high degree of freedom of available cost functions.
Means for Solving the Problems
[0006] A control device according to an exemplary aspect of the present disclosure is a control device for controlling a system in which a state sequence X(V; x0) = (x0, x1,..., x T-1 ) is determined from an initial state x0 and an input sequence V = (v0, v1,..., v T ), and includes an input sequence calculation means for calculating an approximate solution of a minimization problem of minimizing a cost function Φ(U) that is a function of an input sequence U * =(u * 0, u * 1,..., u * T-1 ) input to the system. The cost function Φ(U) is the expected value E T-1 of the sum of a first cost S(V; x0) that is a function of an input sequence V = (v0, v1,..., v U ) according to a probability distribution Q T-1 corresponding to the initial state x0 and the input sequence U and a second cost T(U) that is a function of the input sequence U, under the probability distribution Q U . The first cost S(V; x0) is (a) a state sequence X(V; x0) corresponding to the input sequence V and (b) a state sequence-dependent cost C(X) corresponding to the state sequence X = (x0, x1,..., x QU ), which is a state-dependent cost c(x0), c(x1),..., c(x T ) corresponding to each of the states x0, x1,..., x T , φ(x T-1 ), φ(x TThis is a function defined by S(V;x0)=C(X(V;x0)) using a state sequence-dependent cost C(X) that cannot be decomposed into a sum of ).
[0007] The control method relating to an exemplary aspect of this disclosure is an initial state x0 and an input sequence V=(v0,v1,…,v T-1 ) and the state sequence X(V;x0)=(x0,x1,…,x T A control method for controlling a system in which an input sequence U is input to the system, wherein at least one processor controls the system to determine the input sequence U * =( u * 0,u * 1,…,u * T-1 ) Let the input sequence U = (u0, u1, ..., u T-1 The process includes calculating an input sequence to find an approximate solution to a minimization problem that minimizes the cost function Φ(U), which is a function of the probability distribution Q of the initial state x0 and the input sequence U. U The input sequence V = (v0, v1, ..., v T-1 The probability distribution Q of the sum of the first cost S(V;x0), which is a function of ), and the second cost T(U), which is a function of the input sequence U. U Expected value E under QU [S(V;x0)+T(U)], where the first cost S(V;x0) is (a) a state sequence X(V;x0) corresponding to the input sequence V, and (b) a state sequence X=(x0,x1,…,x T A state sequence-dependent cost C(X) corresponding to each state x0, x1, ..., x T State-dependent costs c(x0), c(x1), ..., c(x T-1 ),φ(x T This is a function defined by S(V;x0)=C(X(V;x0)) using a state sequence-dependent cost C(X) that cannot be decomposed into a sum of ).
[0008] A control program relating to an exemplary aspect of this disclosure is an initial state x0 and an input sequence V=(v0,v1,…,v T-1 ) and the state sequence X(V;x0)=(x0,x1,…,x TA control program that controls a system in which an input sequence U is input to the system is determined, and provides at least one processor with an input sequence U * =( u * 0,u * 1,…,u * T-1 ) Let the input sequence U = (u0, u1, ..., u T-1 The process of calculating the input sequence is performed to find an approximate solution to the minimization problem that minimizes the cost function Φ(U), which is a function of ), and the cost function Φ(U) is calculated using a probability distribution Q corresponding to the initial state x0 and the input sequence U. U The input sequence V = (v0, v1, ..., v T-1 The probability distribution Q of the sum of the first cost S(V;x0), which is a function of ), and the second cost T(U), which is a function of the input sequence U. U Expected value E under QU [S(V;x0)+T(U)], where the first cost S(V;x0) is (a) a state sequence X(V;x0) corresponding to the input sequence V, and (b) a state sequence X=(x0,x1,…,x T A state sequence-dependent cost C(X) corresponding to each state x0, x1, ..., x T State-dependent costs c(x0), c(x1), ..., c(x T-1 ),φ(x T This is a function defined by S(V;x0)=C(X(V;x0)) using a state sequence-dependent cost C(X) that cannot be decomposed into a sum of ). [Effects of the Invention]
[0009] One exemplary effect of this disclosure is that it can provide a control technique with a high degree of freedom for the available cost function. [Brief explanation of the drawing]
[0010] [Figure 1] This diagram shows the configuration of the control device related to this disclosure. [Figure 2] This is a flowchart showing the flow of the control method related to this disclosure. [Figure 3] This diagram shows the configuration of the control device related to this disclosure. [Figure 4]This is a flowchart showing the flow of the control method related to this disclosure. [Figure 5] This diagram shows the configuration of the control device related to this disclosure. [Figure 6] This is a flowchart showing the flow of the control method related to this disclosure. [Figure 7] This is a block diagram showing the configuration of a computer that functions as a control device according to this disclosure. [Figure 8] This figure shows one embodiment of the control device according to the present disclosure. [Modes for carrying out the invention]
[0011] The following are examples of embodiments of the present invention. However, the present invention is not limited to the exemplary embodiments shown below, and various modifications are possible within the scope of the claims. For example, embodiments obtained by appropriately combining some or all of the technologies (things or methods) employed in each of the exemplary embodiments shown below may also be included in the scope of the present invention. Furthermore, embodiments obtained by appropriately omitting some of the technologies employed in each of the exemplary embodiments shown below may also be included in the scope of the present invention. In addition, the effects mentioned in each of the exemplary embodiments shown below are examples of effects that can be expected in that exemplary embodiment and do not define the scope of the present invention. That is, embodiments that do not produce the effects mentioned in each of the exemplary embodiments shown below may also be included in the scope of the present invention.
[0012] [Model-predicted path integral control method] Prior to illustrating embodiments, the model-predictive path integral control method underlying this disclosure will be briefly described.
[0013] The model predictive path integral control method uses an initial state x0 and an input sequence V=(v0,v1,…,v T-1 ) and the state sequence X(V;x0)=(x0,x1,…,x T This is a control method for controlling a system in which the input v is determined. For example, if the system to be controlled is a moving object, the input v tFor example, one or both of the velocity and angular velocity of the moving object at time t can be used, and state x t For example, one or both of the position and direction of the moving object at time t can be used. Alternatively, if the system being controlled is a robot, the input v t For example, one or both of the robot's velocity and attitude change at time t can be used, and state x t For example, one or both of the robot's position and orientation at time t can be used.
[0014] The system being controlled has uncertainty in its behavior. To achieve control that takes this uncertainty into account, the input u is calculated using the model predictive path integral control method. t Instead of the input itself being input to system 2, a probabilistically noisy input v t ~N(u t Assume that ,Σ) is input to system 2. Here, N(u t ,Σ) has an average of u t It is a multidimensional normal distribution with a variance-covariance matrix of Σ.
[0015] In the model predictive path integral control method, the input sequence U is input to the system. * =( u * 0,u * 1,…,u * T-1 ) Let the input sequence U = (u0, u1, ..., u T-1 We calculate an approximate solution to the minimization problem that minimizes the cost function Φ(U), which is a function of ). The cost function Φ(U) is, for example, given x0 as the initial state of the system, V=(v0,v1,...,v T-1 The probability distribution Q corresponds to the probability density function q(V|U,Σ) defined by the following equation (a1). U,Σ As an input sequence following this rule, the probability distribution Q is the sum of the first cost S(V;x0), which is a function of the input sequence V, and the second cost T(U), which is a function of the input sequence U. U,Σ Expected value E under QU [S(V;x0)+T(U)]. In the following equation (a1), Z is Z=((2π)m |Σ|) 1 / 2 is a constant defined by
[0016]
Number
[0017] Therefore, in the model predictive path integral control method, the input sequence U input to the system * is an approximate solution to the minimization problem shown on the right side of the following equation (a2).
[0018]
Number
[0019] In the model predictive path integral control method described in Non-Patent Document 1, the state x at time t + 1 t+1 is the state x at time t t and the input v at time t t and x t+1 = F(x t , v t ). And as the first cost S(V; x0), (a) the state sequence X(V; x0) = (x0, F(x0, v0), F(F(x0, v0), v1),...) corresponding to the input sequence V, and (b) the state sequence-dependent cost C(X) according to the state sequence X = (x0, x1,..., x T ), the function defined by the following equation (a4) is used. Here, c(x0), c(x1),..., c(x T-1 ), φ(x T ) are the state-dependent costs corresponding to the states x0, x1,..., x respectively. T The state-dependent cost φ(x T ) corresponding to the final state x T is also called the terminal cost.
Number
[0020] [Mathematics]
[0021] Also, in the model predictive path integral control method described in Non-Patent Document 1, as the second cost T(U), a function defined by the following formula (a5) is used. Here, β0, β1, …, β T-1 , c0, c1, …, c T-1 are affine terms, and λ is a temperature parameter.
[0022] [Mathematics]
[0023] A point to note in the model predictive path integral control method described in Non-Patent Document 1 is that the state sequence-dependent cost C(X) is given by the sum of the state-dependent costs c(x0), c(x1), …, c(x T-1 ), φ(x T ).
[0024] Therefore, in the model predictive path integral control method described in Non-Patent Document 1, it is impossible to handle a state sequence-dependent cost C(X) that depends non-linearly on the state-dependent costs c(x0), c(x1), …, c(x T-1 ), φ(x T ), for example, a state sequence-dependent cost C(X) that includes a risk-sensitive index. For this reason, risk-sensitive control cannot be performed.
[0025] Also, in the model predictive path integral control method described in Non-Patent Document 1, it is impossible to handle a state sequence-dependent cost C(X) that includes multiple state-dependent costs, such as the two-state-dependent cost c(x t1 , x t2 ) or the three-state-dependent cost c(x t1 , x t2 , x t3 ), as terms. For this reason, it is impossible to control a causal system, that is, a system in which the cost of the current state changes according to the past state.
[0026] Non-patent document 1 shows a method for constructing an approximate solution to the minimization problem shown on the right-hand side of equation (2a) above. The inventors have found that the state sequence dependent costs C(X) are state dependent costs c(x0), c(x1), ..., c(x T-1 ),φ(x T Even without the constraint that it is given by the sum of ), we found that an approximate solution to the minimization problem shown on the right side of equation (2a) can be constructed in the same way as the method described in Non-Patent Literature 1, and we succeeded in mathematically proving this. The exemplary embodiments described below are based on this finding. In the exemplary embodiments described below, the state x at time t+1 is the same as the control method described in Non-Patent Literature 1. t+1 state x at time t t and input v at time t t tokara x t+1 =F(x t ,v t This disclosure applies to a system determined according to (v0,v1,...,v). However, this disclosure is not limited to the system where the initial state x0 and the input sequence V=(v0,v1,...,v T-1 ) and the state sequence X(V;x0)=(x0,x1,…,x T This can apply to the entire system in which ) is determined.
[0027] [First Exemplary Embodiment] A first exemplary embodiment, which is an example of an embodiment of the present invention, will be described in detail with reference to the drawings. This exemplary embodiment is the basic form for each of the exemplary embodiments described later. The scope of application of each technology adopted in this exemplary embodiment is not limited to this exemplary embodiment. That is, each technology adopted in this exemplary embodiment can also be adopted in other exemplary embodiments included in this disclosure, to the extent that no particular technical problems occur. Furthermore, each technology shown in the drawings referenced to explain this exemplary embodiment can also be adopted in other exemplary embodiments included in this disclosure, to the extent that no particular technical problems occur.
[0028] (Control device configuration) The control device 1 according to this disclosure is in state x at time t+1. t+1 state x at time t t and input v at time t t tokara x t+1 =F(x t ,v t This is a control device that controls system 2, which is determined according to the input u calculated by the control device 1. t Instead of the input itself being input, a probabilistically noisy input v t ~N(u t ,Σ) is input. Here, N(u t ,Σ) has an average of u t It is a multidimensional normal distribution with a variance-covariance matrix of Σ.
[0029] The configuration of the control device 1 will be described with reference to Figure 1. Figure 1 is a block diagram showing the configuration of the control device 1. As shown in Figure 1, the control device 1 includes an input column calculation unit 10. The input column calculation unit 10 calculates the input column U to be input to the system 2. * =( u * 0,u * 1,…,u * T-1 ) Let the input sequence U = (u0, u1, ..., u T-1 We calculate an approximate solution to the minimization problem that minimizes the cost function Φ(U), which is a function of ).
[0030] The cost function Φ(U) is a probability distribution Q depending on the initial state x0 and the input sequence U. U The input sequence V = (v0, v1, ..., v T-1 The probability distribution Q of the sum of the first cost S(V;x0), which is a function of ), and the second cost T(U), which is a function of the input sequence U. U Expected value E under QU [S(V;x0)+T(U)]
[0031] Probability distribution Q U For example, let Σ be the variance-covariance matrix, and Z be ((2π) m |Σ|) 1 / 2As the probability distribution Q corresponding to the probability density function q(V|U,Σ) defined by the above formula (a1) U,Σ is.
[0032] In the control device 1, the input sequence U input to the system 2 * is an approximate solution to the minimization problem shown on the right side of the above formula (a2).
[0033] In the control device 1, the first cost S(V;x0) is (a) the state sequence X(V;x0)=(x0,F(x0,v0),F(F(x0,v0),v1),…) corresponding to the input sequence V, and (b) the state sequence-dependent cost C(X) according to the state sequence X=(x0,x1,…,x T ), which is a function defined by S(V;x0)=C(X(V;x0)) using the state-dependent cost C(X) that cannot be decomposed into the sum of the state-dependent costs c(x0),c(x1),…,c(x T ) corresponding to each of the states x0,x1,…,x T-1 ),φ(x T ). Here, c(x0),c(x1),…,c(x T-1 ),φ(x T ) are the state-dependent costs corresponding to the states x0,x1,…,x T respectively. The state-dependent cost φ(x T ) corresponding to the final state x T is also called the terminal cost.
[0034] In the control device 1, as an example, the second cost T(U) is a function defined by the above formula (a5) with β0,β1,…,β T-1 ,c0,c1,…,c T-1 as affine terms.
[0035] (Effect of the control device) As described above, in the control device 1, the input sequence U input to the system 2 * =(u * 0,u * 1,…,u * T-1 ) is used as the input sequence U=(u0,u1,…,u T-1An input sequence calculation unit 10 that calculates an approximate solution to a minimization problem for minimizing a cost function Φ(U) that is a function of U The input sequence V = (v0, v1, …, v T-1 ) that is a function of, and the sum of a first cost S(V; x0) that is a function of the input sequence V and a second cost T(U) that is a function of the input sequence U, the expected value E U under the probability distribution Q QU [S(V; x0) + T(U)], where the first cost S(V; x0) is (a) a state sequence X(V; x0) = (x0, F(x0, v0), F(F(x0, v0), v1), …) corresponding to the input sequence V, and (b) a state sequence-dependent cost C(X) corresponding to the state sequence X = (x0, x1, …, x T ), where the state sequence-dependent cost C(X) is a non-decomposable sum of state-dependent costs c(x0), c(x1), …, c(x T ) corresponding to each of the states x0, x1, …, x T-1 ), φ(x T ), and is defined by S(V; x0) = C(X(V; x0)). Therefore, according to the control device 1, an effect can be obtained that a control device 1 with a high degree of freedom in the available cost function can be provided.
[0036] Also, in the control device 1, the probability distribution Q U has Σ as a covariance matrix and Z as ((2π) m |Σ|) 1 / 2 and is a probability distribution Q U,Σ corresponding to the probability density function q(V|U, Σ) defined by the above formula (a1), and the second cost T(U) is a function defined by the above formula (a5) with β0, β1, …, β T-1 , c0, c1, …, c T-1 as affine terms. Therefore, according to the control device 1, an effect can be obtained that a control device 1 with a higher degree of freedom in the available cost function than the control device described in Non-Patent Document 1 can be provided while reusing a part of assets such as programs developed to realize the control device described in Non-Patent Document 1.
[0037] (Specific example of state-dependent cost) Several examples of state sequence-dependent costs C(X) available in control device 1 are described below.
[0038] State-dependent costs C(X) are, for example, state-dependent costs c(x0), c(x1), ..., c(x T-1 ),φ(x T It may be a function that contains a nonlinear function of ) as a term.
[0039] Therefore, C(X) = {φ(x T )+Σ t∈[0,T-1] c(x t )}-(β / 2){φ(x T )+Σ t∈[0,T-1] c(x t )} 2 Or, C(X)=exp(-β{φ(x T )+Σ t∈[0,T-1] c(x t This makes it possible to handle state-dependent costs C(X) that include risk-sensitive indicators, such as ). Therefore, risk-sensitive control of System 2 becomes possible.
[0040] Furthermore, the state sequence-dependent cost C(X) is, for example, x0, x1, ..., x T Two states x selected from t1 ,x t2 The two-state dependent cost c(x) depends on the state. t1 ,x t2 ), states x0, x1, ..., x T Three selected states x t1 ,x t2 ,x t3 The 3-state dependent cost c(x) depends on the state. t1 ,x t2 ,x t3 ), ..., or states x0, x1, ..., x T T states x selected from t1 ,x t2 ,…,x tT The T-state dependent cost c(x) depends on the state. t1 ,x t2 ,…,x tT It includes ) as an term.
[0041] This results in a 2-state dependent cost c(x t1 ,x t2 ) and 3 state-dependent costs c(x t1 ,x t2 ,x t3 This makes it possible to handle state sequence-dependent costs C(X) that include at least one of multiple state-dependent costs, such as ). Therefore, it becomes possible to control a causal system 2, that is, a system 2 in which the cost of the current state changes depending on past states.
[0042] Furthermore, the state-dependent cost C(X) is, for example, x0, x1, ..., x T Two states x selected from t1 ,x t2 Depends on T+1 C2 states with a 2-state dependent cost c(x t1 ,x t2 The sum of ) and states x0, x1, ..., x T Three selected states x t1 ,x t2 ,x t3 Depends on T+1 C3 3-state dependent cost c(x t1 ,x t2 ,x t3 The sum of ) and ..., states x0, x1, ..., x T T states x selected from t1 ,x t2 ,…,x tT Depends on T+1 C T The individual T-state-dependent costs c(x t1 ,x t2 ,…,x tN It is the sum of the sum of ) and the total sum of ).
[0043] This results in a 2-state dependent cost c(x t1 ,x t2 ) and 3 state-dependent costs c(x t1 ,x t2 ,x t3This makes it possible to handle a state sequence-dependent cost C(X) that includes all of the multiple state-dependent costs such as ). Therefore, it becomes possible to control a causal system 2, that is, a system 2 in which the cost of the current state changes depending on past states. For example, in a control that moves a moving object from region A to region B, it becomes possible to control the object so that it passes through region C before approaching region B.
[0044] (Control method flow) The flow of control method S1 will be explained with reference to Figure 2. Figure 2 is a flowchart showing the flow of control method S1. Control method S1 is performed when the state x at time t+1 t+1 state x at time t t and input v at time t t tokara x t+1 =F(x t ,v t This is a control method for controlling system 2, which is determined according to the input u calculated by the control device 1. t Instead of the input itself being input, a probabilistically noisy input v t ~N(u t ,Σ) is input. Here, N(u t ,Σ) has an average of u t It is a multidimensional normal distribution with a variance-covariance matrix of Σ.
[0045] The control method S1 includes an input column calculation process S10, as shown in Figure 2. In the input column calculation process S10, at least one processor calculates the input column U to be input to the system 2. * =( u * 0,u * 1,…,u * T-1 ) Let the input sequence U = (u0, u1, ..., u T-1 We calculate the input sequence U that minimizes the cost function Φ(U), which is a function of ).
[0046] The cost function Φ(U) is a probability distribution Q depending on the initial state x0 and the input sequence U. U The input sequence V = (v0, v1, ..., v T-1The probability distribution Q of the sum of the first cost S(V;x0), which is a function of ), and the second cost T(U), which is a function of the input sequence U. U Expected value E under QU [S(V;x0)+T(U)]
[0047] Probability distribution Q U For example, let Σ be the variance-covariance matrix, and Z be ((2π) m |Σ|) 1 / 2 As such, the probability distribution Q corresponds to the probability density function q(V|U,Σ) defined by the above equation (a1). U,Σ That is the case.
[0048] In control method S1, the input column U is input to system 2. * This can be expressed as shown in equation (a2) above.
[0049] In control method S1, the first cost S(V;x0) is (a) the state sequence X(V;x0)=(x0,F(x0,v0),F(F(x0,v0),v1),…) corresponding to the input sequence V, and (b) the state sequence X=(x0,x1,…,x T A state sequence-dependent cost C(X) corresponding to each state x0, x1, ..., x T State-dependent costs c(x0), c(x1), ..., c(x T-1 ),φ(x T This is a function defined by S(V;x0)=C(X(V;x0)) using a state sequence-dependent cost C(X) that cannot be decomposed into a sum of ).
[0050] (Effects of the control method) As described above, in control method S1, at least one processor inputs the input sequence U to system 2. * =( u * 0,u * 1,…,u * T-1 ) Let the input sequence U = (u0, u1, ..., u T-1 The process includes an input column calculation process S10 that calculates an approximate solution to a minimization problem that minimizes the cost function Φ(U), which is a function of the probability distribution Q, where the cost function Φ(U) is a function of the initial state x0 and the input column U. UThe input sequence V = (v0, v1, ..., v T-1 The probability distribution Q of the sum of the first cost S(V;x0), which is a function of ), and the second cost T(U), which is a function of the input sequence U. U Expected value E under QU [S(V;x0)+T(U)], where the first cost S(V;x0) is (a) the state sequence X(V;x0)=(x0,F(x0,v0),F(F(x0,v0),v1),…) corresponding to the input sequence V, and (b) the state sequence X=(x0,x1,…,x T A state sequence-dependent cost C(X) corresponding to each state x0, x1, ..., x T State-dependent costs c(x0), c(x1), ..., c(x T-1 ),φ(x T The adopted configuration is that the function is defined by S(V;x0)=C(X(V;x0)) using a state sequence-dependent cost C(X) that cannot be decomposed into a sum of ). Therefore, the control method S1 has the effect of providing a control method S1 with a high degree of freedom in the available cost function.
[0051] [Second exemplary embodiment] A second exemplary embodiment, which is an example of an embodiment of the present invention, will be described in detail with reference to the drawings. Components having the same function as those described in the above-described exemplary embodiment are denoted by the same reference numerals, and their descriptions are omitted as appropriate. The scope of application of each technology adopted in this exemplary embodiment is not limited to this exemplary embodiment. That is, each technology adopted in this exemplary embodiment can also be adopted in other exemplary embodiments included in this disclosure, to the extent that no particular technical problems arise. Furthermore, each technology shown in the drawings referenced to describe this exemplary embodiment can also be adopted in other exemplary embodiments included in this disclosure, to the extent that no particular technical problems arise.
[0052] (Control device configuration) The configuration of the control device 1 will be explained with reference to Figure 3. Figure 3 is a block diagram showing the configuration of the control device 1. The control device 1 controls the state x at time t+1. t+1 state x at time tt and the input v at time t t from which x t+1 = F(x t , v t ) is a control device that controls the system 2 determined according to. The control device 1 includes an input sequence calculation unit 10. The input sequence calculation unit 10 includes an input sequence generation unit 11, a state sequence calculation unit 12, a weight calculation unit 13, and an input calculation unit 14.
[0053] The input sequence generation unit 11 generates an ensemble of input sequences V = (v0, v1,..., v T-1 ) according to the probability distribution Q U^ corresponding to the input sequence U^=(u^0, u^1,..., u^ T-1 ).
[0054] The state sequence calculation unit 12 calculates the state sequence X(V; x0) = (x0, F(x0, v0), F(F(x0, v0), v1),...) corresponding to each input sequence V generated by the input sequence generation unit 11.
[0055] The weight calculation unit 13 calculates the weight w(V) corresponding to each input sequence V generated by the input sequence generation unit 11 from the state-dependent cost C(X(V; x0)) of the state sequence X(V; x0) corresponding to that input sequence V calculated by the state sequence calculation unit 12.
[0056] As an example, the weight calculation unit 13 uses U ~ =(u ~ 0, u ~ 1,..., u ~ T-1 ) as the nominal input, p(V) = q(V|U ~ , Σ) as the base probability density function, η as a constant defined by the following formula (a6), and λ as the temperature parameter, and calculates the weight w(V) corresponding to each input sequence V generated by the input sequence generation unit 11 according to the following formula (a7). In the following formula (a7), u with ^ on it t corresponds to u^ t in this specification, and u with ~ on it in the above formula (a7) t corresponds to u ~ t in this specification.
[0057]
number
number
[0058] The input calculation unit 14 inputs the input column U to the system 2. * The input u that constitutes the structure * t The input v that constitutes each input column V generated by the input column generation unit 11 t w(V)v is obtained by multiplying this by the weight w(V) corresponding to the input sequence V, which is calculated by the weight calculation unit 13. t The probability distribution Q U^ Expected value E under QU^ [w(V)v t [As described in ].
[0059] (Effects of the control device) As described above, in the control device 1, the input sequence calculation unit 10 calculates the input sequence U^=(u^0,u^1,...,u^ T-1 ) Probability distribution Q U^ The input sequence V = (v0, v1, ..., v T-1 An input sequence generation unit 11 generates an ensemble of ), a state sequence calculation unit 12 calculates a state sequence X(V;x0)=(x0,F(x0,v0),F(F(x0,v0),v1),…) corresponding to each input sequence V generated by the input sequence generation unit 11, a weight calculation unit 13 calculates a weight w(V) corresponding to each input sequence V generated by the input sequence generation unit 11 from the state-dependent cost C(X(V;x0)) of the state sequence X(V;x0) corresponding to that input sequence V calculated by the state sequence calculation unit 12, and input sequence U to be input to system 2 * The input u that constitutes the structure * t The input v that constitutes each input column V generated by the input column generation unit 11 t w(V)v is obtained by multiplying this by the weight w(V) corresponding to the input sequence V, which is calculated by the weight calculation unit 13. t The probability distribution Q U^ Expected value E underQU^ [w(V)v t The configuration includes an input calculation unit 14 that calculates ] and a control device 1 that inputs the input sequence U to the system 2. * The input u that constitutes the structure * t This method allows for efficient calculation of the value in a manner similar to that of the control device described in Non-Patent Document 1.
[0060] (Control method flow) The flow of control method S1 will be explained with reference to Figure 4. Figure 4 is a flowchart showing the flow of control method S1. Control method S1 includes input column calculation process S10. Input column calculation process S10 includes input column generation process S11, state column calculation process S12, weight calculation process S13, and input calculation process S14.
[0061] In the input sequence generation process S11, at least one processor generates the input sequence U^=(u^0,u^1,…,u^ T-1 ) Probability distribution Q U^ The input sequence V = (v0, v1, ..., v T-1 ) generates an ensemble.
[0062] In the state sequence calculation process S12, at least one processor calculates a state sequence X(V;x0)=(x0,F(x0,v0),F(F(x0,v0),v1),…) corresponding to each input sequence V generated in the input sequence generation process S11.
[0063] In the weight calculation process S13, at least one processor calculates a weight w(V) corresponding to each input sequence V generated in the input sequence generation process S11 from the state-dependent cost C(X(V;x0)) of the state sequence X(V;x0) corresponding to that input sequence V, which was calculated in the state calculation process S12.
[0064] In the input calculation process S14, at least one processor inputs the input column U to the system 2. * The input u that constitutes the structure * tThe input v that constitutes each input column V generated in the input column generation process S11 t w(V)v is obtained by multiplying this by the weight w(V) corresponding to the input sequence V, which was calculated in the weight calculation process S13. t The probability distribution Q U^ Expected value E under QU^ [w(V)v t [As described in ].
[0065] [Third Exemplary Embodiment] A second exemplary embodiment, which is an example of an embodiment of the present invention, will be described in detail with reference to the drawings. Components having the same function as those described in the above-described exemplary embodiment are denoted by the same reference numerals, and their descriptions are omitted as appropriate. The scope of application of each technology adopted in this exemplary embodiment is not limited to this exemplary embodiment. That is, each technology adopted in this exemplary embodiment can also be adopted in other exemplary embodiments included in this disclosure, to the extent that no particular technical problems arise. Furthermore, each technology shown in the drawings referenced to describe this exemplary embodiment can also be adopted in other exemplary embodiments included in this disclosure, to the extent that no particular technical problems arise.
[0066] (Control device configuration) The configuration of the control device 1 will be explained with reference to Figure 5. Figure 5 is a block diagram showing the configuration of the control device 1. The control device 1 controls the state x at time t+1. t+1 state x at time t t and input v at time t t tokara x t+1 =F(x t ,v t This is a control device that controls System 2, which is determined according to [the specified criteria]. System 2 is a robot or mobile device.
[0067] The control device 1 comprises an input sequence calculation unit 10, a sensor unit 15, and a transmission unit 16. The sensor unit 15 detects the state of system 2. The sensor unit 15 provides the detected state of system 2 to the input sequence calculation unit 10 as the initial state x0.
[0068] The transmission unit 16 is a control signal for controlling the system 2, and is an input sequence U calculated by the input sequence calculation unit 10. * A control signal representing this is sent to system 2.
[0069] (Effects of the control device) As described above, in the control device 1, system 2 is a robot or mobile body, and the control device is a control signal for controlling system 2, and the input sequence U calculated by the input sequence calculation unit 10 * The configuration further includes a transmitting unit 16 that transmits a control signal representing this to the system 2. Therefore, the control device 1 has the effect of enabling control of the system 2 by the control device 1.
[0070] (Control method flow) The flow of control method S1 will be explained with reference to Figure 6. Figure 6 is a flowchart showing the flow of control method S1. Control method S1 includes state acquisition processing S15, input column calculation processing S10, and control signal transmission processing S16.
[0071] In the state acquisition process S15, the sensor unit 15 detects the state of system 2. The input column calculation unit 10 then acquires the state of system 2 detected by the sensor unit 15 from the sensor unit 15.
[0072] In the control signal transmission process S16, the input sequence U calculated by the input sequence calculation unit 10 * The transmission unit 16 acquires the input column U. * The transmitting unit 16 transmits a control signal representing this to the system 2.
[0073] [Examples of implementation using software] Some or all of the functions of the control device 1 may be implemented by hardware such as an integrated circuit (IC chip), or by software.
[0074] In the latter case, each of the above devices is implemented, for example, by a computer that executes instructions for a program, which is software that realizes each function. An example of such a computer (hereinafter referred to as Computer C) is shown in Figure 7. Figure 7 is a block diagram showing the hardware configuration of Computer C, which functions as each of the above devices.
[0075] Computer C comprises at least one processor C1 and at least one memory C2. Memory C2 stores a program P for operating Computer C as each of the above-mentioned devices. In Computer C, the processor C1 reads and executes the program P from memory C2, thereby realizing each function of the control device 1.
[0076] For processor C1, for example, a CPU (Central Processing Unit), GPU (Graphic Processing Unit), DSP (Digital Signal Processor), MPU (Micro Processing Unit), FPU (Floating Point Number Processing Unit), PPU (Physics Processing Unit), TPU (Tensor Processing Unit), quantum processor, microcontroller, or a combination thereof can be used. For memory C2, for example, flash memory, HDD (Hard Disk Drive), SSD (Solid State Drive), or a combination thereof can be used.
[0077] Computer C may also be equipped with RAM (Random Access Memory) for loading program P at runtime and for temporarily storing various data. Furthermore, computer C may be equipped with communication interfaces for sending and receiving data with other devices. Additionally, computer C may be equipped with input / output interfaces for connecting input / output devices such as keyboards, mice, displays, and printers.
[0078] Furthermore, program P can be recorded on a non-temporary, tangible recording medium M that is readable by computer C. Such a recording medium M could be, for example, tape, disk, card, semiconductor memory, or programmable logic circuitry. Computer C can acquire program P via such a recording medium M. Program P can also be transmitted via a transmission medium. Such a transmission medium could be, for example, a communication network or broadcast waves. Computer C can also acquire program P via such a transmission medium.
[0079] Furthermore, each of the above functions of each of the above devices may be implemented by a single processor in a single computer, by multiple processors in a single computer working together, or by multiple processors in each of multiple computers working together. In addition, the programs for implementing each of the above functions in each of the above devices may be stored in a single memory in a single computer, distributed and stored in multiple memories in a single computer, or distributed and stored in multiple memories in each of multiple computers.
[0080] [Examples] An embodiment of the control device 1 will be described with reference to Figure 7. In this embodiment, a mobile body was used as the system 2 to be controlled.
[0081] The simulation results when the moving object behaves risk-neutral using the conventional state sequence cost C(X) shown in equation (a3) above, and when the moving object behaves risk-aversely using C(X)=exp(-β{φ(x T )+Σ t∈[0,T-1] c(x tFigure 7 shows the simulation results when using a risk-sensitive state sequence cost C(X) defined by β=-1. In Figure 7, for each case, the state (position) of the moving object at t=0s, 5s, 10s, and 15s, and the input sequence U calculated by the control device 1 are shown. * The predicted sequence of states (trajectory) and the diagram are shown.
[0082] When using the conventional state sequence cost C(X), the moving object moved through narrowly spaced obstacles without considering the increased collision risk that comes with doing so. In other words, it was confirmed that the moving object behaved risk-neutrally. On the other hand, when using the risk-sensitive state sequence cost C(X), the moving object avoided moving through narrowly spaced obstacles, taking into account the increased collision risk that comes with doing so. In other words, it was confirmed that the moving object behaved risk-aversely.
[0083] [Additional Note A] This disclosure includes the technologies described in the following appendices. However, the present invention is not limited to the technologies described in the following appendices, and various modifications are possible within the scope of the claims.
[0084] (Note A1) Initial state x0 and input sequence V=(v0,v1,…,v T-1 ) and the state sequence X(V;x0)=(x0,x1,…,x T A control device that controls the system in which ) is determined, Input column U to be input into the aforementioned system * =( u * 0,u * 1,…,u * T-1 ) Let the input sequence U = (u0, u1, ..., u T-1 The system includes an input sequence calculation means for calculating an approximate solution to a minimization problem that minimizes the cost function Φ(U), which is a function of ). The cost function Φ(U) is a probability distribution Q depending on the initial state x0 and the input sequence U. U The input sequence V = (v0, v1, ..., v T-1The probability distribution Q of the sum of the first cost S(V;x0), which is a function of ), and the second cost T(U), which is a function of the input sequence U. U Expected value E under QU [S(V;x0)+T(U)], The first cost S(V;x0) is (a) a state sequence X(V;x0) corresponding to the input sequence V, and (b) a state sequence X=(x0,x1,...,x T A state sequence-dependent cost C(X) corresponding to states x0, x1, ..., x T State-dependent costs corresponding to each of the following: c(x0), c(x1), ..., c(x T-1 ),φ(x T A function defined by S(V;x0)=C(X(V;x0)) using a state sequence-dependent cost C(X) that cannot be decomposed into a sum of ), Control device.
[0085] (Appendix A2) Probability distribution Q U Let Σ be the variance-covariance matrix, and Z be ((2π) m |Σ|) 1 / 2 The probability distribution Q corresponds to the probability density function q(V|U,Σ) defined by the following equation (1). U,Σ And, The second cost T(U) is β0, β1, ..., β T-1 ,c0,c1,…,c T-1 The function is defined by equation (2) below, with as the affine term, The control device described in Appendix A1.
number
number
[0086] (Note A3) The state-dependent cost C(X) is the state-dependent cost c(x0), c(x1), ..., c(x T-1 ),φ(x T ) which includes a nonlinear function as a term, The control device described in Appendix A1 or A2.
[0087] (Note A4) The state-dependent cost C(X) is given by the states x0, x1, ..., x T Two states x selected from t1 ,x t2 The two-state dependent cost c(x) depends on the state. t1 ,x t2 ), states x0, x1, ..., x T Three selected states x t1 ,x t2 ,x t3 The 3-state dependent cost c(x) depends on the state. t1 ,x t2 ,x t3 ), ..., or states x0, x1, ..., x T T states x selected from t1 ,x t2 ,…,x tT The T-state dependent cost c(x) depends on the state. t1 ,x t2 ,…,x tT ) includes as an item, The control device described in Appendix A1 or A2.
[0088] (Note A5) The state-dependent cost C(X) is given by the states x0, x1, ..., x T Two states x selected from t1 ,x t2 Depends on T+1 C2 states with a 2-state dependent cost c(x t1 ,x t2 The sum of ) and states x0, x1, ..., x T Three selected states x t1 ,x t2 ,x t3 Depends on T+1 C3 3-state dependent cost c(x t1 ,x t2 ,x t3 The sum of ) and ..., states x0, x1, ..., x T T states x selected from t1 ,x t2 ,…,x tT Depends on T+1 C T The individual T-state-dependent costs c(x t1 ,x t2,…,x tN It is the sum of the sum of ) The control device described in Appendix A4.
[0089] (Note A6) The input column calculation means is Input sequence U^=(u^0,u^1,…,u^ T-1 ) Probability distribution Q U^ The input sequence V = (v0, v1, ..., v T-1 An input sequence generation means for generating an ensemble of ) and A state sequence calculation means calculates a state sequence X(V;x0) corresponding to each input sequence V generated by the input sequence generation means, A weight calculation means calculates a weight w(V) corresponding to each input sequence V generated by the input sequence generation means from the state-dependent cost C(X(V;x0)) of the state sequence X(V;x0) corresponding to that input sequence V, which is calculated by the state sequence calculation means, Input column U to be input into the aforementioned system * The input u that constitutes the structure * t As such, each input sequence V generated by the input sequence generation means constitutes the input v t w(V)v is obtained by multiplying this by the weight w(V) corresponding to the input sequence V, which is calculated by the weight calculation means. t The probability distribution Q U^ Expected value E under QU^ [w(V)v t An input calculation means for calculating ], It is equipped with A control device as described in any one of the appendices A1 to A5, characterized by the above.
[0090] (Note A7) Probability distribution Q U Let Σ be the variance-covariance matrix, and Z be ((2π) m |Σ|) 1 / 2 The probability distribution Q corresponds to the probability density function q(V|U,Σ) defined by the following equation (3). U,Σ And, The second cost T(U) is β0, β1, ..., β T-1 ,c0,c1,…,c T-1This is a function defined by equation (4) below, with as the affine term, The weight calculation means is U ~ =( u ~ 0,u ~ 1,…,u ~ T-1 Let ) be the nominal input, and P(V) = q(V|U ~ With Σ) as the basis probability density function, η as a constant defined by equation (5) below, and λ as the temperature parameter, the weight w(V) corresponding to each input sequence V generated by the input sequence generation means is calculated according to equation (6) below. The control device described in Appendix A6.
number
number
number
number
[0091] (Note A8) The aforementioned system is a robot or a mobile body. The control device is a control signal for controlling the system, and the input column U calculated by the input column calculation means * The system further includes a transmission means for transmitting a control signal representing the above to the system. The control device described in any one of the appendices A1 to A7.
[0092] [Additional Note B] This disclosure includes the technologies described in the following appendices. However, the present invention is not limited to the technologies described in the following appendices, and various modifications are possible within the scope of the claims.
[0093] (Note B1) Initial state x0 and input sequence V=(v0,v1,…,v T-1 ) and the state sequence X(V;x0)=(x0,x1,…,x T A system control method in which the following is determined: At least one processor inputs an input sequence U to the system. * =( u * 0,u * 1,…,u * T-1 ) Let the input sequence U = (u0, u1, ..., u T-1 This includes an input column calculation process that calculates an approximate solution to a minimization problem that minimizes the cost function Φ(U), which is a function of ). The cost function Φ(U) is a probability distribution Q depending on the initial state x0 and the input sequence U. U The input sequence V = (v0, v1, ..., v T-1 The probability distribution Q of the sum of the first cost S(V;x0), which is a function of ), and the second cost T(U), which is a function of the input sequence U. U Expected value E under QU [S(V;x0)+T(U)], The first cost S(V;x0) is (a) a state sequence X(V;x0) corresponding to the input sequence V, and (b) a state sequence X=(x0,x1,...,x T A state sequence-dependent cost C(X) corresponding to each state x0, x1, ..., x T State-dependent costs c(x0), c(x1), ..., c(x T-1 ),φ(x T A function defined by S(V;x0)=C(X(V;x0)) using a state sequence-dependent cost C(X) that cannot be decomposed into a sum of ), Control method.
[0094] (Note B2) Probability distribution Q U Let Σ be the variance-covariance matrix, and Z be ((2π) m|Σ|) 1 / 2 The probability distribution Q corresponds to the probability density function q(V|U,Σ) defined by the following equation (1). U,Σ And, The second cost T(U) is β0, β1, ..., β T-1 ,c0,c1,…,c T-1 The function is defined by equation (2) below, with as the affine term, The control method described in Appendix B1.
number
number
[0095] (Note B3) The state-dependent cost C(X) is the state-dependent cost c(x0), c(x1), ..., c(x T-1 ),φ(x T ) which includes a nonlinear function as a term, The control method described in Appendix B1 or B2.
[0096] (Note B4) The state-dependent cost C(X) is given by the states x0, x1, ..., x T Two states x selected from t1 ,x t2 The two-state dependent cost c(x) depends on the state. t1 ,x t2 ), states x0, x1, ..., x T Three selected states x t1 ,x t2 ,x t3 The 3-state dependent cost c(x) depends on the state. t1 ,x t2 ,x t3 ), ..., or states x0, x1, ..., x T T states x selected from t1 ,x t2 ,…,x tT The T-state dependent cost c(x) depends on the state. t1 ,x t2 ,…,x tT ) includes as an item, The control method described in Appendix B1 or B2.
[0097] (Note B5) The state-dependent cost C(X) is given by the states x0, x1, ..., x T Two states x selected from t1 ,x t2 Depends on T+1 C2 states with a 2-state dependent cost c(x t1 ,x t2 The sum of ) and states x0, x1, ..., x T Three selected states x t1 ,x t2 ,x t3 Depends on T+1 C3 3-state dependent cost c(x t1 ,x t2 ,x t3 The sum of ) and ..., states x0, x1, ..., x T T states x selected from t1 ,x t2 ,…,x tT Depends on T+1 C T The individual T-state-dependent costs c(x t1 ,x t2 ,…,x tN It is the sum of the sum of ) The control method described in Appendix B4.
[0098] (Note B6) In the input column calculation process, at least one processor, Input sequence U^=(u^0,u^1,…,u^ T-1 ) Probability distribution Q U^ The input sequence V = (v0, v1, ..., v T-1 An input sequence generation process that generates an ensemble of ) and A state column calculation process that calculates a state column X(V;x0) corresponding to each input column V generated in the above input column generation process, A weight calculation process is performed to calculate the weight w(V) corresponding to each input column V generated in the above input column generation process from the state-dependent cost C(X(V;x0)) of the state column X(V;x0) corresponding to that input column V, which was calculated in the above state column calculation process, Input column U to be input into the aforementioned system * The input u that constitutes the structure * t The input v that constitutes each input column V generated in the input column generation process is as follows: t w(V)v is obtained by multiplying this by the weight w(V) corresponding to the input sequence V, which was calculated in the weight calculation process described above. t The probability distribution Q U^ Expected value E under QU^ [w(V)v t Input calculation process to calculate ], Execute A control method described in any one of the appendices B1 to B5, characterized by the above.
[0099] (Note B7) Probability distribution Q U Let Σ be the variance-covariance matrix, and Z be ((2π) m |Σ|) 1 / 2 The probability distribution Q corresponds to the probability density function q(V|U,Σ) defined by the following equation (3). U,Σ And, The second cost T(U) is β0, β1, ..., β T-1 ,c0,c1,…,c T-1 This is a function defined by equation (4) below, with as the affine term, In the weight calculation process, at least one processor is U ~ =( u ~ 0,u ~ 1,…,u ~ T-1 Let ) be the nominal input, and P(V) = q(V|U ~ Let Σ) be the basis probability density function, let η be a constant defined by the following equation (5), and let λ be the temperature parameter. Then, the weight w(V) corresponding to each input sequence V generated in the above input sequence generation process is calculated according to the following equation (6). The control method described in Appendix B6.
number
number
number
number
[0100] (Note B8) The aforementioned system is a robot or a mobile body. The control method is a control signal for controlling the system, wherein the input column U calculated in the input column calculation process * The transmission process further includes transmitting a control signal representing to the system, The control method described in any one of the appendices B1 to B7.
[0101] [Additional Note C] This disclosure includes the technologies described in the following appendices. However, the present invention is not limited to the technologies described in the following appendices, and various modifications are possible within the scope of the claims.
[0102] (Note C1) Initial state x0 and input sequence V=(v0,v1,…,v T-1 ) and the state sequence X(V;x0)=(x0,x1,…,x T ) is a control program that controls the system, At least one processor receives an input sequence U that is input to the system. * =( u * 0,u * 1,…,u * T-1 ) Let the input sequence U = (u0, u1, ..., u T-1 The input sequence calculation process is executed to calculate an approximate solution to the minimization problem that minimizes the cost function Φ(U), which is a function of ). The cost function Φ(U) is a probability distribution Q depending on the initial state x0 and the input sequence U. UThe input sequence V = (v0, v1, ..., v T-1 The probability distribution Q of the sum of the first cost S(V;x0), which is a function of ), and the second cost T(U), which is a function of the input sequence U. U Expected value E under QU [S(V;x0)+T(U)], The first cost S(V;x0) is (a) a state sequence X(V;x0) corresponding to the input sequence V, and (b) a state sequence X=(x0,x1,...,x T A state sequence-dependent cost C(X) corresponding to each state x0, x1, ..., x T State-dependent costs c(x0), c(x1), ..., c(x T-1 ),φ(x T A function defined by S(V;x0)=C(X(V;x0)) using a state sequence-dependent cost C(X) that cannot be decomposed into a sum of ), Control program.
[0103] (Note C2) Probability distribution Q U Let Σ be the variance-covariance matrix, and Z be ((2π) m |Σ|) 1 / 2 The probability distribution Q corresponds to the probability density function q(V|U,Σ) defined by the following equation (1). U,Σ And, The second cost T(U) is β0, β1, ..., β T-1 ,c0,c1,…,c T-1 The function is defined by equation (2) below, with as the affine term, The control program described in Appendix C1.
number
number
[0104] (Note C3) The state-dependent cost C(X) is the state-dependent cost c(x0), c(x1), ..., c(x T-1 ),φ(x T ) which includes a nonlinear function as a term, The control program described in Appendix C1 or C2.
[0105] (Note C4) The state-dependent cost C(X) is given by the states x0, x1, ..., x T Two states x selected from t1 ,x t2 The two-state dependent cost c(x) depends on the state. t1 ,x t2 ), states x0, x1, ..., x T Three selected states x t1 ,x t2 ,x t3 The 3-state dependent cost c(x) depends on the state. t1 ,x t2 ,x t3 ), ..., or states x0, x1, ..., x T T states x selected from t1 ,x t2 ,…,x tT The T-state dependent cost c(x) depends on the state. t1 ,x t2 ,…,x tT ) includes as an item, The control program described in Appendix C1 or C2.
[0106] (Note C5) The state-dependent cost C(X) is given by the states x0, x1, ..., x T Two states x selected from t1 ,x t2 Depends on T+1 C2 states with a 2-state dependent cost c(x t1 ,x t2 The sum of ) and states x0, x1, ..., x T Three selected states x t1 ,x t2 ,x t3 Depends on T+1 C3 3-state dependent cost c(x t1 ,x t2 ,x t3 The sum of ) and ..., states x0, x1, ..., x T T states x selected from t1 ,x t2 ,…,x tT Depends on T+1 CT The individual T-state-dependent costs c(x t1 ,x t2 ,…,x tN It is the sum of the sum of ) The control program described in Appendix C4.
[0107] (Appendix C6) The aforementioned input column calculation process is: Input sequence U^=(u^0,u^1,…,u^ T-1 ) Probability distribution Q U^ The input sequence V = (v0, v1, ..., v T-1 An input sequence generation process that generates an ensemble of ) and A state column calculation process that calculates a state column X(V;x0) corresponding to each input column V generated in the above input column generation process, A weight calculation process is performed to calculate the weight w(V) corresponding to each input column V generated in the above input column generation process from the state-dependent cost C(X(V;x0)) of the state column X(V;x0) corresponding to that input column V, which was calculated in the above state column calculation process, Input column U to be input into the aforementioned system * The input u that constitutes the structure * t The input v that constitutes each input column V generated in the input column generation process is as follows: t w(V)v is obtained by multiplying this by the weight w(V) corresponding to the input sequence V, which was calculated in the weight calculation process described above. t The probability distribution Q U^ Expected value E under QU^ [w(V)v t This includes an input calculation process that calculates ], A control program as described in any one of the appendices C1 to C5, characterized by the above.
[0108] (Note C7) Probability distribution Q U Let Σ be the variance-covariance matrix, and Z be ((2π) m |Σ|) 1 / 2 The probability distribution Q corresponds to the probability density function q(V|U,Σ) defined by the following equation (3). U,Σ And, The second cost T(U) is β0, β1, ..., βT-1 ,c0,c1,…,c T-1 This is a function defined by equation (4) below, with as the affine term, The aforementioned weight calculation process is U ~ =( u ~ 0,u ~ 1,…,u ~ T-1 Let ) be the nominal input, and P(V) = q(V|U ~ Let Σ) be the basis probability density function, let η be a constant defined by the following equation (5), and let λ be the temperature parameter. Then, the weight w(V) corresponding to each input sequence V generated in the above input sequence generation process is calculated according to the following equation (6). The control program described in Appendix C6.
number
number
number
number
[0109] (Note C8) The aforementioned system is a robot or a mobile body. The control program provides at least one processor with a control signal for controlling the system, the input column U calculated in the input column calculation process. * Further, the system is made to execute a transmission process that sends a control signal representing the above to the system. The control program described in any one of the appendices C1 to C7.
[0110] [Additional Note D] This disclosure includes the technologies described in the following appendices. However, the present invention is not limited to the technologies described in the following appendices, and various modifications are possible within the scope of the claims.
[0111] (Note D1) Initial state x0 and input sequence V=(v0,v1,…,v T-1 ) and the state sequence X(V;x0)=(x0,x1,…,x T A control device that controls the system in which ) is determined, It comprises at least one processor, and the at least one processor is Input column U to be input into the aforementioned system * =( u * 0,u * 1,…,u * T-1 ) Let the input sequence U = (u0, u1, ..., u T-1 The input column calculation process is performed to calculate an approximate solution to the minimization problem that minimizes the cost function Φ(U), which is a function of ). The cost function Φ(U) is a probability distribution Q depending on the initial state x0 and the input sequence U. U The input sequence V = (v0, v1, ..., v T-1 The probability distribution Q of the sum of the first cost S(V;x0), which is a function of ), and the second cost T(U), which is a function of the input sequence U. U Expected value E under QU [S(V;x0)+T(U)], The first cost S(V;x0) is (a) a state sequence X(V;x0) corresponding to the input sequence V, and (b) a state sequence X=(x0,x1,...,x T A state sequence-dependent cost C(X) corresponding to each state x0, x1, ..., x T State-dependent costs c(x0), c(x1), ..., c(x T-1 ),φ(x T A function defined by S(V;x0)=C(X(V;x0)) using a state sequence-dependent cost C(X) that cannot be decomposed into a sum of ), Control device.
[0112] The control device may also include memory. Furthermore, the memory may store a program for causing at least one processor to perform each of the aforementioned processes.
[0113] (Note D2) Probability distribution Q U Let Σ be the variance-covariance matrix, and Z be ((2π) m |Σ|) 1 / 2 The probability distribution Q corresponds to the probability density function q(V|U,Σ) defined by the following equation (1). U,Σ And, The second cost T(U) is β0, β1, ..., β T-1 ,c0,c1,…,c T-1 The function is defined by equation (2) below, with as the affine term, The control device described in Appendix D1.
number
number
[0114] (Note D3) The state-dependent cost C(X) is the state-dependent cost c(x0), c(x1), ..., c(x T-1 ),φ(x T ) which includes a nonlinear function as a term, The control device described in Appendix D1 or D2.
[0115] (Note D4) The state-dependent cost C(X) is given by the states x0, x1, ..., x T Two states x selected from t1 ,x t2 The two-state dependent cost c(x) depends on the state. t1 ,x t2 ), states x0, x1, ..., x T Three selected states x t1 ,x t2 ,x t3 The 3-state dependent cost c(x) depends on the state. t1 ,x t2 ,xt3 ), ..., or states x0, x1, ..., x T T states x selected from t1 ,x t2 ,…,x tT The T-state dependent cost c(x) depends on the state. t1 ,x t2 ,…,x tT ) includes as an item, The control device described in Appendix D1 or D2.
[0116] (Note D5) The state-dependent cost C(X) is given by the states x0, x1, ..., x T Two states x selected from t1 ,x t2 Depends on T+1 C2 states with a 2-state dependent cost c(x t1 ,x t2 The sum of ) and states x0, x1, ..., x T Three selected states x t1 ,x t2 ,x t3 Depends on T+1 C3 3-state dependent cost c(x t1 ,x t2 ,x t3 The sum of ) and ..., states x0, x1, ..., x T T states x selected from t1 ,x t2 ,…,x tT Depends on T+1 C T The individual T-state-dependent costs c(x t1 ,x t2 ,…,x tN It is the sum of the sum of ) The control device described in Appendix D4.
[0117] (Note D6) In the input column calculation process, at least one processor, Input sequence U^=(u^0,u^1,…,u^ T-1 ) Probability distribution Q U^ The input sequence V = (v0, v1, ..., v T-1 An input sequence generation process that generates an ensemble of ) and A state column calculation process that calculates a state column X(V;x0) corresponding to each input column V generated in the above input column generation process, A weight calculation process is performed to calculate the weight w(V) corresponding to each input column V generated in the above input column generation process from the state-dependent cost C(X(V;x0)) of the state column X(V;x0) corresponding to that input column V, which was calculated in the above state column calculation process, Input column U to be input into the aforementioned system * The input u that constitutes the structure * t The input v that constitutes each input column V generated in the input column generation process is as follows: t w(V)v is obtained by multiplying this by the weight w(V) corresponding to the input sequence V, which was calculated in the weight calculation process described above. t The probability distribution Q U^ Expected value E under QU^ [w(V)v t Input calculation process to calculate ], Execute A control device as described in any one of the appendices D1 to D5, characterized by the above.
[0118] (Note D7) Probability distribution Q U Let Σ be the variance-covariance matrix, and Z be ((2π) m |Σ|) 1 / 2 The probability distribution Q corresponds to the probability density function q(V|U,Σ) defined by the following equation (3). U,Σ And, The second cost T(U) is β0, β1, ..., β T-1 ,c0,c1,…,c T-1 This is a function defined by equation (4) below, with as the affine term, In the weight calculation process, at least one processor is U ~ =( u ~ 0,u ~ 1,…,u ~ T-1 Let ) be the nominal input, and P(V) = q(V|U ~Let Σ) be the basis probability density function, let η be a constant defined by the following equation (5), and let λ be the temperature parameter. Then, the weight w(V) corresponding to each input sequence V generated in the above input sequence generation process is calculated according to the following equation (6). The control device described in Appendix D6.
number
number
number
number
[0119] (Note D8) The aforementioned system is a robot or a mobile body. The control program provides at least one processor with a control signal for controlling the system, the input column U calculated in the input column calculation process. * Further, the system is made to execute a transmission process that sends a control signal representing the above to the system. The control program described in any one of the appendices D1 to D7. [Additional Note E] This disclosure includes the technologies described in the following appendices. However, the present invention is not limited to the technologies described in the following appendices, and various modifications are possible within the scope of the claims.
[0120] (Note E1) Initial state x0 and input sequence V=(v0,v1,…,v T-1 ) and the state sequence X(V;x0)=(x0,x1,…,x TA program that causes a computer to function as a control device that controls a system in which ) is determined, the computer, At least one processor inputs an input sequence U to the system. * =( u * 0,u * 1,…,u * T-1 ) Let the input sequence U = (u0, u1, ..., u T-1 The input sequence calculation process is executed to calculate an approximate solution to the minimization problem that minimizes the cost function Φ(U), which is a function of ). The cost function Φ(U) is a probability distribution Q depending on the initial state x0 and the input sequence U. U The input sequence V = (v0, v1, ..., v T-1 The probability distribution Q of the sum of the first cost S(V;x0), which is a function of ), and the second cost T(U), which is a function of the input sequence U. U Expected value E under QU [S(V;x0)+T(U)], The first cost S(V;x0) is (a) a state sequence X(V;x0) corresponding to the input sequence V, and (b) a state sequence X=(x0,x1,...,x T A state sequence-dependent cost C(X) corresponding to each state x0, x1, ..., x T State-dependent costs c(x0), c(x1), ..., c(x T-1 ),φ(x T A function defined by S(V;x0)=C(X(V;x0)) using a state sequence-dependent cost C(X) that cannot be decomposed into a sum of ), A non-temporary recording medium on which a control program is stored. [Explanation of Symbols]
[0121] 1. Control device 10 ···Input column calculation unit 11 ···Input column generation section 12 ···State column calculation unit 13. Weight calculation section 14. Input Calculation Unit 2 ···System
Claims
1. Initial state x 0 and input column V = (v 0 ,v 1 , ..., v T-1 ) and state sequence X(V; x 0 ) = (x 0 , x 1 , ..., x T A control device that controls the system in which the following is determined: Input sequence U input to the system * = (u * 0 , u * 1 , …, u * T-1 ), the input sequence calculation means for calculating an approximate solution of the minimization problem of minimizing the cost function Φ(U) which is a function of the input sequence U = (u 0 , u 1 , …, u T-1 ) is provided. The cost function Φ(U) is given by the initial state x 0 and the probability distribution Q corresponding to the input sequence U U The input sequence V = (v) 0 ,v 1 , ..., v T-1 The first cost S(V; x) is a function of ) 0 The probability distribution Q of the sum of () and the second cost T(U), which is a function of the input sequence U. U Expected value E under QU [S(V;x] 0 ) + T(U)], First cost S(V; x) 0 ) is (a) the state column X (V; x) corresponding to the input column V. 0 ) and (b) State sequence X = (x 0 , x 1 , ..., x T A state sequence-dependent cost C(X) corresponding to state x 0 , x 1 , ..., x T State-dependent cost c(x) for each of these states 0 ), c(x 1 ), ..., c(x T-1 ), φ(x T Using a state sequence-dependent cost C(X) that cannot be decomposed into the sum of ), S(V; x 0 ) = C(X(V; x) 0 A function defined by )) Control device.
2. Probability distribution Q U Let Σ be the variance-covariance matrix, and Z be ((2π) m |Σ|) 1/2 The probability distribution Q corresponds to the probability density function q(V|U,Σ) defined by the following equation (1). U,Σ And, The second cost T(U) is β 0 , β 1 , ..., β T-1 , c 0 , c 1 , ..., c T-1 The function is defined by the following equation (2), with as the affine term: The control device according to claim 1. [Math 1] [Math 2]
3. The state-dependent cost C(X) is the state-dependent cost c(x) 0 ), c(x 1 ), ..., c(x T-1 ), φ(x T ) containing a nonlinear function as a term, The control device according to claim 1 or 2.
4. The state-dependent cost C(X) is the state x 0 , x 1 , ..., x T Two states x selected from t1 , x t2 The two-state dependent cost c(x) depends on the state. t1 , x t2 ), state x 0 , x 1 , ..., x T Three selected states x t1 , x t2 , x t3 The three-state dependent cost c(x) depends on the state. t1 , x t2 , x t3 ), ..., or state x 0 , x 1 , ..., x T T states x selected from t1 , x t2 , ..., x tT The T state-dependent cost c(x) depends on the state. t1 , x t2 , ..., x tT ) includes as an item, The control device according to claim 1 or 2.
5. The state-dependent cost C(X) depends on two states x 0 , x 1 , …, x T selected from x t1 , x t2 and is the sum of C T+1 individual two-state-dependent costs c(x 2 , x t1 , x t2 ), the sum of C 0 individual three-state-dependent costs c(x 1 , x T , …, x t1 , x t2 , x t3 selected from x T+1 and is the sum of C 3 individual three-state-dependent costs c(x t1 , x t2 , x t3 ), …, the sum of C 0 individual T-state-dependent costs c(x 1 , …, x T selected from x t1 , x t2 , …, x tT and is the total sum of C T+1 individual T-state-dependent costs c(x T , x t1 , x t2 , …, x tN ). The control device according to claim 4.
6. The input column calculation means is Input sequence U^ = (u^ 0 ,u^ 1 ,...,u^ T-1 ) Probability distribution Q U^ The input sequence V = (v) 0 ,v 1 , ..., v T-1 An input sequence generation means for generating an ensemble of ) and State sequence X(V; x) corresponding to each input sequence V generated by the input sequence generation means 0 A means for calculating a state sequence, The weight w(V) corresponding to each input sequence V generated by the input sequence generation means is used in the state sequence calculation means, and the state sequence X(V; x) corresponding to that input sequence V is calculated by the state sequence calculation means. 0 State-dependent cost C(X(V; x)) 0 A weight calculation method calculated from )) and Input column U to be input into the aforementioned system * The input u that constitutes the configuration * t As such, the input v that constitutes each input column V generated by the input column generation means t w(V)v is obtained by multiplying the input sequence V by the weight w(V) corresponding to the input sequence V, which is calculated by the weight calculation means. t The probability distribution Q U^ Expected value E under QU^ [w(V)v t An input calculation means for calculating ], It is equipped with The control device according to claim 1 or 2.
7. Probability distribution Q U Let Σ be the variance-covariance matrix, and Z be ((2π) m |Σ|) 1/2 The probability distribution Q corresponds to the probability density function q(V|U,Σ) defined by the following equation (3). U,Σ And, The second cost T(U) is β 0 , β 1 , ..., β T-1 , c 0 , c 1 , ..., c T-1 This is a function defined by equation (4) below, with as the affine term, The weight calculation means is U ~ = (u ~ 0 , u ~ 1 , ..., u ~ T-1 Let ) be the nominal input, and P(V) = q(V | U ~ With Σ) as the basis probability density function, η as a constant defined by the following equation (5), and λ as the temperature parameter, the weight w(V) corresponding to each input sequence V generated by the input sequence generation means is calculated according to the following equation (6). The control device according to claim 6. [Math 3] [Math 4] [Math 5] [Math 6] Note that in the above equation (6), u has a circumflex accent. t u^ t This corresponds to the above equation (6), where u has a ~ above it. t , u ~ t It corresponds to this.
8. The aforementioned system is a robot or a mobile body. The control device is a control signal for controlling the system, and the input column U calculated by the input column calculation means * The system further includes a transmission means for transmitting a control signal representing the above to the system. The control device according to claim 1 or 2.
9. Initial state x 0 and input column V = (v 0 ,v 1 , ..., v T-1 ) and state sequence X(V; x 0 ) = (x 0 , x 1 , ..., x T A control method for controlling a system in which ) is determined, At least one processor inputs an input sequence U to the system. * = (u * 0 , u * 1 , ..., u * T-1 ) Let the input sequence U = (u 0 , u 1 , ..., u T-1 This includes an input column calculation process that calculates an approximate solution to a minimization problem that minimizes the cost function Φ(U), which is a function of ). The cost function Φ(U) is given by the initial state x 0 and the probability distribution Q corresponding to the input sequence U U The input sequence V = (v) 0 ,v 1 , ..., v T-1 The first cost S(V; x) is a function of ) 0 The probability distribution Q of the sum of () and the second cost T(U), which is a function of the input sequence U. U Expected value E under QU [S(V;x] 0 ) + T(U)], First cost S(V; x) 0 ) is (a) the state column X (V; x) corresponding to the input column V. 0 ) and (b) State sequence X = (x 0 , x 1 , ..., x T A state sequence-dependent cost C(X) corresponding to each state x 0 , x 1 , ..., x T State-dependent cost c(x) 0 ), c(x 1 ), ..., c(x T-1 ), φ(x T Using a state sequence-dependent cost C(X) that cannot be decomposed into the sum of ), S(V; x 0 ) = C(X(V; x) 0 A function defined by )) Control method.
10. Initial state x 0 and input column V = (v 0 ,v 1 , ..., v T-1 ) and state sequence X(V; x 0 ) = (x 0 , x 1 , ..., x T A control program that controls the system in which ) is determined, At least one processor receives an input sequence U that is input to the system. * = (u * 0 , u * 1 , ..., u * T-1 ) Let the input sequence U = (u 0 , u 1 , ..., u T-1 The input column calculation process is executed to calculate an approximate solution to the minimization problem that minimizes the cost function Φ(U), which is a function of ). The cost function Φ(U) is given by the initial state x 0 and the probability distribution Q corresponding to the input sequence U U The input sequence V = (v) 0 ,v 1 , ..., v T-1 The first cost S(V; x) is a function of ) 0 The probability distribution Q of the sum of () and the second cost T(U), which is a function of the input sequence U. U Expected value E under QU [S(V;x] 0 ) + T(U)], First cost S(V; x) 0 ) is (a) the state column X (V; x) corresponding to the input column V. 0 ) and (b) State sequence X = (x 0 , x 1 , ..., x T A state sequence-dependent cost C(X) corresponding to each state x 0 , x 1 , ..., x T State-dependent cost c(x) 0 ), c(x 1 ), ..., c(x T-1 ), φ(x T Using a state sequence-dependent cost C(X) that cannot be decomposed into the sum of ), S(V; x 0 ) = C(X(V; x) 0 A function defined by )) Control program.