Musculoskeletal robot control method and device based on multi-brain region fusion

By employing a multi-brain region fusion control method, muscle control signals are generated using the basal ganglia and cerebellum network, solving the problem of insufficient motion adaptability in musculoskeletal robots and achieving high-precision and highly robust motion control.

CN117681192BActive Publication Date: 2026-06-12INST OF AUTOMATION CHINESE ACAD OF SCI

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
INST OF AUTOMATION CHINESE ACAD OF SCI
Filing Date
2023-12-01
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Musculoskeletal robots suffer from poor motion adaptability in motion tasks, making it difficult to effectively explore the action space to obtain optimal control signals.

Method used

A control method based on multi-brain region fusion is adopted. By acquiring the observed state information of the musculoskeletal robot, the basal ganglia network and cerebellar network are used to generate muscle control signals. The network model is updated and optimized by combining the dopaminergic experience playback method, thereby improving learning efficiency and motor adaptation.

🎯Benefits of technology

This improves the learning efficiency and motion adaptation capabilities of musculoskeletal robots, enabling them to complete tasks with high precision and robustness.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117681192B_ABST
    Figure CN117681192B_ABST
Patent Text Reader

Abstract

The application provides a musculoskeletal robot control method and device based on multi-brain region fusion, and applies to the technical field of robot control. The method comprises the following steps: acquiring observation state information of a musculoskeletal robot, wherein the observation state information comprises joint angle, joint angular velocity, muscle signal energy, and current position and target position of an end effector; generating a muscle control signal by using a network fusion model based on the observation state information; and driving the musculoskeletal robot to perform an action based on the muscle control signal; wherein the network fusion model comprises a basal ganglia network and a cerebellum network, the basal ganglia network and the cerebellum network are communicatively connected based on a brain region communication neural mechanism, and the brain region communication neural mechanism is generated by simulating communication between two brain regions under the cerebral cortex.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of robot control technology, and in particular to a control method and device for musculoskeletal robots based on multi-brain region fusion. Background Technology

[0002] Musculoskeletal robots are tendon-driven systems built by mimicking the arrangement of bones and muscles in living organisms. Compared to rigid-body robots, musculoskeletal robots can perform high-precision tasks more flexibly and smoothly.

[0003] In related technologies, network models of various brain regions can be established through biosimulation, and the output signals of all network models can be weighted and summed to obtain control signals. Finally, these control signals can be used to control the movement of the robot.

[0004] However, for musculoskeletal robots, there are multiple muscle excitation patterns to complete a movement task. Therefore, there is an urgent need for effective methods to explore the action space to obtain the optimal control signal. Summary of the Invention

[0005] This invention provides a musculoskeletal robot control method and device based on multi-brain region fusion, which solves the problem of poor motion adaptability of musculoskeletal robots in the prior art.

[0006] This invention provides a musculoskeletal robot control method based on multi-brain region fusion, comprising: acquiring observational state information of the musculoskeletal robot, the observational state information including: joint angles, joint angular velocities, muscle signal energy, and the current and target positions of the end effector; generating muscle control signals from the observational state information through a network fusion model; and driving the musculoskeletal robot to perform actions based on the muscle control signals; wherein, the network fusion model includes a basal ganglia network and a cerebellar network, the basal ganglia network and the cerebellar network are connected by a brain region communication neural mechanism, the brain region communication neural mechanism being generated by simulating communication between two brain regions under the cerebral cortex.

[0007] According to the present invention, a musculoskeletal robot control method based on multi-brain region fusion is provided, wherein the basal ganglia network includes: a basal ganglia action network and a basal ganglia evaluation network; the basal ganglia action network includes a dorsal striatum network layer, an outer pallidus network layer, a hypothalamic nucleus network layer, a substantia nigra reticularis network layer, and an inner pallidus network layer; the transmission pathway of the basal ganglia action network includes a direct transmission pathway and an indirect transmission pathway; the direct transmission pathway is input through the dorsal striatum network layer and through the substantia nigra reticularis network layer. The indirect transmission pathway involves input through the dorsal striatum network layer, sequentially passing through the outer pallidus network layer and the hypothalamic nucleus network layer, and then outputting through the substantia nigra reticularis network layer and the inner pallidus network layer. The basal ganglia evaluation network includes: the substantia nigra pars compacta network layer, the ventral tegmental area network layer, and the ventral striatum network layer. The transmission pathway of the basal ganglia evaluation network involves input through the substantia nigra pars compacta network layer and the ventral tegmental area network layer, and outputting through the ventral striatum network layer.

[0008] According to the present invention, a musculoskeletal robot control method based on multi-brain region fusion is provided. The cerebellar network includes a granule cell network layer, a deep cerebellar nucleus network layer, a Purkinje cell network layer, and a lower olivary nucleus network layer. The transmission pathway of the cerebellar network includes a first transmission pathway and a second transmission pathway. The first transmission pathway is a direct output through the deep cerebellar nucleus network layer. The second transmission pathway is an input through the granule cell network layer, which passes through the Purkinje cell network layer and then outputs through the deep cerebellar nucleus network layer. The lower olivary nucleus network layer is used to encode error signals and transmit the error signals to the Purkinje cell network layer and the deep cerebellar nucleus network layer to correct their network weights.

[0009] According to the present invention, a musculoskeletal robot control method based on multi-brain region fusion is provided, wherein the brain region communication neural mechanism includes: transmitting the feature extraction results of the basal ganglia action network to the granule cell network layer through the hypothalamic nucleus network layer; and transmitting the output of the deep cerebellar nucleus network layer to the dorsal striatum network layer.

[0010] According to the present invention, a musculoskeletal robot control method based on multi-brain region fusion is provided. Before generating muscle control signals from the observed state information through a network fusion model, the method further includes: acquiring training samples based on a dopaminergic experience playback method; and performing initial and secondary updates of the network fusion model based on the training samples. The initial update is an update of the basal ganglia action network and the basal ganglia evaluation network, respectively, and the secondary update is an update achieved by controlling the interaction between the basal ganglia network and the cerebellar network.

[0011] According to the present invention, a musculoskeletal robot control method based on multi-brain region fusion is provided, wherein the training samples are obtained based on the dopamine-based experience replay method, comprising: determining the most recent sample set and the optimal sample set from the experience replay buffer; and uniformly sampling the training samples from the most recent sample set and the optimal sample set based on the dopamine proportionality coefficient.

[0012] According to the present invention, a musculoskeletal robot control method based on multi-brain region fusion is provided, the method further comprising: updating the dopamine proportional coefficient if the policy entropy of the output signal of the basal ganglia network is within a preset interval in a time step; wherein the preset interval is an interval determined based on the target entropy of the output signal of the basal ganglia network, and the target entropy... The functional relationship between the dopamine proportion coefficient ρ and the dopamine proportion coefficient ρ is as follows:

[0013] This invention also provides a musculoskeletal robot control device based on multi-brain region fusion, comprising: an acquisition module and a processing module; the acquisition module is used to acquire observational state information of the musculoskeletal robot, the observational state information including: joint angles, joint angular velocities, muscle signal energy, and the current and target positions of the end effector; the processing module is used to generate muscle control signals from the observational state information through a network fusion model; and drive the musculoskeletal robot to perform actions based on the muscle control signals; wherein, the network fusion model includes a basal ganglia network and a cerebellar network, the basal ganglia network and the cerebellar network are connected by a brain region communication neural mechanism, the brain region communication neural mechanism being generated by simulating communication between two brain regions under the cerebral cortex.

[0014] According to the present invention, a musculoskeletal robot control device based on multi-brain region fusion is provided, wherein the basal ganglia network includes: a basal ganglia action network and a basal ganglia evaluation network; the basal ganglia action network includes a dorsal striatum network layer, an outer pallidus network layer, a hypothalamic nucleus network layer, a substantia nigra reticularis network layer, and an inner pallidus network layer; the transmission pathway of the basal ganglia action network includes a direct transmission pathway and an indirect transmission pathway; the direct transmission pathway is input through the dorsal striatum network layer and through the substantia nigra reticularis network layer. The indirect transmission pathway involves input through the dorsal striatum network layer, sequentially passing through the outer pallidus network layer and the hypothalamic nucleus network layer, and then outputting through the substantia nigra reticularis network layer and the inner pallidus network layer. The basal ganglia evaluation network includes: the substantia nigra pars compacta network layer, the ventral tegmental area network layer, and the ventral striatum network layer. The transmission pathway of the basal ganglia evaluation network involves input through the substantia nigra pars compacta network layer and the ventral tegmental area network layer, and outputting through the ventral striatum network layer.

[0015] According to the present invention, a musculoskeletal robot control device based on multi-brain region fusion is provided. The cerebellar network includes a granule cell network layer, a deep cerebellar nucleus network layer, a Purkinje cell network layer, and a lower olivary nucleus network layer. The transmission pathway of the cerebellar network includes a first transmission pathway and a second transmission pathway. The first transmission pathway is a direct output through the deep cerebellar nucleus network layer. The second transmission pathway is an input through the granule cell network layer, which passes through the Purkinje cell network layer and then outputs through the deep cerebellar nucleus network layer. The lower olivary nucleus network layer is used to encode error signals and transmit the error signals to the Purkinje cell network layer and the deep cerebellar nucleus network layer to correct their network weights.

[0016] According to the present invention, a musculoskeletal robot control device based on multi-brain region fusion is provided, wherein the brain region communication neural mechanism includes: transmitting the feature extraction results of the basal ganglia action network to the granule cell network layer through the hypothalamic nucleus network layer; and transmitting the output of the deep cerebellar nucleus network layer to the dorsal striatum network layer.

[0017] According to the present invention, a musculoskeletal robot control device based on multi-brain region fusion includes an acquisition module for acquiring training samples based on a dopaminergic experience playback method; and a processing module for performing initial and secondary updates of the network fusion model based on the training samples. The initial update involves updating the basal ganglia action network and the basal ganglia evaluation network, respectively, while the secondary update is achieved by controlling the interaction between the basal ganglia network and the cerebellar network.

[0018] According to the present invention, a musculoskeletal robot control device based on multi-brain region fusion includes an acquisition module that can be used to determine the most recent sample set and the optimal sample set from an experience replay buffer; and to uniformly sample the training samples from the most recent sample set and the optimal sample set based on a dopamine ratio coefficient.

[0019] According to the present invention, a musculoskeletal robot control device based on multi-brain region fusion includes a processing module that can update the dopamine proportional coefficient if the policy entropy of the basal ganglia network output signal is within a preset interval in a time step; wherein the preset interval is an interval determined based on the target entropy of the basal ganglia network output signal, and the target entropy... The functional relationship between the dopamine proportion coefficient ρ and the dopamine proportion coefficient ρ is as follows:

[0020]

[0021] The present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of the musculoskeletal robot control method based on multi-brain region fusion as described above.

[0022] The present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the musculoskeletal robot control method based on multi-brain region fusion as described above.

[0023] This invention provides a musculoskeletal robot control method and apparatus based on multi-brain region fusion. It can acquire observational state information of the musculoskeletal robot, including joint angles, joint angular velocities, muscle signal energy, and the current and target positions of the end effector. The method generates muscle control signals from this observational state information using a network fusion model. Based on these muscle control signals, the musculoskeletal robot is driven to perform actions. The network fusion model includes a basal ganglia network and a cerebellar network, which are connected via a brain region communication mechanism generated by simulating communication between two subcortical brain regions. This approach allows for the generation of muscle control signals for the musculoskeletal robot through a network fusion model. Since the network fusion model can simulate communication between the basal ganglia and the cerebellum subcortical, it improves the learning efficiency and motor adaptability of the musculoskeletal robot, enabling it to complete tasks with high precision and robustness. Attached Figure Description

[0024] To more clearly illustrate the technical solutions in this invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.

[0025] Figure 1 This is one of the flowcharts of the musculoskeletal robot control method based on multi-brain region fusion provided by the present invention;

[0026] Figure 2 This is a schematic diagram of the basal ganglia network provided by the present invention;

[0027] Figure 3 This is a schematic diagram of the communication method between the basal ganglia network and the cerebellar network provided by the present invention;

[0028] Figure 4 This is the second flowchart of the musculoskeletal robot control method based on multi-brain region fusion provided by the present invention;

[0029] Figure 5 This is a schematic diagram of the musculoskeletal robot control device based on multi-brain region fusion provided by the present invention.

[0030] Figure 6 This is a schematic diagram of the structure of the electronic device provided by the present invention. Detailed Implementation

[0031] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention.

[0032] It should be noted that in the embodiments of the present invention, the terms "exemplary" or "for example" are used to indicate examples, illustrations, or descriptions. Any embodiment or design scheme described as "exemplary" or "for example" in the embodiments of the present invention should not be construed as being more preferred or advantageous than other embodiments or design schemes. Specifically, the use of terms such as "exemplary" or "for example" is intended to present the relevant concepts in a specific manner.

[0033] It should be noted that, in this document, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes that element. Furthermore, it should be noted that the scope of the methods and apparatuses in the embodiments of the present invention is not limited to performing functions in the order shown or discussed, but may also include performing functions substantially simultaneously or in the reverse order, depending on the functions involved. For example, the described methods may be performed in a different order than described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

[0034] To facilitate a clear description of the technical solutions of the embodiments of the present invention, the terms "first" and "second" are used in the embodiments of the present invention to distinguish the same or similar items with essentially the same function and effect. Those skilled in the art can understand that the terms "first" and "second" are not intended to limit the quantity or execution order.

[0035] The embodiments of the present invention have been described for illustrative purposes. It should be understood that the present invention may be implemented in other ways not specifically shown in the accompanying drawings.

[0036] The above implementation method will be described in detail below with reference to specific embodiments and accompanying drawings.

[0037] like Figure 1 As shown, this embodiment of the invention provides a musculoskeletal robot control method based on multi-brain region fusion, which can be applied to a musculoskeletal robot control device based on multi-brain region fusion. This musculoskeletal robot control method based on multi-brain region fusion may include steps S101-S103:

[0038] S101. A musculoskeletal robot control device based on multi-brain region fusion acquires the observation state information of the musculoskeletal robot.

[0039] The aforementioned observation status information may include: joint angle, joint angular velocity, muscle signal energy, and the current and target positions of the end effector.

[0040] Specifically, a musculoskeletal robot control device based on multi-brain region fusion can first determine its joint angle q. t Joint angular velocity Muscle signal energy Current position x of the end effector t and target position x * These pieces of information together constitute the observation state information s t .

[0041] S102. The musculoskeletal robot control device based on multi-brain region fusion generates muscle control signals from the observed state information through a network fusion model.

[0042] The network fusion model includes a basal ganglia network and a cerebellar network. The basal ganglia network and the cerebellar network are connected based on a brain region communication neural mechanism, which is generated by simulating the communication between two brain regions under the cerebral cortex.

[0043] Optionally, the network fusion model can serve as a controller for the musculoskeletal robot. At time step t, the musculoskeletal robot control device based on multi-brain region fusion can utilize the network fusion model with the observed state information s at the current moment. t Based on this, muscle control signal a is generated. t And obtain the reward signal r(s) t ,a t The observation state information s at the next time step (t+1) and the next time step (t+1). t+1 .

[0044] Optionally, the above reward signal r(s) t ,a t This can be represented as:

[0045]

[0046] Among them, v t η is the speed of the end effector. i (i∈{1,2,3,4}) are constants representing the weights of each term in the balanced reward function. ζ represents the additional reward when the end effector reaches the target point, which promotes the movement to the task completion stage.

[0047]

[0048] Where δ is the distance threshold.

[0049] Optionally, the basal ganglia network includes: a basal ganglia action network and a basal ganglia evaluation network; the basal ganglia action network includes a dorsal striatum network layer, an outer pallidus network layer, a hypothalamic nucleus network layer, a substantia nigra reticularis network layer, and an inner pallidus network layer; the transmission pathway of the basal ganglia action network includes a direct transmission pathway and an indirect transmission pathway; the direct transmission pathway involves input through the dorsal striatum network layer and output through the substantia nigra reticularis network layer and the inner pallidus network layer; The indirect transmission pathway involves input through the dorsal striatum network layer, sequentially passing through the outer pallidus network layer and the hypothalamic nucleus network layer, and outputting through the substantia nigra reticularis network layer and the inner pallidus network layer; the basal ganglia evaluation network includes: the substantia nigra pars compacta network layer, the ventral tegmental area network layer, and the ventral striatum network layer; the transmission pathway of the basal ganglia evaluation network involves input through the substantia nigra pars compacta network layer and the ventral tegmental area network layer, and outputting through the ventral striatum network layer.

[0050] Specifically, the basal ganglia network is constructed by mimicking the anatomical structure of the basal ganglia, such as... Figure 2 As shown, in the basal ganglia action network, the dorsal striatum (DS) network layer is the input layer, while the substantia nigra pars reticulata (SNr) and internal segment of globus pallidus (GPi) networks work together as the output layer. The DS network layer can transmit signals to other parts of the action network through direct and indirect transmission pathways. In the direct transmission pathway, the DS network layer can directly transmit signals to the SNr-GPi network layer. In the indirect transmission pathway, the DS network layer can sequentially transmit signals to the external segment of globus pallidus (GPe) network layer, the subthalamic nucleus (STN) network layer, and finally to the SNr-GPi network layer. The basal ganglia action network is denoted as π. φ (a|s), where φ is the network parameter.

[0051] Specifically, the indirect transmission pathway of the basal ganglia action network can include two stages: environmental feature extraction (DS→GPe→STN) and action selection (STN→SNr / GPi). The feature extraction stage is denoted as v ψ (s), where ψ is the network parameter.

[0052] Continue to refer to Figure 2In the basal ganglia evaluation network, the substantia nigra pars compacta (SNc) and ventral tegmental area (VTA) network layers are responsible for encoding dopamine hyperparameter information and calculating temporal difference errors; the ventral striatum (VS) network layer receives input from the SNc-VTA network layer and performs state-action value estimation. The basal ganglia evaluation network can be denoted as Q... θ (s,a), where θ is the network parameter.

[0053] In particular, to address the bias problem in state-action value estimation, two different evaluation networks can be used. To estimate the state-action value, where i∈{1,2}, θ i These are the network parameters for each network. The two evaluation networks are trained independently, and their minimum value is used as the final state-action value estimate.

[0054] Optionally, the cerebellar network includes a granule cell network layer, a deep cerebellar nucleus network layer, a Purkinje cell network layer, and a lower olivary nucleus network layer; the transmission pathway of the cerebellar network includes a first transmission pathway and a second transmission pathway; the first transmission pathway is a direct output through the deep cerebellar nucleus network layer; the second transmission pathway is an input through the granule cell network layer, passing through the Purkinje cell network layer, and then outputting through the deep cerebellar nucleus network layer; the lower olivary nucleus network layer is used to encode error signals and transmit the error signals to the Purkinje cell network layer and the deep cerebellar nucleus network layer to correct their network weights.

[0055] Specifically, the cerebellar network is constructed by mimicking the connection patterns of important nuclei in the cerebellum. For example... Figure 3 As shown, the input signals of the cerebellar network can be transmitted to the output layer of the cerebellar network through a first transmission pathway and a second transmission pathway. The first transmission pathway is a direct output through the deep cerebellar nuclei (DCN) network layer; the second transmission pathway passes sequentially through the granule cell (GC) network layer, the Purkinje cell (PC) network layer, and the DCN network layer. Furthermore, the inferior olivary (IO) network layer is responsible for encoding error signals and projecting these error signals to the PC and DCN network layers to modify their network weights. The cerebellar network is denoted as μ. ω (x), where ω is the network parameter.

[0056] Optionally, the brain region communication neural mechanism includes: transmitting the feature extraction results of the basal ganglia action network to the granule cell network layer through the hypothalamic nucleus network layer; and transmitting the output of the deep cerebellar nucleus network layer to the dorsal striatum network layer.

[0057] Specifically, please refer to Figure 3 The connection between the basal ganglia network and the cerebellar network is constructed by mimicking the neural mechanisms of communication between two subcortical brain regions. (Basal ganglia action network π) φ The feature extraction results in (a|s) can be passed from the STN network layer to the cerebellar network μ. ω (x) is a GC network layer, and then a cerebellar network μ ω The cerebellar network μ was evaluated in (x). ω The output of the DCN network layer of (x) can be transmitted to the basal ganglia action network π. φ The DS network layer of (a|s) is used to improve the feature extraction stage ν. ψ The network parameter ψ of (s) is modified.

[0058] S103. A musculoskeletal robot control device based on multi-brain region fusion drives the musculoskeletal robot to perform actions based on the muscle control signals.

[0059] In this embodiment of the invention, muscle control signals for musculoskeletal robots can be generated through a network fusion model. Since the network fusion model can achieve communication by simulating the subcortical interconnection between the basal ganglia and the cerebellum, it can improve the learning efficiency and motor adaptability of musculoskeletal robots, thereby enabling them to complete tasks with high precision and robustness.

[0060] Optionally, such as Figure 2 As shown, before performing S102 above, the musculoskeletal robot control method based on multi-brain region fusion provided in this embodiment of the invention may further include S104-S105:

[0061] S104. A musculoskeletal robot control device based on multi-brain region fusion obtains training samples based on the dopaminergic experience playback method.

[0062] Optionally, the observation state information s at time step t. t Muscle control signals a t Reward signal r(s) t ,a t The observation state information s at the next time step (t+1) and the next time step (t+1). t+1 Together they can form an experience replay buffer. One sample (s) t ,a t,r(s t ,a t ),s t+1 ),Right now

[0063] Optionally, the musculoskeletal robot control device based on multi-brain region fusion can determine the most recent sample set and the optimal sample set from the experience playback buffer; and obtain the training samples by uniformly sampling from the most recent sample set and the optimal sample set based on the dopamine ratio coefficient.

[0064] Specifically, musculoskeletal robot control devices based on multi-brain region fusion can first start from an experience replay buffer. Construct the nearest sample set and the optimal sample set Then, samples are drawn from these two sets according to the dopamine proportion coefficient ρ for gradient update. Here, the function symbol is defined. Indicates from the sample set China and Israel Method to extract n S One sample.

[0065] The specific process for obtaining training samples is as follows:

[0066] As the learning process progresses, the robot's motion performance can gradually improve, and the newly collected samples have high learning value. However, since the initial learning effect is not good, the use of the initially generated samples should be reduced. Therefore, a musculoskeletal robot control device based on multi-brain region fusion can utilize an experience replay buffer. Extract the latest data to construct the most recent sample set

[0067]

[0068] in, It is an experience replay buffer. The capacity, n r yes capacity and It is the floor function. As the learning process progresses, the dopamine proportion coefficient ρ gradually approaches 1, therefore... Gradually, all samples can be obtained. But by adding n... r Due to limitations, the oldest samples are always excluded.

[0069] To further utilize high-performing samples from recent moments, a musculoskeletal robot control device based on multi-brain region fusion can construct an optimal sample set:

[0070]

[0071] Among them, m is the number of mini-batch samples used for network parameter update. The construction of is divided into two steps.

[0072] The musculoskeletal robot control device based on multi-brain region fusion can first extract the 2m nearest samples from That is, Then, the optimal m samples are selected from these 2m samples by sorting, that is, The basis for sorting can be the absolute value |δ| of the time difference error of the samples.

[0073] Optionally, since two basal ganglia evaluation networks are used, |δ| can be obtained by calculating the mean absolute error of the two networks:

[0074]

[0075] Among them, is the state-action target value, is the state-action target value network, is the exponentially weighted moving average, and τ is the smoothing coefficient. Similar to the calculation method of Q θ (s,a), the minimum value of two different value target networks can be used as the final state-action value target estimate:

[0076]

[0077] Finally, m samples are uniformly sampled from and according to the ratio related to the dopamine proportion coefficient ρ:

[0078]

[0079] Among them, Γ > 1 is a constant for adjusting the ratio, is the ceiling function.

[0080] S105. The musculoskeletal robot control device based on multi-brain region fusion implements the initial update and secondary update of the network fusion model according to the training samples.

[0081] Among them, the initial update is the update of the basal ganglia action network and the basal ganglia evaluation network respectively, and the secondary update is the update realized by controlling the interaction between the basal ganglia network and the cerebellar network.

[0082] First, the musculoskeletal robot control device based on multi-brain region fusion can first update the basal ganglia evaluation network Q through the above training samplesθ (s,a). Q θ The training objective of (s,a) is to minimize the Bellman residual:

[0083]

[0084] in, This indicates that the sample will be placed into the experience replay buffer. It is expressed as follows:

[0085]

[0086] Therefore, J Q The unbiased estimate of the gradient of (θ) is as follows:

[0087]

[0088] Therefore, Q θ The (s,a) parameters are updated as follows:

[0089]

[0090] Where, λ Q and the following λ π , λ ν , λ μ , λ α This represents their respective learning rates.

[0091] Then, the musculoskeletal robot control device based on multi-brain region fusion can update the basal ganglia action network π using the above training samples. φ (a|s). π φ The training objective of (a|s) is to minimize the Kullback-Leibler divergence:

[0092]

[0093] in, It is the partition function used to standardize the distribution. To simplify calculations, we can... Multiply by α and remove the constant lnZ(s) that does not affect the gradient. t ), Represented as follows:

[0094]

[0095] Minimize J π (φ) Use the reparameterization technique, that is, use a neural network f φ To estimate π φ (a|s):

[0096] a = f φ(s, ∈)

[0097] Here, ∈ is independent noise sampled from a certain distribution. In practice, actions can be obtained through a compressed Gaussian strategy:

[0098] a=tanh(χ φ (s)+σ φ (s)⊙∈)

[0099] Where, χ φ (s) is the average value of the Gaussian network output, σ φ (s) is the lnσ output by the Gaussian network. φ The obtained standard deviation The reparameterization technique rewrites the expectation of the action as the expectation of the noise, making J... π The expectation in (φ) no longer depends on the policy parameters, effectively reducing the variance of the unbiased gradient estimate. Therefore, J π (φ) is rewritten as follows:

[0100]

[0101] J π The unbiased estimate of the gradient of (φ) is as follows:

[0102]

[0103] Therefore, π φ The parameter (a|s) is updated to φ for the first time. π :

[0104]

[0105] Finally, the musculoskeletal robot control device based on multi-brain region fusion can interact with the cerebellar network via the basal ganglia network and further update the network parameters. In the STN→GC pathway, the state features ν encoded in the STN network layers... ψ (s t Projection to the cerebellar network μ ω (x). The cerebellar network outputs μ after evaluating the quality of feature extraction. ω (v ψ (s t In the DCN→DS pathway, μ ω (v ψ (s t The loss is projected onto DS to modify the network weights ψ in the feature extraction stage. Before updating ψ, ψ is set in φ. π The relative complement φ π Save it:

[0106] φ π′←φ π \ψ

[0107] Then, ψ is updated as follows:

[0108]

[0109] Therefore, π φ The (a|s) parameter is updated to φ for the second time:

[0110] φ←ψ∪φ π ′

[0111] Using J π (φ) index to measure π φ The effect of two updates (a|s):

[0112]

[0113] Where h is an arbitrary compression function, ΔJ π The gradient is limited to a reasonable range to prevent it from becoming too large. Here, h = tanh(·) is used. π (φ) is helpful for optimizing ω because the update process of φ borrows the cerebellar network parameter ω. And J π (φ π φ in ) π It is independent of ω, but it can serve as a benchmark, which is beneficial for the stability of the learning process. The IO network layer encoding evaluation error ΔJ of the cerebellar network. π Used to update μ ω (x) parameter:

[0114]

[0115] Optionally, if the policy entropy of the basal ganglia network output signal is within a preset interval in a time step, the dopamine proportion coefficient is updated; wherein, the preset interval is an interval determined based on the target entropy of the basal ganglia network output signal, and the target entropy... The functional relationship between the dopamine proportion coefficient ρ and the dopamine proportion coefficient ρ is as follows:

[0116] Specifically, since the activity level of dopamine in the basal ganglia affects the probability distribution of the output signal, and the relationship between the entropy of the basal ganglia output signal and the dopamine ratio is approximately an affine function, the dopamine ratio coefficient can be dynamically adjusted through the target entropy.

[0117] The specific dynamic adjustment process is as follows:

[0118] (1) Determine the target entropy Functional relationship with dopamine proportionality coefficient ρ

[0119] To facilitate network optimization calculations, the action space is... Each control signal 'a' in the model is normalized to the interval [-1, 1]. When the robot interacts with the environment, this normalized motion space is rescaled to the interval [0, 1] to control the muscle model normally. This normalized interval is represented as... in This refers to the dimension of the action signal. According to the Soft Actor-Critic literature, the lower bound of the target entropy is set to... Upper bound of target entropy This occurs when all actions are uniformly sampled within the action space, thus yielding the corresponding stochastic policy function π. max (a|s) is as follows:

[0120]

[0121] The maximum value of the target entropy is calculated as follows:

[0122]

[0123] Therefore, the target entropy is obtained. The function of the dopamine proportionality coefficient ρ is as follows:

[0124]

[0125] (2) Determine the target entropy adaptive update algorithm

[0126] At each update time step, determine the current policy entropy. Is it in the target entropy? Within the adjacent interval, i.e. Where ξ>0 is the threshold. If this condition is met, the dopamine proportion coefficient is updated while ensuring that it does not exceed the maximum proportion value:

[0127] ρ←min{ρ·k,1}

[0128] Where k refers to the numerical growth factor. To avoid singularity in the update, let the initial value of the dopamine ratio be ρ. init =0.01. Furthermore, in order to make the target entropy... Instead of remaining at a certain value, the update process is executed smoothly, and the above update process will be forced to repeat after j time steps since the last update.

[0129] Alternatively, the regularization parameter α can be adaptively adjusted using the following optimization function:

[0130]

[0131] Therefore, α is updated as follows:

[0132] The foregoing mainly describes the solutions provided by the embodiments of the present invention from a methodological perspective. To achieve the above functions, it includes corresponding hardware structures and / or software modules for executing each function. Those skilled in the art should readily recognize that, in conjunction with the units and algorithm steps of the various examples described in the embodiments disclosed herein, the embodiments of the present invention can be implemented in hardware or a combination of hardware and computer software. Whether a function is executed in hardware or by computer software driving hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of the present invention.

[0133] The musculoskeletal robot control method based on multi-brain region fusion provided in this embodiment of the invention can be executed by a musculoskeletal robot control device based on multi-brain region fusion, or a control module for controlling the musculoskeletal robot based on multi-brain region fusion within that control device. This embodiment of the invention uses the execution of the musculoskeletal robot control method based on multi-brain region fusion by the control device as an example to illustrate the musculoskeletal robot control device based on multi-brain region fusion provided in this embodiment of the invention.

[0134] It should be noted that, according to the above method examples, the functional modules of the musculoskeletal robot control device based on multi-brain region fusion can be divided. For example, each function can be divided into its own functional modules, or two or more functions can be integrated into one processing module. The integrated modules can be implemented in hardware or as software functional modules. Optionally, the module division in the embodiments of the present invention is illustrative and only represents one logical functional division; other division methods may be used in actual implementation.

[0135] like Figure 5As shown, this embodiment of the invention provides a musculoskeletal robot control device 500 based on multi-brain region fusion. The musculoskeletal robot control device 500 includes an acquisition module 501 and a processing module 502. The acquisition module 501 is used to acquire observational state information of the musculoskeletal robot, including joint angles, joint angular velocities, muscle signal energy, and the current and target positions of the end effector. The processing module 502 is used to generate muscle control signals from the observational state information using a network fusion model; and to drive the musculoskeletal robot to perform actions based on the muscle control signals. The network fusion model includes a basal ganglia network and a cerebellar network, which are connected via a brain region communication neural mechanism generated by simulating communication between two subcortical brain regions.

[0136] Optionally, the basal ganglia network includes: a basal ganglia action network and a basal ganglia evaluation network; the basal ganglia action network includes a dorsal striatum network layer, an outer pallidus network layer, a hypothalamic nucleus network layer, a substantia nigra reticularis network layer, and an inner pallidus network layer; the transmission pathway of the basal ganglia action network includes a direct transmission pathway and an indirect transmission pathway; the direct transmission pathway involves input through the dorsal striatum network layer and output through the substantia nigra reticularis network layer and the inner pallidus network layer; The indirect transmission pathway involves input through the dorsal striatum network layer, sequentially passing through the outer pallidus network layer and the hypothalamic nucleus network layer, and outputting through the substantia nigra reticularis network layer and the inner pallidus network layer; the basal ganglia evaluation network includes: the substantia nigra pars compacta network layer, the ventral tegmental area network layer, and the ventral striatum network layer; the transmission pathway of the basal ganglia evaluation network involves input through the substantia nigra pars compacta network layer and the ventral tegmental area network layer, and outputting through the ventral striatum network layer.

[0137] Optionally, the cerebellar network includes a granule cell network layer, a deep cerebellar nucleus network layer, a Purkinje cell network layer, and a lower olivary nucleus network layer; the transmission pathway of the cerebellar network includes a first transmission pathway and a second transmission pathway; the first transmission pathway is a direct output through the deep cerebellar nucleus network layer; the second transmission pathway is an input through the granule cell network layer, passing through the Purkinje cell network layer, and then outputting through the deep cerebellar nucleus network layer; the lower olivary nucleus network layer is used to encode error signals and transmit the error signals to the Purkinje cell network layer and the deep cerebellar nucleus network layer to correct their network weights.

[0138] Optionally, the brain region communication neural mechanism includes: transmitting the feature extraction results of the basal ganglia action network to the granule cell network layer through the hypothalamic nucleus network layer; and transmitting the output of the deep cerebellar nucleus network layer to the dorsal striatum network layer.

[0139] Optionally, the acquisition module 501 can be used to acquire training samples based on the dopaminergic experience playback method; the processing module 502 can be used to implement the initial update and secondary update of the network fusion model according to the training samples; wherein, the initial update is to update the basal ganglia action network and the basal ganglia evaluation network respectively, and the secondary update is to be implemented by controlling the interaction between the basal ganglia network and the cerebellar network.

[0140] Optionally, the acquisition module 501 can be used to determine the most recent sample set and the optimal sample set from the experience replay buffer; and to uniformly sample the training samples from the most recent sample set and the optimal sample set based on the dopamine ratio coefficient.

[0141] Optionally, the processing module 502 can be used to update the dopamine proportional coefficient if the policy entropy of the basal ganglia network output signal is within a preset interval in a time step; wherein the preset interval is an interval determined based on the target entropy of the basal ganglia network output signal, and the target entropy... The functional relationship between the dopamine proportion coefficient ρ and the dopamine proportion coefficient ρ is as follows:

[0142]

[0143] In this embodiment of the invention, muscle control signals for musculoskeletal robots can be generated through a network fusion model. Since the network fusion model can achieve communication by simulating the subcortical interconnection between the basal ganglia and the cerebellum, it can improve the learning efficiency and motor adaptability of musculoskeletal robots, thereby enabling them to complete tasks with high precision and robustness.

[0144] Figure 6 An example is a schematic diagram of the physical structure of an electronic device, such as... Figure 6As shown, the electronic device may include a processor 610, a communication interface 620, a memory 630, and a communication bus 640. The processor 610, communication interface 620, and memory 630 communicate with each other via the communication bus 640. The processor 610 can call logical instructions in the memory 630 to execute a musculoskeletal robot control method based on multi-brain region fusion. This method includes: acquiring observational state information of the musculoskeletal robot, including joint angles, joint angular velocities, muscle signal energy, and the current and target positions of the end effector; generating muscle control signals from the observational state information using a network fusion model; and driving the musculoskeletal robot to perform actions based on the muscle control signals. The network fusion model includes a basal ganglia network and a cerebellar network, which are connected based on a brain region communication neural mechanism generated by simulating communication between two subcortical brain regions.

[0145] Furthermore, the logical instructions in the aforementioned memory 630 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, essentially, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0146] On the other hand, the present invention also provides a computer program product, the computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions, and when the program instructions are executed by the computer, the computer is able to execute the musculoskeletal robot control method based on multi-brain region fusion provided by the above methods, the method comprising: acquiring observation state information of the musculoskeletal robot, the observation state information including: joint angles, joint angular velocities, muscle signal energy, current position and target position of the end effector; generating muscle control signals from the observation state information through a network fusion model; and driving the musculoskeletal robot to perform actions based on the muscle control signals; wherein, the network fusion model includes a basal ganglia network and a cerebellar network, the basal ganglia network and the cerebellar network achieving communication connection based on a brain region communication neural mechanism, the brain region communication neural mechanism being generated by simulating communication between two brain regions under the cerebral cortex.

[0147] In another aspect, the present invention also provides a non-transitory computer-readable storage medium storing a computer program thereon. When executed by a processor, the computer program is implemented to perform the aforementioned musculoskeletal robot control methods based on multi-brain region fusion. The method includes: acquiring observational state information of the musculoskeletal robot, the observational state information including: joint angles, joint angular velocities, muscle signal energy, and the current and target positions of the end effector; generating muscle control signals from the observational state information through a network fusion model; and driving the musculoskeletal robot to perform actions based on the muscle control signals. The network fusion model includes a basal ganglia network and a cerebellar network, the basal ganglia network and the cerebellar network being connected through a brain region communication neural mechanism, which is generated by simulating communication between two brain regions under the cerebral cortex.

[0148] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.

[0149] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.

[0150] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A musculoskeletal robot control method based on multi-brain region fusion, characterized in that, include: Acquire observation state information of the musculoskeletal robot, including: joint angles, joint angular velocities, muscle signal energy, and the current and target positions of the end effector; The observed state information is used to generate muscle control signals through a network fusion model; The musculoskeletal robot is driven to perform actions based on the muscle control signals; The network fusion model includes a basal ganglia network and a cerebellar network. The basal ganglia network and the cerebellar network are connected based on a brain region communication neural mechanism, which is generated by simulating the communication between two brain regions under the cerebral cortex. The basal ganglia network includes: a basal ganglia action network and a basal ganglia evaluation network; The basal ganglia action network includes the dorsal striatum network layer, the outer pallidum network layer, the hypothalamic nucleus network layer, the substantia nigra reticularis network layer, and the inner pallidum network layer. The transmission pathways of the basal ganglia action network include direct and indirect transmission pathways. The direct transmission pathway involves input through the dorsal striatum network layer and output through the substantia nigra reticularis network layer and the inner pallidus network layer. The indirect transmission pathway involves input through the dorsal striatum network layer, sequentially passing through the outer pallidus network layer and the hypothalamic nucleus network layer, and then outputting through the substantia nigra reticularis network layer and the inner pallidus network layer. The basal ganglia evaluation network includes: the substantia nigra pars compacta network layer, the ventral tegmental area network layer, and the ventral striatum network layer; the transmission pathway of the basal ganglia evaluation network is input through the substantia nigra pars compacta network layer and the ventral tegmental area network layer, and output through the ventral striatum network layer. The cerebellar network includes a granule cell network layer, a deep cerebellar nucleus network layer, a Purkinje cell network layer, and a lower olivary nucleus network layer. The cerebellar network transmission pathway includes a first transmission pathway and a second transmission pathway. The first transmission pathway is a direct output through the deep cerebellar nucleus network layer; The second transmission pathway involves input through the granule cell network layer, passing through the Purkinje cell network layer, and outputting through the deep cerebellar nucleus network layer. The lower olive nucleus network layer is used to encode error signals and transmit the error signals to the Purkinje cell network layer and the deep cerebellar nucleus network layer to correct their network weights.

2. The musculoskeletal robot control method based on multi-brain region fusion according to claim 1, characterized in that, The neural communication mechanism between brain regions includes: transmitting the feature extraction results of the basal ganglia action network to the granule cell network layer through the hypothalamic nucleus network layer; and transmitting the output of the deep cerebellar nucleus network layer to the dorsal striatum network layer.

3. The musculoskeletal robot control method based on multi-brain region fusion according to claim 1, characterized in that, Before generating muscle control signals from the observed state information using a network fusion model, the method further includes: Training samples were obtained based on the dopaminergic experience replay method; The network fusion model is updated initially and then updated based on the training samples. The initial update involves updating the basal ganglia action network and the basal ganglia evaluation network, respectively, while the secondary update is achieved by controlling the interaction between the basal ganglia network and the cerebellar network.

4. The musculoskeletal robot control method based on multi-brain region fusion according to claim 3, characterized in that, The method for obtaining training samples based on dopaminergic empirical replay includes: Determine the most recent and optimal sample sets from the experience replay buffer; The training samples are obtained by uniformly sampling from the nearest sample set and the optimal sample set based on the dopamine ratio coefficient.

5. The musculoskeletal robot control method based on multi-brain region fusion according to claim 4, characterized in that, The method further includes: If the policy entropy of the basal ganglia network output signal is within a preset range in a time step, then the dopamine ratio coefficient is updated. Wherein, the preset interval is an interval determined based on the target entropy of the output signal of the basal ganglia network, and the target entropy The dopamine ratio coefficient The functional relationship is as follows: 。 6. A musculoskeletal robot control device based on multi-brain region fusion, characterized in that, include: Acquisition module and processing module; The acquisition module is used to acquire the observation state information of the musculoskeletal robot, which includes: joint angles, joint angular velocities, muscle signal energy, and the current and target positions of the end effector. The processing module is used to generate muscle control signals from the observed state information through a network fusion model; and drive the musculoskeletal robot to perform actions based on the muscle control signals. The network fusion model includes a basal ganglia network and a cerebellar network. The basal ganglia network and the cerebellar network are connected based on a brain region communication neural mechanism, which is generated by simulating the communication between two brain regions under the cerebral cortex. The basal ganglia network includes: a basal ganglia action network and a basal ganglia evaluation network; The basal ganglia action network includes the dorsal striatum network layer, the outer pallidum network layer, the hypothalamic nucleus network layer, the substantia nigra reticularis network layer, and the inner pallidum network layer. The transmission pathways of the basal ganglia action network include direct and indirect transmission pathways. The direct transmission pathway involves input through the dorsal striatum network layer and output through the substantia nigra reticularis network layer and the inner pallidus network layer. The indirect transmission pathway involves input through the dorsal striatum network layer, sequentially passing through the outer pallidus network layer and the hypothalamic nucleus network layer, and then outputting through the substantia nigra reticularis network layer and the inner pallidus network layer. The basal ganglia evaluation network includes: the substantia nigra pars compacta network layer, the ventral tegmental area network layer, and the ventral striatum network layer; the transmission pathway of the basal ganglia evaluation network is input through the substantia nigra pars compacta network layer and the ventral tegmental area network layer, and output through the ventral striatum network layer. The cerebellar network includes a granule cell network layer, a deep cerebellar nucleus network layer, a Purkinje cell network layer, and a lower olivary nucleus network layer. The cerebellar network transmission pathway includes a first transmission pathway and a second transmission pathway. The first transmission pathway is a direct output through the deep cerebellar nucleus network layer; The second transmission pathway involves input through the granule cell network layer, passing through the Purkinje cell network layer, and outputting through the deep cerebellar nucleus network layer. The lower olive nucleus network layer is used to encode error signals and transmit the error signals to the Purkinje cell network layer and the deep cerebellar nucleus network layer to correct their network weights.