Humanoid robot multi-modal motion command distribution analysis method based on master-slave architecture
By adopting a multimodal motion command distribution and parsing method under a master-slave architecture, the problem of intention understanding deviation caused by multimodal data misalignment in humanoid robots is solved, realizing accurate distribution and consistency judgment of motion commands, and improving the stability and security of the system.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- 上海友帙信息科技有限公司
- Filing Date
- 2026-03-26
- Publication Date
- 2026-06-26
AI Technical Summary
In the intelligent application of humanoid robots, due to factors such as mismatched sampling rates of multimodal sensors, inconsistent time delays, and noise interference, motion commands cannot accurately reflect user intentions, and the master end lacks verifiable expectations of the slave end's execution behavior, making it difficult to achieve consistency verification.
A multimodal motion command distribution and parsing method based on master-slave architecture is adopted. By constructing a time window index to align multimodal input data, candidate motion commands are generated and adapted and evaluated. Packet identification information and sequence identification are introduced to ensure command integrity. Combined with the verification and parsing of the slave end, expected observable results are generated and sent back to the master end for consistency judgment, thereby realizing closed-loop control.
It improves the stability and reliability of task intent determination, avoids problems such as instruction misordering, repeated execution or expired execution, ensures the accuracy of safety constraints and environmental interaction during the execution phase, and improves the success rate and system fault tolerance in complex interaction scenarios.
Smart Images

Figure CN122284520A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of robot control and intelligent interaction technology, and in particular to a method for distributing and parsing multimodal motion commands for humanoid robots based on a master-slave architecture. Background Technology
[0002] In intelligent applications of humanoid robots, the generation of motion commands often relies on the deep fusion of multimodal information, including voice interaction, visual perception, gesture recognition, and proprioception. However, due to the differences in physical characteristics of different modal sensors, the input data naturally exhibits heterogeneous features such as sampling rate mismatch, inconsistent latency, and frequent noise interference. In the absence of a unified spatiotemporal benchmark, this "misalignment" of cross-modal data can easily lead to fluctuations in semantic association, resulting in biased understanding of intent and causing the generated motion commands to fail to accurately reflect the user's true intentions.
[0003] On the other hand, existing technologies typically decouple "multimodal semantic understanding" from "low-level motion execution": the master end focuses on intent recognition and directly issues open-loop commands, while the slave end only focuses on local planning and control. This separation model results in the master end lacking "verifiable expectations" of the slave end's execution behavior. That is, the master end has difficulty predicting the state change trends of the slave end when executing specific commands, making it difficult to verify the consistency between real-time observation results and theoretical expectations.
[0004] Therefore, we propose a multimodal motion command distribution and parsing method for humanoid robots based on a master-slave architecture; the information disclosed above in the background section is only used to enhance the understanding of the background of this disclosure, and therefore may include information that does not constitute prior art known to those skilled in the art. Summary of the Invention
[0005] The purpose of this invention is to address the shortcomings of existing technologies by providing a method for distributing and parsing multimodal motion commands for humanoid robots based on a master-slave architecture, thereby solving the technical problems mentioned in the background section.
[0006] To achieve the above objectives, the present invention provides the following technical solution: A multimodal motion command distribution and parsing method for humanoid robots based on a master-slave architecture includes the following steps: S1. The master control node acquires multimodal input data and constructs a time window index. It preprocesses and aligns the multimodal input data with time to form multimodal semantic features. Based on these features, it determines the task intent and generates a set of candidate motion instructions. S2. The master control node adapts and evaluates the candidate motion command set based on the robot's current running state and the current environment state to determine the target motion command, generates execution stage division information, forms and binds the stage constraint set, and encapsulates the target motion command to generate a motion command distribution package. S3. The master control node sends motion instruction distribution packets to the slave execution node. The slave execution node verifies and parses the motion instruction distribution packets to obtain the motion instruction to be executed, the execution stage division information, the constraint set and the prediction generation requirements, forms the stage constraints, and writes the motion instruction to be executed and the stage constraints into the execution queue. S4. The slave execution node generates a local motion execution plan aligned with the time window index based on the motion instructions to be executed and the stage constraints, determines the set of observables and establishes observable mapping relationships, generates expected observable results and encapsulates them into expected observable feedback packets and sends them back to the master control node. S5. The master control node receives the expected observable return packet and completes the association, extracts and aligns the real-time observation results, performs consistency judgment between the real-time observation results and the expected observable results to output the consistency judgment result, and generates a summary of reasons for non-compliance. S6. The master control node issues execution confirmation information or clarifying motion command distribution packets or modal supplementation requests based on the consistency judgment results, and updates the multimodal input data according to the receipt and supplementation results to enter the closed-loop process.
[0007] S1 includes: acquiring multimodal input data and constructing a time window index; preprocessing the multimodal input data and aligning it according to the time window index; extracting and associating speech semantic features, visual semantic features, and state semantic features to form multimodal semantic features; calculating modality fusion weights and fusion consistency scores based on the multimodal semantic features to determine the task intent and generating a set of candidate motion instructions.
[0008] S2 includes: adapting and evaluating the candidate motion command set based on the robot's current operating state and the current environment state to determine the target motion command, and generating execution stage division information; extracting the execution constraints corresponding to the target motion command to form a constraint set and binding them to stages according to the execution stage division information; parameterizing and encapsulating the target motion command to generate a motion command distribution package. The header of the motion command distribution package includes at least package identification information, sequence identification information, time window index, validity period information, and integrity verification field, and the package body includes at least the target motion command, execution stage division information, constraint set, and prediction generation requirements.
[0009] S3 includes: the master control node sending motion command distribution packets and retransmitting them using an exponential backoff algorithm if no acknowledgment is received within a timeout period; the slave execution node performing integrity verification, order determination, and validity period determination on the motion command distribution packets based on CRC or hash algorithms and writing them into the distribution parsing status record; parsing the motion command distribution packets after successful verification to extract the target motion command, execution stage division information, constraint set, and prediction generation requirements, and converting the target motion command into a motion command to be executed; binding the constraint set according to the execution stage division information to form stage constraints, and writing the motion command to be executed and stage constraints into the execution queue while maintaining the separation between the execution state and the execution state.
[0010] S4 includes: the slave execution node generating a local motion execution plan aligned with the time window index based on the motion instructions to be executed and stage constraints; determining the observable set and establishing an observable mapping relationship according to the prediction generation requirements; generating expected observable results containing expected values and expected ranges, and encapsulating them into expected observable return packets to be sent back to the master control node. The expected observable return packets contain at least packet identification information, sequence identification value, time window index, observable set, observable mapping relationship, prediction time range, and expected values and expected ranges.
[0011] S5 includes: the master control node receiving the expected observable return packet and verifying its associated fields and observable set coverage prediction generation requirements; associating the expected observable return packet with the motion command distribution packet based on packet identification information, sequence identification value and time window index; extracting and aligning real-time observation results according to the observable set and observable mapping relationship, and performing consistency judgment with the expected observable results to output the consistency judgment result; generating a summary of reasons for non-compliance when the consistency conditions are not met and writing it into the distribution parsing status record.
[0012] S6 includes: the master control node selects an execution confirmation distribution strategy or a clarification distribution strategy based on the consistency score and consistency threshold; when the consistency condition is met, it issues an execution confirmation message containing packet identifier information and sequence identifier value to trigger the slave execution node to execute the motion command to be executed and performs idempotent processing on duplicate confirmations; when the consistency condition is not met, it generates a clarification motion command distribution packet or modal supplementation request based on the summary of reasons for non-compliance, and updates the multimodal input data after receiving the acknowledgment and supplementation results to re-enter the closed-loop process, and triggers a distribution degradation strategy when the number of closed-loop iterations reaches a preset upper limit.
[0013] The beneficial effects of this invention are as follows: This invention constructs a unified time window index and performs time alignment, missing state and reduced-weight state processing, and fusion consistency evaluation on multimodal input data such as speech, vision, and proprioception. This avoids semantic mismatch problems caused by sampling differences and latency inconsistencies between different modalities, making the determination of task intent more stable and reliable, and reducing the risk of erroneous command generation from the source. By introducing packet identification information, sequence identification information, validity information, and integrity verification fields into the motion command distribution packet, combined with the slave-side sequence determination and validity determination mechanism, it effectively avoids problems such as command misordering, duplicate execution, or expired execution, giving the entire motion command distribution process clear version association and traceability capabilities.
[0014] This invention divides the target motion command into execution stages on the master end and generates a constraint set corresponding to each stage. The slave end, after parsing, binds the constraints to each stage and executes them as stage constraints. This ensures that the safety constraints, kinematic constraints, and environmental interaction constraints of different execution stages are accurately effective, reducing the safety risks caused by stage switching during complex movements. The slave end generates expected observable results aligned with the time window index based on the local motion execution plan and sends them back to the master end. The master end can determine the consistency between the real-time observation results and the expected observable results before or during execution, achieving verifiable control over the slave end's execution behavior and avoiding the problem of correct understanding but deviation in execution.
[0015] This invention enables the master terminal to generate a clarifying motion command distribution packet or modal supplementation request based on a summary of the reasons for the failure when the consistency determination does not meet preset conditions. This allows for targeted refinement and conservative updates to the task objective description, action parameter range, execution constraints, or prediction generation requirements, giving the command distribution process adaptive correction capabilities and improving the success rate in complex interaction scenarios. By introducing exponential backoff retransmission and CRC check mechanisms, combined with periodic heartbeat detection and local execution lock strategies, it effectively overcomes the problems of command loss and state asynchrony that easily occur in humanoid robots in unstable wireless communication scenarios such as Wi-Fi. Furthermore, it can autonomously trigger degradation protection in the event of network anomalies, greatly improving the fault tolerance and system security of the master-slave architecture in complex dynamic network environments. Attached Figure Description
[0016] Figure 1 This is a schematic diagram of the multimodal motion command distribution and parsing method for humanoid robots based on a master-slave architecture, as described in this invention. Detailed Implementation
[0017] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0018] Example 1: As Figure 1 As shown, this embodiment provides a method for distributing and parsing multimodal motion commands for a humanoid robot based on a master-slave architecture, including the following steps: S1. The master control node acquires multimodal input data and constructs a time window index. It preprocesses and aligns the multimodal input data with time to form multimodal semantic features. Based on these features, it determines the task intent and generates a set of candidate motion instructions. S2. The master control node adapts and evaluates the candidate motion command set based on the robot's current running state and the current environment state to determine the target motion command, generates execution stage division information, forms and binds the stage constraint set, and encapsulates the target motion command to generate a motion command distribution package. S3. The master control node sends motion instruction distribution packets to the slave execution node. The slave execution node verifies and parses the motion instruction distribution packets to obtain the motion instruction to be executed, the execution stage division information, the constraint set and the prediction generation requirements, forms the stage constraints, and writes the motion instruction to be executed and the stage constraints into the execution queue. S4. The slave execution node generates a local motion execution plan aligned with the time window index based on the motion instructions to be executed and the stage constraints, determines the set of observables and establishes observable mapping relationships, generates expected observable results and encapsulates them into expected observable feedback packets and sends them back to the master control node. S5. The master control node receives the expected observable return packet and completes the association, extracts and aligns the real-time observation results, performs consistency judgment between the real-time observation results and the expected observable results to output the consistency judgment result, and generates a summary of reasons for non-compliance. S6. The master control node issues execution confirmation information or clarifying motion command distribution packets and / or modal supplementation requests based on the consistency judgment results, and updates the multimodal input data according to the receipt and supplementation results to enter the closed-loop process.
[0019] S1 specifically includes the following sub-steps: S110. The master control node acquires multimodal input data for the humanoid robot. The multimodal input data includes at least two of the following: voice input data, visual input data, and body sensor input data. The master control node writes a unified time stamp to the multimodal input data to form an aligned multimodal input data sequence. The unified time identifier uses a fixed time window as the alignment unit and assigns a unique time window index to each fixed time window for data association in subsequent time alignment and consistency determination. When any modality is missing or unavailable in a certain fixed time window, the master control node marks the modality as missing in that fixed time window and records the reason for the missing, so as to use the weight reduction processing in subsequent fusion understanding.
[0020] In this embodiment, the length of the fixed time window can be set according to the communication bandwidth and control frequency, preferably 10ms to 50ms (e.g., 20ms) to balance real-time performance and computational load.
[0021] S120. The master control node preprocesses the multimodal input data sequence to obtain preprocessed multimodal input data; the preprocessing includes at least: Data cleaning: Perform silence segment removal and noise reduction on voice input data; remove blurry frames and severely occluded frames from visual input data; remove obvious abrupt changes and smooth the body sensor input data; Anomaly removal: Using a fixed time window as a unit, if the proportion of valid data for a certain modality is lower than a preset proportion threshold within the fixed time window, then the modality is marked as an anomaly within the fixed time window, and the anomaly is converted into a missing state or a reduced-weight state to participate in subsequent fusion and understanding. Time alignment: Multimodal input data with different sampling rates are mapped to the same fixed time window; window aggregation is used to obtain the representative value of the fixed time window for high sampling rate body sensing input data; representative frames are selected for visual input data within the fixed time window; corresponding speech segments are extracted from speech input data within the fixed time window and corresponding semantic segments are generated; thus ensuring that the preprocessed multimodal input data still maintains the same time structure as the time window index, so as to avoid semantic dangling or reference breakage in subsequent steps.
[0022] S130, the master control node performs intra-modal feature extraction and inter-modal association establishment on the preprocessed multimodal input data to obtain multimodal semantic features; wherein: Intramodal feature extraction: Extract speech semantic features from speech input data, extract visual semantic features from visual input data, and extract state semantic features from ontology sensing input data; speech semantic features include at least action words, direction words, and target words; visual semantic features include at least target category, target location, and gesture category; state semantic features include at least contact state and force change trend. Intermodal association establishment: Using the time window index as the key, the speech semantic features, visual semantic features and state semantic features within the same fixed time window are bound into the same multimodal segment, and the semantic correspondence between action words and gesture categories, and between target words and target categories is established; Multimodal semantic feature structuring: The action candidates, direction candidates, target candidates, and modal confidence of each multimodal segment are written into a unified structure to form multimodal semantic features, which serve as direct inputs for subsequent fusion understanding and generation of candidate motion instruction sets.
[0023] S140. The master control node performs fusion understanding based on multimodal semantic features, outputs task intent, and generates a candidate motion instruction set based on the task intent; the fusion understanding includes at least fusion weight calculation and fusion consistency score calculation, and the task intent and candidate motion instruction set satisfy a unified field definition to form a closed-loop consistent referential relationship with the subsequent target motion instruction determination steps; wherein: Fusion weight calculation: The master control node calculates the modality fusion weight for each modality. The modality fusion weight is used to characterize the contribution of that modality to task intent inference within the current fixed time window; let: The number of modes; It is a modal index, and ; To sum the modal indices, and ; For the first Modal confidence of each modality; For the first Modal confidence of each modality; For the first The modal fusion weights of each modality; then the modal fusion weights Determined by the following formula: in, This is an exponential function used to map modal confidence scores to positive values and normalize them so that the sum of all modal fusion weights is 1.
[0024] Fusion Consistency Score Calculation: The master control node calculates a fusion consistency score for candidate task intentions within the same fixed time window. The fusion consistency score measures the degree of semantic support from different modalities for the candidate task intention. Let: For the first The semantic matching score of each modality against the candidate task intent is determined by the degree of matching between the semantic features of that modality and the candidate task intent field, and is normalized to... ; The fusion consistency score represents the consistency score of the candidate task intent. Determined by the following formula: Rules for generating task intent output and candidate motion instruction set: Task intent output: The master control node calculates the fusion consistency score for multiple candidate task intents within the same fixed time window. The candidate task intent with the highest fusion consistency score that meets the preset threshold condition is selected as the task intent. The task intent includes at least the action type field, target object field, direction field, interaction method field, and task urgency and risk level fields obtained based on multimodal semantic feature matching, so as to ensure that it can be used to generate candidate motion instruction set and execution priority in the future. Candidate motion instruction set generation: The master control node retrieves action primitives matching the action type field from the action primitive library based on the task intent, combines the direction field and target object field to generate the action parameter range, and combines the action primitive identifier, action parameter range, execution constraint summary, and corresponding fusion consistency score. Write instruction entries to form a candidate motion instruction set; the master control node scores the candidate motion instruction set based on fusion consistency. Sort the items and retain a preset number of the top-ranked items as the final output set of candidate motion commands for subsequent target motion command determination steps to perform adaptation evaluation and selection.
[0025] S2 specifically includes the following sub-steps: S210, the master control node acquires the current operating state and current environmental state of the humanoid robot, and performs an adaptation evaluation on each candidate motion command based on the candidate motion command set, so as to output the adaptation evaluation score and executability mark corresponding to each candidate motion command; wherein, the current operating state includes at least joint position, joint velocity, actuator temperature, battery status and contact status, and the current environmental state includes at least target object pose, obstacle spatial distribution, passable area and ground support conditions; the adaptation evaluation includes at least kinematic feasibility evaluation, safety margin evaluation and execution efficiency evaluation, and the results of the three are weighted to obtain the adaptation evaluation score; let: Index for candidate motion commands; For the first The adaptation evaluation score of each candidate motion command; For the first Kinematic feasibility score of candidate motion instructions; For the first Safety margin score for each candidate motion instruction; For the first The execution efficiency score of each candidate motion instruction; This is the kinematic feasibility weighting coefficient; This refers to the safety margin weighting coefficient. The performance efficiency weighting coefficient is used; therefore, the adaptation evaluation score is... Determined by the following formula: And satisfy the weight constraint conditions: Among them, the kinematic feasibility assessment is used to determine whether the candidate motion command can generate a feasible trajectory and avoid self-collision and boundary crossing under the constraints of joint limit, velocity limit and acceleration limit; the safety margin assessment is used to determine the risk of the candidate motion command based on obstacle distance, support stability and expected contact force margin; the execution efficiency assessment is used to determine the efficiency of the candidate motion command based on the expected execution time, expected energy consumption and expected heat load change trend; when any candidate motion command does not meet the preset executability conditions, the master control node marks the candidate motion command as unexecutable and reduces its adaptation assessment score or directly eliminates it.
[0026] The specific weight allocation depends on the task scenario. For example, in scenarios involving close human-machine collaboration, the weight can be set... =0.3, =0.5, =0.2, to prioritize security.
[0027] S220, Master Control Node Based on Convergence Consistency Score Fit assessment score The target motion instruction is determined from the candidate motion instruction set, and execution priority and execution stage division information consistent with the target motion instruction are generated; among them, the master control node first determines the execution priority and execution stage division information based on the fusion consistency score. Perform consistency screening on the candidate motion command set and eliminate those with high consistency scores. Candidate motion commands that do not meet the preset threshold conditions are identified; subsequently, the candidate motion commands that pass the consistency screening are evaluated based on their fit scores. Sort the data and select the appropriate assessment score. The largest candidate motion command that is marked as executable is taken as the target motion command; when there are multiple candidate motion commands, the adaptation evaluation score is used. When the same or different is less than the preset difference threshold, the master control node selects the candidate motion command with more conservative constraints, the candidate motion command with fewer execution stages, and the candidate motion command with lower cost of switching with the current running state as the target motion command according to the preset decision rules. The master control node further generates execution priorities based on task urgency, risk level, and resource consumption, and writes the execution priorities into the target motion command. The master control node divides the target motion command into at least two execution phases. The execution phase division information includes at least the phase identifier and phase switching conditions for each execution phase. The phase switching conditions include at least the target object spatial threshold condition, contact state change condition, attitude stability condition, and time window index advancement condition, so as to ensure that the subsequent slave execution nodes can form a local motion execution plan and generate expected observable results based on the execution phase division information.
[0028] S230. The master control node extracts the execution constraints corresponding to the target motion command, forms a constraint set, and binds the constraint set to the execution stage according to the execution stage division information; wherein, the execution constraints are determined by at least the robot's own capabilities and safety strategy, the current environmental state, the target object attributes, and the execution stage division information; the execution constraints include at least safety constraints, kinematic constraints, and environmental interaction constraints. Safety constraints include at least minimum safety distance constraints, maximum contact force or maximum torque constraints, upper limit constraints on collision risk, and support stability constraints. Kinematic constraints include at least joint position boundary constraints, joint velocity boundary constraints, joint acceleration boundary constraints, and end-effector pose reachability constraints. Environmental interaction constraints include at least the constraints of the foot support area, the allowed contact points, the allowed contact direction, and the contact force change rate limit. The master control node writes each constraint in the constraint set into the "constraint applicable stage identifier" to achieve a one-to-one correspondence between the execution constraints and each execution stage of the target motion command. When there is a conflict between constraints from different sources, the master control node adopts the more conservative constraint as the effective constraint and records the source of the conflict and the adopted strategy to write into the constraint set for subsequent parsing and tracing.
[0029] S240. The master control node parameterizes and encapsulates the target motion command to generate a motion command distribution packet; the motion command distribution packet includes at least a header and a body, wherein: The packet header includes at least packet identification information, sequence identification information, time window index, validity period information, and integrity verification field. The packet identification information is used to uniquely identify this motion command distribution task. The sequence identification information is used to characterize the iterative version of the target motion command and increments according to the number of distributions, so that the slave execution node can perform out-of-order and duplicate checks. The integrity verification field is used by the slave execution node to perform transmission integrity checks on the motion command distribution packet. The packet body includes at least the target motion command, execution phase division information, constraint set, and prediction generation requirements. Among them, the prediction generation requirements include at least the observable set, prediction time range, and output format requirements. The observable set is used to indicate the observable categories that the slave execution node needs to generate for the expected observable results. The prediction time range is used to cover all execution phases or at least the key execution phases corresponding to the execution phase division information. The output format requirements are used to indicate that the expected observable results are output in the form of a trend or value range and aligned with the time window index. The master control node uses the motion command distribution packet as a consistency benchmark for subsequent distribution and parsing to establish a traceable association with the expected observable return packet sent back from the slave.
[0030] S3 specifically includes the following sub-steps: S310: The master control node sends motion command distribution packets to the slave execution node through a real-time communication network (such as EtherCAT bus, real-time Ethernet or customized high-reliability wireless LAN), and carries packet identification information and sequence identification information corresponding to the motion command distribution packet when sending. The master control node writes an integrity check field generated based on CRC (cyclic redundancy check) or a hash algorithm into the header of the motion command distribution packet, so that the slave execution node can perform strict transmission integrity checks on the motion command distribution packet. After completing the transmission, the master control node waits for the slave execution node to return the reception confirmation information. If the reception confirmation information is not received within the preset timeout, the master control node uses the exponential backoff algorithm to dynamically adjust the retransmission time interval and retransmits the motion command distribution packet according to the preset number of retransmissions. When the number of retransmissions reaches the preset limit and no confirmation of receipt is received, the master control node triggers a distribution degradation strategy. The distribution degradation strategy includes at least regenerating the clarifying motion instruction distribution packet and / or triggering a modal supplementation request to ensure that the distribution link is recoverable and traceable.
[0031] S320. After receiving the motion command distribution packet, the slave execution node performs a consistency check on the motion command distribution packet to confirm that the motion command distribution packet is not out of order, duplicated, or expired, and returns a reception confirmation message to the master control node after the check passes; the consistency check includes at least: Integrity verification: The slave execution node performs transmission integrity verification on the motion command distribution packet based on the integrity verification field. When the integrity verification fails, the slave execution node determines the motion command distribution packet as invalid and discards it, and sends a retransmission request back to the master control node. Sequence determination: The slave execution node determines whether the motion command distribution packet is an acceptable version based on the sequence identifier information; Let: This refers to the sequence identifier value corresponding to the sequence identifier information; If the slave execution node has successfully received and passed the latest sequence identifier value, then the slave execution node determines that the motion command distribution packet is an acceptable version when the following formula is satisfied: When the conditions are not met, the slave execution node will determine the motion command distribution packet as a duplicate or out-of-order packet and discard it; when the conditions are met but not met, the slave execution node will discard the packet. When the difference exceeds the preset skip number threshold, the slave execution node temporarily stores the motion instruction distribution packet and sends a retransmission request back to the master control node to fill in the missing version before entering the parsing process. Validity determination: The slave execution node determines whether the motion command distribution packet has expired based on the validity information; Let: The timing of receiving motion command distribution packets from the slave execution node; Let $\mathbf$ be the expiration time of the motion command distribution packet. Then, when the following formula is satisfied, the slave execution node determines that the motion command distribution packet has not expired: When the conditions are not met, the slave execution node determines the motion command distribution packet as an expired packet and discards it, and sends an expired response back to the master control node to prompt for re-distribution; after the consistency check passes, the slave execution node will send the packet identification information and sequence identifier value. Write the time window index to the local distribution parsing status record and send back the receipt confirmation information.
[0032] S330: After the consistency check is passed, the slave execution node performs field parsing on the motion instruction distribution package, extracts the target motion instruction, execution stage division information, constraint set and prediction generation requirements, and converts the target motion instruction into a local executable format to obtain the motion instruction to be executed. Among them, when the slave execution node converts the target motion command into a locally executable format, it performs at least unit consistency processing and coordinate system consistency processing to ensure that the motion command to be executed can be directly called by the local control interface; The execution node further performs semantic consistency verification, which includes at least: verifying whether the stage identifier in the execution stage division information matches the constraint applicable stage identifier in the constraint set, and verifying whether the motion parameters of the target motion command fall within the ontology capability boundary; when the semantic consistency verification fails, the execution node determines the motion command distribution packet as a semantically inconsistent packet and sends back a semantic verification failure response, and stops writing the motion command to be executed into the execution queue.
[0033] S340. After the field parsing and semantic consistency verification are passed, the execution node binds the constraint set according to the execution stage information and writes the bound execution constraints into the local execution structure of the motion instruction to be executed, so as to obtain the stage constraints that correspond one-to-one with each execution stage. The slave execution node writes the motion instructions to be executed and the stage constraints into the execution queue and maintains the separation between the execution state and the pending state: when there is an instruction that is being executed, the newly written motion instructions to be executed into the execution queue do not directly overwrite the instructions that are being executed, unless a cancellation or abort instruction is received from the master control node or a local security policy is triggered. After enqueueing is completed, the slave execution node sends an enqueue confirmation message back to the master control node. The enqueue confirmation message includes at least packet identifier information and sequence identifier value. This is used by the master control node to confirm that the distribution and parsing process has been completed and to establish a connection for the generation and return of subsequent expected observable results.
[0034] In addition, a periodic heartbeat detection mechanism is established between the slave execution node and the master control node. When the slave execution node detects a loss of heartbeat or a network connection interruption exceeding a safety threshold, it triggers a network outage fault tolerance strategy: the slave execution node immediately freezes or clears the pending execution queue, blocks any pending execution commands, and autonomously calls the underlying hardware controller to switch to a "safe emergency stop" or "maintain current stable posture" degradation protection mode until the network connection is restored and the master-slave clock alignment and state handshake are re-completed.
[0035] S4 specifically includes the following sub-steps: S410. The slave execution node generates a local motion execution plan based on the motion instructions to be executed and the stage constraints. The local motion execution plan uses the execution stage division information as the stage skeleton and is aligned with the time window index. The local motion execution plan includes at least the sequence of action stages arranged by stage identifier, the stage plan time range corresponding to each action stage, the control target corresponding to each action stage, and the stage switching conditions consistent with the execution stage division information. The stage constraints are written into the corresponding action stage as effective constraints. The execution nodes at the end perform consistent processing of the local motion execution plan execution units and coordinate systems to ensure that the control objectives, phase plan time ranges, and phase switching conditions can be executed by the slave control interface according to a unified time base and a unified spatial reference, and to provide a unified data foundation for the calculation of subsequent expected observable results.
[0036] S420. The execution node determines the set of observables according to the prediction generation requirements and establishes an observable mapping relationship for each observable in the set of observables. The set of observables includes at least one of the following: end pose change trend, contact force or joint torque change trend, target object relative pose change trend, and stability-related quantity change trend. The set of observables should cover the observable categories specified in the prediction generation requirements. The observable mapping relationship includes at least the source sensor or estimator, unit, coordinate system, and sampling interval or time window indexed output method for each observable. When the source sensor or estimator corresponding to any observable is unavailable, the slave execution node marks the observable as missing and records the reason for the missing state. The missing state is explicitly marked during subsequent data transmission to avoid the master control node misusing the missing data in the consistency determination.
[0037] S430, the slave execution node generates expected observable results based on the local motion execution plan, stage constraints, and observable mapping relationship. These expected observable results characterize the expected change trend or expected value range of each observable during the execution of the motion command to be executed, and are output aligned with the time window index. To ensure the expected observable results are reproducible and verifiable, the slave execution node outputs the expected value and expected range of each observable within the predicted time range in a structured manner. Let: For observable quantities; For observable index, and ; The predicted time is aligned with the time window index or calculated from the slave control cycle. For the first An observable at the prediction time The expected value; For the first The current value of an observable at the prediction start time; For the first The expected rate of change of an observable quantity is determined by the rate of change of the control target or the planned speed of the corresponding action phase in the local motion execution plan, and is subject to phase constraints. For the first The expected tolerance of an observable is determined by stage constraints, upper limit of sensing noise, and upper limit of environmental uncertainty. For the first An observable at the prediction time The expected lower bound; For the first An observable at the prediction time If the expected upper bound is given, then the expected value and expected range of the execution node should be determined at least by the following formula: Among them, the execution node can adopt different expected rates of change for different action stages. Tolerance as expected Furthermore, during high-risk action phases, a more conservative expected range is output to improve the robustness of consistency judgment; when an observable is marked as missing, the execution node at the end only outputs the missing state identifier and missing reason of the observable, without outputting its expected value and expected range.
[0038] S440: The slave execution node encapsulates the expected observable results in a structured manner and writes them into the association field consistent with the motion instruction distribution packet to form the expected observable return packet. The expected observable return packets include at least packet identification information and sequence identifier values. The expected values and ranges output according to the predicted time range are: time window index, observable set, observable mapping relationship, prediction time range, and expected values and ranges for observables marked as missing states. For observables marked as missing states, the expected observable return packet should include at least the missing state identifier and the reason for the missing state. Before encapsulation, the slave execution node verifies whether the set of observables covers the prediction generation requirements... After encapsulation to form the expected observable return packet, it is transmitted back to the master control node via the wireless network.
[0039] To prevent master-slave state asynchrony caused by the loss of expected observable return packets in unstable wireless communication scenarios such as Wi-Fi, the slave execution node automatically activates the local execution lock after the return packet, suspending the activation state of the instruction in the queue to be executed. The local execution lock is released and the execution state is entered only after the slave execution node receives the execution confirmation information sent by the master control node (i.e., the consistency determination result of the S6 stage is passed). If the master does not receive the determination confirmation within the preset synchronization waiting window, the slave execution node will actively trigger a status query packet or directly discard the instruction to be executed to ensure strong consistency of the instruction states of the master and slave ends.
[0040] S5 specifically includes the following sub-steps: S510. After receiving the expected observable return packet from the slave execution node, the master control node performs a return verification on the expected observable return packet. The return verification includes at least: verifying whether the expected observable return packet contains packet identification information and sequence identifier value. Related fields such as time window index are used to verify whether the set of observables covers the prediction generation requirements, and to verify whether observables marked as missing carry missing state identifiers and missing reasons; When the return verification fails, the master control node generates a return exception response and sends it back to the slave execution node. The return exception response includes at least one of the following: retransmission request, context missing response, version inconsistency response, or predicted generation of unsatisfactory response, and stops entering the consistency determination process.
[0041] S520: After the backhaul verification passes, the master control node associates the expected observable backhaul packets with the corresponding motion command distribution packets based on the association key to form task entries to be judged; wherein, the association key includes at least packet identification information and sequence identification value. With time window index, a one-to-one correspondence is established between expected observable return packets and motion command distribution packets under the same distribution task, the same version, and the same time alignment benchmark; When the association fails, the master control node marks the task entry to be judged as having failed association and generates an association failure response to send back to the slave execution node, and stops entering the real-time observation result extraction and consistency judgment process to avoid semantic dangling.
[0042] S530: The master control node extracts real-time observation results corresponding to the prediction time range from the multimodal input data based on the observable set and the observable mapping relationship, and forms real-time observation results that can be used for judgment. Specifically, the master control node performs time alignment on the real-time observation results according to the time window index, making the real-time observation results comparable to the expected observable results on the same predicted time sequence; the master control node performs unit consistency processing and coordinate system consistency processing on the real-time observation results to ensure that the real-time observation results are comparable to the expected lower bound. Compared to the expected upper bound Numerical comparisons are performed; the master control node further performs quality verification on the real-time observation results. When missing, abrupt changes, or noise exceeding limits occur, the corresponding observable is marked as a missing state or a reduced-weight state at the corresponding prediction time and the reason is recorded. The missing state or reduced-weight state information is written into the real-time observation results that can be used for judgment, so that subsequent consistency judgment can remove or downgrade it.
[0043] S540, the master control node will determine the consistency between the real-time observation results that can be used for judgment and the expected observable results, output the consistency judgment result, and write the consistency judgment result into the distribution parsing status record; the consistency judgment includes at least the determination of whether each observable falls within the expected range at the prediction time and the calculation of the consistency score. set up: For the first An observable at the prediction time Real-time observed values; For the first An observable at the prediction time Consistent indication quantity; The set of predicted times is determined by the predicted time range and sampling rules; To predict the number of elements in the time set; The consistency score is used to characterize the overall level of consistency between expected observables and real-time observations. The consistency threshold is used to determine whether consistency is achieved; therefore, the consistency indicator... Determine by the following formula: Consistency score Determine by the following formula: Specifically, when missing states or reduced-weight states exist in the real-time observation results available for judgment, the master control node removes the corresponding observables and predicted times from the consistency score calculation, and adjusts the denominator accordingly to the product of the number of valid observables after removal and the number of valid predicted times. Simultaneously, the reason for removal is written into the distribution and parsing status record. When the master control node outputs a consistency determination result that satisfies the consistency condition; when When the master control node outputs a consistency judgment result that does not meet the consistency conditions, it generates a summary of reasons for the non-compliance. The summary of reasons for the non-compliance should at least include the observable categories that exceed the expected range and the corresponding time window index interval, so as to provide a basis for subsequent motion command distribution strategy selection and clarification distribution generation.
[0044] S6 specifically includes the following sub-steps: S610. The master control node selects a motion command distribution strategy based on the consistency determination result. The motion command distribution strategy includes at least an execution confirmation distribution strategy and a clarification distribution strategy. To ensure the reproducibility of the strategy selection, the master control node uses the consistency score... With consistency threshold As a triggering condition, the confirmation distribution strategy is triggered when the following formula is met: The clarifying distribution strategy is triggered when the following condition is met: The master control node waits for the slave execution node's response for execution confirmation information, clarifying motion command distribution packets, and modal supplementation requests. If no response is received within a preset timeout, the master control node retransmits the data for a preset number of times. If no response is received after the preset number of retransmissions reaches the maximum number of times or if the consistency condition is not met multiple times consecutively, the master control node triggers a distribution degradation strategy. The distribution degradation strategy includes at least one of issuing stop or hold attitude commands and triggering a manual confirmation request to avoid an infinite closed loop and ensure safety and controllability.
[0045] S620. When the consistency determination result meets the consistency condition, the master control node sends an execution confirmation message to the slave execution node to trigger the execution of the motion command to be executed; wherein, the execution confirmation message includes at least packet identification information and sequence identification value. This establishes a version binding between the execution confirmation information and the motion instructions to be executed in the queue; after receiving the execution confirmation information, the slave execution node verifies the packet identifier information and sequence identifier value in the execution confirmation information. Check whether it matches the corresponding entry in the queue to be executed. Only if it matches will the corresponding motion instruction to be executed be dequeued and enter the execution state. When the check does not match, the slave execution node ignores the execution confirmation information and sends back a version inconsistency response to avoid accidentally triggering old or incorrect version instructions. When the master control node repeatedly sends the same version of execution confirmation information, the slave execution node performs idempotent processing on the execution confirmation information, only sending back the confirmation receipt without triggering execution again; during the execution of the motion command to be executed, the slave execution node performs real-time constraint control according to the stage constraints, and stops execution and sends back the reason for the stop when the local safety policy is triggered, so as to maintain the safety closed loop and traceability.
[0046] S630. When the consistency determination result does not meet the consistency conditions, the master control node generates a clarifying motion command distribution package and / or a modal supplementation request based on the summary of reasons for non-compliance; wherein, the summary of reasons for non-compliance at least includes observable categories that exceed the expected range and the corresponding time window index interval; the master control node takes the summary of reasons for non-compliance, the current task intent, the current target motion command, and the current constraint set as input, and performs at least one clarifying generation action to form a clarifying motion command distribution package: Task objective description refinement: The action type field, target object field, direction field and interaction method field in the task intent are further refined into executable sub-objective fields. The executable sub-objective fields shall include at least one of the following: stopping distance, allowed contact part, contact force limit or stability requirement, in order to eliminate semantic ambiguity and improve predictability. Action parameter range convergence: The range of action parameters of the target motion command is narrowed according to a preset convergence rule. The preset convergence rule includes at least one of the following: reducing the speed limit, reducing the step size range, tightening the end pose allowable deviation, and shortening the single-stage prediction time range, so as to reduce execution uncertainty. Conservative update of constraints: For the time window index interval and observable category corresponding to the cause summary that is not satisfied, the constraint set is tightened and updated. The tightening update includes at least one of the following: increasing the minimum safe distance, reducing the maximum contact force or maximum torque, tightening the stability constraint, and limiting the rate of change of the contact force. The updated constraints are then rebound according to the execution stage. Predictive generation requirement reconstruction: When inconsistencies are concentrated in a specific execution phase or a specific observable category, the master control node increases the output granularity of that execution phase or adds output requirements for key phases in the predictive generation requirements, and adds a requirement to prioritize the completion of missing modes when missing states occur frequently, so as to improve the robustness of subsequent consistency determination. Meanwhile, the master control node generates a modal supplementation request based on the cause summary of non-compliance. The modal supplementation request is used to obtain key modal information that causes inconsistency. The key modal information includes at least one of visual positioning information, force or tactile information and speech clarification information. The slave execution node is required to send back the supplementation result to enter the closed-loop update.
[0047] S640: The master control node sends the clarifying motion instruction distribution package and / or modal supplementation request to the slave execution node. After receiving the acknowledgment and supplementation result from the slave execution node, it updates the multimodal input data and updates the multimodal semantic features, task intent, and candidate motion instruction set accordingly. Based on the updated candidate motion instruction set, the master control node redetermines the target motion instruction and generates a new motion instruction distribution package to re-enter the closed-loop process of motion instruction distribution package generation, distribution, expected observable feedback package feedback, and consistency determination. The master control node records the version chain of each iteration, the summary of the reasons for the non-compliance triggered by the clarifying distribution, the type of clarification generation action taken, and whether the execution confirmation distribution strategy is finally entered in the distribution parsing status record. When the number of closed-loop iterations reaches the preset maximum number and the consistency condition is still not met, the master control node triggers the distribution degradation strategy and stops automatic closed-loop to ensure system security and controllability.
[0048] The above formulas are all dimensionless calculations. The formulas are derived from software simulations based on a large amount of collected data to obtain the most recent real-world results. The preset parameters and thresholds in the formulas are set by those skilled in the art according to the actual situation.
[0049] The above embodiments can be implemented, in whole or in part, by software, hardware, firmware, or any other combination thereof. When implemented using software, the above embodiments can be implemented, in whole or in part, as a computer program product. A computer program product includes one or more computer instructions or computer programs. When the computer instructions or computer programs are loaded or executed on a computer, all or part of the processes or functions according to the embodiments of this application are generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. Computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, computer instructions can be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., infrared, wireless, microwave, etc.) means. A computer-readable storage medium can be any available medium that a computer can access or a data storage device such as a server or data center that includes one or more sets of available media. Available media can be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., DVDs), or semiconductor media. Semiconductor media can be solid-state drives.
[0050] Those skilled in the art will recognize that the modules and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.
[0051] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and modules described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.
[0052] In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of modules is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple modules or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between apparatuses or modules may be electrical, mechanical, or other forms.
[0053] The modules described as separate components may or may not be physically separate. The components shown as modules may or may not be physical modules; they may be located in one place or distributed across multiple network modules. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs.
[0054] In addition, the functional modules in the various embodiments of this application can be integrated into one processing module, or each module can exist physically separately, or two or more modules can be integrated into one module.
[0055] If a function is implemented as a software module and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0056] The above are merely specific embodiments of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.
[0057] In conclusion, the above are merely preferred embodiments of the present invention and are not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.
Claims
1. A method for distributing and parsing multimodal motion commands for humanoid robots based on a master-slave architecture, characterized in that, Includes the following steps: S1. The master control node acquires multimodal input data and constructs a time window index. It preprocesses and aligns the multimodal input data with time to form multimodal semantic features. Based on these features, it determines the task intent and generates a set of candidate motion instructions. S2. The master control node adapts and evaluates the candidate motion command set based on the robot's current running state and the current environment state to determine the target motion command, generates execution stage division information, forms and binds the stage constraint set, and encapsulates the target motion command to generate a motion command distribution package. S3. The master control node sends motion instruction distribution packets to the slave execution node. The slave execution node verifies and parses the motion instruction distribution packets to obtain the motion instruction to be executed, the execution stage division information, the constraint set and the prediction generation requirements, forms the stage constraints, and writes the motion instruction to be executed and the stage constraints into the execution queue. S4. The slave execution node generates a local motion execution plan aligned with the time window index based on the motion instructions to be executed and the stage constraints, determines the set of observables and establishes observable mapping relationships, generates expected observable results and encapsulates them into expected observable feedback packets and sends them back to the master control node. S5. The master control node receives the expected observable return packet and completes the association, extracts and aligns the real-time observation results, performs consistency judgment between the real-time observation results and the expected observable results to output the consistency judgment result, and generates a summary of reasons for non-compliance.
2. The method for distributing and parsing multimodal motion commands for humanoid robots based on a master-slave architecture according to claim 1, characterized in that, Also includes: S6. The master control node issues execution confirmation information or clarifying motion command distribution packets or modal supplementation requests based on the consistency judgment results, and updates the multimodal input data according to the receipt and supplementation results to enter the closed-loop process.
3. The method for distributing and parsing multimodal motion commands for humanoid robots based on a master-slave architecture according to claim 1, characterized in that, S1 specifically includes: Acquire multimodal input data and construct a time window index; preprocess the multimodal input data and align it according to the time window index; extract and associate speech semantic features, visual semantic features and state semantic features to form multimodal semantic features; Modality fusion weights and fusion consistency scores are calculated based on multimodal semantic features to determine task intent and generate a set of candidate motion instructions.
4. The method for distributing and parsing multimodal motion commands for humanoid robots based on a master-slave architecture according to claim 1, characterized in that, S2 specifically includes: Based on the robot's current operating state and the current environmental state, the candidate motion command set is adapted and evaluated to determine the target motion command, and execution stage division information is generated; Extract the execution constraints corresponding to the target motion command to form a constraint set, and bind the information according to the execution stage; The target motion command is parameterized and encapsulated to generate a motion command distribution package. The header of the motion command distribution package includes at least package identification information, sequence identification information, time window index, validity period information and integrity verification field, and the package body includes at least the target motion command, execution stage division information, constraint set and prediction generation requirements.
5. The method for distributing and parsing multimodal motion commands for humanoid robots based on a master-slave architecture according to claim 1, characterized in that, S3 specifically includes: The master control node sends motion command distribution packets and performs retransmission using the exponential backoff algorithm if no acknowledgment is received within a timeout period. After the execution node performs integrity verification, order determination and validity determination of the motion instruction distribution packet based on CRC or hash algorithm, it writes the distribution parsing status record. After the verification is passed, the motion instruction distribution packet is parsed to extract the target motion instruction, execution phase division information, constraint set and prediction generation requirements, and the target motion instruction is transformed into a motion instruction to be executed. The constraint set is divided into stages according to the execution stage, and stage constraints are formed by binding the information of each stage. The motion instructions to be executed and the stage constraints are written into the execution queue, and the execution state and the execution state are kept separate.
6. The method for distributing and parsing multimodal motion commands for humanoid robots based on a master-slave architecture according to claim 1, characterized in that, S4 specifically includes: The execution node generates a local motion execution plan aligned with the time window index based on the motion instructions to be executed and the stage constraints. Based on the prediction generation requirements, determine the set of observables and establish observable mapping relationships; Generate expected observable results containing expected values and expected ranges, and encapsulate them into expected observable return packets to be sent back to the master control node.
7. The method for distributing and parsing multimodal motion commands for humanoid robots based on a master-slave architecture according to claim 6, characterized in that, The expected observable return packet should include at least packet identification information, sequence identifier value, time window index, observable set, observable mapping relationship, prediction time range, and expected value and expected range.
8. The method for distributing and parsing multimodal motion commands for humanoid robots based on a master-slave architecture according to claim 1, characterized in that, S5 specifically includes: The master control node receives the expected observable return packet and verifies its associated fields and observable set coverage prediction generation requirements. The expected observable return packets are associated with motion command distribution packets based on packet identification information, sequence identification value, and time window index. Based on the observable set and the observable mapping relationship, the real-time observation results are extracted and aligned, and the consistency with the expected observable results is determined to output the consistency determination result. If the consistency conditions are not met, a summary of the reasons for non-compliance is generated and written into the distribution parsing status record.
9. The method for distributing and parsing multimodal motion commands for humanoid robots based on a master-slave architecture according to claim 2, characterized in that, S6 specifically includes: The master control node selects to execute either an acknowledgment-based distribution strategy or a clarification-based distribution strategy based on the consistency score and the consistency threshold. When the consistency condition is met, an execution confirmation message containing packet identification information and sequence identification value is sent to trigger the slave execution node to execute the motion command to be executed and to perform idempotent processing on duplicate confirmations. When the consistency condition is not met, a clarifying motion instruction distribution package or modal supplementation request is generated based on the summary of reasons for non-compliance. After receiving the receipt and supplementation results, the multimodal input data is updated to re-enter the closed-loop process. When the number of closed-loop iterations reaches the preset upper limit, the distribution degradation strategy is triggered.