Dance gesture action generation method, device and equipment of dexterous hand and readable storage medium
By analyzing audio signals and generating layered motions, combined with kinematic optimization, dexterous hand gesture dance movements are generated, solving the problem that existing technologies cannot generate creative hand gesture dances, and realizing a deep connection between movements and music and stable execution.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHENZHEN ZHAOWEI MACHINERY&ELECTRONICS CO LTD
- Filing Date
- 2026-03-19
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies cannot deeply understand the inherent characteristics of music, and cannot automatically, smoothly, and creatively generate dexterous hand gestures that highly match it.
By analyzing the audio signal, extracting the audio feature sequence, generating action layers, optimizing the initial action sequence using kinematics, generating the target action sequence, and controlling the dexterous hand to perform the gesture dance.
The generated movements not only follow the rhythm, but also express the emotions and colors of the melody, possessing human-like creativity, and can be executed safely and stably in real, dexterous hands.
Smart Images

Figure CN122244942A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of control technology, and in particular to a method, apparatus, device, and readable storage medium for generating dexterous hand gestures and dance movements. Background Technology
[0002] With the development of humanoid robots and highly dexterous hands, their applications in entertainment, education, and companionship are increasing. Dexterous hand gestures, as a highly expressive form of nonverbal communication, are key to enhancing the anthropomorphism and emotional interaction capabilities of robots.
[0003] Currently, the relevant technologies have limitations in deeply understanding the inherent characteristics of music and automatically, smoothly, and creatively generating dexterous hand gestures that highly match it. Summary of the Invention
[0004] In view of this, the purpose of the present invention is to overcome the shortcomings of the prior art and provide a method, apparatus, device and readable storage medium for generating dexterous hand gestures and dance movements.
[0005] This invention provides the following technical solution: In a first aspect, the present invention provides a method for generating dexterous hand gesture dance movements, the method comprising: The audio signal is analyzed to obtain the audio feature sequence; Based on the audio feature sequence, action layering is performed to generate an initial action sequence; The initial action sequence is optimized based on kinematics to obtain the target action sequence; Control the dexterous hand to perform a gesture dance according to the target action sequence.
[0006] In an optional implementation, the step of generating an initial action sequence by hierarchical action generation based on the audio feature sequence includes: Based on the audio feature sequence, high-level semantic features, melodic rhythm features, and low-level acoustic features are obtained; An action style framework is generated based on the high-level semantic features, a key pose sequence is generated based on the melody beat features, and action detail angles are generated based on the low-level acoustic features. The initial motion sequence is obtained based on the motion style framework, the key pose sequence, and the motion detail angle.
[0007] In an optional implementation, obtaining the initial motion sequence based on the motion style framework, the key pose sequence, and the motion detail angle includes: Candidate action sequences are obtained based on the action style framework, the key pose sequence, and the action detail angle; The candidate action sequence is optimized based on the audio feature sequence to obtain the initial action sequence.
[0008] In an optional implementation, optimizing the initial action sequence based on kinematics to obtain the target action sequence includes: The initial motion sequence is subjected to joint limiting processing, self-collision processing, motion smoothing processing, and style filter processing to obtain the target motion sequence.
[0009] In an optional implementation, the step of performing joint limiting processing, self-collision processing, motion smoothing processing, and style filter processing on the initial motion sequence to obtain the target motion sequence includes: Perform joint limit checks on the initial motion sequence to obtain the limit check results; If the limit check result is abnormal, the initial action sequence is angularly clipped to obtain the clipped action sequence, and the clipped action sequence is subjected to self-collision detection to obtain the collision detection result. If the limit check result is normal, then perform self-collision detection on the initial action sequence to obtain the collision detection result; If the collision detection result is abnormal, the pose of the cropped action sequence or the initial action sequence is adjusted to obtain the adjusted action sequence, and the motion smoothness of the adjusted action sequence is checked to obtain the smoothness check result. If the collision detection result is normal, then a motion smoothness check is performed on the clipped action sequence or the initial action sequence to obtain the smoothness check result; If the smoothing check result is abnormal, then motion smoothing filtering is performed on the adjusted action sequence or the initial action sequence to obtain a filtered action sequence, and style filter detection is performed on the filtered action sequence to obtain a style filter detection result. If the smoothing check result is normal, then style filter detection is performed on the adjusted action sequence or the initial action sequence to obtain the style filter detection result; If the style filter detection result is abnormal, the filtered action sequence or the initial action sequence is modified by the filter to obtain the target action sequence; If the style filter detection result is normal, then the adjusted action sequence or the initial action sequence is taken as the target action sequence.
[0010] In an optional implementation, controlling the dexterous hand to perform a gesture dance according to the target action sequence includes: Generate executable driving instructions based on the target action sequence; The executable driving instructions control the dexterous hand to perform the gesture dance.
[0011] In an optional implementation, generating executable driving instructions based on the target action sequence includes: The first instruction is obtained by mapping the target action sequence; Based on the first instruction, the joint space is mapped to the motor angle to obtain the second instruction; Based on the second instruction, dynamic feedforward compensation is performed to obtain the third instruction; The third instruction is encapsulated according to the protocol corresponding to the dexterous hand to obtain the executable driver instruction.
[0012] Secondly, the present invention provides a dexterous hand gesture dance motion generation device, the device comprising: The feature extraction module is used to analyze the audio signal and obtain the audio feature sequence; The action generation module is used to perform action layering generation based on the audio feature sequence to obtain an initial action sequence; The motion optimization module is used to optimize the initial motion sequence based on kinematics to obtain the target motion sequence; The motion control module is used to control the dexterous hand to perform a gesture dance according to the target motion sequence.
[0013] Thirdly, the present invention provides a computer device including a memory and a processor, wherein the memory stores a computer program, and when the computer program is executed by the processor, it implements the dexterity hand gesture dance motion generation method as described in any of the foregoing embodiments.
[0014] Fourthly, the present invention provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the dexterity hand gesture dance motion generation method as described in any of the foregoing embodiments.
[0015] This invention discloses a method, apparatus, device, and readable storage medium for generating dexterous hand gesture dance movements. The method involves parsing an audio signal to obtain an audio feature sequence; generating an initial movement sequence by layering the movements based on the audio feature sequence; optimizing the initial movement sequence based on kinematics to obtain a target movement sequence; and controlling the dexterous hand to perform a gesture dance based on the target movement sequence. In this way, layered movement generation based on audio features deeply associates music and movement, resulting in movements that not only follow the rhythm but also express the emotion and color of the melody, exhibiting human-like creativity. Simultaneously, kinematic optimization ensures that all generated movements can be safely and stably executed on a real dexterous hand. Attached Figure Description
[0016] To more clearly illustrate the technical solution of the present invention, the accompanying drawings used in the embodiments will be briefly described below. It should be understood that the following drawings only show some embodiments of the present invention and should not be regarded as a limitation on the scope of protection of the present invention. In the various drawings, similar components are numbered similarly.
[0017] Figure 1 A flowchart illustrating the dexterity hand gesture dance motion generation method proposed in this embodiment is shown. Figure 2 Another flowchart of the dexterity hand gesture dance motion generation method proposed in this embodiment is shown; Figure 3 This illustration shows another flowchart of the dexterity hand gesture dance motion generation method proposed in this embodiment; Figure 4 A schematic diagram of the dexterous hand gesture dance motion generation device proposed in this embodiment is shown.
[0018] Explanation of reference numerals in the attached diagram: 400 - Dexterous hand gesture dance motion generation device; 401 - Feature extraction module; 402 - Motion generation module; 403 - Motion optimization module; 404 - Motion control module. Detailed Implementation
[0019] The technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments.
[0020] The components of the embodiments of the invention described and illustrated herein can typically be arranged and designed in various different configurations. Therefore, the following detailed description of the embodiments of the invention provided in the accompanying drawings is not intended to limit the scope of the claimed invention, but merely to illustrate selected embodiments of the invention. All other embodiments obtained by those skilled in the art based on the embodiments of the invention without inventive effort are within the scope of protection of the invention.
[0021] In the following, the terms “comprising,” “having,” and their cognates, which may be used in various embodiments of the invention, are intended only to indicate a particular feature, number, step, operation, element, component, or combination thereof, and should not be construed as excluding, firstly, the presence of one or more other features, numbers, steps, operations, elements, components, or combinations thereof, or adding the possibility of one or more features, numbers, steps, operations, elements, components, or combinations thereof.
[0022] Furthermore, the terms "first," "second," and "third" are used only to distinguish descriptions and should not be interpreted as indicating or implying relative importance.
[0023] Unless otherwise specified, all terms used herein (including technical and scientific terms) shall have the same meaning as commonly understood by one of ordinary skill in the art to which the various embodiments of the invention pertain. Terms (such as those defined in commonly used dictionaries) shall be interpreted as having the same meaning as in their contextual meaning in the relevant technical field and shall not be interpreted as having an idealized or overly formal meaning, unless clearly defined in the various embodiments of the invention.
[0024] Example 1 This disclosure provides a method for generating dexterous hand gesture dance movements.
[0025] Please see Figure 1 The method for generating dexterous hand gestures and dance movements includes steps S101 to S104, and each step is described in detail below.
[0026] Step S101: Analyze the audio signal to obtain the audio feature sequence.
[0027] In this embodiment, the audio signal is analyzed at multiple levels to extract a structured audio feature sequence, providing an orthogonal feature basis for subsequent layered action generation.
[0028] Exemplary audio feature sequences include low-level acoustic features, high-level semantic features, and melodic beat features. Low-level acoustic features include, but are not limited to, Mel spectrograms, Mel frequency cepstral coefficients, spectral centroid, spectral bandwidth, and zero-crossing rate. These features characterize the instantaneous frequency distribution and energy of the music.
[0029] Melodic beat features include rhythm and meter features, and melody and harmony features. Rhythm and meter features are obtained by extracting beat points, tempos, and rhythm intensity contours through a beat tracking algorithm; this is the foundation for action timing alignment. Melody and harmony features are obtained by extracting fundamental frequency contours and chromaticity features to capture melodic direction and harmonic changes.
[0030] High-level semantic features are obtained by extracting emotional tags (such as passionate and melancholic), style tags (such as classical and electronic dance music), and instrument recognition information using a pre-trained music model.
[0031] Furthermore, the underlying acoustic features, high-level semantic features, and melodic rhythm features are aligned along the time axis and fused into a multi-dimensional, time-series audio feature sequence.
[0032] Step S102: Generate an initial action sequence by performing action layering based on the audio feature sequence.
[0033] In this embodiment, the model's hierarchical action generator generates actions hierarchically based on the audio feature sequence to obtain the initial action sequence for the dexterous hand gesture dance. The model includes, but is not limited to, Hierarchical Conditional GAN or Diffusion Model architectures.
[0034] Please see Figure 2 In one specific embodiment, step S102 includes steps S1021 to S1023, and each step is described in detail below.
[0035] Step S1021: Obtain high-level semantic features, melodic rhythm features, and low-level acoustic features based on the audio feature sequence.
[0036] In this embodiment, the audio feature sequence is encoded into a series of conditional latent vectors by the model's conditional encoder, and high-level semantic features, melodic beat features, and low-level acoustic features are obtained from the audio feature sequence as the data basis for action layering generation.
[0037] Step S1022: Generate an action style framework based on the high-level semantic features, generate a key pose sequence based on the melody beat features, and generate action detail angles based on the low-level acoustic features.
[0038] In this embodiment, the model's hierarchical action generator generates corresponding actions based on high-level semantic features, melodic rhythm features, and low-level acoustic features.
[0039] The exemplary, hierarchical action generator includes a macro-planning layer, a meso-structural layer, and a micro-execution layer, which can map different levels of musical features to different levels of action planning.
[0040] The macro-planning layer generates a movement style framework based on the high-level semantic features (emotion, style) and overall rhythm of the music. For example, it determines whether the entire passage is dominated by large, wavy movements or delicate fingertip movements.
[0041] The mesoscopic structural layer plans phrases and breathing patterns of movements based on the characteristics of melody and rhythm, generates key posture sequences and transition intentions, and ensures that the movement structure is synchronized with the musical structure.
[0042] The micro-execution layer generates motion detail angles based on the underlying acoustic characteristics (especially high-frequency details and instantaneous energy), namely the fine finger joint angles, speeds, and accelerations for each frame, so that subtle movements (such as finger tremors) can accurately respond to the high-frequency harmonics or transients of the music (such as cymbal sounds).
[0043] Step S1023: Obtain the initial motion sequence based on the motion style framework, the key pose sequence, and the motion detail angle.
[0044] In this embodiment, the initial motion sequence is obtained by integrating the motion style framework, key pose sequence, and motion detail angle.
[0045] Understandably, the style framework determines the overall topology of the hand configuration, key poses are anchored to the finger joint combination angles at the strong points of the beat, and detailed angles are filled in for fine-tuning the inter-frame transitions. The three are generated and then merged in parallel, which ensures both macro-level artistry and micro-level feasibility, avoiding the technical defect of high-frequency details drowning out low-frequency style tendencies in unified sequence generation.
[0046] In one specific embodiment, step S1023 includes: obtaining a candidate action sequence based on the action style framework, the key pose sequence, and the action detail angle; and optimizing the candidate action sequence based on the audio feature sequence to obtain the initial action sequence.
[0047] In this embodiment, candidate action sequences are obtained based on the action style framework, key pose sequences, and action detail angles. The discriminator in the model receives the generated candidate action sequences and their corresponding audio feature sequences, and determines whether the candidate action sequences are authentic. After training, the discriminator can identify whether the actions are coordinated with the music in terms of emotion and rhythm, and whether the actions themselves are smooth and natural, thereby performing preliminary optimization of the action sequences to obtain the initial action sequences.
[0048] Step S103: Optimize the initial action sequence based on kinematics to obtain the target action sequence.
[0049] In this embodiment, the initial action sequence is kinematically optimized to obtain the target action sequence, ensuring that all generated actions can be executed safely and stably on a real dexterous hand.
[0050] In one specific embodiment, step S103 includes: performing joint limiting processing, self-collision processing, motion smoothing processing, and style filter processing on the initial motion sequence to obtain the target motion sequence.
[0051] In this embodiment, the initial motion sequence is sequentially processed by joint limiting, self-collision, motion smoothing, and style filter.
[0052] Specifically, joint limit checks are performed on the initial motion sequence to obtain the limit check results.
[0053] If the limit check result is abnormal, the initial action sequence is trimmed to a safe range to obtain the trimmed action sequence, and self-collision detection is performed on the trimmed action sequence to obtain the collision detection result.
[0054] If the limit check result is normal, then perform self-collision detection on the initial action sequence to obtain the collision detection result.
[0055] As an example, a 1:1 scale 3D model of the dexterous hand is created. During the execution of a motion sequence that passes joint limit checks, the spatial relationships of each finger joint of the dexterous hand are calculated frame by frame. If a distance less than a preset threshold (e.g., 5mm) is detected, it is determined to be a potential collision.
[0056] If the collision detection result is abnormal, the pose of the clipped motion sequence or the initial motion sequence that has passed the joint limit check is adjusted to avoid collision, resulting in an adjusted motion sequence. The smoothness of the adjusted motion sequence is then checked to obtain the smoothness check result.
[0057] If the collision detection result is normal, then the smoothness of motion is checked on the clipped motion sequence or the initial motion sequence that has passed the joint limit check and self-collision detection to obtain the smoothness check result.
[0058] As an example, the joint accelerations of adjacent frames in an action sequence that has passed joint limit checks and self-collision detection are calculated, and trajectory continuity is evaluated using cubic spline interpolation. If a sudden acceleration change exceeds a threshold, it is marked as a non-smooth point. Simultaneously, it is necessary to check whether the time intervals between action frames are uniform to avoid stuttering caused by frame rate fluctuations.
[0059] If the smoothing check result is abnormal, then the adjusted motion sequence or the initial motion sequence that has passed the joint limit check and self-collision detection is subjected to motion smoothing filtering to obtain the filtered motion sequence, and then the filtered motion sequence is subjected to style filter detection to obtain the style filter detection result.
[0060] If the smoothness check result is normal, then style filter detection is performed on the adjusted motion sequence or the initial motion sequence that has passed the joint limit check, self-collision detection and motion smoothness check to obtain the style filter detection result.
[0061] If the style filter detection result is abnormal, the filtered motion sequence or the initial motion sequence that has passed the joint limit check, self-collision check and motion smoothness check will be modified with a filter to obtain the target motion sequence.
[0062] If the style filter detection result is normal, the adjusted action sequence or the initial action sequence that has passed the joint limit check, self-collision detection, motion smoothness check and style filter detection will be used as the target action sequence.
[0063] Step S104: Control the dexterous hand to perform a gesture dance according to the target action sequence.
[0064] In this embodiment, the dexterous hand is controlled to precisely execute gesture dance movements that are deeply associated with audio information according to the target action sequence. This can be applied to fields such as robot performance, virtual idol driving, music visualization interaction, special education (sign language learning assistance), and metaverse virtual human animation generation.
[0065] Please see Figure 3 In one specific embodiment, step S104 includes steps S1041 to S1042, and each step is described in detail below.
[0066] Step S1041: Generate executable driving instructions based on the target action sequence.
[0067] In this embodiment, executable driving instructions are generated based on the target action sequence, so that the dexterous hand can directly perform actions based on the instructions.
[0068] In one specific embodiment, step S1041 includes: performing instruction mapping based on the target action sequence to obtain a first instruction; performing joint space-to-motor angle mapping based on the first instruction to obtain a second instruction; performing dynamic feedforward compensation based on the second instruction to obtain a third instruction; and encapsulating the third instruction according to the protocol corresponding to the dexterous hand to obtain the executable drive instruction.
[0069] In this embodiment, a first instruction is obtained by mapping the target action sequence; a second instruction is obtained by mapping the joint space to the motor angle based on the first instruction to adapt to the nonlinear transmission ratio unique to dexterous hands (such as tendon sheath stretching and gear backlash) and eliminate static position errors under open-loop control; a third instruction is obtained by performing dynamic feedforward compensation based on the second instruction, for example, by pre-compensating acceleration / velocity-related torques based on the dexterous hand dynamics model (including finger mass inertia and friction coulomb terms) to significantly suppress trajectory tracking lag under high-speed actions; and a third instruction is obtained by encapsulating the third instruction according to the dexterous hand's corresponding protocol to obtain an executable drive instruction, enabling immediate use.
[0070] Step S1042: Control the dexterous hand to perform the dexterous hand gesture dance according to the executable drive instruction.
[0071] In this embodiment, the dexterous hand is controlled to perform dexterous hand gestures according to executable driving instructions, which lowers the threshold for engineering deployment and accelerates the transformation and application of technology.
[0072] The dexterity hand gesture dance generation method proposed in this embodiment analyzes the audio signal to obtain an audio feature sequence; generates an initial action sequence by layering the actions based on the audio feature sequence; optimizes the initial action sequence based on kinematics to obtain a target action sequence; and controls the dexterity hand to perform a gesture dance according to the target action sequence. In this way, layered action generation based on audio features deeply associates music and movement, resulting in actions that not only follow the rhythm but also express the emotion and color of the melody, possessing human-like creativity. Simultaneously, kinematic optimization ensures that all generated actions can be safely and stably executed on a real dexterity hand.
[0073] Example 2 Furthermore, this disclosure provides a dexterous hand gesture dance motion generation device 400, please refer to [link to relevant documentation]. Figure 4 The device includes: The feature extraction module 401 is used to analyze the audio signal to obtain an audio feature sequence; Action generation module 402 is used to perform action layering generation based on the audio feature sequence to obtain an initial action sequence; The motion optimization module 403 is used to optimize the initial motion sequence based on kinematics to obtain the target motion sequence; The motion control module 404 is used to control the dexterous hand to perform a gesture dance according to the target motion sequence.
[0074] In an optional implementation, the motion generation module 402 is further configured to obtain high-level semantic features, melody beat features, and low-level acoustic features based on the audio feature sequence; generate a motion style framework based on the high-level semantic features; generate a key pose sequence based on the melody beat features; generate motion detail angles based on the low-level acoustic features; and obtain the initial motion sequence based on the motion style framework, the key pose sequence, and the motion detail angles.
[0075] In an optional implementation, the motion generation module 402 is further configured to obtain a candidate motion sequence based on the motion style framework, the key pose sequence, and the motion detail angle; and to optimize the candidate motion sequence based on the audio feature sequence to obtain the initial motion sequence.
[0076] In an optional implementation, the motion optimization module 403 is further configured to perform joint limiting processing, self-collision processing, motion smoothing processing, and style filter processing on the initial motion sequence to obtain the target motion sequence.
[0077] In an optional implementation, the motion optimization module 403 is further configured to perform joint limit checks on the initial motion sequence to obtain limit check results; if the limit check results are abnormal, the initial motion sequence is angularly trimmed to obtain a trimmed motion sequence, and the trimmed motion sequence is subjected to self-collision detection to obtain a collision detection result; if the limit check results are normal, the initial motion sequence is subjected to self-collision detection to obtain a collision detection result; if the collision detection result is abnormal, the trimmed motion sequence or the initial motion sequence is pose adjusted to obtain an adjusted motion sequence, and the adjusted motion sequence is subjected to motion smoothness checks to obtain a smoothness check result; if the collision detection result is normal, the trimmed motion sequence or the initial motion sequence is subjected to pose adjustment to obtain an adjusted motion sequence, and the adjusted motion sequence is subjected to motion smoothness checks to obtain a smoothness check result; if the collision detection result is normal, the trimmed motion sequence or the initial motion sequence is subjected to pose adjustment to obtain an adjusted motion sequence, and the adjusted motion sequence is subjected to motion smoothness checks to obtain a smoothness check result; if the collision detection result is normal, the trimmed motion sequence or the initial motion sequence is subjected to pose adjustment to obtain an adjusted motion sequence, and the adjusted motion sequence is subjected to motion smoothness checks to obtain a smoothness check result. The initial action sequence undergoes a motion smoothness check to obtain a smoothness check result. If the smoothness check result is abnormal, the adjusted action sequence or the initial action sequence undergoes motion smoothing filtering to obtain a filtered action sequence, and the filtered action sequence undergoes style filter detection to obtain a style filter detection result. If the smoothness check result is normal, the adjusted action sequence or the initial action sequence undergoes style filter detection to obtain a style filter detection result. If the style filter detection result is abnormal, the filtered action sequence or the initial action sequence undergoes filter modification to obtain the target action sequence. If the style filter detection result is normal, the adjusted action sequence or the initial action sequence is used as the target action sequence.
[0078] In an optional implementation, the motion control module 404 is further configured to generate executable driving instructions based on the target motion sequence; and control the dexterous hand to perform the gesture dance based on the executable driving instructions.
[0079] In an optional implementation, the motion control module 404 is further configured to perform instruction mapping based on the target motion sequence to obtain a first instruction; perform joint space-to-motor angle mapping based on the first instruction to obtain a second instruction; perform dynamic feedforward compensation based on the second instruction to obtain a third instruction; and encapsulate the third instruction according to the protocol corresponding to the dexterous hand to obtain the executable drive instruction.
[0080] The apparatus provided in this embodiment can execute the steps of the dexterous hand gesture dance motion generation method provided in Embodiment 1. To avoid repetition, it will not be described again.
[0081] The dexterous hand gesture dance generation device proposed in this embodiment analyzes audio signals to obtain audio feature sequences; generates initial action sequences by layering actions based on the audio feature sequences; optimizes the initial action sequences based on kinematics to obtain target action sequences; and controls the dexterous hand to perform gesture dances according to the target action sequences. In this way, layered action generation based on audio features deeply associates music and movement, resulting in generated actions that not only follow the rhythm but also express the emotion and color of the melody, possessing human-like creativity. Simultaneously, kinematic optimization ensures that all generated actions can be safely and stably executed on a real dexterous hand.
[0082] Example 3 Furthermore, this disclosure provides a computer device including a memory and a processor. The memory stores a computer program, which, when executed by the processor, implements the dexterity hand gesture dance motion generation method described in Embodiment 1.
[0083] The device provided in this embodiment can execute the steps of the dexterous hand gesture dance motion generation method provided in Embodiment 1. To avoid repetition, it will not be described again.
[0084] Example 4 This disclosure provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the dexterity hand gesture dance motion generation method described in Embodiment 1.
[0085] In this embodiment, the computer-readable storage medium may be a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, etc.
[0086] The computer-readable storage medium provided in this embodiment can implement the dexterous hand gesture dance motion generation method provided in Embodiment 1. To avoid repetition, it will not be described again here.
[0087] In all examples shown and described herein, any specific values should be interpreted as merely exemplary and not as limitations; therefore, other examples of exemplary embodiments may have different values.
[0088] It should be noted that similar labels and letters in the following figures indicate similar items. Therefore, once an item is defined in one figure, it does not need to be further defined and explained in subsequent figures.
[0089] The above-described embodiments are merely illustrative of several implementations of the present invention, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of the invention. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of the present invention, and these modifications and improvements all fall within the scope of protection of the present invention.
Claims
1. A method for generating dexterous hand gesture dance movements, characterized in that, The method includes: The audio signal is analyzed to obtain the audio feature sequence; Based on the audio feature sequence, action layering is performed to generate an initial action sequence; The initial action sequence is optimized based on kinematics to obtain the target action sequence; Control the dexterous hand to perform a gesture dance according to the target action sequence.
2. The method for generating dexterous hand gesture dance movements according to claim 1, characterized in that, The step of generating an initial action sequence by hierarchically generating actions based on the audio feature sequence includes: Based on the audio feature sequence, high-level semantic features, melodic rhythm features, and low-level acoustic features are obtained; An action style framework is generated based on the high-level semantic features, a key pose sequence is generated based on the melody beat features, and action detail angles are generated based on the low-level acoustic features. The initial motion sequence is obtained based on the motion style framework, the key pose sequence, and the motion detail angle.
3. The method for generating dexterous hand gestures for dance according to claim 2, characterized in that, The process of obtaining the initial motion sequence based on the motion style framework, the key pose sequence, and the motion detail angle includes: Candidate action sequences are obtained based on the action style framework, the key pose sequence, and the action detail angle; The candidate action sequence is optimized based on the audio feature sequence to obtain the initial action sequence.
4. The method for generating dexterous hand gesture dance movements according to claim 1, characterized in that, The optimization of the initial action sequence based on kinematics to obtain the target action sequence includes: The initial motion sequence is subjected to joint limiting processing, self-collision processing, motion smoothing processing, and style filter processing to obtain the target motion sequence.
5. The method for generating dexterous hand gesture dance movements according to claim 4, characterized in that, The process of performing joint limiting processing, self-collision processing, motion smoothing processing, and style filter processing on the initial motion sequence to obtain the target motion sequence includes: Perform joint limit checks on the initial motion sequence to obtain the limit check results; If the limit check result is abnormal, the initial action sequence is angularly clipped to obtain the clipped action sequence, and the clipped action sequence is subjected to self-collision detection to obtain the collision detection result. If the limit check result is normal, then perform self-collision detection on the initial action sequence to obtain the collision detection result; If the collision detection result is abnormal, the pose of the cropped action sequence or the initial action sequence is adjusted to obtain the adjusted action sequence, and the motion smoothness of the adjusted action sequence is checked to obtain the smoothness check result. If the collision detection result is normal, then a motion smoothness check is performed on the clipped action sequence or the initial action sequence to obtain the smoothness check result; If the smoothing check result is abnormal, then the adjusted action sequence or the initial action sequence is subjected to motion smoothing filtering to obtain a filtered action sequence, and the filtered action sequence is subjected to style filter detection to obtain a style filter detection result. If the smoothing check result is normal, then style filter detection is performed on the adjusted action sequence or the initial action sequence to obtain the style filter detection result; If the style filter detection result is abnormal, the filtered action sequence or the initial action sequence is modified by the filter to obtain the target action sequence; If the style filter detection result is normal, then the adjusted action sequence or the initial action sequence is taken as the target action sequence.
6. The method for generating dexterous hand gesture dance movements according to claim 1, characterized in that, The step of controlling a dexterous hand to perform a gesture dance according to the target action sequence includes: Generate executable driving instructions based on the target action sequence; The executable driving instructions control the dexterous hand to perform the gesture dance.
7. The method for generating dexterous hand gesture dance movements according to claim 6, characterized in that, The step of generating executable driving instructions based on the target action sequence includes: The first instruction is obtained by mapping the target action sequence; Based on the first instruction, the joint space is mapped to the motor angle to obtain the second instruction; Based on the second instruction, dynamic feedforward compensation is performed to obtain the third instruction; The third instruction is encapsulated according to the protocol corresponding to the dexterous hand to obtain the executable driver instruction.
8. A device for generating dexterous hand gestures and dance movements, characterized in that, The device includes: The feature extraction module is used to analyze the audio signal and obtain the audio feature sequence; The action generation module is used to perform action layering generation based on the audio feature sequence to obtain an initial action sequence; The motion optimization module is used to optimize the initial motion sequence based on kinematics to obtain the target motion sequence; The motion control module is used to control the dexterous hand to perform a gesture dance according to the target motion sequence.
9. A computer device, characterized in that, It includes a memory and a processor, the memory storing a computer program that, when executed by the processor, implements the dexterity hand gesture dance motion generation method as described in any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that, It stores a computer program that, when executed by a processor, implements the method for generating dexterous hand gestures and dance movements as described in any one of claims 1 to 7.