A method and system for recovering human motion capture data
By using a combined model of Transformer encoder and multilayer perceptron, the problems of insufficient efficiency and effectiveness in motion capture data recovery are solved, achieving efficient data recovery and accurate feedback.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- UNIV OF JINAN
- Filing Date
- 2023-04-06
- Publication Date
- 2026-06-23
Smart Images

Figure CN116416680B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of motion capture data recovery technology, specifically relating to a method and system for recovering human motion capture data. Background Technology
[0002] The statements in this section are merely background information related to the present invention and do not necessarily constitute prior art.
[0003] Motion capture (MoCap) data is a digital representation of human movement recorded by a professional motion capture system. It is widely used in many fields such as 3D animation, film production, medical analysis, human-computer interaction, and virtual reality. How to process and analyze this motion capture data has been a hot research topic in computer graphics and animation for decades.
[0004] However, even with professional motion capture equipment, the original motion capture data may be incomplete or corrupted during the motion capture process due to occlusion issues. Therefore, the missing motion must be recovered from the original data before use.
[0005] Motion capture data is essentially sequential data, making it suitable for processing with recurrent neural networks (RNNs). Recently, researchers have attempted to use Long Short-Term Memory (LSTM) networks as the basic framework of their models, adding constraints, self-attention mechanisms, bidirectional structures, and varying depths. However, these methods do not change the sequential computational nature of RNNs, limiting their model efficiency and recovery accuracy. Summary of the Invention
[0006] To address the aforementioned problems, this invention proposes a method and system for recovering human motion capture data. This invention explores and utilizes the spatial and temporal correlations in motion capture data for recovery.
[0007] According to some embodiments, the first solution of the present invention provides a method for recovering human motion capture data, which adopts the following technical solution:
[0008] A method for recovering human motion capture data includes:
[0009] Acquire a motion capture data matrix, which consists of sequentially arranged frames;
[0010] Based on linear projection, each frame of motion capture data in the motion capture data matrix is mapped to a high-dimensional feature space, and spatial location information is embedded into the frame data to obtain high-dimensional single-frame data; and a self-attention mechanism is used to extract the spatial features of the correlation between all marker points in the high-dimensional single-frame data to obtain high-dimensional single-frame spatial features.
[0011] Temporal information is embedded into all high-dimensional single-frame spatial features of the motion capture data matrix, and the temporal features of the correlation between high-dimensional single-frame spatial features are obtained by using a self-attention mechanism to obtain the spatiotemporal motion capture data matrix.
[0012] Based on the spatiotemporal motion capture data matrix, a multilayer perceptron is used to reconstruct complete motion capture data.
[0013] Furthermore, the motion capture data matrix consists of a series of frames, where each frame records the 3D position of all marker points.
[0014] Furthermore, the extraction of spatial features of correlations among all marker points in high-dimensional single-frame data using a self-attention mechanism specifically involves:
[0015] Self-attention extraction of high-dimensional single-frame data based on Transformer encoder;
[0016] The extracted self-attention features are residually connected to capture the correlation between all labeled points in high-dimensional single-frame data;
[0017] High-dimensional single-frame spatial features are obtained.
[0018] Furthermore, the method of utilizing the self-attention mechanism to obtain the temporal features of the correlation between high-dimensional single-frame spatial features specifically involves:
[0019] Self-attention extraction of high-dimensional motion capture data matrices with embedded temporal information based on Transformer encoder;
[0020] The extracted self-attention features are subjected to residual connections to capture the correlation between high-dimensional single-frame spatial features in each frame.
[0021] The spatiotemporal motion capture data matrix is obtained.
[0022] Furthermore, the Transformer encoder consists of multi-head self-attention blocks and multi-layer sensing blocks, with residual connections following each block.
[0023] Furthermore, the Transformer encoder performs L iterations of calculation, specifically:
[0024] Z′l=MSA(LN(Z l -1))+Z l -1, l = 1, 2, ..., L
[0025] Z l =MLP(LN(Z′l))+Z′l, l=1, 2,...,L
[0026] Y = LN(ZL)
[0027] Where LN(·) represents the layer normalization operator, Z l This represents the data processed in the l-th calculation by the encoder, where l represents the current iteration number, and there are a total of L iterations.
[0028] Furthermore, based on the spatiotemporal motion capture data matrix, a multilayer perceptron is used to reduce the data dimensionality and restore the complete motion capture data.
[0029] According to some embodiments, the second aspect of the present invention provides a human motion capture data recovery system, which adopts the following technical solution:
[0030] A human motion capture data recovery system includes:
[0031] The data acquisition module is configured to acquire a motion capture data matrix, which consists of sequentially arranged frames;
[0032] The spatial feature extraction module is configured to map each frame of motion capture data matrix to a high-dimensional feature space based on linear projection, and embed spatial location information into the frame data to obtain high-dimensional single-frame data; and use a self-attention mechanism to extract the spatial features of the correlation between all marker points in the high-dimensional single-frame data to obtain high-dimensional single-frame spatial features.
[0033] The temporal feature extraction module is configured to embed temporal information into all high-dimensional single-frame spatial features of the motion capture data matrix, and use a self-attention mechanism to obtain the temporal features of the correlation between high-dimensional single-frame spatial features to obtain the spatiotemporal motion capture data matrix.
[0034] The data recovery module is configured to restore complete motion capture data using a multilayer perceptron based on a spatiotemporal motion capture data matrix.
[0035] According to some embodiments, a third aspect of the present invention provides a computer-readable storage medium.
[0036] A computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of a human motion capture data recovery method as described in the first aspect above.
[0037] According to some embodiments, a fourth aspect of the present invention provides a computer device.
[0038] A computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor, when executing the program, implements the steps of a human motion capture data recovery method as described in the first aspect above.
[0039] Compared with the prior art, the beneficial effects of the present invention are as follows:
[0040] This invention uses Transformer to learn the spatiotemporal dependencies between motion capture data, exploring and utilizing the spatial and temporal correlations in the motion capture data for recovery. Structurally, it overcomes the limitation of traditional RNN models that cannot be parallelized, as the number of operations required to calculate the correlation between two locations does not increase with distance, greatly reducing model training time. In terms of recovery performance, it uses temporal and spatial modules to learn the spatiotemporal information of the data separately, enhancing the model's understanding of the data and resulting in excellent recovery performance. In addition, the novel and comprehensive loss function enables the model to comprehensively measure the effect of data recovery and provide accurate feedback information. Attached Figure Description
[0041] The accompanying drawings, which form part of this invention, are used to provide a further understanding of the invention. The illustrative embodiments of the invention and their descriptions are used to explain the invention and do not constitute an improper limitation of the invention.
[0042] Figure 1 This is a flowchart of a human motion capture data recovery method according to an embodiment of the present invention;
[0043] Figure 2 This is a schematic diagram of the spatiotemporal Transformer network structure described in an embodiment of the present invention;
[0044] Figure 3 This is a diagram of basketball motion recovery markers in an embodiment of the present invention. Detailed Implementation
[0045] The present invention will be further described below with reference to the accompanying drawings and embodiments.
[0046] It should be noted that the following detailed description is illustrative and intended to provide further explanation of the invention. Unless otherwise specified, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains.
[0047] It should be noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the scope of exemplary embodiments according to the invention. As used herein, the singular form is intended to include the plural form as well, unless the context clearly indicates otherwise. Furthermore, it should be understood that when the terms "comprising" and / or "including" are used in this specification, they indicate the presence of features, steps, operations, devices, components, and / or combinations thereof.
[0048] Where there is no conflict, the embodiments and features in the embodiments of the present invention can be combined with each other.
[0049] Example 1
[0050] This embodiment provides a method for recovering human motion capture data. This embodiment uses the application of this method to a server as an example for illustration. It is understood that this method can also be applied to a terminal, or to a system including a terminal, server, and system, and is implemented through interaction between the terminal and server. The server can be a standalone physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network servers, cloud communication, middleware services, domain name services, CDN security services, and big data and artificial intelligence platforms. The terminal can be a smartphone, tablet, laptop, desktop computer, smart speaker, smartwatch, etc., but is not limited to these. The terminal and server can be directly or indirectly connected via wired or wireless communication, which is not limited herein. In this embodiment, the method includes the following steps:
[0051] Step S01: Obtain the motion capture data matrix, which consists of sequentially arranged frames;
[0052] Step S02: Based on linear projection, map each frame of motion capture data matrix to a high-dimensional feature space, and embed spatial location information into the frame data to obtain high-dimensional single-frame data; and use a self-attention mechanism to extract the spatial features of the correlation between all marker points in the high-dimensional single-frame data to obtain high-dimensional single-frame spatial features.
[0053] Step S03: Embed the temporal information into all the high-dimensional single-frame spatial features of the motion capture data matrix, and use the self-attention mechanism to obtain the temporal features of the correlation between the high-dimensional single-frame spatial features to obtain the spatiotemporal motion capture data matrix.
[0054] Step S04: Based on the spatiotemporal motion capture data matrix, use a multilayer perceptron to reconstruct the complete motion capture data.
[0055] This embodiment makes full use of data correlation and proposes a novel neural network model for motion capture data recovery. It has two levels of Transformer, namely a spatial Transformer encoder and a temporal Transformer encoder, followed by a regression head. The first and second level Transformers explore and utilize spatial and temporal correlations, respectively.
[0056] To learn the spatiotemporal dependencies between motion capture data, this embodiment uses a Transformer to construct a backbone network. Backbone networks are adept at uncovering correlations in sequence data through self-attention mechanisms. For example... Figure 2 As shown, the motion capture data recovery neural network model proposed in this embodiment consists of a two-stage Transformer (spatial Transformer and temporal Transformer) and a regression head. The spatial Transformer encoder processes each frame, and then the temporal Transformer encoder processes the features extracted from all frames. Finally, the regression head reconstructs the complete motion capture data.
[0057] Specifically, in step S01, the motion capture data matrix consists of a series of frames, where each frame records the 3D position of all marker points.
[0058] For step S02, the spatial Transformer encoder
[0059] The spatial Transformer encoder in the motion capture data recovery neural network model is used to extract and utilize the spatial features of the correlation between marker points in each frame of the motion capture data matrix. Specifically:
[0060] The correlation between marker points reflects the characteristics of different actions. This step aims to extract and utilize the spatial features of the correlation between marker points in each frame of data. A separate spatial Transformer encoder was designed for each frame.
[0061] x is projected through a learnable linear projection i ∈R 3J Mapping to a high-dimensional feature space R J×C ;
[0062] Spatial location information E pos ∈R j×C Embedded within it, high-dimensional single-frame data is obtained.
[0063] High-dimensional single-frame data The data is sent to the spatial Transformer encoder, which uses its self-attention mechanism to capture the spatial features of the correlation between marker points in the high-dimensional single-frame data.
[0064] The spatial Transformer encoder employs a Transformer encoder, consisting of a multi-head self-attention (MSA) block and an MLP block. The Transformer's self-attention mechanism helps capture the correlations between marker points, resulting in high-dimensional single-frame spatial features extracted from the i-th frame. The above process is applied to each frame of data in the motion capture data matrix to obtain a series of spatial feature combinations composed of high-dimensional single-frame spatial features.
[0065] For step S03, the time Transformer encoder
[0066] The temporal features of the correlation between high-dimensional single-frame spatial features in the motion capture data matrix processed by the spatial Transformer encoder in the temporal Transformer encoder of the motion capture data recovery neural network model are extracted and utilized. Specifically:
[0067] There are specific correlations between each frame of motion; therefore, this step aims to extract and utilize the temporal features of the correlations between captured frames.
[0068] The spatial features extracted by the spatial Transformer encoder are combined into Z = (z 1 ,z 2 ,…,z i ,…,z N )∈R N ×J·C , Embed the time information before sending it to the time Transformer encoder;
[0069] The self-attention mechanism of the temporal Transformer encoder is used to capture the temporal features of the correlation between high-dimensional single-frame spatial features in each frame in the combination of spatial features.
[0070] The self-attention mechanism of the temporal Transformer encoder helps capture the correlation between frames, outputting Y∈R. N×J·C Sent to the regression head X that generates the recovery motion capture data rec ∈R N×3J .
[0071] Please note that the regression head is a simple MLP block with layer norm and linear layers.
[0072] For the spatial Transformer encoder and the temporal Transformer encoder in steps S03 and S04, the Transformer encoder model is used.
[0073] The Transformer encoder model consists of a multi-head self-attention (MSA) block and an MLP block. A layer norm is applied before each block, and a residual connection is applied after each block. The Transformer encoder and the initial embedded features Z0 are used for L iterations of computation, as shown below:
[0074] Z l =MSA(LN(Z) l-1 ))+Z l-1 l = 1, 2, ..., L
[0075] Z l =MLP(LN(Z′) l ))+Z′ l l = 1, 2, ..., L
[0076] Y = LN(Z) L )
[0077] Where LN(·) represents the layer normalization operator, Z l This represents the data processed in the l-th calculation by the encoder, where l represents the current iteration number, and there are a total of L iterations. The value of L can be adjusted, and the default value is 4.
[0078] For step S04, the regression head
[0079] The regression head is a simple MLP block with layer norm and linear layers.
[0080] Based on the spatiotemporal motion capture data matrix, a multilayer perceptron is used to reduce the dimensionality of the data and restore the complete motion capture data.
[0081] like Figure 2 As shown in the specific embodiment, during the training of the motion capture data recovery neural network model, in order to better compare the effect after data recovery, the complete data is occluded to obtain damaged data, and then the damaged data is input into the motion capture data recovery neural network model described in this embodiment for training.
[0082] The preparation of the dataset is as follows:
[0083] For a motion clip, the motion capture data matrix consists of a series of frames (poses), where each frame records the 3D position of all marker points.
[0084] The motion capture data matrix is represented as X = (x1, x2, ..., x...). N}∈RN ×3J Where N is the number of frames, J is the number of markers in the human skeleton, and the frame with time step i-th (1≤i≤N) is represented as xi∈R. 3J .
[0085] Introducing the symbol X tru Represents complete data, X cor This indicates corrupted data, where X cor =X tru⊙M, M∈R N×3J The masking matrix M consists of elements 0 and 1, where 0 masks the original data and 1 retains the original data.
[0086] The specific training process:
[0087] The dataset was divided into a test set and a training set in a 3:7 ratio.
[0088] Based on the training set, the motion capture data recovery neural network model is trained to recover motion capture data, and the recovered motion capture data is obtained.
[0089] The overall loss function is used to calculate the difference between the recovered motion capture data and the real motion capture data, and the parameters in the network are optimized by backpropagation through the optimizer.
[0090] The test set data is input into the motion capture data recovery neural network model, and the RMSE value of the model is calculated. The RMSE value is the error value; the smaller the better. This is used to determine if the motion capture data recovery neural network model has converged. If it has, training ends; otherwise, training continues, and the trained motion capture data recovery neural network model is saved. It's understandable that the model training process sets the number of epochs. Once the set number of epochs is reached, the model stops training. Generally, the model will tend to converge within the set number of epochs.
[0091] Optimizer
[0092] The overall loss function consists of three parts: reconstruction loss, rigidity loss, and smoothing loss, and the formula is as follows:
[0093] L sum =λ rec ×L rec +λ ri ×L ri +λ sm ×L sm
[0094] λ rec +λ ri +λ sm =1
[0095] Where, λ rec , λ ri , λ sm This represents the weight parameters of the three loss terms.
[0096] The three loss functions are as follows:
[0097] Reconstruction loss is calculated by comparing the ground-truth sequence with the recovered MoCap sequence.
[0098] L rec =||Xrec -X tru ||2
[0099] Rigid loss, a novel rigid loss method proposed in this embodiment, calculates the distance loss between marker pairs in set B in each frame, where set B consists of marker pairs with rigid structures, specifically:
[0100]
[0101] Where i and s represent the frame index and the marker point pair index, respectively, and b i,s This represents the distance between s-th rigid marker pairs in the i-th frame of a real motion capture sequence. This indicates the distance of the recovered motion capture sequence.
[0102] Smoothness loss is calculated by comparing the loss between each frame of the recovered MoCap sequence and the average of its neighboring frames. Specifically:
[0103]
[0104] Following the protocol established in many previous works, this embodiment selected 25 topics from the CMU dataset for experiments, including walking, jumping, basketball, and other sports. All MoCap data was stored in C3D format, providing the coordinates of 41 marker points; therefore, each frame can be represented as x. i ∈R 3×41 .
[0105] Example Experiment
[0106] This example implements a neural network using PyTorch 1.9 and Python 3.7, and trains it on an Nvidia GeForce GTX 3090 GPU. The Adam optimizer is used, with a learning rate of 0.001 and a batch size of 256 for training the model. The hyperparameter λ... rec , λ ri and λ sm They were set to 0.8, 0.1, and 0.1 respectively.
[0107] To simulate missing data, data and markers within certain frames are occluded. Two control strategies are employed: varying the missing interval and varying the number of missing markers. The missing marker interval controls the length of the interval (per frame), while the number of missing markers controls the number of missing markers within a single frame. These two control strategies work together to simulate potential missing data scenarios in real-world applications. Two types of experiments are designed to verify the robustness of the proposed model for the two control variables. We use the root mean square error (RMSE) as our evaluation metric, as shown in the formula below.
[0108]
[0109] Experimental results
[0110] Experiments were conducted under different missing gaps.
[0111] In this experiment, the gap in missing data ranged from 10 to 200 frames, with the number of missing markers fixed at 5 (10%). The RMSE for playing basketball, walking, and boxing under LSTM, A-LSTM+LS, and our method are shown in Tables 1(a), (b), and (c). It can be seen that the model in this embodiment achieved excellent results under all missing interval conditions. Even with a missing frame count of 200 frames, the model's performance remained stable. This indicates that the model can effectively utilize the temporal correlation of motion capture data and handle long-term missing data gaps.
[0112] Table 1
[0113]
[0114]
[0115] Experiments were conducted with different numbers of missing markers.
[0116] In this experiment, the number of missing markers was set to 5, 10, and 15 (corresponding to 10%, 20%, and 30% of the missing data, respectively), and as in previous experiments, the sampled values for the missing intervals were from a Gaussian distribution with a mean of 10 and a standard deviation of 5. The RMSE for basketball, boxing, and jumping are shown in Table 2 for Window, LSTM, Li, and the method of this embodiment. It can be seen that the model of this embodiment performs best in basketball and jumping turns, but poorly in boxing. This is because boxing is included in the test set, not the training set. Nevertheless, the proposed method produces better results than Window in boxing, demonstrating the model's scalability.
[0117] Table 2
[0118]
[0119]
[0120] exist Figure 3 The image visualizes a recovered basketball motion with a missing marker rate of 30%, where recovered markers and original markers are represented in orange and blue, respectively. Figure 3 It demonstrated good quality of recovery exercise.
[0121] Example 2
[0122] This embodiment provides a human motion capture data recovery system, including:
[0123] The data acquisition module is configured to acquire a motion capture data matrix, which consists of sequentially arranged frames;
[0124] The spatial feature extraction module is configured to map each frame of motion capture data matrix to a high-dimensional feature space based on linear projection, and embed spatial location information into the frame data to obtain high-dimensional single-frame data; and use a self-attention mechanism to extract the spatial features of the correlation between all marker points in the high-dimensional single-frame data to obtain high-dimensional single-frame spatial features.
[0125] The temporal feature extraction module is configured to embed temporal information into all high-dimensional single-frame spatial features of the motion capture data matrix, and use a self-attention mechanism to obtain the temporal features of the correlation between high-dimensional single-frame spatial features to obtain the spatiotemporal motion capture data matrix.
[0126] The data recovery module is configured to restore complete motion capture data using a multilayer perceptron based on a spatiotemporal motion capture data matrix.
[0127] The examples and application scenarios implemented by the above modules and corresponding steps are the same, but are not limited to the content disclosed in Embodiment 1 above. It should be noted that the above modules, as part of the system, can be executed in a computer system such as a set of computer-executable instructions.
[0128] The descriptions of each embodiment in the above embodiments have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions in other embodiments.
[0129] The proposed system can be implemented in other ways. For example, the system embodiments described above are merely illustrative, and the division of modules described above is only a logical functional division. In actual implementation, there may be other division methods. For example, multiple modules may be combined or integrated into another system, or some features may be ignored or not executed.
[0130] Example 3
[0131] This embodiment provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the steps of a human motion capture data recovery method as described in Embodiment 1 above.
[0132] Example 4
[0133] This embodiment provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the program, it implements the steps in the human motion capture data recovery method described in Embodiment 1 above.
[0134] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, the present invention can take the form of hardware embodiments, software embodiments, or embodiments combining software and hardware aspects. Furthermore, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage and optical storage) containing computer-usable program code.
[0135] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.
[0136] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.
[0137] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.
[0138] Those skilled in the art will understand that all or part of the processes in the above embodiments can be implemented by a computer program instructing related hardware. The program can be stored in a computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. The storage medium can be a magnetic disk, optical disk, read-only memory (ROM), or random access memory (RAM), etc.
[0139] While the specific embodiments of the present invention have been described above in conjunction with the accompanying drawings, this is not intended to limit the scope of protection of the present invention. Those skilled in the art should understand that various modifications or variations that can be made by those skilled in the art without creative effort based on the technical solutions of the present invention are still within the scope of protection of the present invention.
Claims
1. A method for recovering human motion capture data, characterized in that, include: Acquire a motion capture data matrix, which consists of sequentially arranged frames; Based on linear projection, each frame of motion capture data in the motion capture data matrix is mapped to a high-dimensional feature space, and spatial location information is embedded into the frame data to obtain high-dimensional single-frame data; and a self-attention mechanism is used to extract the spatial features of the correlation between all marker points in the high-dimensional single-frame data to obtain high-dimensional single-frame spatial features. The method of extracting spatial features of correlation between all marker points in high-dimensional single-frame data using a self-attention mechanism specifically includes: performing self-attention extraction on high-dimensional single-frame data based on a Transformer encoder; performing residual connections on the extracted self-attention features to capture the correlation between all marker points in the high-dimensional single-frame data; obtaining high-dimensional single-frame spatial features; and designing a separate spatial Transformer encoder for each frame. Temporal information is embedded into all high-dimensional single-frame spatial features of the motion capture data matrix, and the temporal features of the correlation between high-dimensional single-frame spatial features are obtained by using a self-attention mechanism to obtain the spatiotemporal motion capture data matrix. The method of using self-attention mechanism to obtain temporal features of correlation between high-dimensional single-frame spatial features specifically includes: performing self-attention extraction on a high-dimensional motion capture data matrix with embedded temporal information based on a Transformer encoder; performing residual connection on the extracted self-attention features to capture the correlation between high-dimensional single-frame spatial features of each frame; and obtaining a spatiotemporal motion capture data matrix. Based on the spatiotemporal motion capture data matrix, a multilayer perceptron is used to reconstruct complete motion capture data. The Transformer encoder consists of a multi-head self-attention block and a multi-layer sensing block, with residual connections connected after each block. During the training of the neural network model for motion capture data recovery, the overall loss function includes reconstruction loss, stiffness loss, and smoothing loss, with the specific formula as follows: in, These represent reconstruction loss, stiffness loss, and smoothing loss, respectively. The weight parameters represent the three loss terms; The reconstruction loss is specifically calculated using the following formula: in, Indicates complete data. This represents the recovered motion capture data; Rigidity loss is used to calculate the distance loss between marker pairs in each frame set, where the set consists of marker pairs with rigid structures. The specific formula is as follows: in, These represent the frame index and the marker point pair index, respectively. Represents a real motion capture sequence -th frame -th is the distance between rigid marker pairs. Indicates the distance of the recovered motion capture sequence; Smoothing loss is used to calculate the loss between each frame of the recovered sequence and the average of its neighboring frames. The specific formula is as follows: 。 2. The method for recovering human motion capture data as described in claim 1, characterized in that, The motion capture data matrix consists of a series of frames, where each frame records the 3D position of all marker points.
3. The method for recovering human motion capture data as described in claim 1, characterized in that, The Transformer encoder performs L The next iteration is calculated as follows: in, Presentation layer normalization operators, Indicates the encoder's... l The data processed in this calculation l This indicates the current iteration number, with a total of L iterations.
4. The method for recovering human motion capture data as described in claim 1, characterized in that, The spatiotemporal motion capture data matrix is used to perform data dimensionality reduction using a multilayer perceptron to restore the complete motion capture data.
5. A human motion capture data recovery system, employing the human motion capture data recovery method as described in any one of claims 1-4, characterized in that, include: The data acquisition module is configured to acquire a motion capture data matrix, which consists of sequentially arranged frames; The spatial feature extraction module is configured to map each frame of motion capture data matrix to a high-dimensional feature space based on linear projection, and embed spatial location information into the frame data to obtain high-dimensional single-frame data; and use a self-attention mechanism to extract the spatial features of the correlation between all marker points in the high-dimensional single-frame data to obtain high-dimensional single-frame spatial features. The temporal feature extraction module is configured to embed temporal information into all high-dimensional single-frame spatial features of the motion capture data matrix, and use a self-attention mechanism to obtain the temporal features of the correlation between high-dimensional single-frame spatial features to obtain the spatiotemporal motion capture data matrix. The data recovery module is configured to restore complete motion capture data using a multilayer perceptron based on a spatiotemporal motion capture data matrix.
6. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the program is executed by the processor, it implements the steps in the human motion capture data recovery method as described in any one of claims 1-4.
7. A computer device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the steps in the human motion capture data recovery method as described in any one of claims 1-4.