A method and system for generating motion data based on first-person perspective video

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By using a motion data generation method based on first-person perspective video, video image information is automatically processed, feature points and lens behavior are extracted, and camera motion trajectory is optimized. This solves the problems of low efficiency and high cost in existing motion data production technologies, and achieves more efficient and accurate motion data generation and immersive experience.

CN116206019BActive Publication Date: 2026-06-19SHANDONG UNIV

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: SHANDONG UNIV
Filing Date: 2023-02-27
Publication Date: 2026-06-19

Application Information

Patent Timeline

27 Feb 2023

Application

19 Jun 2026

Publication

CN116206019B

IPC: G06T13/20; G06T7/80; G06T7/246; G06V10/75; G06V10/44

AI Tagging

Application Domain

Image analysis Character and pattern recognition

Technical Efficacy Phrases

reduce manufacturing costreduce cumulative error

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Positive electrode active material, method for producing positive electrode active material, and potassium ion battery
CN122202264AOvercome the problem of rapid decayImprove structural stabilityCell electrodes Secondary cells
Anti-interference three-layer heating wire
CN224367997Ureduce manufacturing cost Improve efficiency Insulated cables Heating element shapes
Nitrogen purging device for gas-fired double-hearth kiln
CN224365315UReduce installation costsreduce manufacturing cost Charge manipulation Furnace types
A delivery chute for a copper ingot heating furnace
CN224353583UCharge manipulation Furnace types
An essential oil fractionating device and method
CN117505096BSimple structure reduce manufacturing cost

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure CN116206019B_ABST

Patent Text Reader

Abstract

This invention provides a method and system for generating motion data based on first-person perspective video. The method involves acquiring video image information and determining the camera's internal parameters; extracting feature points from adjacent video frames to estimate camera motion; determining whether the camera has returned to a previously visited environmental area based on the acquired image features, and establishing loop closure constraints; detecting camera shake or stillness in the video to provide position commands for generating special effects; optimizing the camera pose measured at different times and using loop closure detection information to obtain globally consistent camera motion trajectory data; mapping the generated globally consistent camera motion trajectory data onto the motion range of a motion seat as the main driving signal, superimposing the position commands generated by special effects, and generating corresponding motion data. This invention offers high processing efficiency and provides users with a more realistic experience.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of video data processing technology, and relates to a method and system for generating motion data based on first-person perspective video. Background Technology

[0002] The statements in this section are merely background information related to the present invention and do not necessarily constitute prior art.

[0003] Driven by emerging technologies such as communications, ultra-high definition, and virtual reality, the immersive video industry has experienced explosive growth. Motion seats, driven by motion data, simulate motion effects in real space through pitch, roll, and yaw movements. They can also integrate vestibular, haptic, vibration, auditory, and wind technologies to enhance the viewer's immersion. Therefore, motion seats can provide users with a higher level of interactivity and presence in immersive video experiences. However, since motion seats need to acquire corresponding motion data during immersive video playback, an effective method for generating motion data is required.

[0004] According to the inventors, in existing technologies, the production of motion effects based on immersive videos still relies on manual creation and is highly labor-intensive. Currently, people often use manual and joystick methods to create motion data. The manual method involves technical experts watching the video in advance based on its content, and simultaneously simulating the movement of the camera lens corresponding to the video by manually writing or editing motion curves using 3D software. Simulator input is also used to obtain motion data for the motion seats, resulting in a motion parameter file for the seats and special effects equipment. During video playback, the corresponding parameter file is transmitted to the seats, controlling them to move synchronously with the plot. The joystick method involves technicians watching the video and simultaneously operating a six-degree-of-freedom joystick, moving the joystick according to the camera angles, while using measuring devices to measure the joystick's trajectory, and then generating corresponding motion data for the motion seats based on this trajectory. Both methods require specialized technicians, consume a lot of manpower, and are inefficient and costly. From an industrial perspective, the entire process cannot be automated or intelligentized, and standardized implementation is difficult. Furthermore, the viewing experience of technicians varies from person to person, resulting in inconsistent motion data that cannot be standardized. Summary of the Invention

[0005] To address the aforementioned problems, this invention proposes a method and system for generating motion data based on first-person perspective video. This invention offers high processing efficiency and enables users to have a more realistic experience.

[0006] According to some embodiments, the present invention adopts the following technical solution:

[0007] A method for generating motion data based on first-person perspective video includes the following steps:

[0008] Acquire video image information to determine the camera's internal parameters;

[0009] Extract feature points from adjacent video frames, estimate camera motion, determine whether the camera has returned to a previously visited environmental area based on the obtained image features, and establish loop closure constraints.

[0010] Detect camera shake or stillness in a video and provide positional instructions for generating special effects;

[0011] Based on the camera pose measured at different times and the information from loop closure detection, the data is optimized to obtain globally consistent camera motion trajectory data.

[0012] The generated globally consistent camera motion trajectory data is mapped onto the motion range of the motion seat as the main drive signal, and position commands generated by special effects are superimposed to generate corresponding motion data.

[0013] As an alternative implementation method, the specific process of acquiring video image information and determining the camera's internal parameters is as follows: the video data is processed into frame images, and the camera's internal parameters are determined based on the frame width and frame height of the frame images, combined with the camera's focal length which is set to a fixed value.

[0014] As an alternative implementation, the specific process of extracting feature points from adjacent video frame images and estimating camera motion includes: extracting FAST corner points and describing features by detecting areas with significant local pixel grayscale changes in the image, and matching feature points in adjacent video frames using a fast approximate nearest neighbor algorithm; eliminating incorrect matching pairs, and calculating the relative pose of the camera using the epipolar geometric constraints of the two images, until the number of successful pairings meets a predetermined value.

[0015] As an alternative implementation, the process of determining whether the camera has returned to a previously visited environmental area based on the obtained image features and establishing loop closure constraints includes comparing the current frame image with the processed video frame images one by one, adding the image with the highest similarity to the loop closure candidate frame set, quickly completing the coarse search, traversing the determined loop closure candidate frame set, solving the relative camera pose between the current frame and the loop closure candidate frames, performing reprojection matching, determining the number of matches, and finally determining the loop closure.

[0016] As an alternative implementation, the specific process of detecting camera shake or stationary behavior in a video and providing positional instructions for generating special effects includes detecting camera shake in the video and using the Euclidean distance of feature matching points to determine whether the camera has moved within a certain period of time, so as to determine the stationary behavior of the camera.

[0017] As an alternative implementation method, the camera pose optimization process includes:

[0018] Multiple unary connections are established between multiple map points in the 3D world corresponding to a single image frame to form a graph structure; local optimization is performed on shared keyframes by placing the map points corresponding to multiple keyframes with shared views into the graph structure for optimization.

[0019] As an alternative implementation method, the optimization process for loop closure detection information includes:

[0020] The detected loop closure constraint information is subjected to essential graph optimization. The map points corresponding to the common keyframes and loop closure frames are added to the graph optimization. The poses of the beginning and end of the loop are connected in the graph structure to optimize the camera pose.

[0021] By adding the map points corresponding to the global keyframes to the graph optimization, the optimized pose of the global camera is obtained.

[0022] A motion data generation system based on first-person perspective video includes:

[0023] The data reading module is configured to acquire video image information and determine the camera's internal parameters;

[0024] The visual odometry module is configured to extract feature points from adjacent video frame images to estimate camera motion;

[0025] The loop closure detection module is configured to determine whether the camera has returned to a previously visited environmental area based on the acquired image features, and to establish loop closure constraints.

[0026] The special effects detection module is configured to detect camera shake or stillness in a video and provide positional instructions for generating special effects.

[0027] The backend optimization module is configured to optimize the camera pose measured at different times and the loop closure detection information to obtain globally consistent camera motion trajectory data.

[0028] The motion simulation module is configured to map the generated globally consistent camera motion trajectory data onto the motion range of the motion seat as the main drive signal, and superimpose position commands generated by special effects to generate corresponding motion data.

[0029] As an alternative implementation, the visual odometry module includes:

[0030] The feature extraction and matching module is configured to extract image features by detecting areas of significant local pixel grayscale changes in the image, extracting FAST corner points, calculating descriptors, and using a fast approximate nearest neighbor algorithm to match feature points in adjacent video frames.

[0031] The camera pose initial estimation module is configured to eliminate incorrect matching pairs based on matching point pairs and calculate the relative camera pose using the epipolar geometric constraints of the two images.

[0032] As an alternative implementation, the loop closure detection module includes:

[0033] The appearance verification module is configured to compare the current frame image with the processed video frame images one by one, and add the image with the highest similarity to the loop closure candidate frame set;

[0034] The geometric verification module is configured to traverse the loop closure candidate frame set determined by the appearance verification module, solve the relative camera pose between the current frame and the loop closure candidate frames, perform reprojection matching, determine the number of matches, and finally determine the loop closure.

[0035] As an alternative implementation, the backend optimization module includes:

[0036] The pose graph optimization module is configured to establish multiple unary connections for multiple map points in the corresponding 3D world in a single image frame to form a graph structure to optimize the relative camera pose.

[0037] The local BA optimization module is configured to put the map points corresponding to multiple keyframes with co-view relationships into a graph structure for optimization;

[0038] The essential graph optimization module adds map points corresponding to common keyframes and loop closure frames to the graph optimization, and connects the poses of the beginning and end of the loop closure in the graph structure to optimize the camera pose.

[0039] The global BA optimization module is configured to add map points corresponding to global keyframes to the graph optimization and perform global camera pose optimization.

[0040] Compared with the prior art, the beneficial effects of the present invention are as follows:

[0041] This invention can automatically extract motion data from POV (first-person perspective) videos, greatly saving the production cost of motion effects.

[0042] This invention adds backend optimization and loop closure detection modules to reduce the cumulative error caused by solving camera pose between two video frames and improve the accuracy of estimating the camera's global motion trajectory.

[0043] This invention detects the camera shake effect during movement and superimposes this effect onto the main drive signal of the camera movement, thus covering a wider range of video and improving the user's immersion and satisfaction during the experience. Attached Figure Description

[0044] The accompanying drawings, which form part of this invention, are used to provide a further understanding of the invention. The illustrative embodiments of the invention and their descriptions are used to explain the invention and do not constitute an improper limitation of the invention.

[0045] Figure 1 A flowchart illustrating the method for generating motion data using visual algorithms based on POV video provided in this embodiment;

[0046] Figure 2 A schematic diagram of the four solutions obtained by decomposing the essential matrix;

[0047] Figure 3 A schematic diagram illustrating the camera motion process for calculating adjacent frames in a video;

[0048] Figure 4 This is a schematic diagram of the graph structure optimized based on g2o. Detailed Implementation

[0049] The present invention will be further described below with reference to the accompanying drawings and embodiments.

[0050] It should be noted that the following detailed description is illustrative and intended to provide further explanation of the invention. Unless otherwise specified, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains.

[0051] It should be noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the scope of exemplary embodiments according to the invention. As used herein, the singular form is intended to include the plural form as well, unless the context clearly indicates otherwise. Furthermore, it should be understood that when the terms "comprising" and / or "including" are used in this specification, they indicate the presence of features, steps, operations, devices, components, and / or combinations thereof.

[0052] like Figure 1 As shown in this embodiment, the method for generating motion data based on POV video visual algorithm processing includes the following steps:

[0053] (1) Read video image information and obtain camera internal parameters.

[0054] (2) Estimate camera motion by extracting feature points from adjacent video frames; determine whether the camera has returned to the previously visited environment area by using the obtained image features, and establish loop closure constraints; detect lens shaking or stationary behavior in the video and provide position instructions for special effects generation.

[0055] (3) Based on the camera pose measured at different times and the information of loop closure detection, optimize them to reduce errors and obtain globally consistent camera motion trajectory data.

[0056] (4) The generated global camera motion trajectory parameters are mapped onto the motion range of the motion seat through a flushing filter as the main drive signal. In addition, a shaking signal or a stationary signal is generated according to the special effect position command and superimposed on the main drive signal to generate the corresponding motion data.

[0057] In step 1, the video data is processed into frame images and the camera's intrinsic parameter matrix K is determined.

[0058]

[0059] w and h represent the frame width and frame height of the video, respectively, and f is the camera's focal length. In motion simulation, we need to simulate the trend of virtual camera movement in the video, not its specific pose. The value of parameter f only creates a constant scaling factor between the recovered camera motion and the actual camera motion. Humans perceive motion as acceleration, not velocity; therefore, we set f to a constant value, which does not affect the overall motion simulation.

[0060] In step 2, the specific methods include:

[0061] (2-1) As Figure 3 As shown, ORB features are described by extracting FAST corner points and calculating BRIEF descriptors in areas of significant local pixel grayscale changes in the image. The Fast Approximate Nearest Neighbor (FLANN) algorithm is then used to match ORB feature points in adjacent video frames. Based on the matching points, the RANSAC strategy is used to eliminate incorrect matching pairs. The relative camera pose, i.e., the rotation matrix R and translation vector t, is calculated using the epipolar geometry constraints of the two images. Eight pairs of successfully matched feature points are randomly selected from the two images, and four possible relative poses (e.g., ...) are obtained through the calculation and decomposition of the essential matrix E. Figure 2 (As shown). The number of map points with positive depth values generated by one of the relative poses is more than 1.5 times the number of map points with positive depth values generated by the remaining three groups. If such a relative pose exists, the RANSAC process stops, and this relative pose is the correct solution; if no relative pose meets the condition, the calculation is repeated by selecting 8 pairs of successfully matched feature points.

[0062] = ∧ R (2)

[0063] (2-2) Loop closure detection. Based on the Bag-of-Words (BoW) model, the current frame image is compared one by one with the processed video frames, and the image with the highest similarity is added to the loop closure candidate frame set for rapid coarse retrieval. The determined loop closure candidate frame set is traversed, and the RANSAC method is used to solve the relative camera pose between the current frame and the loop closure candidate frames. Reprojection matching is then performed to determine the number of matches, ultimately identifying the loop. This loop closure constraint is added to step 3 to eliminate the cumulative error in camera pose estimation and maintain the accuracy of data acquisition over long periods.

[0064] (2-3) Detect special effects in the video. Use feature point matching to detect camera shake in the video, and use the Euclidean distance of ORB matching points to determine whether the camera moves within a certain time to determine the stillness of the camera. Add two special effects to step 4.

[0065] In step 3, the specific methods include:

[0066] (3-1) Optimize the camera pose data obtained in step 2 based on the g2o optimization library. For the relative camera pose calculated in (2-2), perform pose graph optimization by establishing multiple unary connections between multiple map points in the 3D world corresponding to a single image frame, forming a graph structure to optimize the camera pose data and reduce errors. Perform local BA optimization on common-view keyframes by adding map points corresponding to multiple keyframes with common-view relationships into the graph structure for optimization. Perform essential graph optimization on the loop closure constraint information detected in (2-3) by adding map points corresponding to common-view keyframes and loop closure frames to the graph optimization, connecting the beginning and end poses of the loop closure in the graph structure for camera pose optimization. Finally, after all work is completed, add the map points corresponding to the global keyframes to the graph optimization to obtain the global optimized camera pose.

[0067] This invention also provides product examples:

[0068] A system for generating motion data based on visual algorithms for POV video processing includes:

[0069] The data reading module is configured to read camera image information and acquire internal camera parameters.

[0070] The visual odometry module is configured to estimate camera motion by extracting and matching feature points from adjacent video frame images.

[0071] The loop closure detection module is configured to determine whether the camera has returned to a previously visited environmental area based on the acquired image features, establish loop closure constraints and add them to the backend global optimization module, thereby eliminating the cumulative error of camera pose estimation and maintaining the accuracy of data acquisition over a long period of time.

[0072] The backend optimization module is configured to receive camera pose measurements from the visual odometry module at different times, as well as loop closure detection information, optimize them, reduce errors, and obtain a globally consistent camera motion trajectory.

[0073] The special effects detection module is configured to detect camera shake or stillness in the video, providing positional instructions for the motion simulation module to generate special effects.

[0074] The motion simulation module is configured to map the generated global camera motion trajectory parameters onto the motion range of the motion seat through a flushing filter as the main drive signal. In addition, it generates a shaking signal or a stationary signal according to the special effect position command, and superimposes it on the main drive signal to generate the corresponding motion data.

[0075] In the above system, the visual odometry module includes:

[0076] The feature extraction and matching module is configured to extract ORB features from the image. ORB is a locally invariant feature detector, meaning it remains robust even when the image undergoes rotational transformations. It extracts FAST corner points by detecting areas of significant local pixel grayscale changes in the image, calculates BRIEF descriptors, and uses the Fast Approximate Nearest Neighbor (FLANN) algorithm to match ORB feature points in adjacent video frames.

[0077] The camera pose initial estimation module is configured to use the RANSAC strategy to eliminate incorrect matching pairs based on the matching point pairs, and to calculate the relative camera pose using the epipolar geometric constraints of the two images. The RANSAC algorithm first randomly selects points, then samples the selected points, and finally fits a straight line to verify whether the selected in-place points are the correct matching points.

[0078] In this embodiment, the loop closure detection module includes:

[0079] The appearance verification module is configured to compare the current frame image with the processed video frame images one by one based on the Bag-of-Words (BoW) model, and add the image with the highest similarity to the loop closure candidate frame set.

[0080] The geometric verification module is configured to traverse the loop closure candidate frame set determined by the appearance verification module, use the RANSAC method to solve the relative camera pose between the current frame and the loop closure candidate frames, perform reprojection matching, determine the number of matches, and finally determine the loop closure.

[0081] In this embodiment, the backend optimization module includes:

[0082] The pose graph optimization module is configured to establish multiple unary connections between multiple map points in the corresponding 3D world within a single image frame, forming a graph structure to optimize the relative camera pose.

[0083] The local BA optimization module is configured to put the map points corresponding to multiple keyframes with co-visibility into a graph structure for optimization.

[0084] The essential graph optimization module adds map points corresponding to common keyframes and loop closure frames to the graph optimization, connecting the poses of the beginning and end of the loop closure in the graph structure to optimize the camera pose.

[0085] The global BA optimization module is configured to add map points corresponding to global keyframes to the graph optimization and perform global camera pose optimization.

[0086] The graph structures in this embodiment are all based on the graph structures used in the g2o optimization library, and the selected motion seat is a six-degree-of-freedom parallel mechanism STEWART platform.

[0087] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, the present invention can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0088] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0089] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0090] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0091] The above description is merely a preferred embodiment of the present invention and is not intended to limit the invention. Various modifications and variations can be made to the present invention by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.

[0092] While the specific embodiments of the present invention have been described above in conjunction with the accompanying drawings, this is not intended to limit the scope of protection of the present invention. Those skilled in the art should understand that various modifications or variations that can be made by those skilled in the art without creative effort based on the technical solutions of the present invention are still within the scope of protection of the present invention.

Claims

1. A method for generating motion data based on a first-person perspective video, characterized by, Includes the following steps: Acquire video image information to determine the camera's internal parameters; Extract feature points from adjacent video frames, estimate camera motion, determine whether the camera has returned to a previously visited environmental area based on the obtained image features, and establish loop closure constraints. Detect camera shake or stillness in a video and provide positional instructions for generating special effects; Based on the camera pose measured at different times and the information from loop closure detection, the data is optimized to obtain globally consistent camera motion trajectory data. The generated globally consistent camera motion trajectory data is mapped onto the motion range of the motion seat as the main drive signal, and position commands generated by special effects are superimposed to generate corresponding motion data. The specific process of acquiring video image information and determining the camera's internal parameters is as follows: the video data is processed into frame images, and the camera's internal parameters are determined based on the frame width and frame height of the frame images, combined with the camera's focal length which is set to a fixed value. The process of determining whether the camera has returned to a previously visited environmental area based on the obtained image features and establishing loop closure constraints includes comparing the current frame image with the processed video frame images one by one, adding the image with the highest similarity to the loop closure candidate frame set to complete the coarse search, traversing the determined loop closure candidate frame set, solving the relative camera pose between the current frame and the loop closure candidate frames, performing reprojection matching, determining the number of matches, and finally determining the loop closure. The process of optimizing camera pose includes: Multiple unary connections are established between multiple map points in the 3D world corresponding to a single image frame to form a graph structure. For shared keyframes, perform local optimization by placing the map points corresponding to multiple keyframes with shared view relationships into a graph structure for optimization.

2. The method for generating motion data based on first-person perspective video as described in claim 1, characterized in that, The specific process of extracting feature points from adjacent video frames and estimating camera motion includes: extracting FAST corner points and describing features by detecting areas with significant local pixel grayscale changes in the image; matching feature points in adjacent video frames using a fast approximate nearest neighbor algorithm; eliminating incorrect matching pairs; and calculating the relative pose of the camera using epipolar geometric constraints of the two images until the number of successful pairings meets a predetermined value.

3. The method for generating motion data based on first-person perspective video as described in claim 1, characterized in that, The specific process of detecting camera shake or stillness in a video and providing positional instructions for generating special effects includes detecting camera shake in the video and using the Euclidean distance of feature matching points to determine whether the camera has moved within a certain period of time, in order to determine the stillness of the camera.

4. The method for generating motion data based on first-person perspective video as described in claim 1, characterized in that, The optimization process for loop closure detection information includes: The detected loop closure constraint information is subjected to essential graph optimization. The map points corresponding to the common keyframes and loop closure frames are added to the graph optimization. The poses of the beginning and end of the loop are connected in the graph structure to optimize the camera pose. By adding the map points corresponding to the global keyframes to the graph optimization, the optimized pose of the global camera is obtained.

5. A motion data generation system based on first-person perspective video, employing the method described in claim 1, characterized in that, include: The data reading module is configured to acquire video image information and determine the camera's internal parameters; The visual odometry module is configured to extract feature points from adjacent video frame images to estimate camera motion; The loop closure detection module is configured to determine whether the camera has returned to a previously visited environmental area based on the acquired image features, and to establish loop closure constraints. The special effects detection module is configured to detect camera shake or stillness in a video and provide positional instructions for generating special effects. The backend optimization module is configured to optimize the camera pose measured at different times and the loop closure detection information to obtain globally consistent camera motion trajectory data. The motion simulation module is configured to map the generated globally consistent camera motion trajectory data onto the motion range of the motion seat as the main drive signal, and superimpose position commands generated by special effects to generate corresponding motion data.

6. The motion data generation system based on first-person perspective video as described in claim 5, characterized in that, The visual odometry module includes: The feature extraction and matching module is configured to extract image features by detecting areas of significant local pixel grayscale changes in the image, extracting FAST corner points, calculating descriptors, and using a fast approximate nearest neighbor algorithm to match feature points in adjacent video frames. The camera pose initial estimation module is configured to eliminate incorrect matching pairs based on matching point pairs and calculate the relative camera pose using the epipolar geometric constraints of the two images.

7. The motion data generation system based on first-person perspective video as described in claim 5, characterized in that, The loop closure detection module includes: The appearance verification module is configured to compare the current frame image with the processed video frame images one by one, and add the image with the highest similarity to the loop closure candidate frame set; The geometric verification module is configured to traverse the loop closure candidate frame set determined by the appearance verification module, solve the relative camera pose between the current frame and the loop closure candidate frames, perform reprojection matching, determine the number of matches, and finally determine the loop closure.

8. The motion data generation system based on first-person perspective video as described in claim 5, characterized in that, The backend optimization module includes: The pose graph optimization module is configured to establish multiple unary connections for multiple map points in the corresponding 3D world in a single image frame to form a graph structure to optimize the relative camera pose. The local BA optimization module is configured to put the map points corresponding to multiple keyframes with co-view relationships into a graph structure for optimization; The essential graph optimization module adds map points corresponding to common keyframes and loop closure frames to the graph optimization, and connects the poses of the beginning and end of the loop closure in the graph structure to optimize the camera pose. The global BA optimization module is configured to add map points corresponding to global keyframes to the graph optimization and perform global camera pose optimization.