A management method and system for querying spatiotemporal multimodal data required for embodied intelligence training

By constructing hardware meta-languages ​​and embodied configurations, the problems of time-varying topological relationships and heterogeneous data consistency in multimodal data management of embodied intelligent robots are solved, realizing efficient management and querying of multimodal data and improving data availability and query convenience.

CN122309583APending Publication Date: 2026-06-30WISDOM CORNERSTONE (SHANGHAI) TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
WISDOM CORNERSTONE (SHANGHAI) TECHNOLOGY CO LTD
Filing Date
2026-04-01
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing database systems are unable to effectively manage multimodal sensor data from embodied intelligent robots, especially in handling time-varying topological relationships, spatial consistency of heterogeneous data, and time alignment, resulting in low data availability and logical fragmentation.

Method used

We construct hardware meta-languages ​​and embodied configurations, represent robot coordinate system transformation relationships through spatial graphs, generate topological flow records of hardware mounting events, and use ALIGN operators and expiration protection mechanisms for data querying to achieve efficient management of multimodal data.

Benefits of technology

It ensures spatial consistency and temporal alignment of multimodal data, supports mapping between traditional databases and robot-specific concepts, enables efficient querying and data traceability, and improves the convenience and accuracy of data retrieval.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122309583A_ABST
    Figure CN122309583A_ABST
Patent Text Reader

Abstract

This invention relates to the field of embodied intelligent robots, and provides a management method and system for spatiotemporal multimodal data querying required for embodied intelligent training. The method includes: setting hardware metadata, which stores device definitions for multiple hardware devices, including performance definitions, category definitions, and flow mode definitions for multimodal devices; creating an embodied configuration, including a spatial graph, a hardware inventory registry, and hardware mounting configurations; automatically generating one or more data streams based on the device definitions and hardware mounting configurations, wherein the data streams continuously receive time-series observations generated by the hardware devices to form a time data stream, which includes a topology stream used to record topology events of hardware device installation and uninstallation; organizing the time data stream into segments, each segment being accompanied by a descriptive label representing a complete recording session; and, in response to receiving a data query request including one or more segments, traversing and executing the data query request to obtain the query results.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of embodied intelligent robots, and in particular to a management method and system for querying spatiotemporal multimodal data required for embodied intelligent training. Background Technology

[0002] Embodied intelligence is an intelligent system that perceives, learns, and dynamically interacts with its environment based on a physical body. It represents an evolutionary form of artificial intelligence moving from the "virtual world" to the "physical world." Through embodied intelligence, robots can closely resemble natural organisms, deeply interacting with the real world and completing tasks based on human natural language commands. Traditional database systems, such as time-based data stream databases, Geographic Information System (GIS) databases, and SQL database systems, face the following problems when storing and managing sensor data for such embodied intelligent robots: First, robots consist of various sensors and actuators. Different sensors and actuators often generate multiple data streams at different frequencies, such as RGB, depth, and IMU. Current data management systems can only store the data but cannot perform effective retrieval and training, treating these data streams as irrelevant. Second, because robots form time-varying transformation trees (TF trees) during movement, the spatial relationships between sensors also change with robot movement. Traditional database systems lack the ability to accurately understand this temporal and spatial complexity. Third, because robots support tool changers, reconfigurable payloads, and modular components, which can be attached or unattached during operation, database systems must be able to query or represent this time-varying topology data, which existing systems cannot meet. Fourth, in robot systems, different sensors have different sampling frequencies. For example, the operating frequencies of cameras, inertial measurement units (IMUs), and joint states are typically 30Hz, 400Hz, and 500Hz, respectively. Traditional database systems' JOIN operations cannot correctly align these time-mismatched data streams. Finally, robot applications require queries for physical properties, such as the center of mass and inertial tensor, concepts that traditional databases do not possess.

[0003] Therefore, existing general-purpose data storage technologies lack intrinsic support for the unique "spatiotemporal-physical" semantics of embodied intelligent systems. Specifically, existing technologies struggle to handle time-varying topological relationships caused by the dynamic configuration evolution of robots, cannot automatically perform physical spatial consistency verification and precise time alignment of heterogeneous multimodal data at the storage layer, and lack an adaptive instantiation mechanism based on hardware metadata templates. This results in low data availability and logical fragmentation when performing complex task backtracking and physical attribute queries. A data governance architecture that integrates dynamic spatiotemporal maps and physical perception is urgently needed to address these technical bottlenecks. Summary of the Invention

[0004] To overcome the aforementioned technical deficiencies, the present invention aims to provide a management method and system for spatiotemporal multimodal data querying required for embodied intelligent training. This method builds hardware primitives for embodied intelligent robots, creates embodied configurations of spatial graphs and hardware mounting configurations, automatically generates temporal data streams containing topological flows, and organizes them according to recorded session fragments, thereby achieving efficient management of robot sensor data.

[0005] This invention discloses a management method for querying spatiotemporal multimodal data required for embodied intelligence training, comprising the following steps: Set up hardware metadata, which stores standardized device definitions for multiple hardware devices. The device definition is a hardware metadata template decoupled from physical hardware instances. The hardware metadata template includes the capability mode and multimodal data mode of the hardware device. The multimodal data mode includes the boundaries of data generation, data type, and storage method. Create an embodied configuration, which includes a space graph, a hardware inventory registry, and a hardware mounting configuration. The space graph is configured with static and dynamic transformation relationships between the robot's coordinate systems. The hardware inventory registry registers the hardware devices to be mounted as available system resources. The hardware mounting configuration mounts the instantiated hardware metadata template to the coordinate system of the space graph. For any hardware device mounting, one or more data streams are automatically generated based on the device definition and hardware mounting configuration. The data stream continuously receives time-series observations generated by the hardware device to form a time data stream. The time data stream includes a topology stream, which is used to record topology events of hardware device installation and unloading. The time data stream is organized into segments, each segment is accompanied by a descriptive label, representing a complete recording session. In response to receiving a data query request that includes one or more segments, the data query request is traversed and executed to obtain the query results.

[0006] Preferably, the method further includes the following steps: At the physical storage layer, the changes in the embodied configuration are continuously monitored. In response to changes in the spatial map or hardware mount, the configuration fingerprints of consecutive segments are compared. When the configuration fingerprints are different, the time-series observations are automatically truncated and stored as segments. At the logical query layer, in response to the start and end times of the query, a logical session view spanning one or more segments is constructed, and the logical session view does not display the segments.

[0007] Preferably, the traversal execution of the data query request includes: The query requests for multi-source heterogeneous sensor data are executed by traversing through the ALIGN operator. When the robot structure undergoes dynamic changes, topological flow queries can be used to look up the coordinate relationship transformations in the spatial graph at any timestamp. When the robot's structure undergoes dynamic changes, physical perception is used to query the robot's overall physical properties.

[0008] Preferably, the query request for multi-source heterogeneous sensor data is executed by traversing through the ALIGN operator, including the following steps: Receive a query request for multi-source heterogeneous sensor data, specifying a first time data stream and a second time data stream that need to be timestamped, wherein the first time data stream and the second time data stream are two time data streams with different frequencies; For each first timestamp in the first time data stream, perform a loop to identify the latest observation in the second time data stream whose second timestamp is less than or equal to the first timestamp; The difference between the first timestamp and the second timestamp is calculated as the expiration value, and a preset threshold is set. The expiration value is then compared with the threshold. If the expired value is greater than the threshold, then return the first time-series observation and null value of the first time-series data stream; If the expiration value is less than or equal to the threshold, a time aggregation window is defined with the first timestamp as the center or boundary, all observations of the second time data stream within the aggregation window are obtained, statistical aggregation calculation is performed, the calculation result of the aggregation calculation is returned, and the calculation result is aligned with the first time series observation.

[0009] Preferably, the transformation of coordinate relationships in a spatial graph using topological flow queries at any timestamp includes the following steps: Receive data query requests based on topological events and construct the time-varying spatial graph state under the specified timestamp; Determine whether a path exists between the source coordinate system and the target coordinate system in the time-varying spatial graph state; If no path exists, return null; if a path exists, combine the transformations on the path and return the transformation result.

[0010] Preferably, the state for constructing the time-varying spatial map under the specified timestamp includes: Load this static transformation relationship in the robot's embodied configuration; Based on this static transformation relationship, combined with the dynamic transformation relationship with timestamps, the coordinate system deviation corresponding to the specified timestamp is completed by interpolation calculation; Based on the topological flow, coordinate systems are added or removed, and the connection relationships between coordinate systems are corrected to obtain the state of the time-varying spatial graph.

[0011] Preferably, the method of using physical perception to query the overall physical attributes of the robot includes the following steps: Obtain and parse the robot definition file to obtain the inertial properties of the robot base and multiple links; Based on the topology flow and query timestamp, dynamically reconstruct the tool attributes that change over time. These tool attributes represent the relationship between the tool and the robot's installation over time. Receive a query request for the robot's physical properties, and input the timestamp to be queried and the joint configuration, which includes but is not limited to the position or angle of each joint of the robot; Based on the topology flow log, specify the tool attribute under the timestamp to be queried; The robot's physical properties are calculated using the inertial property, the joint configuration, and the specified tool property. These physical properties include, but are not limited to, the center of mass, the inertial tensor, and the kinetic energy.

[0012] Preferably, the method of using topological flow to query coordinate relationship transformations in a spatial graph at any timestamp further includes: For the floating base coordinate system of the base-mobile robot, a state estimation flow is set up, which represents the root transformation of the floating base coordinate system relative to the world coordinate system; By combining the root transformation, the static transformation relationship, the dynamic transformation relationship, and the connection relationship between coordinate systems, the time-varying spatial graph state of the base mobile robot is constructed.

[0013] Preferably, the method further includes: A derived stream is generated by fusing multiple data streams through a processor. When the data stream is version controlled, the derived stream is independent of the version control of the data stream. Maintain the dependency graph attributes between derived streams and the original data streams. When a segment is updated due to data backflow or configuration changes, automatically mark the affected derived stream as invalid or trigger incremental recalculation based on the dependency graph attributes.

[0014] Preferably, the method further includes: The data stream is stored in unstructured and semi-structured formats. When executing a query request, a query engine is invoked to load the hardware metadata template as a semantic interpreter to dynamically parse the stored binary or JSON data and obtain the parsed data. The hardware metadata template also includes physical constraints, which are used to perform real-time verification on the parsed data.

[0015] This invention discloses a management system for querying spatiotemporal multimodal data required for embodied intelligence training, comprising: The hardware metadata setting module sets hardware metadata, which stores standardized device definitions for multiple hardware devices. The device definition is a hardware metadata template decoupled from the physical hardware instance. The hardware metadata template includes the capability mode and multimodal data mode of the hardware device. The multimodal data mode includes the boundaries of data generation, data type, and storage method. The embodied configuration creation module creates an embodied configuration, which includes a spatial diagram, a hardware inventory registry, and a hardware mounting configuration. The spatial diagram is configured with static and dynamic transformation relationships between the robot's coordinate systems. The hardware inventory registry registers the hardware devices to be mounted as available system resources. The hardware mounting configuration mounts the instantiated hardware metadata template to the coordinate system of the spatial diagram. The data stream generation module is used to automatically generate one or more data streams based on the device definition and hardware mounting configuration for any hardware device to be mounted. The data stream continuously receives the time-series observations generated by the hardware device to form a time data stream. The time data stream includes a topology stream, which is used to record the topology events of the hardware device's installation and unloading. The query request execution module is used to organize the time data stream into segments, each segment being accompanied by a descriptive label representing a complete recording session. In response to receiving a data query request that includes one or more segments, the module iterates through and executes the data query request to obtain the query results.

[0016] Compared with existing technologies, the above technical solution has the following advantages: 1. Separate the immutable metadata of hardware from the dynamic time data stream, construct a unified hardware meta-language, support a hardware device to define multiple asynchronous multimodal data streams of different frequencies in the hardware meta-language, and map the multimodal data streams in the hardware meta-language to the actual data streams, automatically generate independent data streams of different frequencies, and all data streams share the same coordinate system in the spatial graph, solve the problem of existing technologies treating multimodal data as irrelevant streams, and ensure the spatial consistency of cross-modal data; 2. A spatial graph structure is used to represent the transformation relationships between all coordinate systems of the robot. It can query the static transformation of fixed coordinate system relationships and the dynamic transformation of real-time relationships. In addition, a topology flow is set up to record hardware mounting and unmounting events, which can realize topology validity verification. That is, when querying spatial relationships, it automatically verifies whether the coordinate system path exists at a specified timestamp, thus solving the problem of spatiotemporal complexity caused by robot motion and the inability to process time-varying topology data. 3. Set up an ALIGN operator with expiration protection. Based on the low-frequency time data stream, match the timestamp of each low-frequency time data stream with the latest valid observation of the high-frequency time data stream. At the same time, filter out invalid data through a preset threshold. If the expired value exceeds the threshold, return null value instead of expired data. This forms an ASOF connection with an explicit expiration protection mechanism to prevent the silent use of expired data and solve the problem that the existing JOIN operation cannot align the time-mismatched data streams. 4. Set up a physical perception query specifically designed for robots to query the center of mass, inertia tensor, etc., and unify traditional databases with robot-specific concepts by mapping traditional database concepts to robot-specific concepts, supporting traditional SQL queries while preserving robot-specific semantics; 5. Using fragments as logical units, segmentation trigger conditions are configured to automatically segment data, enabling efficient querying. Virtual derived streams are generated based on the fragments. When the original data stream is updated or segmented, the derived stream is automatically recalculated and retains an independent version to ensure traceability. 6. Set the root coordinate system of the mobile robot to a dynamic, configurable floating base, driven by a specific state estimation flow. When querying, there is no need to distinguish between fixed base and floating base. It automatically adapts to the root transformation calculation, improving the convenience of data query. Attached Figure Description

[0017] Figure 1 This is a flowchart illustrating the steps of a management method for querying spatiotemporal multimodal data required for embodied intelligence training, as disclosed in an embodiment of the present invention. Figure 2 This is a structural framework diagram of a management system for querying spatiotemporal multimodal data required for embodied intelligence training, as disclosed in an embodiment of the present invention. Detailed Implementation

[0018] The advantages of the present invention will be further illustrated below with reference to the accompanying drawings and specific embodiments.

[0019] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numerals in different drawings denote the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this disclosure. Rather, they are merely examples of apparatuses and methods consistent with some aspects of this disclosure as detailed in the appended claims.

[0020] The terminology used in this disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The singular forms “a,” “the,” and “the” as used in this disclosure and the appended claims are also intended to include the plural forms unless the context clearly indicates otherwise. It should also be understood that the term “and / or” as used herein refers to and includes any and all possible combinations of one or more of the associated listed items.

[0021] It should be understood that although the terms first, second, third, etc., may be used in this disclosure to describe various information, such information should not be limited to these terms. These terms are used only to distinguish information of the same type from one another. For example, without departing from the scope of this disclosure, first information may also be referred to as second information, and similarly, second information may also be referred to as first information. Depending on the context, the word "if," as used herein, can be interpreted as "when," "in response to determination," or "when," or "in the event of a determination." In the description of this invention, it should be understood that the terms "longitudinal", "lateral", "up", "down", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc., indicate the orientation or positional relationship based on the orientation or positional relationship shown in the accompanying drawings. They are only for the convenience of describing this invention and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation. Therefore, they should not be construed as limitations on this invention.

[0022] In the description of this invention, unless otherwise specified and limited, it should be noted that the terms "installation", "connection" and "linking" should be interpreted broadly. For example, they can refer to mechanical or electrical connections, or internal connections between two components. They can be direct connections or indirect connections through an intermediate medium. Those skilled in the art can understand the specific meaning of the above terms according to the specific circumstances.

[0023] In the following description, suffixes such as "module," "part," or "unit" used to denote elements are used only for the convenience of the description of the invention and have no specific meaning in themselves. Therefore, "module" and "part" can be used interchangeably.

[0024] To achieve the above objectives, one embodiment of the present invention discloses a management method for querying spatiotemporal multimodal data required for embodied intelligence training. This management method can be understood as constructing an abstract database, but it is not a physical database software. Rather, it is a dynamic data management method that can map the physical world to the digital world, efficiently managing the storage, querying, and analysis of robot sensor data. Specifically, it includes the following steps: Step S101: Set hardware metadata. The hardware metadata stores standardized device definitions for multiple hardware devices. The device definition is a hardware metadata template decoupled from physical hardware instances. The hardware metadata template includes the capability mode and multimodal data mode of the hardware device. The multimodal data mode includes the boundaries of data generation, data type, and storage method. Specifically, a hardware abstraction layer is constructed: a hardware meta-term is set as the single source of truth, storing standardized hardware metadata templates decoupled from physical instances. This hardware meta-term is used to verify whether the timing observations generated during runtime conform to the physical specifications of the hardware, such as frequency upper limits and resolution ranges. Furthermore, the hardware meta-term enforces a unified rigid transformation relationship in the spatial graph for its multiple subordinate multimodal data streams. The device definition is a standardized definition of the inherent attributes of the hardware device itself. Here, the hardware device includes the robot itself and the tools mounted on the robot. The hardware device's capabilities include video, range, motion, force, system state, position, command, and general-purpose sensor types. The video type includes standard color images (RGB), depth maps, stereo vision, thermal imaging, and event cameras; the range type includes lidar, radar, and ultrasonic sensors; and the motion state type includes joint positions / angles and inertial measurement units (IMUs). Devices include u) and odom (odometer), contact-based devices such as torque, tactile, and pressure sensors, system status-based devices such as gripper opening / closing status, battery level, voltage, health status, and diagnostic information, positioning-based devices such as GPS and magnetometer, control command-based devices such as joint position / angle, linear or angular velocity commands, and gripper control commands, and general-purpose sensors such as accelerometers, gyroscopes, and temperature sensors. When defining devices, the specific capability modes of the aforementioned hardware devices are referenced to standardize the description of the hardware device functions.

[0025] More importantly, since a hardware device can simultaneously output multiple types of data, such as RGB images, depth images, and IMU data, this embodiment of the invention defines a pattern for each type of output in the hardware meta-language. For example, this multimodal data pattern includes the boundaries of data generation, data type, and storage method, thereby enabling standardized identification and management of multimodal data generated by the same hardware device. In other words, by pre-defining the multimodal data pattern, when hardware mounting occurs, data streams of different frequencies are automatically created without reconfiguration each time mounting occurs, thereby decoupling the inherent attribute definition of the hardware device (in the hardware meta-language) from the specific mounting instance (in the embodied configuration below). Therefore, this multimodal data pattern defines the rules for mapping a single physical hardware interface to multiple independent logical data streams. Each logical data stream has an independent frequency, message type, and topic, but shares the same parent coordinate system in the spatial graph or has a fixed relative transformation relationship.

[0026] Step S102: Create an embodied configuration, which includes a space graph, a hardware inventory registry, and a hardware mounting configuration. The space graph is configured with static and dynamic transformation relationships between the robot's coordinate systems. The hardware inventory registry registers the hardware devices to be mounted as available system resources. The hardware mounting configuration mounts the instantiated hardware metadata template to the coordinate system of the space graph. Specifically, this step can be understood as implementing embodied dynamic mapping, that is, creating an embodied configuration, including a spatial graph, a hardware inventory registry, and mounting configuration; then, in response to the mounting command, instantiating the hardware metadata template and generating the corresponding heterogeneous data flow channels in the spatial graph. It is worth noting that the embodied configuration is composable, capable of dynamically merging the definitions of multiple independent hardware subsystems into a single embodied entity at the logical layer. For example, the upper body of a robot and its chassis can be dynamically assembled into a complete robot entity to accommodate different robot forms.

[0027] More specifically, a spatial graph describes the transformation relationships between the robot's coordinate systems. Static transformation relationships do not change over time, such as the fixed transformation between two rigidly connected links; while dynamic transformation relationships change over time, such as the transformation between links driven by joints. Spatial graphs are usually represented by a tree or graph structure, where nodes are coordinate systems and edges are transformations between coordinate systems, thus allowing the calculation of the transformation between any two coordinate systems.

[0028] Hardware mounting configuration specifically associates the hardware devices defined in the hardware meta-language with a specific coordinate system in the spatial graph. For example, a camera device is defined in the hardware meta-language and associated with the robot's "head" coordinate system through hardware mounting. Based on this, the camera data can be described in the robot's head coordinate system. Furthermore, the positional relationship of the camera relative to other parts of the robot can be calculated through the spatial graph, such as the positional relationship of the camera relative to the robot's base.

[0029] Simultaneously, the embodied configuration also includes a hardware inventory registry. Before mounting hardware to a specific spatial coordinate system, the hardware devices to be mounted are registered as pending entities in the database, establishing a unique correspondence between virtual identifiers in the database and devices in the physical world, thereby supporting subsequent dynamic topology mounting. In other words, this hardware inventory registry is used to register mountable hardware devices, while the topology events recorded in the topology flow record establish dynamic connections between registered hardware devices in the hardware inventory and interface coordinate systems in the spatial graph.

[0030] The above steps S101-S102 can be understood as follows: First, a hardware abstraction layer is established, and a standardized hardware metadata template that is decoupled from hardware metadata storage and physical instances is created. In response to the mounting instructions for registering physical devices in the embodied configuration, the hardware metadata template is called for instantiation, and the logical data flow pattern corresponding to the physical device, which contains multiple heterogeneous channels, is automatically mapped and generated. It is worth noting that this embodied configuration is not a static file and can be understood as a dynamic spatial database pattern.

[0031] Step S103: For any hardware device mounting, one or more data streams are automatically generated according to the device definition and hardware mounting configuration. The data stream continuously receives time-series observations generated by the hardware device to form a time data stream. The time data stream includes a topology stream, which is used to record topology events of hardware device installation and unloading. Furthermore, in response to the mounting of multimodal hardware devices, multiple data streams with the same or different frequencies are generated, and the frequency corresponds to the mounting time.

[0032] Specifically, when a hardware device is mounted within an embodied configuration—that is, when a hardware device definition is associated with a coordinate system in a spatial graph—a corresponding data stream is automatically created based on the streaming pattern described in the device definition. For example, a RealSense instance... The D435i camera device definition includes three streaming modes: RGB stream, depth stream, and IMU stream. When this device is attached, three data stream instances are automatically created to receive RGB images, depth images, and IMU data, respectively. It's important to note that robot sensor data is not directly uploaded to the data stream defined in this embodiment. Currently, embodied intelligence data formats include BAG, MCAP, custom schema Parquet, and HDF5. Regardless of the format, the robot first records the external environment data seen by its sensors through RGB and RGBD and depth data, while also recording the state of its joints and end effector. In other words, the hardware device is defined first—this is a logical concept, similar to a database concept, but not entirely the same. For example, the RealSense camera's device definition describes its relevant information. Then, the embodied robot is assembled based on the hardware device's device definition. Finally, sensor data can be used to create datasets using this embodied robot. Thus, a 1:1 mapping is achieved to the most basic hardware, migrating the concept of physical space to the digital world. Each data stream continuously receives time-series observations from the corresponding hardware device during runtime and timestamps them. These time-series observations form time-series data in chronological order. Furthermore, data streams typically receive data asynchronously, meaning each stream generates data according to its own frequency and rhythm. It is worth noting that time synchronization is not mandatory between different streams.

[0033] The topology flow does not receive sensor observations from hardware devices. Instead, it records topology events occurring within the system. Topology events are generated when a hardware mount is added or removed from the embodied configuration, or when hardware is dynamically installed / uninstalled during runtime (e.g., changing tools). The topology flow can track the history of changes in robot hardware mounts, thus reconstructing the topology state at that time when querying historical data. It is worth noting that this embodiment is based on hardware-defined abstraction, not simply recording data. Instead, it first establishes an abstract model of "hardware type" at the database level, and then instantiates this model into a specific data flow through mounting actions. This differs from the traditional approach of hardcoding sensor parameters into configuration files. Furthermore, the topology flow here records device installation and unloading events, which can be understood as establishing a spatial relationship query mechanism based on event sourcing. It stores the topology event flow, and when responding to a query request, it replays or interpolates the topology events in real time based on the query timestamp, dynamically reconstructing the effective spatial graph structure at that moment in memory. Based on this reconstructed graph structure, it performs coordinate transformation calculations or data associations, thereby supporting unstructured robot data queries where the structure changes over time.

[0034] Step S104: Organize the time data stream into segments, each segment is accompanied by a descriptive label, representing a complete recording session. In response to receiving a data query request that includes one or more segments, traverse and execute the data query request to obtain the query results. Specifically, a segment is a physical storage unit automatically generated based on a configuration fingerprint, and a record session refers to a user-defined logical time range. A record session can span multiple segments, or a segment can contain multiple record sessions. This architecture can automatically optimize storage based on hardware configuration changes without interrupting the user's logical view.

[0035] In one specific embodiment, the time data stream can be continuous images from a 30Hz camera or inertial data from a 400Hz IMU. Configuration fingerprints are used to automatically generate segments with a fixed threshold of 5 minutes, representing the smallest physical storage unit for camera / IMU data. Specifically, the time data stream is structured into segments based on a "complete recording session." These segments are visible to the user, each with a clear time range and a unique identifier. It also includes descriptive tags that describe the attributes of the recording session, including but not limited to task type (e.g., part picking, path planning), environmental information, and custom annotations. Thus, a complete recording session can be "completing a 15-minute logistics patrol record of warehouse area A," where the time range is "15 minutes," the task type is "logistics patrol," the environmental information is "warehouse area A," and it has a unique identifier, episode_id. This 15-minute logistics patrol recording session, with its continuous camera images and IMU inertial data, is automatically split into three 5-minute physical segments by a configured fingerprint: 0-5 minutes for the first segment, 5-10 minutes for the second segment, and 10-15 minutes for the third segment. As you can see, this recording session spans three segments. The descriptive tag and unique episode_id of this recording session are associated with the corresponding data segments in these three segments. When a user queries, all data belonging to this recording session from these three segments can be automatically integrated without the user having to manually split it. Furthermore, forming time data streams into segments allows for intuitive data management based on the specific content of the descriptive tags. Data can be extracted according to task type. Referring to the example above, a command like "Retrieve all logistics patrol records for warehouse area A" can be directly issued for data management. Users can also quickly filter target data using the descriptive tags, directly locating the time range of the target segment without traversing all the original data. This approach helps reduce data management and query costs.

[0036] Based on the above steps S101-S104, the embodiments of the present invention can be understood as follows: A hierarchical hardware model is constructed, including immutable hardware primitives and dynamic embodied configurations, and a hardware inventory is registered. A list of hardware that can be dynamically mounted is declared in the embodied configuration. Multiple heterogeneous data streams with independent frequencies are automatically generated according to the streaming pattern, and a topology stream is generated to record the mounting / unmounting events of inventory hardware and the spatial coordinate system. A two-layer storage management system is also constructed, including a physical layer and a logical layer. The physical layer monitors configuration fingerprint changes and automatically segments continuous observations into storage fragments. The logical layer responds to user definitions, mapping time ranges to logical sessions. Logical sessions point to one or more storage fragments through indexes. Finally, a spatiotemporal query is executed. Based on the time range of the logical session, the spatial graph structure at that moment is reconstructed using the topology stream, and the query results are returned.

[0037] Furthermore, this management approach also includes: At the physical storage layer, the changes in the embodied configuration are continuously monitored. In response to changes in the spatial graph or hardware mount, the configuration fingerprints of consecutive segments are compared. When the configuration fingerprints are different, the time-series observations are automatically truncated and stored as segments, making the query more efficient and faster. At the logical query layer, in response to the start and end times of the query, a logical session view spanning one or more segments is constructed, and the logical session view does not display the segments.

[0038] Specifically, a configuration fingerprint is a unique string identifier generated by a hash algorithm from a set of core configuration parameters that affect the consistency of data collection. Based on this, all data within the same segment is based on the same hardware configuration and operating parameters, avoiding data deviations caused by changes in configuration or parameters and ensuring the accuracy of subsequent analysis. It should be noted that this segmentation is completely transparent. When a user queries a segment, the data within the segment is automatically read, while the mapping from that segment to that segment is hidden from the user. In other words, the user can query using only the segment, not the segment itself. This can be understood as follows: real-time calculation of hash fingerprints for hardware configuration parameters, such as frequency, resolution, and calibration parameters; when a fingerprint change is detected, automatic truncation and creation of new storage segments at the physical storage layer, while maintaining the continuity index of the user-level logical record session; when a user queries data with a specific start and end time, automatic aggregation of multiple heterogeneous storage segments at the underlying level, presenting the user with a unified logical session view. For example, a mobile robot equipped with an RGB camera and a LiDAR conducted an experiment between 10:00:00 and 10:05:00, such as walking around obstacles. However, at 10:02:30, the LiDAR's USB cable disconnected, causing the LiDAR hardware to disappear. Consequently, a new configuration fingerprint was automatically created. In this scenario, the physical storage layer saves the data in two segments. The first segment, ranging from 10:00:00 to 10:02:30, includes both RGB and LiDAR data streams. The second segment, ranging from 10:02:30 to 10:05:00, contains only the RGB data stream. At this point, if a user queries all RGB images and LiDAR data between 10:00:00 and 10:05:00, then by parsing the start and end times, it is discovered that the data storage spans two segments. The data streams of the first and second segments are extracted and concatenated into a continuous logical session view, returning a unified interface to the user. Therefore, the underlying segmentation is hidden from the user, and the user does not need to be informed that the data they are querying is divided into two segments, nor does the user need to manually handle missing data.

[0039] Further, in step S104, the execution of the data query request includes: In one embodiment, the query requests for multi-source heterogeneous sensor data are executed by traversing the ALIGN operator to achieve time alignment of heterogeneous and different frequency sensor data; further, the following steps are included: Receive a query request for multi-source heterogeneous sensor data, specify a first time data stream and a second time data stream to be timestamped, wherein the first time data stream and the second time data stream are two time data streams with different frequencies; For each first timestamp in the first time data stream, perform a loop to identify the latest observation in the second time data stream whose second timestamp is less than or equal to the first timestamp; The difference between the first timestamp and the second timestamp is calculated as the expiration value, and a preset threshold is set. The expiration value is then compared with the threshold. If the expired value is greater than the threshold, then return the first time-series observation and null value of the first time-series data stream; If the expired value is less than or equal to the threshold, a time aggregation window is defined with the first timestamp as the center or boundary. All observations of the second time data stream within the aggregation window are obtained, and statistical aggregation calculations, such as mean, maximum value, or integral, are performed. The calculation result of the aggregation calculation is returned, and the calculation result is aligned with the first time series observation.

[0040] Specifically, the first time data stream is used as the traversal reference stream, and each timestamp of it is traversed in a loop to serve as the alignment anchor point; the second time data stream is used as the matching search stream to find the latest observation that meets the conditions. For example, the first time-series data stream is a 30Hz image stream, and the second time-series data stream is a 100Hz IMU stream. These two streams need to be time-aligned. For each timestamp t1 in the image stream, a loop is executed to look up timestamp t2 in the IMU stream buffer. t2 should satisfy t2≤t1 and timestamp t2 is the most recent. For example, t1=100ms, and the IMU stream has data at 99.3ms, 99.8ms, and 101.1ms. At 99.3ms, the x-axis acceleration = 1, meaning that at that instant, the robot's velocity in the x-direction increases by 1m / s per second, for example, from 1m / s to 2m / s in just 1 second. At 99.8ms, the x-axis acceleration = 2, and at 101.1ms, the x-axis acceleration = 3. Simultaneously, a threshold of 10ms is set (the threshold is generally determined according to the expected data rate of the second time-series data stream), and the aggregation window is [t1-10ms, ... [t1]; then the found t2 is 99.8ms. Since 101.1ms is greater than t1, and 99.3ms is not the latest; next, calculate the expiration value, i.e. t1-t2, the expiration value is 100ms-99.8ms=0.2ms. At the same time, compare the expiration value with the threshold, and get the expiration value 0.2ms < threshold 10ms, indicating that the data has not expired. That is, the latest observation found in the IMU stream is fresh relative to the current timestamp t1 of the image stream, and can represent the motion state of the robot at time t1. It can be fused with the image data. So, the extracted IMU data in the window is 99.3ms and 99.8ms. Using the mean aggregation function, i.e. (1+2)÷2=1.5, the [image frame, 1.5] is returned.

[0041] Conversely, if t1=100ms, and the IMU stream has data at 80.3ms, 81.8ms, and 101.1ms, and t2=81.8ms, the expired value t1-t2=18.2ms is greater than the threshold of 15ms, indicating that the data has expired. This means that at 100ms, there is no available valid IMU data that can be paired with the image data, so the corresponding null value is returned instead of expired data, forming an ASOFJOIN with expiration protection.

[0042] In another embodiment, when the robot structure undergoes dynamic changes, the coordinate relationship transformation in the spatial graph at any timestamp is queried using topological flow queries, thereby achieving the representation and convenient querying of data containing topological events; furthermore, it includes the following steps: The topology flow is stored in ascending order of timestamps, recording topology events for all hardware devices; in detail, each topology event includes timestamp, event type, hardware identifier, coordinate system, etc. Receive data query requests based on topological events. The query request specifies a query timestamp, a source coordinate system, and a target coordinate system. Under the query timestamp, perform the transformation from the source coordinate system to the target coordinate system, and then construct the time-varying spatial graph state under the specified timestamp. Determine whether a path exists between the source coordinate system and the target coordinate system in the time-varying spatial graph state; If no path exists, return null; if a path exists, combine the transformations on the path and return the transformation result.

[0043] In this embodiment, constructing the time-varying spatial graph state under the specified timestamp includes the following steps: The static transformation relationship in the robot's physical configuration is loaded, such as the fixed offset transformation between the links of the robot body, and the fixed transformation between the permanently installed sensors and the mounting base; Based on this static transformation relationship, combined with the dynamic transformation relationship with timestamps, the coordinate system deviation corresponding to the specified timestamp is completed by interpolation calculation; among them, linear interpolation, spherical linear interpolation (SLERP) or spline interpolation methods can be used to calculate the difference and obtain the accurate transformation value under the query timestamp; Based on the topology flow, coordinate systems are added or removed, and the connection relationships between coordinate systems are corrected. Specifically, for the mount event, a new coordinate system is added to the space graph, a connection edge is established from the parent coordinate system to the new coordinate system, and the transformation relationship is initialized; for the unmount event, the corresponding coordinate system and all its connection edges are removed from the space graph. The time-varying spatial graph state is obtained through the above three steps, and then the existence of a path is determined based on the time-varying spatial graph state.

[0044] Specifically, the robot's spatial map is not fixed or static. Tool changes or the installation and unloading of modular components cause the connection relationships of the coordinate system in the spatial map to change over time. For example, the end flange of a robotic arm can be equipped with different tools, and the robotic arm will change tools according to task requirements during operation. When querying the precise position of a tool tip relative to the robot base at any historical moment, the robot's embodied configuration defines the static transformation relationships between the robotic arm base, a series of joints (joint1, joint2,...) and the end flange, such as the fixed offset from the robotic arm base to joint1. At the same time, the hardware meta-language defines the installable tools, such as a welding torch. The recorded topology events of the hardware device include: at timestamp 100, the welding torch is installed onto the end flange; at timestamp 200, the welding torch is unloaded from the end flange.

[0045] When the query request is to query the position of the welding torch tip relative to the robot arm base during welding, the source coordinate system is the welding torch tip, the target coordinate system is the robot arm base, and the query timestamp is 150 seconds; The construction of the time-varying spatial graph state includes: First, loading static transformation relationships, including all static transformations such as from the robot arm base to joint1, joint1 to joint2, etc.; Second, calculating dynamic transformation relationships through interpolation, including interpolating the angles of each joint at 150 seconds from the joint angle data stream; Finally, correcting the coordinate system connection relationship. Since the query timestamp of 150 seconds is after the welding gun is installed and before it is unloaded, the welding gun installation event is applied, and the installation interface coordinate system and tip coordinate system of the welding gun are added to the spatial graph and connected to the flange coordinate system. Next, in the constructed time-varying space graph state, it was determined that there exists a path from the welding torch tip to the robot arm base, which is: welding torch tip → welding torch mounting interface → flange → ... → robot arm base; Finally, by combining all transformations along the above path, the transformation matrix from the welding torch tip to the robotic arm base is obtained, and the transformation result is returned.

[0046] Furthermore, for robots with movable bases, querying the coordinate relationship transformation in the spatial graph at any timestamp using topological flow involves the following steps: For the floating base coordinate system of the mobile robot, a state estimation flow is set up, which represents the root transformation of the floating base coordinate system relative to the world coordinate system. In other words, the root coordinate system of the mobile robot is set to a dynamic and configurable floating base, and a unified query interface is used for both the fixed base and the floating base. By combining the root transformation, the static transformation relationship, the dynamic transformation relationship, and the connection relationship between coordinate systems, the time-varying spatial diagram state of the base mobile robot is constructed. The judgment and other operations are performed using this time-varying spatial state diagram as described above, which will not be elaborated here.

[0047] Specifically, the root node of the spatial graph is configured as a dynamic floating base, and the state estimation flow is set as the driving source in the embodied configuration. When constructing the time-varying spatial graph state, the data of the driving source is dynamically injected to calculate the real-time transformation of the root node relative to the world coordinate system, thereby unifying the query logic of the fixed base and the mobile robot.

[0048] Specifically, in one embodiment, the robot has a mobile base and a robotic arm with a tool mounting interface at its end for mounting a camera sensor. The robot undergoes the following state transitions: It starts at time t=0, and the moving base is located at the origin of the world coordinate system; At t=10 seconds, a camera sensor is installed at position (1,0,0) to the end effector of the robotic arm; At t=20 seconds, the robot moves to position (2,0,0), and at the same time the robotic arm moves, causing the position of the camera sensor relative to the moving base to change; At t=30 seconds, the camera sensor was unloaded.

[0049] At this point, two topology events are recorded: an installation event and an uninstallation event. These can be: Installation event: timestamp=10, event type=installation, hardware identifier=camera_001, coordinate system=arm_end_effector (robotic arm end effector). Unload event: Timestamp=30, Event type=Unload, Hardware identifier=camera_001, Coordinate system=arm_end_effector; Subsequently, a query request is received to query the transformation of the camera coordinate system relative to the world coordinate system at t=15; Construct the time-varying space graph state at t=15, and load the static transformation relationship, such as the static transformation relationship from the base to the robot arm base, denoted as... The static transformations between the links inside the robotic arm are also known. Let's assume the static transformation chain from the robotic arm base to the end effector is... Dynamic transformation relationships are calculated through interpolation. For example, the angles of each joint in the robotic arm's joint state are interpolated to obtain the joint angles at t=15, thus obtaining the dynamic transformation relationship from the robotic arm base to the robotic arm end effector, denoted as... ; Next, since the moving base is mobile, the state estimation stream records the transformation of the camera coordinate system relative to the world coordinate system, denoted as... ; Based on the topological flow, the connection relationships are corrected. For example, at t=15, the installation event has occurred, but the unloading event has not yet occurred, so the camera has been installed. A camera coordinate system will be added to the end effector of the robotic arm, and its transformation is a static transformation from the end effector of the robotic arm to the camera coordinate system, denoted as... ; Finally, the above connections are merged, that is... It constructs a complete time-varying spatial state and returns the transformation result.

[0050] In another embodiment, when the robot structure undergoes dynamic changes, physical perception is used to query the overall physical properties of the robot, realizing a physical attribute data query specific to the robotics field; further, it includes the following steps: Obtain and parse the robot definition file to obtain the inertial properties of the robot base and multiple links; Based on the topology flow and query timestamp, dynamically reconstruct the tool attributes that change over time. These tool attributes represent the relationship between the tool and the robot's installation over time. Receive a query request for the robot's physical properties, and input the timestamp to be queried and the joint configuration. The joint configuration includes, but is not limited to, the position or angle of each joint of the robot. Based on the topology flow log, specify the tool attribute under the timestamp to be queried; The robot's physical properties are calculated using the inertial property, the joint configuration, and the specified tool property. These physical properties include, but are not limited to, the center of mass, the inertial tensor, and the kinetic energy.

[0051] Specifically, when calculating the robot's physical attributes at a specific historical moment or the current moment, the system first determines which tools were installed at that time based on the topology flow. Then, combining the robot's definition file and the joint configuration at that time, the overall physical attributes, including those of the tools, are calculated. This allows for force control and motion planning based on accurate physical attributes, improving control precision and safety. It's important to note that this physical perception query method uses topology flow to determine the valid hardware mounts at the query moment, parsing the robot's definition file and the inertial parameters of the valid mounted hardware. Based on the current joint state and dynamically synthesized kinematic chains, the system calculates the overall center of mass or inertial tensor, etc. This calculation result is dynamically synthesized and calculated in real time, rather than being pre-stored.

[0052] For example, taking a six-axis robotic arm as the core carrier, combined with a gripper tool, as an example, this six-axis robotic arm is used for part gripping operations on the production line. The tool is changed in real time according to the operation requirements. The following specific steps can accurately query the overall physical properties of the robot and the tool at any given time to support subsequent force control and motion planning, including: The official definition file of the six-axis robotic arm is read and parsed in advance to extract the inertial properties of the robot base and six links, including the mass, center of mass coordinates, and inertial tensor of each component. For example, the mass of the base is 50kg, the mass of the first link is 8kg, and the mass of the second link is 6kg. The end flange of the six-axis robotic arm is a tool mounting position, which records the installation or disassembly of the tool and the robot in real time and generates tool attributes; Query the overall physical attributes of the robot and tools at a certain moment. For example, receive a query request with timestamp = 09:10:00 and joint configuration = first joint (30°), second joint (90°), ... sixth joint (0°). Traverse the topology flow log, match the time node 09:10:00, and obtain the tool attributes for that timestamp. For example, if the tool is an installed gripper, the tool attributes include: the mass of the gripper is 2.5kg, the distance from the center of mass of the gripper to the end flange is 5cm, and the inertia tensor of the gripper is I1. By combining the above inertial properties, the joint configuration, and the specified tool properties, the robot's physical properties are finally obtained.

[0053] In one embodiment of the present invention, the management method further includes: The processor merges multiple data streams to generate a derived stream. When the data stream is version controlled, the derived stream is independent of the version control of the data stream. Updating, deleting or modifying the original data stream version will not affect the generated derived stream. The derived stream has its own version iteration logic, which solves the problems of version chaos, untraceability and dependence on the original data. Maintain the dependency graph attributes between derived streams and the original data streams. When a segment is updated due to data backflow or configuration changes, automatically mark the affected derived stream as invalid based on the dependency graph attributes, or trigger incremental recalculation to ensure the time / version consistency between derived data and original data.

[0054] Specifically, derived streams possess dependency graph attributes. When the segment containing the original data stream changes, only the affected derived stream portion can be recalculated based on the dependency relationship—that is, local recalculation—without requiring a full recalculation. This approach is similar to the data management method of incremental map construction. Data backflow refers to offline recording and correction of existing original data in segments. Configuration changes mainly refer to changes in hardware configuration and sensor parameters, such as hardware topology changes or the mounting of new sensors, causing the segments of the original data stream to be split due to topology changes. When the aforementioned underlying segment is updated, the dependency graph attributes are traversed to find all derived flows affected by that segment. Then, two automated processing strategies are executed: First, the affected derived flows are automatically marked as invalid; Second, based on the dependency graph attributes, the derived flows are required to synchronize with the original data in real time, locking only the changed segment range and recalculating "partial data dependent on the changed segment" in the derived flow, rather than recalculating the entire derived flow.

[0055] In one embodiment of the present invention, the method further includes: The data stream is stored in unstructured and semi-structured formats (such as a single wide table). When executing a query request, a query engine is invoked to load the hardware metadata template as a semantic interpreter to dynamically parse the stored binary or JSON data and obtain the parsed data. The hardware metadata template also includes physical constraints, which are used to perform real-time verification on the parsed data.

[0056] Specifically, regardless of the underlying physical storage structure—such as a single wide table or hierarchical storage—the query engine is equipped with an interpretation layer. During query execution, this layer dynamically decodes and maps the raw binary data stream into a structured view with physical semantics, such as pixel matrices and physical units, based on predefined hardware metadata templates. It also utilizes the physical constraints in the hardware metadata template to perform integrity checks on the decoded data. These physical constraints are the dimensions, ranges, precision, units, and sampling characteristics that the hardware device must satisfy when outputting data in a real physical environment, such as range and units. In short, during a query, the hardware metadata template acts as a translator, transforming the messy raw data into unified structured data, while simultaneously verifying the data's validity according to the hardware parameter rules within the template.

[0057] In one embodiment of the present invention, the management method further includes: The standard database concepts are mapped to the robotics domain, as shown in Table 1: Table 1: Mapping Table In one embodiment of the present invention, in response to the data writing or querying phase, physical specification parameters in the hardware meta-language are invoked, including but not limited to maximum range and field of view, to perform semantic-level verification of the data stream; among the multimodal data streams, the spatial transformation consistency between different data streams is verified based on rigid body kinematic constraints, wherein the different data streams can be visual streams and inertial streams.

[0058] In summary, one embodiment of the present invention mainly constructs the following four virtual-real mapping relationships: Firstly, the mapping relationship between physical device models and metadata templates at the database level is established. Specifically, a hardware meta-language is built as a metadata template carrier at the database level. This hardware meta-language specifically stores the device definitions of multiple hardware devices. The device definitions cover the core attributes of physical device models, including performance definitions, category definitions, and streaming mode definitions for multimodal devices. Based on this metadata template, a unified configuration basis is provided for subsequent hardware mounting and data stream generation, thereby avoiding the problem of hard-coding configuration of physical device models from the bottom layer.

[0059] Secondly, the mapping relationship between physical mounting actions and dynamic graph reconstruction instructions in the database layer is specifically to transform the hardware mounting / unmounting actions at the physical layer into reconstruction instructions for the robot space graph at the database layer, driving the space graph to be updated in real time, thereby preventing the TF tree from being treated as static.

[0060] Third, the mapping relationship between user task logic and the adaptive sharded storage of the database. Specifically, a complete recording session is mapped to the adaptive sharded storage of the database based on changes in robot configuration, i.e., fragments, and then only simple file storage is performed.

[0061] Fourth, the mapping relationship between the robot's movement and the state flow injection drive in the database layer. Specifically, the actual movement of the robot's floating base in the physical layer is mapped to the state estimation flow in the database layer. The state estimation flow integrates the chassis movement and the robotic arm coordinate system into the same spatial graph, thereby preventing the chassis and robotic arm from being treated separately.

[0062] Therefore, the embodiments of the present invention resolve the contradiction between the dynamic changes in physical configuration and the static mode of traditional databases in the field of robotics, and realize standardized data management for time-varying physical systems.

[0063] Another embodiment of the present invention discloses a management system for querying spatiotemporal multimodal data required for embodied intelligence training, comprising: The hardware metadata setting module 10 stores device definitions for multiple hardware devices. The device definition hardware metadata template includes the capability mode and multimodal data mode of the hardware device. The stream mode has the definition of multiple data streams with different frequencies generated by the same hardware device. The embodied configuration creation module 20 includes a spatial diagram and a hardware mounting configuration. The spatial diagram is configured with static and dynamic transformation relationships between the robot's coordinate systems. The hardware mounting configuration mounts the device definition in the hardware meta-language to the coordinate system of the spatial diagram. The data stream generation module 30 is used to automatically generate one or more data streams based on the device definition and hardware mounting configuration for any hardware device to be mounted. The data stream continuously receives the time-series observations generated by the hardware device to form a time data stream. The time data stream includes a topology stream, which is used to record the topology events of the hardware device's installation and unloading. The query request execution module 40 is used to organize the time data stream into segments, each segment being accompanied by a descriptive label representing a complete recording session. In response to receiving a data query request that includes one or more segments, the module iterates through and executes the data query request to obtain the query results.

[0064] It should be noted that this system corresponds to the management method described above; for any other parts not described, please refer to the content of the methods described above.

[0065] The present invention also discloses a computer device, including a memory and a processor, wherein the memory and the processor are communicatively connected to each other, the memory stores computer instructions, and the processor executes the above-described management method by executing the computer instructions.

[0066] This disclosure also provides a computer-readable storage medium in which the methods described in this disclosure can be implemented in hardware or firmware, or implemented as recordable on a storage medium, or implemented as computer code originally stored on a remote storage medium or a non-transitory machine-readable storage medium and subsequently stored on a local storage medium after being downloaded over a network. Thus, the methods described herein can be processed by software stored on a storage medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware. The storage medium may be a magnetic disk, optical disk, read-only memory, random access memory, flash memory, hard disk, or solid-state drive, etc.; further, the storage medium may also include combinations of the above types of memory. It is understood that computers, processors, microprocessor controllers, or programmable hardware include storage components capable of storing or receiving software or computer code that, when accessed and executed by the computer, processor, or hardware, implements the methods shown in the above embodiments.

[0067] It should be noted that the embodiments of the present invention have better implementability and are not intended to limit the present invention in any way. Any person skilled in the art may use the above-disclosed technical content to change or modify it into equivalent effective embodiments. However, any modifications or equivalent changes and modifications made to the above embodiments based on the technical essence of the present invention without departing from the content of the technical solution of the present invention shall still fall within the scope of the technical solution of the present invention.

Claims

1. A management method for querying spatiotemporal multimodal data required for embodied intelligence training, characterized in that, Includes the following steps: Set up hardware metadata, which stores standardized device definitions for multiple hardware devices. The device definition is a hardware metadata template decoupled from physical hardware instances. The hardware metadata template includes the capability mode and multimodal data mode of the hardware device. The multimodal data mode includes the boundaries of data generation, data type, and storage method. Create an embodied configuration, which includes a space graph, a hardware inventory registry, and a hardware mounting configuration. The space graph is configured with static and dynamic transformation relationships between the robot's coordinate systems. The hardware inventory registry registers the hardware devices to be mounted as available system resources. The hardware mounting configuration mounts the instantiated hardware metadata template to the coordinate system of the space graph. For any hardware device mounting, one or more data streams are automatically generated based on the device definition and hardware mounting configuration. The data streams continuously receive time-series observations generated by the hardware devices to form a time data stream. The time data stream includes a topology stream, which is used to record topology events of hardware device installation and uninstallation. The time data stream is organized into segments, each segment being accompanied by a descriptive label representing a complete recording session. In response to receiving a data query request that includes one or more segments, the data query request is traversed and executed to obtain query results.

2. The management method for querying spatiotemporal multimodal data required for embodied intelligence training as described in claim 1, characterized in that, The method further includes the following steps: At the physical storage layer, the changes in the embodied configuration are continuously monitored. In response to changes in the spatial map or hardware mount, the configuration fingerprints of consecutive segments are compared. When the configuration fingerprints are different, the time-series observations are automatically truncated and stored as segments. At the logical query layer, in response to the start and end times of the query, a logical session view spanning one or more segments is constructed, and the logical session view does not display the segments.

3. The management method for querying spatiotemporal multimodal data required for embodied intelligence training as described in claim 1, characterized in that, The process of traversing and executing the data query request includes: The query requests for multi-source heterogeneous sensor data are executed by traversing through the ALIGN operator. When the robot structure undergoes dynamic changes, topological flow queries can be used to look up the coordinate relationship transformations in the spatial graph at any timestamp. When the robot's structure undergoes dynamic changes, physical perception is used to query the robot's overall physical properties.

4. The management method for querying spatiotemporal multimodal data required for embodied intelligence training as described in claim 3, characterized in that, The process of traversing and executing query requests for multi-source heterogeneous sensor data using the ALIGN operator includes the following steps: Receive query requests for multi-source heterogeneous sensor data, specify a first time data stream and a second time data stream to be timestamped, wherein the first time data stream and the second time data stream are two time data streams with different frequencies; For each first timestamp in the first time data stream, execute a loop to identify the latest observation in the second time data stream whose second timestamp is less than or equal to the first timestamp; The difference between the first timestamp and the second timestamp is calculated as the expiration value, and a preset threshold is set. The expiration value is then compared with the threshold. If the expired value is greater than the threshold, then the first time-series observation value and null value of the first time data stream are returned; If the expiration value is less than or equal to the threshold, a time aggregation window is defined with the first timestamp as the center or boundary, all observations of the second time data stream within the aggregation window are obtained, statistical aggregation calculation is performed, the calculation result of the aggregation calculation is returned, and the calculation result is aligned with the first time series observation.

5. The management method for querying spatiotemporal multimodal data required for embodied intelligence training as described in claim 3, characterized in that, The method of using topological flow to query coordinate relationship transformations in a spatial graph at any timestamp includes the following steps: Receive data query requests based on topological events and construct the time-varying spatial graph state under the specified timestamp; Determine whether a path exists between the source coordinate system and the target coordinate system in the time-varying spatial graph state; If no path exists, return null; if a path exists, combine the transformations on the path and return the transformation result. The step of constructing the time-varying spatial graph state under the specified timestamp includes the following steps: The static transformation relationship of the coordinate system in the robot's embodied configuration; Based on the static transformation relationship, and combined with the dynamic transformation relationship with timestamps, the coordinate system deviation corresponding to the specified timestamp is completed by interpolation calculation; Based on the topological flow, coordinate systems are added or removed, and the connection relationships between coordinate systems are corrected to obtain the state of the time-varying spatial graph.

6. The management method for querying spatiotemporal multimodal data required for embodied intelligence training as described in claim 3, characterized in that, The process of querying the overall physical attributes of the robot using physical perception includes the following steps: Obtain and parse the robot definition file to obtain the inertial properties of the robot base and multiple links; Based on the topology flow and query timestamp, the tool attributes that change over time are dynamically reconstructed. The tool attributes are the relationship between the mounted or unmounted tools and the robot's installation over time. Receive a query request for the robot's physical attributes, and input the timestamp to be queried and the joint configuration, wherein the joint configuration includes, but is not limited to, the position or angle of each joint of the robot; Based on the topology flow log, specify the utility attribute under the timestamp to be queried; The robot's physical properties are calculated using the inertial properties, the joint configuration, and the specified tool properties. These physical properties include, but are not limited to, the center of mass, the inertial tensor, and the kinetic energy.

7. The management method for querying spatiotemporal multimodal data required for embodied intelligence training as described in claim 5, characterized in that, The method of using topological flow to query coordinate relationship transformations in a spatial graph at any timestamp also includes: For the floating base coordinate system of the base-mobile robot, a state estimation flow is set up, which represents the root transformation of the floating base coordinate system relative to the world coordinate system; By combining the root transformation, the static transformation relationship, the dynamic transformation relationship, and the connection relationship between coordinate systems, the time-varying spatial graph state of the base mobile robot is constructed.

8. The management method for querying spatiotemporal multimodal data required for embodied intelligence training as described in claim 2, characterized in that, The method further includes: The processor merges multiple data streams to generate a derived stream. When the original data stream is version controlled, the derived stream is independent of the version control of the original data stream. Maintain the dependency graph attributes between the derived stream and the original data stream. When the segment is updated due to data backflow or configuration change, automatically mark the affected derived stream as invalid or trigger incremental recalculation based on the dependency graph attributes.

9. The management method for querying spatiotemporal multimodal data required for embodied intelligence training as described in claim 1, characterized in that, The method further includes: The data stream is stored in unstructured and semi-structured formats. When executing a query request, a query engine is invoked to load the hardware metadata template as a semantic interpreter to dynamically parse the stored binary or JSON data and obtain the parsed data. The hardware metadata template also includes physical constraints, which are used to perform real-time verification on the parsed data.

10. A management system for querying spatiotemporal multimodal data required for embodied intelligence training, characterized in that, The system includes: The hardware metadata setting module sets hardware metadata, which stores standardized device definitions for multiple hardware devices. The device definition is a hardware metadata template decoupled from the physical hardware instance. The hardware metadata template includes the capability mode and multimodal data mode of the hardware device. The multimodal data mode includes the boundaries of data generation, data type, and storage method. The embodied configuration creation module creates an embodied configuration, which includes a spatial diagram, a hardware inventory registry, and a hardware mounting configuration. The spatial diagram is configured with static and dynamic transformation relationships between the robot's coordinate systems. The hardware inventory registry registers the hardware devices to be mounted as available system resources. The hardware mounting configuration mounts the instantiated hardware metadata template to the coordinate system of the spatial diagram. The data stream generation module is used to automatically generate one or more data streams based on the device definition and hardware mounting configuration for any hardware device mounting. The data stream continuously receives time-series observations generated by the hardware device to form a time data stream. The time data stream includes a topology stream, which is used to record topology events of hardware device installation and unloading. The query request execution module is used to organize the time data stream into segments, each segment being accompanied by a descriptive label representing a complete recording session. In response to receiving a data query request that includes one or more segments, the module iterates through and executes the data query request to obtain query results.