Data processing method based on sliding window

By processing real-time video frames, first constraint frames, and second key frames in the SLAM system to form a stable frame set, the system instability and positioning accuracy degradation caused by data fluctuations in existing technologies are solved, achieving stable data processing and high-precision positioning.

CN115346156BActive Publication Date: 2026-06-19BEIJING LINGYU CENTURY INFORMATION TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING LINGYU CENTURY INFORMATION TECH CO LTD
Filing Date
2022-08-24
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing SLAM methods suffer from data fluctuations during optimization due to significant differences between the overall data of consecutive frames and the data of key frames. This affects system stability and positioning accuracy, making it impossible to simultaneously meet the requirements of real-time and high-precision positioning.

Method used

By processing real-time video frames, first constraint frames, and second key frames in a sliding window, a set of first constraint frames, a set of second key frames, and a set of real-time video frames are formed to perform 3D environmental target tracking, avoid data jumps, and ensure the stability and real-time performance of data processing.

Benefits of technology

This effectively avoids data jumps between constraint frames and key frames during data processing, ensuring the stability of data processing and meeting the requirements for high-precision positioning, thereby improving the overall performance of the SLAM system.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115346156B_ABST
    Figure CN115346156B_ABST
Patent Text Reader

Abstract

This application discloses a data processing method based on a sliding window. The method includes: processing video frame data from a real-time acquired environmental image video stream using first constraint frames for global constraints on an acquired global map, processing second keyframes for constraints on the environmental image within the sliding window, and performing real-time video frame acquisition processing, based on a preset sliding window size, to obtain corresponding first constraint frame sets, second keyframe sets, and real-time video frame sets; and performing target tracking in the 3D environment corresponding to the environmental image based on the first constraint frame set, second keyframe set, and real-time video frame set included in the sliding window. This method effectively avoids data jumps caused by constraint frames and keyframes during data processing, while also considering the real-time performance and high-precision positioning requirements of the data processing.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of computer vision, and more particularly to a data processing method based on a sliding window. Background Technology

[0002] As a core technology for positioning and navigation in mobile robots, virtual reality, and augmented reality devices, the algorithm framework for Simultaneous Localization and Mapping (SLAM) technology is becoming increasingly sophisticated, gradually shifting from theoretical research to market application.

[0003] Existing SLAM methods typically use a sliding window to continuously read image frames of limited capacity for local optimization to reduce computation. Simultaneously, these methods also employ keyframe construction (i.e., selectively promoting ordinary image frames as keyframes for storage) to perform fast loop closure detection and global optimization. This feedback is then used to optimize the pose of ordinary frames within the sliding window, achieving a trade-off between accuracy and speed, as well as globally consistent state information.

[0004] Although keyframes are constructed in the backend of a SLAM system to skip information from a large number of ordinary frames and accelerate nonlinear optimization, as the system runs, ordinary image frames acquired by sensors are continuously upgraded to keyframes and cumulatively stored in the keyframe database. This leads to a continuous increase in the number of parameters to be optimized, which in turn reduces the speed and accuracy of nonlinear optimization and may even affect the positioning accuracy and memory usage of the entire SLAM system. Therefore, in the backend optimization process of a SLAM system, it is necessary to use a longer period of continuous frames as a whole to maintain and optimize a keyframe of a reasonable volume. This is to maintain real-time performance for a longer period of time and improve robustness in real-world applications (such as robot navigation or AR wearable devices).

[0005] However, while the existing sliding window strategy improves robustness to a certain extent, the large difference between the overall data of consecutive frames and the key frame data means that optimizing the key frame using the overall data of consecutive frames can lead to data fluctuations, resulting in instability of the SLAM system, decreased positioning accuracy, and an inability to meet the requirements of real-time and high-precision positioning. Summary of the Invention

[0006] In view of this, embodiments of this application provide a data processing method based on a sliding window to at least partially solve the above-mentioned problems.

[0007] According to a first aspect of the embodiments of this application, a data processing method based on a sliding window is provided, comprising: based on a preset sliding window size, performing first constraint frame processing for global constraints on an acquired global map, second key frame processing for constraints on the environmental image in the sliding window, and real-time video frame acquisition processing on video frame data in a real-time acquired environmental image video stream, respectively, to obtain a corresponding first constraint frame set, second key frame set, and real-time video frame set;

[0008] Based on the first set of constraint frames, the second set of key frames, and the set of real-time video frames included in the sliding window, target tracking is performed in the three-dimensional environment corresponding to the environmental image.

[0009] In another implementation of this application, the step of performing real-time video frame acquisition processing to obtain a corresponding real-time video frame set includes: selecting at least one video frame adjacent to the current video frame from the video stream to obtain an adjacent video frame set; and integrating the adjacent video frame set with the current video frame to obtain a real-time video frame set.

[0010] In another implementation of this application, the first constraint frame processing for globally constraining the obtained global map includes: selecting multiple first video frames from the video stream whose time distance from the current video frame is greater than a first preset time threshold and whose corresponding environment has a similarity greater than a preset similarity threshold with the environment corresponding to the current video frame; and performing global constraint processing on the obtained global map based on the multiple first video frames.

[0011] In another implementation of this application, the step of performing global constraint processing on the obtained global map based on the plurality of first video frames includes: obtaining the target pose corresponding to the plurality of first video frames; and performing global constraint processing on the global map formed by historical keyframes in the video stream based on the target pose.

[0012] In another implementation of this application, the second keyframe processing for constraining the environmental image in the sliding window includes: selecting multiple second video frames from the video stream that are within a preset time range from the current video frame; and constraining the environmental image contained in the sliding window based on the multiple second video frames.

[0013] In another implementation of this application, the method further includes: performing target tracking in the three-dimensional environment corresponding to the environmental image based on the first constraint frame set, the second key frame set, and the real-time video frame set included in the sliding window, comprising: performing target recognition and tracking based on each video frame in the real-time video frame set; and correcting the recognized and tracked target according to the constraints formed by the first constraint frame set and the second key frame set.

[0014] According to a second aspect of the embodiments of this application, a data processing apparatus based on a sliding window is provided, comprising: a processing module, configured to, based on a preset sliding window size, process video frame data in a real-time acquired environmental image video stream by performing first constraint frame processing for global constraints on an acquired global map, second key frame processing for constraints on the environmental image in the sliding window, and real-time video frame acquisition processing, to obtain a corresponding first constraint frame set, second key frame set, and real-time video frame set;

[0015] The display module is used to perform target tracking in the three-dimensional environment corresponding to the environmental image based on the first set of constraint frames, the second set of key frames, and the real-time video frame set included in the sliding window.

[0016] According to a third aspect of the present application, an electronic device is provided, including: a processor, a memory, a communication interface, and a communication bus, wherein the processor, the memory, and the communication interface communicate with each other through the communication bus; the memory is used to store at least one executable instruction, which causes the processor to perform the operation corresponding to the method described in the first aspect.

[0017] According to a fourth aspect of the embodiments of this application, a computer storage medium is provided having a computer program stored thereon, which, when executed by a processor, implements the method described in the first aspect.

[0018] In the solution of this application embodiment, by displaying the first constraint frame set, the second key frame set, and the real-time video frame set in the sliding window, it is possible to effectively avoid data jumps between constraint frames and key frames during data processing, ensure the stability of data processing, and take into account both the real-time performance and high-precision positioning requirements of data processing. Attached Figure Description

[0019] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments recorded in the embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings.

[0020] Figure 1 A schematic block diagram of an example SLAM algorithm.

[0021] Figure 2 This is a flowchart illustrating the steps of a sliding window-based data processing method according to an embodiment of this application.

[0022] Figure 3 This is a schematic block diagram of a sliding window-based data processing apparatus according to an embodiment of this application.

[0023] Figure 4 This is a schematic diagram of the structure of an electronic device according to an embodiment of this application. Detailed Implementation

[0024] To enable those skilled in the art to better understand the technical solutions in the embodiments of this application, the technical solutions in the embodiments of this application will be clearly and thoroughly described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art should fall within the protection scope of the embodiments of this application.

[0025] It should be understood that the terms "first," "second," and "third," etc., in the claims, specification, and drawings of this disclosure are used to distinguish different objects, not to describe a specific order. The terms "comprising" and "including" as used in the specification and claims of this disclosure indicate the presence of the described features, integrals, steps, operations, elements, and / or components, but do not exclude the presence or addition of one or more other features, integrals, steps, operations, elements, components, and / or sets thereof.

[0026] It should also be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the scope of this disclosure. As used in this disclosure and claims, the singular forms “a,” “an,” and “the” are intended to include the plural forms unless the context clearly indicates otherwise. It should also be understood that the term “and / or” as used in this disclosure and claims refers to any combination and all possible combinations of one or more of the associated listed items, and includes such combinations.

[0027] Figure 1This is a schematic block diagram of an example SLAM algorithm. Existing SLAM algorithms include tracking, small map optimization, large map optimization, and loop closure detection. Specifically, Step 1: Perform tracking algorithm processing, and then add frame information to the small map. Step 2: Since the small map requires less data maintenance, frames with excessive duplicate information are deleted, and the remaining frames are called keyframes. This way, although the small map maintains less data, the information loss is not excessive. With each new frame, the data in the small map is optimized to provide more accurate data. Here, the concept of a small map is consistent with the concept of a window. Step 3: To make the SLAM calculation results more accurate, all keyframes generated in the small map are saved to the large map. Step 4: When a loop closure occurs (when the image detects the position the SLAM previously passed through), the data in the large map is optimized to reduce the drift of the calculation results.

[0028] The SLAM algorithm in this example has relatively good performance in computer vision processing. However, when optimizing keyframes with large map data, data fluctuations occur, causing problems such as data instability and low accuracy, which affect the overall stability and data calculation accuracy of the SLAM system.

[0029] Figure 2 An exemplary flow of a sliding window-based data processing method according to an embodiment of this application is shown. This application applies to the field of SLAM (Simultaneous Localization and Mapping) technology. SLAM, short for Simultaneous Localization and Mapping, uses cameras to perceive the environment. For example, when an autonomous robot is in an unknown environment, it uses information acquired by its camera to estimate its own position and the surrounding environment. During its movement, the robot performs localization based on position estimation and the data perceived by the camera, while continuously building and updating the map and planning its path.

[0030] The data processing method based on a sliding window in this embodiment includes:

[0031] S210: Based on the preset sliding window size, the video frame data in the real-time acquired environmental image video stream are processed by first constraint frame processing for global constraints on the obtained global map, second key frame processing for constraints on the environmental image in the sliding window, and real-time video frame acquisition processing, to obtain the corresponding first constraint frame set, second key frame set, and real-time video frame set.

[0032] It should be noted that since the input data for SLAM is a real-time acquired environmental image video stream, and this stream is composed of video frame data, processing of the video frame data is necessary. Here, a video frame refers to an image frame, the smallest unit that makes up a video. A real-time video frame refers to the image frame corresponding to the current time; it should be understood that there is exactly one current video frame, while the total number of video frames is not limited here. Keyframe processing refers to extracting at least one image frame that meets specific requirements from the video frames, and then processing the extracted video frames to obtain the set of image frames, which is the keyframe set. Constraint frame processing refers to applying constraints to the video frame data; the resulting set of image frames is the constraint frame set.

[0033] It should be noted that the sliding window in this embodiment is of a fixed size, and the size of the sliding window can be preset based on human experience. Specifically, the sliding window size can be set to X, the size of the current video frame or the current video frame set to W, the size of the video frame constraint data to N, and the size of the keyframe set to S; wherein the sum of S, N, and W is equal to the sliding window size X, and S, N, and W can be set based on human experience.

[0034] S220: Based on the first set of constraint frames, the second set of key frames, and the set of real-time video frames included in the sliding window, target tracking is performed in the three-dimensional environment corresponding to the environmental image.

[0035] The sliding window mentioned here includes the first set of constraint frames, the second set of keyframes, and the set of real-time video frames. For example, the sliding window structure is 10 frames per second, selecting the set of keyframes as the first 5 frames to display, selecting the set of constraint frames as the middle 3 frames to display, and selecting the current set of video frames as the last 2 frames to display.

[0036] In the solution of this application embodiment, by displaying the first set of constraint frames, the second set of key frames, and the current set of video frames in the sliding window, it is possible to effectively avoid data jumps between constraint frames and key frames during data processing, ensure the stability of data processing, and take into account both the real-time performance and high-precision positioning requirements of data processing.

[0037] In one possible implementation, the real-time video frame acquisition process to obtain a corresponding real-time video frame set includes: selecting at least one video frame adjacent to the current video frame from the video stream to obtain an adjacent video frame set; and integrating the adjacent video frame set with the current video frame to obtain a real-time video frame set.

[0038] It should be noted that "adjacent video frames" here refers to all image frames preceding the current video frame in time. Since the data volume of a particular current video frame is relatively small, it cannot guarantee the acquisition of more information during tracking. Therefore, at least one video frame adjacent to the current video frame is selected and combined with the current video frame to form the current video frame set. Here, "adjacent video frames" refers to the video frames that are closest to the current video frame in time. For example, if the image frame corresponding to the current moment is the current video frame, then the image frame corresponding to the previous moment is the adjacent current video frame. The image frames that are temporally adjacent to this adjacent current video frame constitute the adjacent current video frame set. This method increases the number of target tracking frames, thereby improving data processing accuracy.

[0039] In one possible implementation, the first constraint frame processing for globally constraining the obtained global map includes: selecting multiple first video frames from the video stream that are more than a first preset time threshold from the current video frame and whose corresponding environments have a similarity greater than a preset similarity threshold with the environment corresponding to the current video frame; and performing global constraint processing on the obtained global map based on the multiple first video frames.

[0040] It should be noted that by selecting multiple first video frames from the video stream that are more than a first preset time threshold from the current video frame and whose corresponding environments have a similarity greater than a preset similarity threshold to the environment corresponding to the current video frame; and by performing global constraint processing on the obtained global map based on these multiple first video frames, the data processing can be made more accurate.

[0041] In one possible implementation, the step of performing global constraint processing on the obtained global map based on the plurality of first video frames includes: obtaining the target pose corresponding to the plurality of first video frames; and performing global constraint processing on the global map formed by historical keyframes in the video stream based on the target pose.

[0042] It should be noted that the pose data here refers to both position and orientation. Position is described using a position vector, and orientation is described by attaching a coordinate system to the object and then providing a description of this coordinate system relative to the reference system, i.e., using a rotation matrix to describe the direction. This method allows for more accurate keyframe data.

[0043] In one possible implementation, the second keyframe processing for constraining the environmental image in the sliding window includes: selecting multiple second video frames from the video stream that are within a preset time range from the current video frame; and constraining the environmental image contained in the sliding window based on the multiple second video frames.

[0044] It should be noted that the constraint frame processing method can also be achieved by selecting multiple second video frames that are within a preset time range from the current video frame. Specifically, these second video frames can be keyframes within the previous sliding window. This method ensures that the sliding window has a constraint, preventing significant data drift.

[0045] In one possible implementation, the target tracking in the 3D environment corresponding to the environmental image based on the first constraint frame set, the second key frame set, and the real-time video frame set included in the sliding window includes: target recognition and tracking based on each video frame in the real-time video frame set; and correction of the recognized and tracked target according to the constraints formed by the first constraint frame set and the second key frame set.

[0046] Figure 3 This is a schematic block diagram of a sliding window-based data processing apparatus according to another embodiment of this application. The solutions of this application embodiment can be applied to electronic devices, including but not limited to electronic devices with data processing capabilities.

[0047] The sliding window-based data processing device of this embodiment includes: a processing module 310, which is used to process video frame data in the real-time acquired environmental image video stream according to the size of a preset sliding window, performing first constraint frame processing for global constraints on the obtained global map, second key frame processing for constraints on the environmental image in the sliding window, and real-time video frame acquisition processing to obtain corresponding first constraint frame set, second key frame set and real-time video frame set.

[0048] The display module 320 is used to perform target tracking in the three-dimensional environment corresponding to the environmental image based on the first set of constraint frames, the second set of key frames, and the real-time video frame set included in the sliding window.

[0049] In other examples, the processing module 310 is specifically used to: select at least one video frame adjacent to the current video frame from the video stream to obtain a set of adjacent video frames;

[0050] The adjacent video frame set is integrated with the current video frame to obtain a real-time video frame set.

[0051] In other examples, the processing module 310 is specifically used to: select from the video stream a plurality of first video frames whose time from the current video frame is greater than a first preset time threshold and whose corresponding environment has a similarity to the environment corresponding to the current video frame greater than a preset similarity threshold.

[0052] Based on the multiple first video frames, global constraint processing is performed on the obtained global map.

[0053] In other examples, the processing module 310 is specifically used to: obtain the target pose corresponding to the plurality of first video frames;

[0054] Based on the target pose, a global constraint process is performed on the global map formed by the historical keyframes in the video stream.

[0055] In other examples, the processing module 310 is specifically used to: select from the video stream a plurality of second video frames whose time distance from the current video frame is within a preset time range;

[0056] Based on the plurality of second video frames, the environmental image contained in the sliding window is constrained.

[0057] In other examples, processing module 310 is specifically used to: perform target recognition and tracking based on each video frame in the real-time video frame set;

[0058] Correction of the identified and tracked target based on the constraints formed by the first set of constraint frames and the second set of key frames.

[0059] Reference Figure 4 This illustration shows a schematic diagram of an electronic device according to another embodiment of the present application. The specific embodiments of the present application do not limit the specific implementation of the electronic device.

[0060] like Figure 4 As shown, the electronic device may include: a processor 402, a communications interface 404, a memory 406 storing a program 410, and a communications bus 408.

[0061] The processor, communication interface, and memory communicate with each other via a communication bus. The communication interface is used to communicate with other electronic devices or servers. The processor executes programs, specifically the steps described in the method embodiments above. Specifically, the program may include program code, which includes computer operation instructions.

[0062] The processor may be a CPU, an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of this application. The one or more processors included in the smart device may be processors of the same type, such as one or more CPUs; or they may be processors of different types, such as one or more CPUs and one or more ASICs.

[0063] Memory is used to store programs. Memory may include high-speed RAM, and may also include non-volatile memory, such as at least one disk drive.

[0064] Specifically, the program can be used to enable the processor to perform the following operations: based on the size of a preset sliding window, the video frame data in the real-time acquired environmental image video stream is processed by performing first constraint frame processing for global constraints on the obtained global map, second key frame processing for extracting the environmental image in the sliding window, and current video frame processing for target tracking, to obtain the corresponding first constraint frame set, second key frame set, and current video frame set; the first constraint frame set, second key frame set, and current video frame set are displayed in the sliding window and updated in real time.

[0065] The above embodiments are only used to illustrate the embodiments of this application and are not intended to limit the embodiments of this application. Those skilled in the art can make various changes and modifications without departing from the spirit and scope of the embodiments of this application. Therefore, all equivalent technical solutions also fall within the scope of the embodiments of this application, and the patent protection scope of the embodiments of this application should be defined by the claims. The systems, devices, modules, or units described in the above embodiments can be implemented by computer chips or physical entities, or by products with certain functions.

[0066] For ease of description, the above devices are described separately by function as various units. Of course, in implementing this application, the functions of each unit can be implemented in one or more software and / or hardware.

[0067] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0068] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart... Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0069] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0070] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0071] In a typical configuration, a computing device includes one or more processors (CPUs), input / output interfaces, network interfaces, and memory. Memory may include non-persistent storage in computer-readable media, such as random access memory (RAM) and / or non-volatile memory, such as read-only memory (ROM) or flash RAM. Memory is an example of computer-readable media.

[0072] Computer-readable media includes both permanent and non-permanent, removable and non-removable media that can store information using any method or technology. Information can be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic magnetic disk storage or other magnetic storage devices, or any other non-transferable medium that can be used to store information accessible by a computing device. As defined herein, computer-readable media does not include transient computer-readable media, such as modulated data signals and carrier waves.

[0073] It should also be noted that the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such process, method, article, or apparatus. Unless otherwise specified, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes that element.

[0074] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0075] This application can be described in the general context of computer-executable instructions, such as program modules, that are executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc., that perform specific transactions or implement specific abstract data types. This application can also be practiced in distributed computing environments where transactions are performed by remote processing devices connected via a communication network. In distributed computing environments, program modules can reside in local and remote computer storage media, including storage devices.

[0076] The various embodiments in this specification are described in a progressive manner. Similar or identical parts between embodiments can be referred to interchangeably. Each embodiment focuses on describing the differences from other embodiments. In particular, the system embodiments are basically similar to the method embodiments, so the description is relatively simple; relevant parts can be referred to the descriptions in the method embodiments.

Claims

1. A data processing method based on a sliding window, characterized in that, include: Based on the preset sliding window size, the video frame data in the real-time acquired environmental image video stream are processed by first constraint frame processing for global constraints on the obtained global map, second key frame processing for constraints on the environmental image in the sliding window, and real-time video frame acquisition processing, to obtain the corresponding first constraint frame set, second key frame set, and real-time video frame set. Based on the first set of constraint frames, the second set of key frames, and the set of real-time video frames included in the sliding window, target tracking is performed in the three-dimensional environment corresponding to the environmental image. The first constraint frame processing for globally constraining the obtained global map includes: selecting multiple first video frames from the video stream whose time distance from the current video frame is greater than a first preset time threshold and whose corresponding environment has a similarity greater than a preset similarity threshold with the environment corresponding to the current video frame; and performing global constraint processing on the obtained global map based on the multiple first video frames. The second keyframe processing for constraining the environmental image in the sliding window includes: selecting multiple second video frames from the video stream that are within a preset time range from the current video frame; and constraining the environmental image contained in the sliding window based on the multiple second video frames.

2. The method of claim 1, wherein, The process of acquiring and processing real-time video frames to obtain a corresponding set of real-time video frames includes: From the video stream, at least one video frame adjacent to the current video frame is selected to obtain a set of adjacent video frames; The adjacent video frame set is integrated with the current video frame to obtain a real-time video frame set.

3. The method according to claim 1, wherein, The step of performing global constraint processing on the obtained global map based on the plurality of first video frames includes: Obtain the target pose corresponding to the plurality of first video frames; Based on the target pose, global constraint processing is performed on the global map formed by the historical keyframes in the video stream.

4. The method of claim 1, wherein, The method further includes: performing target tracking in the 3D environment corresponding to the environmental image based on the first constraint frame set, the second key frame set, and the real-time video frame set included in the sliding window, including: Target recognition and tracking are performed based on each video frame in the real-time video frame set; Correction of the identified and tracked target based on the constraints formed by the first set of constraint frames and the second set of key frames.

5. A data processing apparatus based on a sliding window, comprising: The processing module is configured to, based on a preset sliding window size, process video frame data from a real-time acquired environmental image video stream using first constraint frame processing for globally constraining an already obtained global map, second keyframe processing for constraining environmental images within the sliding window, and real-time video frame acquisition processing, to obtain corresponding first constraint frame sets, second keyframe sets, and real-time video frame sets. Specifically, the first constraint frame processing for globally constraining the already obtained global map includes: selecting multiple first video frames from the video stream whose time distance from the current video frame is greater than a first preset time threshold, and whose corresponding environment has a similarity greater than a preset similarity threshold to the environment corresponding to the current video frame; and performing global constraint processing on the already obtained global map based on these multiple first video frames. The second keyframe processing for constraining environmental images within the sliding window includes: selecting multiple second video frames from the video stream whose time distance from the current video frame is within a preset time range; and performing constraint processing on the environmental images contained within the sliding window based on these multiple second video frames. The display module is used to perform target tracking in the three-dimensional environment corresponding to the environmental image based on the first set of constraint frames, the second set of key frames, and the real-time video frame set included in the sliding window.

6. An electronic device, comprising: The processor, memory, communication interface, and communication bus are provided, wherein the processor, memory, and communication interface communicate with each other via the communication bus. The memory is used to store at least one executable instruction that causes the processor to perform the operation corresponding to the method as described in any one of claims 1-4.

7. A computer storage medium having a computer program stored thereon, which, when executed by a processor, implements the method as described in any one of claims 1-4.