Retrieving images for video scrubbing at a client device

CN122196231APending Publication Date: 2026-06-12AXIS

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
AXIS
Filing Date
2025-12-05
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing technologies suffer from latency and slow response during video scrubbing, making it difficult for users to quickly locate and navigate to parts of interest in the video. Furthermore, existing caching technologies fail to provide a seamless and smooth navigation experience.

Method used

User input is detected at the client device, and video images are adaptively retrieved using precision margin and relevance score. Images that meet the criteria are processed first through client caching, and the most relevant images are retrieved from the server when necessary. Precision margin is adjusted to adapt to video length to optimize responsiveness and efficiency.

🎯Benefits of technology

Improved video scrubbing responsiveness and user experience, providing seamless, smooth navigation, reduced latency and ensuring images closely match user intent, and optimized cache usage.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122196231A_ABST
    Figure CN122196231A_ABST
Patent Text Reader

Abstract

The present disclosure relates to retrieving images for video scrubbing at a client device, in particular to a method 400, software, apparatus and system for video scrubbing enabling a client device to retrieve images for scrubbing based on a user requested time along a video timeline of a video stored in a server. The client device checks S404 if a cached image meets a specified condition, which is a timestamp included within a precision margin around the requested time. The precision margin scales with the timeline length, providing a smaller margin for shorter timelines and a larger margin for longer timelines. If a relevant cached image is found, the image is retrieved S416; if not, the image with the highest relevance score within the precision margin is fetched S418 from the server and stored in memory.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to video scrubbing, and more particularly to methods, apparatus, and software for retrieving images for video scrubbing at a client device. Background Technology

[0002] Wiping is a technology that enables users (e.g., via a client device) to navigate and browse data such as video data. This technology allows users to explore and interact with specific points or segments within a dataset by manipulating control elements like sliders along a specified timeline or axis.

[0003] Existing technologies for supporting smoother video scrubbing, such as caching, aim to reduce the need to retrieve images from servers storing the video. While these caching techniques attempt to provide a seamless experience, they fall short of allowing smooth, uninterrupted navigation because users may still experience latency when manually navigating through segments due to repetitive image retrieval. These limitations can result in a laggy and sluggish scrubbing experience. Furthermore, current technologies may not reliably allow users to quickly locate and navigate to specific portions of interest within the video.

[0004] Therefore, there is a need for improvement in this context. Summary of the Invention

[0005] In view of the foregoing, it would be beneficial to address or at least mitigate one or more of the disadvantages discussed above, as set forth in the appended independent patent claims.

[0006] According to a first aspect of this disclosure, a method is provided for retrieving images for video scrubbing at a client device, the method comprising: detecting user input indicating a requested time along a timeline of a video stored at a server device, wherein each image of the video is associated with a corresponding relevance score; checking whether cached images satisfying each of one or more conditions are stored in the memory of the client device, wherein a first condition of the one or more conditions includes the cached images having timestamps within a precision margin of the requested time; retrieving cached images from memory once it is determined that cached images satisfying each of the one or more conditions exist in memory; and retrieving images from video from the server device once it is determined that cached images satisfying each of the one or more conditions do not exist in memory, and storing the retrieved images in memory, wherein the precision margin is proportional to the length of the timeline such that a smaller margin is used for short timelines and a larger margin is used for long timelines, and wherein retrieving images from the server device includes retrieving the image with the highest relevance score among images having timestamps within the precision margin.

[0007] As used in this article, "precision margin" defines a range on the video timeline around the time requested by the user. It allows for a degree of flexibility in selecting images for scrubbing that may not precisely match the requested timestamp but are close enough to meet the user's intent. The precision margin adjusts according to the length of the timeline: a smaller precision margin is used for shorter videos (i.e., the shorter portion of the video currently available for scrubbing), providing a more accurate timestamp match, while a larger precision margin is used for longer videos (i.e., the longer portion of the video currently available for scrubbing), thus widening the acceptable range of timestamps. This scaling ensures a balance between accuracy and efficiency by adapting to the video length, providing relevant frames without needing to retrieve exact matches from the cache, which improves performance and responsiveness during scrubbing.

[0008] The timeline represents the portion of the video currently available for scrubbing. While the entire video may be long, such as 90 minutes, the timeline can be zoomed in to display only a specific segment, such as between 10 and 14 minutes. This focused view allows users to navigate within the manageable portion of the video, providing finer control and more detailed scrubbing within the selected range. Precision margins are then adjusted to the length of this zoomed-in section, ensuring that the retrieved images closely match the user's intent within the selected timeframe.

[0009] The technique described in this disclosure optimizes the video scrubbing experience by effectively utilizing the client device's local cache and selectively retrieving images from the server based on both temporal proximity and relevance. By applying an adaptive precision margin proportional to the timeline length, the method adjusts the allowable range of timestamps, using a narrower margin for shorter timelines and a wider margin for longer timelines. This method minimizes unnecessary server retrievals by prioritizing cached images when they meet all conditions. When a server retrieval is required, the method selects the image with the highest relevance score within the precision margin, facilitating the display of the most representative image within the precision margin during scrubbing. This combination of adaptive precision and relevance-based selection improves responsiveness and enhances the user experience.

[0010] In some instances, the second condition in one or more conditions includes the cached image having the highest relevance score among the images in the video with timestamps within a precision margin.

[0011] In this example, if a cached image is not the most relevant within the video's adaptive precision margin, a more relevant image is retrieved from the server. This prioritization of relevance within a time constraint can improve the scrubbing experience by presenting images that best represent the video within the time segment corresponding to the precision margin, thereby reducing visual noise by not presenting less relevant images.

[0012] In some examples, the method further includes: once it is determined that the memory includes currently cached images that have timestamps within the precision margin but do not have the highest relevance score among images with timestamps within the precision margin, and once it is determined that memory utilization will exceed a predefined threshold when the retrieved images are stored in memory, deleting the currently cached images from the memory.

[0013] This approach facilitates the removal of less relevant images within the current precision margin when the cache is nearing its capacity (based on a predefined threshold that could be 100% or lower, e.g., 80%), to make room for new, more relevant images. Conversely, if the memory threshold is not reached, previously cached images can be retained, allowing them to potentially be reused for adjacent precision margins where they are likely the most relevant images. This strategy optimizes cache usage by prioritizing the most relevant images while maintaining flexibility in timeline navigation.

[0014] In some examples, the client device can access metadata specifying the relevance score of each image with timestamps within a precision margin, wherein checking by the client device whether a cached image satisfying one or more conditions is stored in the client device's memory includes: the client device using the metadata when checking whether a cached image satisfies a second condition. In some examples, the client device can access metadata specifying the relevance score of each image of a video.

[0015] Client devices can have access to metadata in any suitable manner. For example, metadata can be provided as a separate metadata stream. This approach is common in systems like ONVIF (a protocol for networked devices) that allow metadata to be streamed along with the main data stream (i.e., video), rather than embedding the metadata directly into the video stream. Metadata can be further provided to client devices via a custom API that implements on-demand retrieval of relevance scores for specific video frames. This API can be customized to dynamically deliver metadata as the user scrubs the timeline (i.e., corresponding to precision margins), thereby reducing data transmission to only the data required in real time and optimizing network usage. Alternatively, metadata for the entire video can be pre-buffered when the scrubbing session is initialized, enabling rapid relevance-based decision-making without additional network requests during the scrubbing operation.

[0016] In some examples, the server device can access metadata specifying the relevance score of each image with a timestamp within a precision margin. The method further includes: the client device querying the server device for the highest relevance score of the images with timestamps within the precision margin, wherein the client device checking whether cached images satisfying one or more conditions are stored in the client device's memory includes: the client device using the response of the query when checking whether cached images satisfy a second condition.

[0017] Advantageously, this example reduces the processing load on client devices because the server handles the relevance score evaluation and only provides the data needed to make caching decisions (e.g., the highest score, timestamps or indexes of the image with the highest score, etc.). This offloading allows client devices to work more efficiently, especially in resource-constrained environments. Furthermore, this example optimizes network usage by minimizing unnecessary data transfers by sending only the necessary metadata instead of the entire set of relevance scores.

[0018] In some examples, the metadata specifying the relevance score of an image in a video includes one or more of the following: the number of objects detected in the image; the number of object categories detected in the image; or a score indicating the relevance of the image.

[0019] By considering the number of objects, frames with more visual detail can be prioritized, thus providing the user with more information during the scrubbing process. Additionally or alternatively, using the diversity of object categories allows the system to select images representing a wider range of content. Additionally or alternatively, specific relevance scores provide flexibility, allowing for customized relevance measures that can take into account scene importance, event importance, or other context-specific factors.

[0020] In some instances, once it is determined that multiple cached images are stored in memory and each of one or more conditions is met, the cached image with the earliest timestamp from the multiple cached images is retrieved from among the multiple cached images.

[0021] Advantageously, this example helps to prioritize the earliest relevant frame when multiple related images exist in the cache, which can provide a more orderly or logical flow when the user is wiping the video.

[0022] In some instances, retrieving images from a server device further includes: once it is determined that multiple images with timestamps within a precision margin each have the same highest relevance score, retrieving the image with the earliest timestamp from among the multiple images.

[0023] Advantageously, this example helps to prioritize the earliest relevant frame when multiple related images exist in a video, which can provide a more orderly or logical flow when users quickly browse the video.

[0024] In some examples, the size of the precision margin is adjusted in response to changes in the scaling level of the timeline as the length of the timeline changes.

[0025] Adjusting the precision margin in response to changes in the timeline's zoom level allows for maintaining the accuracy and relevance of image selection. When the timeline is zoomed in, a smaller precision margin provides finer control, helping to ensure that retrieved images closely match the user-specified time. Conversely, when the timeline is zoomed out, a larger precision margin helps to capture a wider range of representative frames without filling the cache with nearly duplicated images.

[0026] In some instances, once it is determined that a cached image satisfying one or more of the conditions exists in memory, the cached image is displayed via the user interface of the client device; and once it is determined that a cached image satisfying one or more of the conditions does not exist in memory, the retrieved image is displayed via the user interface of the client device.

[0027] In this method, if a cached image meets all the conditions, the client device displays the cached image; otherwise, if none of the conditions are met, the image is retrieved from the server and displayed (and stored). This setup helps minimize latency for the user experience because the cached image is displayed immediately when it is available, thus improving responsiveness and providing a smoother scrubbing experience.

[0028] In some examples, the user input indicating the requested time along the video's timeline is the selection of a visual marker positioned along the length of the timeline, corresponding to the requested time along the video's timeline.

[0029] Using user-input visual markers to indicate the requested time on the timeline allows for precise and intuitive navigation, enabling users to quickly and accurately select specific points in a video.

[0030] According to a second aspect of this disclosure, the aforementioned objective is achieved by a non-transitory computer-readable storage medium having instructions stored thereon that, when executed on one or more devices having processing capabilities, are used to implement the method according to the first aspect.

[0031] According to a third aspect of this disclosure, the above objective is achieved by a client device providing a video scrubbing function, the client device being configured to retrieve images for video scrubbing by: detecting user input indicating a requested time along a timeline of a video stored at a server device, wherein each image of the video is associated with a corresponding relevance score; checking whether cached images satisfying each of one or more conditions are stored in the memory of the client device, wherein a first condition of one or more conditions includes the cached images having timestamps within a precision margin of the requested time; retrieving cached images from memory once it is determined that cached images satisfying each of one or more conditions exist in memory; and retrieving images from the video from the server device once it is determined that cached images satisfying each of one or more conditions do not exist in memory, and storing the retrieved images in memory, wherein the precision margin is proportional to the length of the timeline such that a smaller margin is used for short timelines and a larger margin is used for long timelines, and wherein retrieving images from the server device includes retrieving the image with the highest relevance score among images having timestamps within the precision margin.

[0032] According to the fourth aspect of this disclosure, the above objective is achieved by a system comprising a server and a client device of the third aspect, wherein the server is configured to: receive from the client device a query for an image with the highest relevance score among images having timestamps within a precision margin; and transmit the image to the client device.

[0033] The second, third, and fourth aspects may generally have the same features and advantages as the first aspect. It should be further noted that, unless otherwise expressly stated, this disclosure relates to all possible combinations of features. Attached Figure Description

[0034] The above and additional objects, features, and advantages of the present invention will be better understood through the following illustrative and non-limiting detailed description of embodiments of the present disclosure with reference to the accompanying drawings, in which the same reference numerals will be used for similar elements, wherein:

[0035] Figure 1 A system for video scrubbing according to an embodiment is shown;

[0036] Figure 2 A cache of images for video scrubbing is shown according to an embodiment;

[0037] Figure 3 A cache of images for video scrubbing is shown according to an embodiment;

[0038] Figure 4 A flowchart illustrating a method for retrieving images for video scrubbing at a client device. Detailed Implementation

[0039] In today's digital environment, video scrubbing (the ability to quickly navigate through video content by moving along a timeline) is a valuable feature for users seeking efficient access to specific scenes or moments within a video. With a vast amount of video content available across platforms, the ability to quickly locate relevant sections is increasingly important for professional analytics, personal entertainment, and content creation. Effective scrubbing requires a smooth and responsive experience where users can scroll through representative images of a video without significant delays or irrelevant frames. However, achieving this responsiveness, especially in client-server setups where videos are stored remotely, can be challenging. The techniques described in this paper address these challenges by optimizing caching strategies and intelligently retrieving the most relevant images from the server, providing users with a seamless and efficient scrubbing experience that balances speed, accuracy, and contextual relevance.

[0040] Figure 1 A system 100 for video scrubbing of video 142 stored on a server device 140 is illustrated according to various embodiments. The exemplary system 100 includes a display 102 that presents content from the video 142 (via a user interface 104). The video 142 can be streamed to the display 102 using protocols such as MPEG-DASH, HLS, or similar adaptive streaming protocols that adapt to network conditions to achieve smooth transmission and playback.

[0041] Display 102 includes a timeline 110 representing video 142. The timeline 110 enables video scrubbing via user input 106, which specifies a desired time point along the timeline. For example, a user can select a specific time by positioning a visual marker 108 (e.g., a slider) along the length of the timeline 110. The position of the visual marker 108 corresponds to the requested time within video 142, allowing the user to quickly navigate to and view the frame associated with that specific point. Any other suitable method of selecting a specific time may be employed.

[0042] Users can adjust the zoom level of timeline 110 to navigate within specific sections of video 142. Zooming in on timeline 110 allows for more precise control, enabling users to scribble in shorter time intervals and pinpoint specific moments with greater accuracy. This is particularly beneficial when searching for fine details in information-dense or event-heavy sections of the video. Conversely, zooming out provides a broader view of video 142, making it easier to navigate between larger segments or quickly locate key scenes throughout video 142. This flexibility in zoom level enhances the overall user experience by adapting to varying navigation needs within the video. As used herein, the “length of the timeline” corresponds to the portion of video currently available for scribble. This length can vary depending on the zoom level of the timeline: shorter lengths represent magnified views focused on specific segments of the video (e.g., lengths corresponding to 5 minutes, 1 minute, etc. in video 142), while longer lengths correspond to magnified views (e.g., lengths corresponding to 20 minutes, 30 minutes, full length, etc. in video 142), thus covering a wider span of the video.

[0043] System 100 also includes a client device 120 responsible for retrieving images for video scrubbing based on user input 106. Client device 120 may be, for example, a computer directly connected to display 102, or it may be connected to the display via a local area network such as Wi-Fi or Ethernet. Client device 120 detects user input 106 specifying the requested time along timeline 110 via connection 112 to display 102.

[0044] Client device 120 includes a precision margin determiner 122 configured to establish a precision margin proportional to the length of timeline 110. Specifically, a smaller margin is used for shorter timelines, allowing for finer control, while a larger margin is applied as the timeline length is extended, providing a wider range around the requested time. Therefore, the length of timeline 110 directly affects the precision margin applied later when selecting relevant images for wiping. For example, if the requested time indicated by user input 106 is 43:20, and the timeline length is set to display a one-hour segment, the precision margin can be set, for example, to 5 seconds. If the margin is set to 5 seconds, this will allow system 100 to retrieve images for video wiping within a 10-second range around approximately 43:20 (i.e., from 43:15 (5 seconds earlier than the requested time) to 43:25 (5 seconds later than the requested time)). In another example, if the requested time indicated by user input 106 is 10:24:30 and the timeline length is set to display 24-hour segments, the precision margin can be set to 120 seconds, allowing system 100 to retrieve images for video scrubbing within a 4-minute range (10:22:30 to 10:26:30) around the requested time. These examples illustrate possible settings, and specific precision margins can be adjusted based on system requirements, user preferences, or the desired balance between retrieval accuracy and performance.

[0045] Client device 120 includes an image determiner 124 configured to check whether a cached image that satisfies each of one or more specified conditions is stored in the memory (cache) 126 of client device 120 and is therefore available for wiping. A first condition requires that the cached image has a timestamp within a precision margin of the requested time. In some instances, a second condition among one or more conditions includes that the cached image has the highest relevance score among images in the video with timestamps within a precision margin. The following will combine... Figure 2 and Figure 3 These conditions will be described further.

[0046] When the image determiner 124 identifies a cached image in memory 126 that meets each of the specified conditions, it retrieves the cached image from memory and sends it (via connection 112) to display 102 for display as a scrubbing image via user interface 104. This process utilizes cached images to provide a fast response, thereby improving the scrubbing experience by minimizing latency.

[0047] Once the image determiner 124 determines that no cached image satisfying one or more of the conditions is present in the memory 126, the image determiner 124 is configured to instead retrieve the image from the video 142 of the server device 140 and store the retrieved image in the memory 126. The client device 120 is typically connected to the server 140 via a network connection 132 through the Internet 134, which can be implemented using HTTP / HTTPS protocols for data transmission. The server device 140 may be, for example, a network camera.

[0048] System 100 is configured to retrieve images from server device 140, the image retrieval including retrieving the image with the highest relevance score from images having timestamps within a precision margin.

[0049] Server device 140 includes an image retrieval unit 146 configured to receive a query from client device 120 via network connection 132. This query requests an image with the highest relevance score among images with timestamps within a specified precision margin. Upon receiving the request, image retrieval unit 146 locates the relevant image from video 142 stored on server device 140 and transmits it back to client device 120 via network connection 132 (e.g., back to image determiner 124). When an image is received from server device 140, image determiner 124 is configured to store the retrieved image in memory 126. The retrieved image can then be sent (via connection 112) to display on display 102 as a scrubbing image via user interface 104, thereby delivering a context-sensitive visual response to the user based on user input 106.

[0050] The data required to determine the relevance score of an image can reside in client device 120 and / or server device 140. In one example, client device 120 can access metadata 128 (or metadata indicating the relevance score of each image in video 144) that specifies the relevance score of each image with a timestamp within the precision margin. Such metadata 128 can be used to check if one or more conditions include that a cached image has the highest relevance score among images in the video with timestamps within the precision margin. By comparing the relevance scores in metadata 128, client device 120 can determine whether a locally cached image meets the specified conditions without requesting additional information from server device 140. This setup simplifies the checking process, enabling fast access to relevant images based on pre-stored scores 128.

[0051] In some examples, server device 140 maintains metadata 144 of the relevance score (or the score for the entire video) for each image within a specified precision margin. When a condition requires that a cached image must have the highest relevance score among the images in that range (a second condition), client device 120 can verify this by querying server 140. Specifically, client device 120 sends a request to server device 140 to obtain the highest relevance score, the index of the image with that score, or another identifier representing the most relevant image within the precision margin.

[0052] Therefore, the server can respond with a relevance score, an index, or any other identifier uniquely associated with the image that maintains the highest relevance score. The client device 120 can then use this information to check whether the cached image meets the second condition.

[0053] Various embodiments of the metadata 128, 144 used to store relevance scores can affect how client device 120 retrieves an image with the highest relevance score within a precision margin from server device 140. For example, if metadata 128 is stored locally on client device 120, client device 120 can use that data 128 to identify an image with the highest relevance score within a precision margin and include only the image's identifier (such as its index or timestamp) in the query sent to server device 140.

[0054] Alternatively, if metadata 144 is stored on server device 140, queries from client device 120 can specify only the relevance margin without any specific identifier. Server device 140 can then use metadata 144 to identify the image with the highest relevance score within the specified margin and transmit that image directly back to client device 120. This approach leverages server data resources and offloads relevance calculations from the client, thus simplifying the client-side process.

[0055] In some embodiments, the metadata 128, 144 specifying the relevance score for each image within a precision margin includes various measures that evaluate the relevance of each image using unique identifiers such as indexes or timestamps, which associate each image with its corresponding data. The metadata 128, 144 may combine several types of measures that contribute to the relevance score.

[0056] One example metric is the number of objects detected in each image, where a higher object count can indicate greater visual complexity or importance, suggesting that the image is more information-rich. Another example metric is the number of object categories detected, which identifies the various categories present in the image (e.g., people, vehicles, animals). Images with a wider range of object categories can be prioritized because they capture more diverse content and provide richer snapshots of video footage. In addition to these metrics, a general relevance score can be assigned to each image, for example, through a custom algorithm tailored to specific application needs. This score can combine one or more factors such as the metrics mentioned above, motion intensity, to highlight frames with significant motion or event detection, where images characterized by detected events or actions are labeled as highly relevant.

[0057] These metadata elements 128 and 144 together allow the system to evaluate and compare images within a precision margin, so as to select meaningful and representative frames during scrubbing.

[0058] like Figure 1 The functional division of retrieving images for video scrubbing at the client device, as illustrated herein, is provided for descriptive purposes only. Components such as the image determiner 124, the precision margin determiner 122, and the image receiver 146 are shown as separate entities to clearly convey the roles and processes involved in retrieving images for video scrubbing. However, it should be understood that the techniques discussed herein can be implemented in various ways, and the specific organization of components can vary depending on system architecture and design preferences. For example, certain functions may be combined into a single module, distributed across multiple systems, or implemented using alternative methods to achieve the same objective. Therefore, the described... Figure 1 The structure is not intended to be restrictive, and any configuration that can be used to retrieve images for video scrubbing at the client device 120 falls within the scope of this disclosure.

[0059] Figure 2 The cache 126 of the client device 120 is illustrated by way of example. The cache includes three cached images 202a to 202c. Each cached image 202a to 202c is associated with timestamps 204a to 204c. Optionally, each cached image 202a to 202c may be further associated with relevance scores 206a to 206c.

[0060] In some instances, one or more conditions include only the first condition, namely that the cached image has a timestamp within the precision margin of the requested time. In these examples, if the requested time is limited by the user to 33 seconds and the precision margin is set to 3 seconds (causing the image used for video scrubbing to be retrieved within a 6-second range around 33 seconds (i.e., 30 to 36 seconds), then the first cached image 202a satisfies the condition and can be retrieved from cache 126.

[0061] If the request time is limited by the user to 10 seconds and the precision margin is set to, for example, 2 seconds, then cached images 202b to 202c satisfy this condition. In this case, once it is determined that multiple cached images 202b to 202c are stored in memory 126 and each of one or more conditions is met, the cached image 202c with the earliest timestamp 204c from the multiple cached images 202b to 202c is retrieved from the multiple cached images 202b to 202c and used for wiping.

[0062] If the request time is limited by the user to 17 seconds and the precision margin is set to, for example, 3 seconds, then none of the cached images 202a to 202c meet the first condition within the 14-20 second time span of the video. In this case, as discussed previously, a new image 210 is retrieved from the server that meets the first condition and also has the highest relevance score within the precision margin. This image 210 is stored in cache 126 for possible later use. If, when storing the new image 210, cache 126 is full or about to be full (i.e., memory utilization will exceed that of images not yet stored), then... Figure 2 If a predefined threshold is set (as shown), the newly retrieved image 210 can replace one of the previously cached images 202a to 202c. For example, it can replace 202b, which has a timestamp 204b closest to the precision margin; or image 202c, which has the lowest relevance score 206c among the cached images. This replacement strategy can help optimize cache 126 by retaining images that are substantially different in time from the precision margin or images with the highest relevance among the cached images, thereby increasing the likelihood that future wipe requests can be efficiently satisfied.

[0063] In some examples, one or more conditions further include a second condition: that the cached image has the highest relevance score among images in the video with timestamps within a precision margin. For example, if the request time is limited by the user to 16 seconds and the precision margin is set to, for example, 4 seconds, then the second cached image 202b satisfies the first condition, i.e., it has a timestamp 204b within an allowed time span of 12 to 20 seconds. However, it can be determined that the cached image 202b does not satisfy the second condition. In this case, as discussed previously, a new image 210 that satisfies both conditions is retrieved from the server. This image 210 is stored in cache 126 for possible later use. If, when storing the new image 210, cache 126 is full or about to be full (i.e., memory utilization will exceed that of images not yet cached), then... Figure 2 If the predefined threshold shown is applied, then cached image 202b, which is in the correct time span but is not the most relevant image in that time span, can be deleted and replaced by the newly retrieved image 210. Alternatively, any other strategy described above can be used.

[0064] Figure 3 The cache 126 of client device 120 is illustrated by example. The cache includes three cached images 202e to 202g. Each cached image 202e to 202g is associated with timestamps 204e to 204g. As discussed above, in some examples, one or more conditions further include a second condition, namely, that the cached image has the highest relevance score among the images in the video with timestamps within a precision margin. Figure 3 In the example, each cached image 202e to 202g is therefore associated with a relevance score 206e to 206g. For example, if the request time is limited by the user to 11 seconds and the precision margin is set to, for example, 2 seconds, then both images 202f and 202g satisfy the first condition. Furthermore, using the metadata at client device 120 and / or server device 140 discussed earlier, it can be determined that the highest relevance score within the allowed time span of 9 to 13 seconds is 11. Therefore, both images 202f and 202g also satisfy the second condition. In this case, the client device can be configured to retrieve the cached image 202g with the earliest timestamp 204g among the multiple cached images 202f and 202g that satisfy both conditions.

[0065] Combination Figure 3The described example can also be used on the server side for retrieving images with the highest relevance score within a precision margin. In other words, a server device (e.g., image retrieval unit 146) can be configured to retrieve the image with the earliest timestamp from a plurality of images once it is determined that each of the multiple images with timestamps within a precision margin has the same highest relevance score.

[0066] In some examples, if a user navigates to a new segment on the timeline (e.g., a segment that does not overlap with the currently viewed segment, or a segment with a significant difference exceeding a threshold), cache 126 can be cleared (e.g., as...). Figure 2 and Figure 3 As shown in the diagram, this optimizes memory usage and ensures relevance to the timeline of new accesses. This approach is useful when cached images no longer correspond to the current precision margin set by the new fragment, as these images are unlikely to be useful for wiping within the new timeline.

[0067] Figure 4 A flowchart is shown for a method 400 for retrieving images for video scrubbing at a client device.

[0068] Method 400 includes detecting user input, indicated by the client device in S402, specifying a requested time along the timeline of the video. The video is stored on a server device, and each frame of the video is associated with a corresponding relevance score.

[0069] Method 400 further includes: the client device checking whether a cached image satisfying one or more of the conditions in S404 is stored in the memory of the client device. Checking S404 includes checking whether the cached image in S406 satisfies a first condition. The first condition includes that the cached image has a timestamp within a precision margin of the requested time. In some examples, checking S404 includes checking whether the cached image in S408 satisfies a second condition. The second condition includes that the cached image has the highest relevance score among images in the video with timestamps within a precision margin.

[0070] Method 400 further includes checking whether the cached image satisfying each of one or more conditions in S410 exists in memory. If it is determined that the cached image satisfying each of one or more conditions in S412 exists in memory, method 400 includes retrieving the cached image from memory in S416. If it is determined that the cached image satisfying each of one or more conditions in S414 does not exist in memory, method 400 includes retrieving the image in S418 from the video from the server device by the client device and storing the retrieved image in memory.

[0071] In various instances, the methods (e.g., method 400) and functions described in this document can be implemented using a non-transitory computer-readable storage medium containing instructions that, when executed by one or more processing devices, perform the methods and functions. The storage medium may include, for example, flash memory, a solid-state drive, a hard disk drive, or other types of memory capable of holding program instructions. The execution of these instructions can be performed by various types of processors, including general-purpose processors (e.g., processors found in standard desktop and laptop computers) and dedicated microprocessors designed for specific tasks. These processors can operate as independent processing units or as part of a multi-core or multi-processor system that can improve processing efficiency by distributing tasks among multiple cores. Processors can be supplemented by or incorporated into an ASIC (Application-Specific Integrated Circuit).

[0072] The above embodiments should be understood as illustrative examples of this disclosure. Further embodiments of this disclosure are contemplated. For example, additional example metrics may be used to determine relevance scores. These additional metrics may include highlighting frames with a wider palette to potentially indicate color diversity in visually different scenes, or may include face detection counts that may prioritize images with identifiable human features in applications where human presence is important. Additional metrics may include “action recognition” and / or links to other meta-knowledge (e.g., traffic lights changing from red to green or from green to red).

[0073] As another example, the relevance score can be supplemented by a measure of the quality of the cropping of objects detected in the video. Therefore, a third condition for selecting images to retrieve and display could be the existence of a better crop than the one already available in the cache. If, for example, an image frame with a relevance score of 10 and a timestamp within the precision margin is already available in the cache, and the user wipes back to the same point on the timeline, the third condition could be satisfied by another image frame with a slightly lower relevance score but containing a new, optimal crop that is not present in the already cached image frame.

[0074] As yet another example, the precision margin can even be more dynamic. If a user scribbles back and forth across a section of the timeline, this can indicate that the user is particularly interested in that specific part of the video. Therefore, the precision margin can be refined so that it narrows as the user shows this interest. This functionality can be represented in different ways in a graphical user interface. One way would be to stretch the timeline and use heterogeneous distances between the bars on the timeline. If each bar initially represents 5 minutes (i.e., 5 minutes | 5 minutes | 5 minutes | 5 minutes | 5 minutes | 5 minutes), this could be changed so that each bar in the center of that time interval represents only 30 seconds (e.g., 5 minutes | 5 minutes | 1 minute | 1 minute | 30 seconds | 1 minute | 5 minutes | 5 minutes). Another way would be to add a second “pop-up” timeline for interesting time intervals. Using the same example numbers as the heterogeneous timeline, the first timeline could have bars each representing 5 minutes, and the pop-up timelines could have bars each representing 1 minute. Then, a third pop-up timeline could have bars each representing 30 seconds.

[0075] The proposed method can be further improved by incorporating the concept of dynamic resolution when retrieving images for video scrubbing from a server device. Dynamic resolution can adjust the quality of the retrieved images based on factors such as relevance score levels or user interaction patterns. This can be achieved in several ways. For example, scalable video coding techniques can be used to deliver images at varying resolutions depending on the user's current needs. For instance, during rapid scrubbing, lower-resolution images can be provided to prioritize speed, while higher-resolution images can be delivered when scrubbing is slow or stops at specific timestamps. Alternatively, the server device can store images at multiple resolutions on the server device, allowing clients to request images at the resolution best suited to the current scene.

[0076] The resolution transmitted may depend on factors such as display size, available bandwidth, or relevance score. For example, the choice of resolution can be constrained by the relevance score. In one example, if all images within a defined precision margin have low relevance scores, the server device may send images with lower resolutions because the content of the images is unlikely to provide important value or detail to the user.

[0077] It should be understood that any feature described with respect to any embodiment may be used alone or in combination with other described features, and may also be used in combination with one or more features of any other embodiment or any combination of any other embodiment. Furthermore, equivalents and modifications not described above may be employed without departing from the scope of this disclosure as defined by the appended claims.

Claims

1. A method (400) for retrieving images for video scrubbing at a client device (120), the method comprising: The client device detects (S402) user input (106) indicating the requested time along the timeline (110) of the video (142), which is stored at the server device (140), wherein each image of the video is associated with a corresponding relevance score; The client device checks (S404) whether a cached image (202) that satisfies one or more of the conditions is stored in the memory (126) of the client device, wherein the first of the one or more conditions includes the cached image having a timestamp (204) within the precision margin of the requested time. Once it is determined (S412) that the cached image that satisfies each of the one or more conditions exists in the memory, the cached image is retrieved from the memory (S416); Once it is determined (S414) that the cached image satisfying each of the one or more conditions does not exist in the memory, the client device retrieves (S418) the image from the video from the server device and stores the retrieved image in the memory. The precision margin is defined as a range around the requested time along the video timeline, wherein the precision margin is proportional to the length of the timeline, such that a smaller margin is used for short timelines and a larger margin is used for long timelines, wherein the length of the timeline defines the length of the video available for video scrubbing, such that shorter timelines define a shorter length of the video available for video scrubbing and longer timelines define a longer length of the video available for video scrubbing. Retrieving images from the server device includes retrieving the image with the highest relevance score among the images having timestamps within the precision margin.

2. The method according to claim 1, wherein, The second condition of the one or more conditions includes the cached image having the highest relevance score (206) among the images in the video with timestamps within the precision margin.

3. The method according to claim 2, further comprising: Once it is determined that the memory includes currently cached images that have timestamps within the precision margin but do not have the highest relevance score among the images with timestamps within the precision margin, and once it is determined that memory utilization will exceed a predefined threshold when the retrieved images are stored in memory, the currently cached images are deleted from the memory.

4. The method according to claim 2, wherein, The client device accesses metadata (128) specifying the relevance score of each image having a timestamp within the precision margin, wherein checking by the client device whether a cached image satisfying one or more of the conditions is stored in the memory of the client device includes: the client device using the metadata when checking whether the cached image satisfies the second condition.

5. The method according to claim 4, wherein, The client device accesses metadata specifying the relevance score for each image of the video.

6. The method according to claim 2, wherein, The server device accesses metadata (144) specifying the relevance score for each image having a timestamp within the precision margin, and the method further includes: The client device queries the server device for the highest relevance score of the image with timestamps within the precision margin. The step of the client device checking whether a cached image satisfying one or more of the conditions is stored in the memory of the client device includes: when checking whether the cached image satisfies the second condition, the client device uses the response of the query.

7. The method according to claim 4, wherein, The metadata specifying the relevance score of an image in the video includes one or more of the following: The number of objects detected in the image; The number of object categories detected in the image; and A score indicating the relevance of the image.

8. The method of claim 2, further comprising: Once it is determined that multiple cached images are stored in the memory and each of the one or more conditions is met, the cached image with the earliest timestamp from the multiple cached images is retrieved from the multiple cached images.

9. The method according to claim 1, wherein, Retrieving images from the server device further includes: Once it is determined that multiple images with timestamps within the precision margin each have the same highest relevance score, the image with the earliest timestamp from the multiple images is retrieved from among the multiple images.

10. The method according to claim 1, wherein, The magnitude of the accuracy margin is adjusted in response to changes in the scaling level of the timeline as the length of the timeline changes.

11. The method of claim 1, further comprising: Once it is determined that the cached image that satisfies each of the one or more conditions exists in the memory, the cached image is displayed via the user interface (104) of the client device; as well as Once it is determined that the cached image that satisfies each of the one or more conditions does not exist in the memory, the retrieved image is displayed via the user interface of the client device.

12. The method according to claim 1, wherein, The user input indicating the requested time along the timeline of the video is the selection of a visual marker (108) located along the length of the timeline corresponding to the requested time along the timeline of the video.

13. A non-transitory computer-readable storage medium having instructions stored thereon, which, when executed on one or more means having processing capabilities, are used to perform the method according to any one of claims 1 to 11.

14. A client device (120) providing video scrubbing functionality, the client device being configured to retrieve an image for the video scrubbing via the following steps: The detection (S402) indicates user input (106) along the timeline (110) of the video (142), which is stored at the server device (140), wherein, Each image in the video is associated with a corresponding relevance score; Check (S404) whether a cached image (202) that satisfies one or more of the conditions is stored in the memory (126) of the client device, wherein the first of the one or more conditions includes the cached image having a timestamp (204) within the precision margin of the requested time. Once it is determined (S412) that the cached image that satisfies each of the one or more conditions exists in the memory, the cached image is retrieved from the memory (S416); Once it is determined (S412) that the cached image satisfying each of the one or more conditions does not exist in the memory, the image is retrieved (S418) from the video from the server device and stored in the memory. The precision margin is defined as a range around the requested time along the video timeline, wherein the precision margin is proportional to the length of the timeline, such that a smaller margin is used for short timelines and a larger margin is used for long timelines, wherein the length of the timeline defines the length of the video available for video scrubbing, such that shorter timelines define a shorter length of the video available for video scrubbing and longer timelines define a longer length of the video available for video scrubbing. Retrieving images from the server device includes retrieving the image with the highest relevance score among the images having timestamps within the precision margin.

15. A system comprising a server (140) and a client device according to claim 14, wherein, The server is configured to: Receive from the client device a query for the image with the highest relevance score among the images having timestamps within the precision margin; and The image is transmitted to the client device.