Providing relevant video scenes in response to a video search query

CN116881501BActive Publication Date: 2026-06-26ADOBE INC

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: ADOBE INC
Filing Date: 2017-06-06
Publication Date: 2026-06-26

Application Information

Patent Timeline

06 Jun 2017

Application

26 Jun 2026

Publication

CN116881501B

IPC: G06F16/738; G06F16/735; G06F16/783

CPC: G06F16/735; G06F16/739; G06F16/7847; G06F16/7837; G06F16/784; G06F16/73; G06F16/738

AI Tagging

Technology Topics

MediaFLO Key frame

Technical Efficacy Phrases

Easy and efficient marking

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Multimedia resource display method and device based on eye control interaction and related equipment
CN122331771AVisual observation Computer graphics (images)
Cloud game terminal device and cloud game server and related method
CN122160535ASelective content distribution Transmission Terminal equipment MediaFLO
Media information effect attribution and budget intelligent allocation method and system
CN122264857ARealize accurate quantitative evaluationReflect actual effectCommerce MediaFLO Operations research
Interaction method and apparatus, device, and storage medium
US20260202950A1Software engineering MediaFLO
A method and system for negative feedback monitoring of media data
CN121526700Bavoid fragmentationrestore accuratelyShard MediaFLO

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Conventional media hosting systems often return results that do not indicate specific content when searching for videos, and the preview images are irrelevant to the user's search query, causing users to spend time searching for relevant content.

Method used

By identifying content features within keyframes of video content, machine learning techniques are used to generate preview images most relevant to search queries. Multiple preview images are provided to represent specific features of the video content, and keyframes are selected as preview images based on confidence values.

Benefits of technology

It improved the user experience, saved time and effort, reduced the download of irrelevant content, and optimized search query processing and computing resource usage.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN116881501B_ABST

Patent Text Reader

Abstract

Embodiments of the present application relate to providing relevant video scenes in response to video search queries. The present disclosure relates to methods and systems for providing relevant video scenes in response to video search queries. The systems and methods identify a plurality of keyframes of a media object and detect one or more content features represented in the plurality of keyframes. Based on the one or more detected content features, the systems and methods associate a tag indicative of the detected content features with the plurality of keyframes of the media object. In response to receiving a search query comprising a search term, the systems and methods compare the search term to the tags of the selected keyframes, identify a selected keyframe that depicts at least one content feature related to the search term, and provide a preview image of a media item that depicts the at least one content feature.

Need to check novelty before this filing date? Find Prior Art

Description

[0001] Divisional Application Instructions

[0002] This application is a divisional application of Chinese invention patent application filed on June 6, 2017, with application number 201710417832.7 and entitled "Providing relevant video scenes in response to video search queries". Technical Field

[0003] Various embodiments of this application relate to providing relevant video scenarios in response to video search queries. Background Technology

[0004] The development of communication technology has led to significant advancements in media hosting technology, particularly enabling users to freely upload content to, search for, and download content from media hosting systems. For example, a user can use a search engine to search for videos hosted by a media hosting system. Based on the search query, many conventional media hosting systems search for video titles or categories associated with the video to identify one or more search results. In response to processing a user's search, the media hosting system can return a list of search results that the user might be interested in (e.g., a list of video titles) (i.e., links to the video titles of the identified videos). Therefore, the user can select a result (e.g., select a link) to access the video.

[0005] Despite advancements in media hosting technology, conventional media hosting systems suffer from several problems. One issue with many conventional media hosting systems is that when a user searches for media objects (e.g., videos) related to specific content (e.g., topics within a video), the results returned by the system often fail to indicate the content within that specific result that might be relevant to the user's search. For example, as discussed above, returning a list of video titles often does not indicate much information about the specific content within the video.

[0006] Furthermore, some conventional media hosting systems return an image representing the identified video in response to a search, but this image is almost always irrelevant to the user's search for specific content. For example, many conventional media hosting systems allocate a single frame image from the video to represent it. Some systems allocate the first frame of the video (e.g., as a thumbnail) to represent the video's content, while others allow the user to manually select frames from the video. As a result, the selected frame image rarely shows or indicates specific content features relevant to the user-initiated search query.

[0007] Therefore, conventional media hosting systems often return results that seem irrelevant to the user's search because frame images do not allow the user to easily discern whether a video is related to the search query. In light of this, conventional media hosting systems often cause users to skip (e.g., not select, consider, or view) videos relevant to their search query, as the preview images appear unrelated. Consequently, most conventional media hosting systems provide an inefficient and time-consuming search process. For example, due to the aforementioned problems with conventional media hosting systems, users often have to spend a significant amount of time performing multiple searches and viewing most of the resulting videos in order to finally find the video containing the content they are looking for.

[0008] Therefore, these and other disadvantages exist compared to conventional media hosting systems. Summary of the Invention

[0009] The various embodiments described below utilize systems and methods for identifying and providing relevant preview images of video content to a user in response to a search query, providing benefits and / or solving one or more of the aforementioned or other problems in the art. For example, the systems and methods disclosed herein identify potential preview images (e.g., video scenes, poster frames, etc.) for media objects (e.g., videos) that include specific content features (e.g., items depicted within a video frame). Based on a received search query, the systems and methods select specific preview images from the potential preview images that are most relevant to the search query. Furthermore, the systems and methods provide the selected preview images to the user (i.e., via a client device to the user), allowing the user to easily view images of the most relevant portions of the media object related to the search query.

[0010] In one or more embodiments, the system and method further determine a confidence value for the probability that a specific keyframe indicating video content includes a specific content feature (e.g., a depiction of a dog). Based on the confidence value determined for each identified content feature within each keyframe, the system and method can identify which keyframe is most likely to include the specific content feature. Therefore, the system and method can sort, identify, or otherwise organize one or more keyframes corresponding to a specific content feature based on the determined confidence values. Thus, based on the confidence values determined for the content features within keyframes, the system and method can select the keyframe most relevant to the search query as a preview image of the video content. For example, upon receiving a search query for "dog," the system and method can identify a specific keyframe depicting a dog and provide an image of the keyframe as a preview image in response to the search query.

[0011] Furthermore, as briefly stated above, the system and method generate a set of potential preview images that may potentially be used as preview images to represent specific content features found within the video content. For example, the set of potential preview images is based on images of keyframes (e.g., thumbnails of keyframes). Additionally, the system and method generate and assign tags to each potential preview image indicating the content features depicted in each potential preview image. Based on a received search query, the system and method can search the tags of the potential preview images to identify media objects relevant to the search query, and can also search the tags to identify preview images from the set of potential preview images that most closely match the search query. Therefore, because the system and method provide relevant portions of one or more media objects related to the search request as preview images, users can easily and efficiently identify video content of interest without having to manually view the video to locate the relevant content.

[0012] Additional features and advantages of the embodiments will be set forth in the following description, and will be apparent in part from the description, or may be learned by practice of such exemplary embodiments. The features and advantages of such embodiments may be realized and obtained by means of the means and combinations particularly pointed out in the appended claims. These and other features will become fully apparent from the following description and the appended claims, or may be learned by practice of such exemplary embodiments set forth below. Attached Figure Description

[0013] Various embodiments will be described and illustrated in more specific and detailed manner using the accompanying drawings, in which:

[0014] Figure 1 The illustration shows a schematic diagram of a communication system according to one or more embodiments;

[0015] Figures 2A-2C The illustration shows a sequence-flowchart illustrating how, according to one or more embodiments, a media object is analyzed in response to a search query to identify and provide a relevant preview image of the media object;

[0016] Figure 3 The illustration shows a sample data table of a media object database according to one or more embodiments;

[0017] Figure 4 The illustration shows another example data table of a media object database according to one or more embodiments;

[0018] Figure 5 A schematic representation of a media system according to one or more embodiments is shown;

[0019] Figure 6A flowchart illustrating an example method for providing a relevant preview image in response to a search query, according to one or more embodiments, is shown.

[0020] Figure 7 A flowchart is shown for another example method, according to one or more embodiments, for providing a relevant preview image in response to a search query;

[0021] Figure 8 A block diagram of an example computing device according to one or more embodiments is illustrated. Detailed Implementation

[0022] The embodiments described below provide a media system for identifying and providing preview images (e.g., poster frames) of media objects (e.g., video content) in relation to a search query received from the media system. In one or more embodiments, the media system identifies content features (e.g., objects, activities, emotions, animals, landscapes, locations, colors) depicted in a set of keyframes from the video content. Additionally, based on the identified content features, the media system selects one or more keyframes to generate a potential set of preview images (e.g., potential poster frames), which can be used as search results in response to a search query to represent the video content. Furthermore, in one or more embodiments, upon receiving a search query, the media system identifies a specific video related to the search query, and selects a preview image from the potential set of preview images for that specific video that most closely matches the search query.

[0023] Therefore, in response to a search query for video content, the media system identifies and provides a specific preview image for the search query, indicating specific content features related to the search query. For example, a particular digital video may have two video segments comprising two different content features. The media system can determine that a first content feature is related to a first search query, and therefore, in response to the first search query, provide a first preview image extracted from keyframes of the first video segment. Furthermore, the media system can determine that a second content feature is related to a second search query, and therefore, provide a second preview image extracted from keyframes of the second video segment. In other words, the media system can provide different, customized preview images for a single instance of video content to provide a preview image related to a given search query.

[0024] In one or more embodiments, as briefly stated above, the media system generates a set of potential preview images (e.g., thumbnails) based on keyframes selected from video content, and associates each potential preview image with a tag indicating one or more content features identified in each selected keyframe. Therefore, based on a received search query, the media system identifies media objects using tags associated with each media object (e.g., associated with a potential preview image of the media object), and additionally identifies keyframes and / or preview images of the media object relevant to the search query.

[0025] As noted above, the media system identifies keyframes of video content within a media object. For example, the media system can utilize content-based and non-content-based methods to identify keyframes of video content (e.g., scene detection analysis). Furthermore, as also mentioned above, after identifying keyframes, the media system employs content feature recognition technology to identify the content features within the keyframes. For example, in one or more embodiments, the media system uses machine learning techniques to train a neural network model that can accurately identify the content features depicted within each keyframe.

[0026] Furthermore, after detecting the content features of each identified keyframe, the media system can determine a confidence value for each detected content feature in the keyframe. In one or more embodiments, using the confidence values, the media system filters or discards one or more keyframes to produce a set of keyframes most likely to actually include any of the identified content features. In particular, the media system can select keyframes based on the confidence values assigned to the content features of the identified keyframes. In other words, the media system can select keyframes with the highest confidence values regarding content features to use as potential preview images.

[0027] In one or more embodiments, based on a determined confidence value, for each detected content feature of a media object, the media system selects a single keyframe to be included as a potential preview image of the media object. In other words, if a specific content feature is identified in two keyframes, the media system selects the keyframe with the highest confidence value to represent the media object for that specific content feature. In other embodiments, the media system can generate multiple preview images related to a single search query, provided that the confidence value associated with the corresponding keyframe is greater than a defined confidence value threshold. For example, based on a received query, the media system can select two preview images to represent a single media object (e.g., the system can provide two preview images as two separate results, or alternatively, as discussed further below, can provide a single result including a combination of the two preview images).

[0028] Therefore, as will be further described in detail below, in one or more embodiments, the media system receives a search query and, in response to the search query, may query tags associated with a media object to identify keyframes and / or preview images of the media object related to the search query. For example, the media system may compare the items of the search query with tags assigned to media objects to identify one or more preview images(s) associated with the search item. After selecting a relevant preview image of the media object, the media system may provide the preview image of the media object within a result set provided by the system for display to a user's client device.

[0029] Therefore, the various embodiments of the media system described herein offer advantages over conventional media hosting systems. For example, unlike conventional media hosting systems, the media system can provide preview images for video content search results based on video scenes within video content relevant to a search query. Specifically, instead of simply providing the first frame of the video (e.g., in a conventional manner), the media system can provide preview images representing media objects depicting content features particularly relevant to the search query. Furthermore, unlike conventional media hosting systems, the media system can provide multiple preview images for a single media object, including content features relevant to the search query, to indicate various examples of content features relevant to the search query within the media object.

[0030] Therefore, by providing customized preview images based on specific search queries, media systems allow users to easily determine whether they are interested in a media object without manually viewing the entire video to try and locate relevant content within video search results. This offers a significant advantage over conventional media systems, which often provide random and therefore irrelevant frame samples. Consequently, as a result of media systems providing relevant preview images of video content in response to search queries, users have a more pleasant, efficient, and less frustrating experience compared to conventional media systems. In particular, receiving relevant preview images in response to search queries saves users time and effort in finding content of interest.

[0031] Various embodiments of the media system also offer advantages to authors of media objects. For example, the media system of this disclosure eliminates any need for authors to manually select frames of their media objects to represent them, saving authors time and effort. Furthermore, authors can be assured that when a user provides a search related to the content features included in an author's media object, the media system will provide relevant preview images depicting the content features within the media object that the user is most likely to be interested in. As a result, the user is most likely to download and purchase the author's media object.

[0032] Furthermore, the media system described herein provides performance improvements for computer systems. For example, because the media system provides relevant preview images of media objects, it can lead to faster processing of search queries. Moreover, because the media system provides relevant preview images in response to search queries, it can optimize the number of searches received from users, as users can more easily determine whether their search yielded media objects relevant to their interests. Furthermore, the media system results in fewer downloads of video content that users ultimately deem irrelevant to their interests, leading to less data transfer and less bandwidth usage by the computer system. In other words, the media system results in less required processing power and communication bandwidth compared to conventional media hosting systems. Furthermore, because preview images can be generated prior to (e.g., anticipated before the search query), the media system of this disclosure can lead to faster search query processing compared to conventional media systems.

[0033] As used herein, the term "media object" refers to digital media data that includes at least some video content. For example, a media object may include digital video. Additionally, a media object may include both digital video and other types of digital media (e.g., digital photographs, digital audio, slideshow presentations, text, and / or any other type of digital media data).

[0034] Additionally, as used herein, the terms “digital video,” “video content,” or simply “video” refer to encoded digital data comprising representations of one or more visual images. For example, video content may include one or more frames (e.g., digital images), and as is typically the case, video content also includes audio data accompanying the visual images.

[0035] As used herein, the term "content feature" refers to digital elements included and / or depicted in one or more frames of video content. For example, digital elements may include, but are not limited to: objects (e.g., bicycles, cars, trees), people, activities (e.g., running, skydiving, hiking), image types (e.g., macro, portrait, panorama), emotions (e.g., smiling, crying), animals (e.g., dogs, cats), landscapes (e.g., beaches, forests, mountains), geographic locations, structures (e.g., houses, bridges), colors, and / or any other items or elements depicted within the video content.

[0036] Figure 1A schematic diagram of an example communication system 100 according to one or more embodiments is illustrated, in which a media system operates. As illustrated, the communication system 100 includes a client device 102, a media hosting server device 104, and a network 106. The client device 102 and the media hosting server device 104 can communicate via the network 106. The network 106 may include one or more networks such as the Internet, and may use one or more communication platforms or technologies suitable for transmitting data and / or communication signals. Although Figure 1 The illustration shows a specific arrangement of client device 102, media hosting server device 104, and network 106, but various additional arrangements are possible. For example, media hosting server device 104 can bypass network 106 and communicate directly with client device 102.

[0037] like Figure 1 As illustrated, user 110 interfaces with client device 102, for example, to access media objects stored on media hosting server device 104. User 110 can be an individual (i.e., a human user), a business, a group, or any other entity. Additionally, user 110 can be a media object author who uploads the media object to media hosting server device 104 via client device 102. Alternatively, user 110 can be a media object consumer who searches for and downloads media objects for various purposes. Figure 1 Only one user 110 associated with client device 102 is shown in the illustration. Communication system 100 may include any number of users, each of whom interacts with communication system 100 using their respective client devices.

[0038] In addition, and such Figure 1 As shown, client device 102 may include search engine 112. Specifically, client device 102 may include search engine 112 for providing search queries to media hosting server device 104 to locate media objects of interest to user 110 stored on media hosting server device 104. In an additional embodiment, search engine 112 may reside on a third-party device (e.g., a separate server) accessed by client device 102 via network 106. Regardless, in response to a search query, media hosting server device 104 may transmit media object search results (e.g., video content related to the search query) to client device 102. Specifically, media hosting server device 104 may provide a list of search results to client device 102 via network 106, the list of search results including links to media objects related to the search query.

[0039] Both client device 102 and media hosting server device 104 can represent various types of computing devices that users and media hosting administrators can interact with. For example, client device 102 and / or media hosting server device 104 can be mobile devices (e.g., cellular phones, smartphones, PDAs, tablets, laptops, watches, wearable devices, etc.). However, in some embodiments, client device 102 and / or media hosting server device 104 can be non-mobile devices (e.g., desktop computers or servers). Additional details regarding client device 102 and media hosting server device 104 are provided below with respect to... Figure 8 A discussion was held.

[0040] like Figure 1 The media hosting server device 104 shown includes a media system 108. (See the following text regarding...) Figures 2A-5 As further described in detail, media system 109, in conjunction with media hosting server device 104, identifies content features depicted within frames of video content, and as a result, can generate preview images representing the video content as search results. Specifically, the preview images are generated and provided to include content features corresponding to the search query (e.g., in response to receiving a search query including the item "dog," the media system generates and provides preview images from video frames within video content including a dog).

[0041] Figures 2A-2C Example embodiments of the media system 108 are illustrated using various sequence-flowcharts. For example, Figures 2A-2C The illustration shows one or more embodiments of a sequence-flow that media system 108 uses to identify content features included in a media object, generate potential preview images for the media object based on the identified content features, and select preview images with content features relevant to a search query. Specifically, Figures 2A-2C The media hosting server device 104 shown in the diagram can be related to... Figure 1 Example embodiments of the described media hosting server device 104, and Figure 2C The client device 102 shown in the diagram can be about Figure 1 An example embodiment of the client device 102 described herein.

[0042] like Figure 2AAs shown in step 202, the media hosting server device 104 can receive media objects (e.g., from a client device). Specifically, the media hosting server device 104 can receive media objects and store them in a media object database. In some embodiments, a client device (e.g., client device 102) provides media objects to the media hosting server device 104. For example, client device 102 can upload media objects to the media hosting server device 104 via network 106. In other words, a user 110 of client device 102 can interact with client device 102 to cause client device 102 to provide media objects to the media hosting server device 104. Alternatively, in one or more embodiments, the media objects may already be stored on the media hosting server device 104. In other words, the media objects may already exist on the media hosting server device 104, and therefore, step 202 of receiving media objects may not appear in every embodiment.

[0043] like Figure 2A As shown in step 204, in response to receiving a media object, in one or more embodiments, the media system 108 determines one or more specifications of the video content included in the media object. For example, the media system 108 detects one or more of the following: the type of the video content (e.g., .mp4, .avi, .mov, .flv, etc.), the frame rate of the video content, the total number of frames in the video content, and / or the video quality of the video content (e.g., resolution). Depending on the specific embodiment, the media system 108 determines other specifications of the video content.

[0044] In addition, such as Figure 2A As shown in step 206, in the process of identifying keyframes of video content within a media object, media system 108 may use one or more specifications of the video content. As used herein, the term "keyframe" and any derived terms refer to a frame of video content within a media object, representing a portion of the video content (e.g., a plurality of sequential frames). For example, a portion of video content may relate to a scene within the video content and may include a limited number of frames that relatively depict the same content features. Thus, a keyframe will be a single frame comprising content features representing a portion of the video including that scene. Therefore, the set of keyframes of video content can provide a compact summary of the video content of the media object compared to using all frames within the video content.

[0045] Media system 108 can use any of a variety of methods to identify keyframes of video content. For example, media system 108 can utilize non-content-based methods, content-based methods, or combinations thereof to determine (e.g., identify) keyframes of video content. Each of the aforementioned methods will be described in detail below.

[0046] As noted above, media system 108 can use non-content-based methods for identifying keyframes of video content. For example, when using non-content-based methods to identify keyframes of video content, media system 108 can use spatial segmentation of each frame in a plurality of frames of video content to detect defined portions of the video content (e.g., clusters of frames). Defined portions of video content can be detected based on changes in the image from one frame to the next sequential frame or from one cluster of frames (e.g., a sequence of consecutive frames) to the next cluster of frames. Based on the detected defined portions of video content, media system 108 can identify one or more frames within each defined portion of video content as keyframes (e.g., frames representing defined portions of video content).

[0047] Additionally, in one or more embodiments, the media system 108 may identify keyframes (e.g., select keyframes) at predetermined intervals within the video content. For example, the media system 108 may identify frames of the video content as keyframes for each given time interval (e.g., 3 seconds). In an additional embodiment, the media system 108 may identify frames of the video content as keyframes for each given number of frames (e.g., every 30 frames). In other words, for each given number of consecutive frames of the video content, the media system 108 selects one of these frames as a keyframe.

[0048] In addition to non-content-based methods, and as briefly mentioned above, one or more embodiments of media system 108 may use content-based methods to identify keyframes of video content. For example, media system 108 uses machine learning to determine the content features (e.g., objects, activities, colors, etc.) included (e.g., depicted) in the frames of video content. Furthermore, based on the content features of the frames, media system 108 may group the frames of media objects into homogeneous clusters (e.g., frame clusters that share at least substantially the same content features). Thus, media system 108 may select at least one keyframe from each homogeneous frame cluster as a keyframe.

[0049] The media content system 108 can determine which frames to include in a particular frame cluster based on one or more characteristics of each frame. For example, the media system 108 can determine whether one or more frames form a cluster based on whether they share one or more content features (e.g., the items depicted within each frame). Additionally, the media system 108 can identify keyframes of media objects by comparing non-adjacent frames, using inter-frame entropy, histogram similarity, or wavelets, selecting the frame with the largest object-to-background ratio (when compared with other frames of the video content), and / or any combination thereof.

[0050] Additionally, in one or more embodiments, the media system 108 may perform keyframe analysis only on a subset of frames within the video content (e.g., contrary to all frames). For example, depending on one or more specifications of the video content within the media object, the media system 108 may determine to perform keyframe analysis on each fourth frame. For example, based on determining that the video content has a low frame rate, the media system 108 may perform keyframe analysis on a higher percentage of video frames, and on the other hand, based on determining that the video content has a high frame rate, the media system 108 may perform keyframe analysis on a lower percentage of video frames.

[0051] Based on keyframes that identify media objects, media system 108 generates and stores data packets in the media object database of media hosting server device 104. These data packets include a compilation (e.g., a set) of keyframes associated with the media object. For example, media system 108 can assign keyframe identifiers (“keyframe ID numbers”) (e.g., 1, 2, 3, 4, 5, etc.) to each frame of a media object within the media object database of media hosting server device 104 (e.g., in a table stored in the media object database). Furthermore, media system 108 can store data within the media object database representing sequences (e.g., lists) of keyframes for media objects, each represented by a corresponding keyframe ID number. Additionally, media system 108 stores data representing timestamps (e.g., timestamps indicating the position of a keyframe relative to other frames of video content) within the media object database. Similarly, media system 108 associates timestamps with corresponding keyframes. The following... Figures 3-5 A more detailed discussion of the media object database follows.

[0052] In addition to keyframes that identify video content for specific media objects, Figure 2A The illustration shows the content features included and / or depicted in the keyframes detected by the media system 108, such as... Figure 2A The steps are illustrated in step 208. In some embodiments, the media system 108 may detect content features of keyframes while (e.g., in conjunction with) identifying keyframes of a media object. Alternatively, after identifying keyframes of a media object, the media system 108 may detect content features depicted in the keyframes. Specifically, the media system 108 detects objects, activities (e.g., running, skydiving, hiking), photo types (e.g., macro, portrait, etc.), emotions (e.g., smiling, crying, etc.), animals, landscapes (e.g., beach, forest, mountains), locations, colors, etc., depicted in keyframes of video content within the media object.

[0053] In one or more embodiments, the media system 108 detects content features of keyframes by analyzing them using content feature recognition techniques (e.g., object recognition techniques). For example, the content feature recognition techniques may use machine learning (e.g., deep learning) to identify (e.g., detect) the content features depicted in the keyframes. More specifically, the content feature recognition techniques may use machine learning algorithms to detect and identify the content features represented by keyframes of media objects.

[0054] Depending on the specific embodiment, media system 108 may use various machine learning techniques to detect content features within keyframes. For example, media system 108 may use neural networks to analyze keyframes to detect content features within those keyframes. Specifically, in one or more embodiments, media system 108 uses a region-based convolutional neural network (i.e., RCNN) or a fast region-based neural network (i.e., F-RCNN). Depending on the specific embodiment, media system 108 may use other forms of content feature detection. While media system 108 is specifically described herein as using machine learning for detecting content features depicted within keyframes, media system 108 may use any content feature recognition technique capable of detecting and identifying content features within frames of a video.

[0055] In addition to detecting and identifying intra-frame content features of video content within media objects, media system 108 can determine the characteristics of the content features. For example, media system 108 can determine the percentage of space occupied by one or more content features in a keyframe (e.g., the proportion of the content feature relative to the background). Furthermore, media system 108 can determine the orientation associated with a particular content feature. For example, if the content feature includes a person, media system 108 can determine whether the person is facing forward, facing to the side, facing backward, etc. Furthermore, media system 108 can determine the relative position of the content feature within a keyframe. For example, media system 108 can determine whether the content feature is centered within a keyframe.

[0056] In addition to detecting and identifying the characteristics of content features within a keyframe, and based on these characteristics, media system 108 can identify a name, type, or category for the content features depicted within the keyframe. For example, based on a trained machine learning model, media system 108 can identify detected content features within a keyframe as Babi Ruth (e.g., name), person (e.g., type), and / or man (e.g., category). Furthermore, in one or more embodiments, media system 108 can further associate characteristics with the identified content features. For example, media system 108 can associate orientation, position within the frame, and / or other characteristics with the identified content features. For example, [Babi Ruth, facing forward, centered], [person, facing forward, centered], and / or [man, facing forward, centered] indicate a name, type, and / or category combined with one or more characteristics of a content item. As will be discussed in further detail below with reference to step 210, media system 108 associates the identification and characteristic data for the content feature with the keyframe including that content feature.

[0057] In addition to content features identified within keyframes of the video content, such as Figure 2A As illustrated in step 209, media system 108 may also determine a confidence value for each identified content feature. As used herein, a "confidence value" represents the probability that a content feature identified by the media system within a keyframe is actually that content feature. For example, media system 108 may assign a percentage value to the degree of confidence that a particular content feature is included and / or depicted in a keyframe. Additionally, various factors can influence the confidence value, such as image quality, the extent of the content feature included within the keyframe (e.g., half of a human head versus the entire head), the contrast between the content feature and the background, the similarity of one content feature to another different content feature (e.g., a toy car versus an actual car), and / or other factors.

[0058] In one or more embodiments, the machine learning model described above provides a confidence value as output. For example, based on a trained convolutional neural network model, the model can predict a confidence value for a specific content feature based on the degree to which the content feature matches one or more training examples of the trained content feature. For example, a convolutional neural network model can be trained to recognize dogs using a large training set of dog images. Therefore, essentially, the convolutional neural network model can compare the characteristics of the content feature identified within a keyframe with the characteristics of one or more dog images in the training set to calculate the probability that the dog identified in the keyframe is actually a dog.

[0059] In some embodiments, if the confidence value associated with a particular content feature in a keyframe is less than a defined threshold confidence value (e.g., less than 30%), then the media system 108 may decide not to identify that content feature. In other words, based on the confidence value being less than the threshold, the media system 108 determines that the probability that the content feature is accurate is insufficient to be used in the preview image to represent the particular content feature. The media system 108 may define the threshold confidence value at or below any probability value, for example, at or below 10%, 20%, 30%, 40%, etc.

[0060] In addition to identifying keyframes, identifying the content features depicted in the keyframes, and determining the confidence values associated with the detected content features, such as Figure 2A As shown in step 210, the media system 108 associates the detected content features and their corresponding confidence values with the identified keyframes. Specifically, the media system 108 generates and stores data corresponding to the content features identified in each corresponding keyframe and their associated confidence values in its media object database. Figure 2A Step 210 in relation to Figure 3 Further details will be provided below.

[0061] In particular, Figure 3 An example data table 300 is illustrated, which is used to associate identified content features and their corresponding confidence values with each corresponding keyframe. Specifically, data table 300 may include a keyframe column 302, which includes multiple identified keyframes indicated by keyframe ID numbers (e.g., keyframe 1, keyframe 2, keyframe 3, etc.). For example, the keyframes listed in keyframe column 302 include keyframes identified by the media system 108 of media hosting server device 104. Specifically, as discussed above with respect to step 206, in response to the media system 108 identifying a keyframe, the media system 108 may populate keyframe column 302 with keyframe ID numbers.

[0062] In addition, such as Figure 3 As illustrated, the data table 300 may include multiple content feature columns 304 (e.g., 304a, 304b, 304c, etc.) identified by various free content feature IDs (e.g., content feature A, content feature B, content feature C, etc.). As described above, each of the multiple content feature columns 304 represents a corresponding content feature identified by the media system 108. In other words, each detected content feature has its own corresponding content feature column 304. As a non-limiting example, as described above regarding... Figure 2AAs described in step 208, the media system 108 can generate a data table 300 based on the content features identified within the keyframes by the media system. Furthermore, the media system 108 can generate multiple content feature columns 304 based on the detected content features within the video content of the media object.

[0063] Furthermore, each detected content feature is associated with at least one keyframe in keyframe column 302 via an indicator (e.g., a symbol, check mark, X, or other symbol). Figure 3 As illustrated, media system 108 can associate content features in multiple content feature columns with keyframes by generating indicators in rows of content feature columns 304 associated with keyframes 302 where content features are identified. In other words, each content feature can be associated with a keyframe via an indicator in a row of data table 300. While media system 108 of this disclosure is described as using data tables within a database to associate detected content features with identified keyframes, this disclosure is not limited thereto, and media system 108 can use any method known in the art to associate data.

[0064] Return to reference Figure 2A In addition to associating the detected content features with the identified keyframes, in one or more embodiments, such as Figure 2A As shown in step 212, media system 108 selects (e.g., designates) at least one keyframe from the identified keyframes to generate a potential preview image of the media object (e.g., a poster frame). Specifically, as described below regarding... Figure 2B In more detail, media system 108 selects keyframes to generate potential preview images of media objects, which correspond to identified content features of the video content within the media object in relation to a search query. In other words, in response to a search query, media hosting server device 104 provides a preview image selected from one or more potential preview images of the media object, based on the preview images depicting content features relevant to the search query.

[0065] In some embodiments, the media system 108 selects keyframes from the identified keyframes to include a preview image as a media object, using data about keyframes (e.g., confidence values, timestamps, and identified content features of the keyframes) from the media object database of the media hosting server device 104. For example, the media system can use the data table 300 described above to select keyframes for generating a potential preview image. Additionally, Figure 4 The illustration shows a sample data table 400 within the media system database. The media system 108 can use the data table 400 to select at least one keyframe from the identified keyframes to generate a potential preview image of a media object.

[0066] Similar to the discussion of reference data table 300 above, data table 400 may include a content feature column 402. Specifically, content feature column 402 may include multiple content features of the media object identified by content feature IDs (e.g., A, B, C, D, etc.). For example, the multiple content features listed in content feature column 402 of data table 400 include the detected content features of the keyframes discussed above.

[0067] Additionally, data table 400 includes multiple keyframe columns 404 (e.g., 404a, 404b, 404c, etc.), and each of these keyframe columns 404 represents an identified keyframe. Each keyframe column 404 may indicate that the corresponding keyframe depicts one or more content features among the multiple content features listed in content feature column 402. For example, as Figure 4 As shown, data table 400 can indicate that a specific keyframe includes a content feature by including a confidence value in the keyframe column 404 of a specific keyframe corresponding to a specific content feature 402. If a keyframe is not detected to include a content feature, then the keyframe column 404 of that keyframe includes an indicator or blank space indicating that the content feature was not detected in that keyframe (or the content feature does not have a confidence value greater than the defined confidence value threshold).

[0068] As a non-restrictive example, such as Figure 4 As illustrated in the diagram, data table 400 can indicate that the first keyframe 404a includes content feature A with a 98% confidence value and content feature B with a 70% confidence value. Additionally, as also... Figure 4 As illustrated, the second keyframe 404b includes content feature A with a 60% confidence value and content feature B with a 90% confidence value.

[0069] Return to reference Figure 2A In step 212, where keyframes are selected to generate a potential preview image of the media object for each content feature in the media object, the media system 108 can perform various additional steps. Specifically, such as... Figure 2AAs shown in step 214, for each detected content feature of the media object, the media system 108 determines how many keyframes depict a specific content feature. For example, the media system 108 queries data table 400 (or data table 300) to determine how many keyframes include that specific content feature. For example, depending on the content feature, the media system 108 may determine that only one keyframe is associated with the content feature (e.g., in data table 400, keyframe 6 is the only keyframe associated with content feature F), while several keyframes may be associated with another content feature (e.g., in data table 400, keyframes 1 and 2 are both associated with content features A and B).

[0070] On the one hand, such as Figure 2A As shown in step 216, if the media system 108 determines that a single keyframe includes a specific content feature, then the media system 108 selects that single keyframe as a preview image of the media object for that specific content feature. In other words, as described below... Figure 2B In more detail, in response to a search query relating to specific content features depicted within that single keyframe, media system 108 determines that the selected keyframe will be used to generate a preview image for the media object.

[0071] On the other hand, such as Figure 2A As shown in step 218, if the media system 108 determines that multiple keyframes include content features, then the media system 108 may perform one or more additional steps to select keyframes to generate a potential preview image. For example, as Figure 2A As shown in steps 220 and 222, in one or more embodiments, media system 108 determines whether multiple frames appear in one or more keyframe clusters (e.g., groups) of a media object. In other words, media system 108 determines whether multiple keyframes including or associated with a specific content feature appear in a single sequential keyframe cluster (e.g., keyframes 2, 3, 4) or in separate keyframe clusters (e.g., keyframes 2 and 3 and keyframes 7 and 8). For example, media system 108 queries data table 400 to determine whether multiple keyframe clusters (e.g., separate clusters) of a keyframe include the content feature or whether a single keyframe cluster includes the content feature.

[0072] like Figure 2AAs shown in step 220, if media system 108 determines that multiple keyframes appear in a single keyframe cluster of a media object, then media system 108 can select one keyframe from that cluster for use in generating a potential preview image. For example, because multiple keyframes appear in a single cluster, the media system can determine that the content feature within each keyframe in the multiple keyframes is substantially the same content feature. To select that one key feature from the cluster, media system 108 can compare the confidence values of each keyframe in the single keyframe cluster and can select the keyframe in the single keyframe cluster that has the highest confidence value for the detected content feature. For example, media system 108 can compare the confidence values of keyframes in the single keyframe cluster included in data table 400.

[0073] In an alternative embodiment, media system 108 does not compare the confidence values of keyframes within a single keyframe cluster. Instead, media system 108 may randomly select keyframes from a single keyframe cluster to generate a potential preview image of the media object. For example, media system 108 may include a random number generator, which can be used to randomly select keyframes from a single keyframe cluster.

[0074] In other embodiments, the media system 108 may perform additional analysis on multiple keyframes to determine the likelihood that the multiple keyframes reflect or do not reflect the same content feature instance (e.g., the same dog versus two different dogs). Based on this additional analysis, if it is determined that the multiple keyframes are likely to depict the same content feature instance (e.g., the same dog), then the media system may select one keyframe. Alternatively, based on the additional analysis, if it is determined that the multiple keyframes depict different content instances (e.g., different dogs), then the media system may select two or more keyframes from the multiple keyframes. For example, keyframes associated with each content instance may be selected to generate a potential preview image.

[0075] In some instances, two or more keyframes in a single keyframe cluster may have the same confidence value regarding a specific content feature. In such instances, media system 108 may select the first keyframe among two or more keyframes with the same confidence value (e.g., the first keyframe when considering two or more keyframes sequentially based on timestamp information). In other words, media system 108 selects the first-ordered keyframe among two or more keyframes in a time-related sequence.

[0076] Alternatively, based on the examples described in the preceding paragraphs, media system 108 may compare the percentage of space occupied by keyframes by one or more content features. In such an embodiment, keyframes with a higher content feature-to-background ratio may be favored, and media system 108 may select keyframes with a higher ratio. In yet another embodiment, other characteristics of the content features may be considered in the selection of keyframes. For example, media system 108 may determine whether a content feature is forward-facing, side-facing, or back-facing, and content features with a specific orientation may be favored. Alternatively, media system 108 may randomly select keyframes from two or more keyframes with the same confidence value to include in a preview image of the media object. As noted above, media system 108 may include a random number generator, and may use the random number generator when selecting keyframes from two or more keyframes with the same confidence value.

[0077] As briefly mentioned above, such as Figure 2A As shown in step 222, in some embodiments, the media system 108 determines that multiple keyframes, including content features, appear in multiple separate (e.g., disconnected) keyframe clusters of the media object. In such instances, based on the assumption that each cluster may have different instances of content features, the media system 108 can determine keyframes from each keyframe cluster. As will be described in more detail below, various methods can be used to select keyframes from each separate content cluster.

[0078] For example, in some embodiments, in response to determining that multiple keyframes including specific content features appear in separate keyframe clusters, media system 108 may compare the confidence values of each keyframe within each separate keyframe cluster. If a keyframe within a particular keyframe cluster has a higher confidence value than the other keyframes, then media system 108 selects the keyframe with the highest confidence value for use in generating a potential preview image of the media object. Thus, media system 108 can select the highest-ranking keyframe from each separate keyframe cluster (e.g., two separate keyframe clusters produce two selected keyframes).

[0079] As briefly discussed above, if media system 108 determines that multiple keyframes in a separated keyframe cluster have the same highest confidence value, then in the above discussion regarding... Figure 2AIn any of the methods described in step 220, the media system 108 may select a single keyframe from the plurality of keyframes having the same highest confidence value to include in the preview image of the media object. In an additional embodiment, in any of the methods described above, the media system 108 may identify the keyframe with the highest confidence value in each keyframe cluster, and then may compare several content features included in each highest confidence keyframe to select at least one keyframe to generate a potential preview image of the media object.

[0080] For example, after determining the highest confidence keyframes in the separated keyframe clusters, media system 108 queries data table 400 of the media object database to compare the number of content features included in each highest confidence keyframe of the separated keyframe clusters. By comparing the number of content features included in each highest confidence keyframe of the separated keyframe clusters, media system 108 can select the keyframe that includes the highest number of content features from the highest confidence keyframes to include in the preview image of the media object. If two or more of the highest confidence keyframes include the same highest number of content features, then media system 108 can proceed according to the above description regarding step 220 or... Figure 2A Any method described may be used to select one of the keyframes with the highest confidence.

[0081] Let's refer to it again. Figure 2A In steps 212-222, when selecting at least one keyframe to use to generate a potential preview image of the media object for each identified content feature of the media object, the media system 108 may select one or more keyframes that include multiple content features of the media object. In other words, the media system 108 may select a single keyframe for generating a preview image of the media object corresponding to two or more content features. For example, the selected single keyframe may include two or more content features, and for each of these two or more content features, the media system 108 selects that single keyframe to include in the preview image of the media object.

[0082] In addition to selecting keyframes to use in generating the potential preview image of the media object, such as Figure 2A As shown in step 224, media system 108 discards unselected keyframes. For example, media system 108 discards keyframes that it did not select for use as a potential preview image for a media object. As used herein, the term "discard" and any derived terms refer to the removal of unselected keyframes by media system 108 as an identifier of keyframes. For example, media system 108 may remove unselected keyframes from the above description of keyframes. Figure 3 and Figure 4The data is removed from data tables 300 and 400 of the media object database discussed above. In other embodiments, the term "discard" and any derived terms may refer to the media system 108 marking unselected keyframes as unusable as preview images for media objects. For example, the media system 108 may identify (e.g., mark) unselected keyframes within data tables 300 and 400 as unusable as preview images. In yet another embodiment, the term "discard" and any derived terms may refer to the media system 108 removing unselected keyframes from the above-mentioned... Figure 2A Step 206 discusses the removal (e.g., deletion) of the keyframe data packets.

[0083] After selecting one or more keyframes for generating potential preview images related to specific content features, in one or more embodiments, such as Figure 2B As shown in step 226, media system 108 generates tags to assign to each selected keyframe. As used herein, the term "tag" or any derived term refers to associating tag data with a media object and / or a portion of a media object. For example, a tag may indicate that a content feature is associated with a selected keyframe from a media object.

[0084] As briefly noted above, media system 108 generates tags to assign to selected keyframes of a media object, the tags indicating detected content features of the selected keyframes. Each tag may indicate the content features of the selected keyframe to which it is associated. In other words, each tag may include data (e.g., text) indicating one or more content features of the associated keyframe. As a non-limiting example, a tag may include the text “dog” to indicate that the selected keyframe to which the tag is assigned depicts a dog. As a result of the foregoing, depending on the content features(s) depicted in the selected keyframe(s), media system 108 may associate a single tag with the selected keyframe, or alternatively, media system 108 may associate multiple tags with the selected keyframe(s).

[0085] To generate tags and / or assign them to selected keyframes, media system 108 may query a first data table and / or a second data table in a media object database to determine the identified content features for each selected keyframe. After determining the identified content features(s) for each selected keyframe, media system 108 generates and associates tags indicating the content features of the selected keyframes. For example, media system 108 may store data representing each tag in the media object database and may associate each tag with its corresponding keyframe within the media object database (e.g., within data table 300 or data table 400).

[0086] In addition to generating tags, such as Figure 2A As shown in step 228, media system 108 generates a potential preview image for a media object based on each selected keyframe. Specifically, media system 108 generates a potential preview image of the media object to include an image of each keyframe. In some embodiments, media system 108 generates a potential preview image to include the entire image within a particular selected keyframe (e.g., the preview image is an image of the selected keyframe). In other embodiments, media system 108 generates a potential preview image to include only a portion of the selected keyframe (e.g., a portion including content features). Furthermore, media system 108 stores data representing each of the generated potential preview images in a media object database.

[0087] Still referencing Figure 2B As shown in step 230, media system 108 can associate data representing selected keyframes, assigned markers, and potential preview images of a media object with the media object itself. Specifically, media system 108 can associate metadata with media objects, and the metadata can represent (e.g., indicate and / or include) the selected keyframes, assigned markers, and potential preview images. For example, media system 108 can store metadata and associate the metadata with media objects within the media object database of media hosting server device 104.

[0088] As briefly discussed above, media system 108 generates potential preview images of media objects so as to be able to provide relevant preview images of media objects in response to search queries. Figure 2C The diagram illustrates a sequence flowchart 250 of providing relevant preview images of video content for a media object in response to a search query. For example, as shown in step 252, Figure 2C The illustration shows media system 108 receiving a search query from client device 102. For example, media system 108 may receive a search query from search engine 112 of client device 102. The search query may include one or more search terms provided by the user attempting to locate video content of interest to the user.

[0089] In response to receiving a search query from client device 102, such as Figure 2B As shown in step 254, media system 108 can query its media object database, and specifically, the tags of selected keyframes of media objects stored in the media object database. (As indicated by...) Figure 2CAs illustrated in step 255, based on a query, media system 108 can identify a media object as a search result of a search query. For example, media system 108 can query tags associated with a media object to identify the media object to be provided as a search result in response to a search query.

[0090] Although media system 108 is described herein as querying a media object database while searching for search terms within the tags of media objects, this disclosure is not limited thereto. For example, when searching for tags of media objects that match search terms of a search query, media system 108 may search for derivatives of the search terms, synonyms of the search terms, and / or related terms of the search terms. As a result, even if user 110 does not use the precise language of the tags of media objects in user 110's search query, media system 108 can identify media objects and preview images of media objects.

[0091] In addition to identifying the media objects to be provided as search results, such as Figure 2C As shown in step 256, media system 108 can select at least one preview image to provide to client device 102 as a representation of the identified media object. For example, based on the identification of a specific media object, media system 108 can determine which of one or more potential preview images associated with that media object is most relevant to the search query. In particular, based on the tags associated with the selected keyframe and / or the corresponding potential preview images, media system 108 can select preview images that include content features related to the search query.

[0092] In some embodiments, when selecting preview images to provide to client device 102, media system 108 may select a first preview image of a media object that includes all content features associated with the search terms of a search query. Alternatively, if none of the preview images of a media object include all content features associated with the search terms of a search query, media system 108 may select a preview image of a media object that includes the highest number of content features associated with the search terms of the search query. In other words, media system 108 selects a preview image of a media object that depicts the most content features related to the most search terms of the search query.

[0093] For example, if a search query includes four search terms, and a first preview image includes two content features related to two of the four search terms, while a second preview image includes three content features related to three of the four search terms, then media system 108 selects the second preview image to provide to client device 102 in response to the search query. As a result, media system 108 provides the user who generated the search query with the most relevant preview image (e.g., a video scene) within the video content as the search result.

[0094] In one or more additional embodiments, media system 108 may select two or more preview images associated with a single media object to provide to client device 102 in response to a search query. For example, media system 108 may select a first preview image and a second preview image of a media object based on the fact that both the first and second preview images are sufficiently relevant to the search query (e.g., satisfying an association threshold for one or more search terms in the search query). In other words, in some embodiments, a media object may include multiple preview images associated with a search query based on content features depicted within the search query. As a non-limiting example, a first preview image of a media object may be associated with a first search term of the search query (e.g., dog), and a second preview image of a media object may be associated with a second search term of the search query (e.g., cat). In such an instance, media system 108 may select both the first and second preview images to provide to the client device in response to a search query. For example, media system 108 may provide the first and second preview images as separate results, where each preview image is linked to a corresponding keyframe of the same media object.

[0095] In addition to selecting one or more preview images of a single media object to provide to the client device, such as Figure 2B As shown in step 258, media system 108 may optionally generate a combined preview image for the client device. Specifically, when media system 108 selects two or more preview images of a single media object to provide to the client device, media system 108 may combine the two or more preview images to form a combined preview image. For example, in some instances, media system 108 may generate a collage of two or more preview images of a single media object. In such embodiments, media system 108 generates a combined preview image to ensure that as many content features as possible relevant to the search query are shown in the combined preview image. As a result, via the client device (e.g., client device 102), the user (e.g., user 110) can more easily identify media objects with multiple content features that may be of interest to the user based on the user's search query.

[0096] In one or more embodiments, the combined preview image may include a thumbnail "slideshow" arrangement, wherein a selected preview image is displayed for a period of time and then replaced by the next selected preview image. Alternatively, another example of a combined preview image may be presented together with a graphic element indicating one or more additional preview images to display the most relevant preview image. A user can interact with this graphic element (e.g., by clicking the graphic element or providing a touch gesture) to access one or more additional preview images relevant to a specific media object and a search query provided by the user.

[0097] Once the preview image of the relevant media object has been selected and / or generated, such as Figure 2C As shown in step 260, media system 108 can provide the preview image to the client device for presentation to the user. For example, media system 108 can provide a preview image of the media object within the results page, causing the search engine to display the preview image to user 110 via the display of client device 102. Furthermore, the preview image may include a hyperlink to the media object, such that by selecting the preview image, client device 102 requests and downloads a copy of the media object to client device 102 for presentation to user 110.

[0098] Although the media hosting device is described above as performing steps 204-230, in some embodiments, the media hosting device may provide the media object to a third-party server, and the third-party server may perform steps 204-230. Furthermore, in some embodiments, the third-party server may provide a preview image to the client device in response to a search query.

[0099] Figure 5 The illustration shows a schematic diagram of a media hosting server device 104 having a media system 108 according to one or more embodiments. The media system 108 may be combined with... Figures 1-4 The media system 108 referred to herein is an example embodiment of the media system 108. The media system 108 may include various components for performing the processes and features described herein. For example, such as... Figure 5 As illustrated, media system 108 includes a keyframe identifier 502, a content feature detector 504, a tag assigner 506, a preview image generator 508, a query manager 510, and a media object database 512. Additionally, media system 108 may include additional components, such as those described below but not shown. The various components of media system 108 can communicate with each other using any suitable communication protocol.

[0100] Each component of the media system 108 may be implemented using a computing device (e.g., media hosting server device 104) including at least one processor that executes instructions causing the media system 108 to perform the processes described herein. As described above, the components of the media system 108 may be implemented using a single media hosting server device 104 or across multiple media hosting server devices 104. Although a specific number of components are shown in Figure 5 However, media system 108 may include more components, or these components may be combined into fewer components (such as a single component), as may be desired for a particular implementation.

[0101] As briefly mentioned above, the media system 108 includes a keyframe identifier 502. (The above text is incomplete and requires further context.) Figure 2A As described in step 206, the keyframe identifier 502 can manage the identification of keyframes for media objects. Specifically, the keyframe identifier can utilize both non-content-based and content-based methods for identifying keyframes of media objects. Furthermore, as described above regarding... Figure 2A As described in step 212, the keyframe identifier can manage the selection of keyframes in order to generate potential preview images for media items.

[0102] As discussed above, the media system 108 further includes a content feature detector 504. The content feature detector 504 can manage the detection of feature content within keyframes identified by the keyframe identifier 502. Specifically, the content feature detector 504 can use feature content recognition techniques (e.g., machine learning) to detect content features within keyframes. In some example embodiments, after detecting content features, the content feature detector 504 can store data representing the detected content features. For example, the content feature detector 504 can, as discussed above... Figure 2A Step 208 discusses any method for detecting the content features of a media object and storing data related to those content features.

[0103] As mentioned above, media system 108 includes a tag assigner 506. Tag assigner 506 can generate tags and assign them to selected keyframes. Specifically, tag assigner 506 can generate tags indicating content features of keyframes (such as those detected by content feature detector 504) and assign them to the keyframes. In some example embodiments, tag assigner 506 can store data representing tags in media object database 512 and can associate tags with corresponding keyframes of media objects. Furthermore, tag assigner 506 can, as described above... Figure 2B Step 226 describes any method to assign a tag to the selected keyframe.

[0104] As briefly mentioned above, media system 108 includes a preview image generator 508. The preview image generator 508 can manage the generation of preview images of media objects based on selected keyframes. For example, after keyframe identifier 502 has selected keyframes to be used as the basis for potential preview images of media objects, as described above... Figure 2AAs described in step 212, the preview image generator 508 can generate a preview image that includes the selected keyframe images. Furthermore, as noted above, the preview image generator 508 can combine two or more images from two or more selected keyframes to include within the preview image of the media object (e.g., the preview image generator 508 can combine two or more keyframes into a collage to include in the preview image). The preview image generator 508 can, as described above regarding... Figure 2B Step 228 or Figure 2C Step 258 describes any method used to generate a preview image.

[0105] Additionally, as discussed above, media system 108 may further include query manager 510. Query manager 510 can manage: receiving search queries from, for example, client device 102, and querying media object database 512 of media system 108 to identify media objects associated with the search query. For example, after receiving a search query with search terms, query manager 510 can query media object database 512 to compare the search terms of the search query with the tags of selected frames of media objects. After finding a match, if a preview image needs to be generated, query manager 510 can provide the associated selected(one or more) keyframes(s) to preview image generator 508. Otherwise, after finding a match, as discussed above... Figure 2C As described in step 260, the media system 108 can provide interrelated preview images to the client device.

[0106] As mentioned above, media system 108 includes a media object database 512. The media object database 512 may include a single database or multiple databases. Additionally, the media object database 512 may be located within media system 108. Alternatively, the media object database 512 may be located outside media system 108, such as in the cloud or a remote storage device. Furthermore, as further discussed below and as mentioned above... Figures 2A-4 As described, the media object database 512 can store data and information and provide the data and information to the media system 108.

[0107] The media object database 512 may include media objects 514 provided to the media hosting server device 104. Each media object 514 may have a media object identifier number (or simply a "media object ID number") to provide a unique identifier. In some cases, media objects 514 may be organized according to their media object ID numbers. Alternatively, media objects 514 in the media object database 512 may be organized according to other criteria such as creation date, timestamp, last modification date, latest result, etc.

[0108] like Figure 5 As shown, the media object database 512 may include media objects 514 such as keyframes 516, tags 518, preview images 520, and metadata 522. The media system 108 may store keyframes 516 grouped according to their respective media objects 514. Additionally, each keyframe 516 may have a unique issue identifier number (or simply a "keyframe ID number"). In some cases, the keyframe ID number may also identify the media object 514 to which the keyframe 516 belongs. For example, all keyframes 516 from a specific media object 514 may include media object ID numbers within the keyframe ID number.

[0109] Additionally, each tag 518 of the media object 514 can be associated with a keyframe 516 of the media object 514. When the media system 108 detects the content features of the keyframe 516 and assigns the tag 518 to the keyframe 516, as described above... Figure 2A As discussed, media system 108 can add tags 518 as part of tags(multiple) 518. Thus, tags 518 can comprise an accumulated set of tags 518 for media object 514. Additionally, each tag 518 can have a unique tag identifier (or simply a "tag ID"). In some instances, the tag ID can identify the media object 514 and / or the selected keyframe 516 associated with the tag 518. For example, based on the tag ID, media system 108 can identify the corresponding media object 514 and / or the selected keyframe 516 of media object 514.

[0110] As noted above, media objects 514 in the media object database 512 can be associated with one or more potential preview images 520. Furthermore, each preview image 520 for a media object 514 can be associated with at least one selected keyframe 516 and associated marker 518 for that media object 514. When the media system 108 generates a preview image 520 based on the detected content features of the media object 514 and the selected keyframe 516, as described above... Figure 2A As discussed, media system 108 can add preview images 520 as part of potential preview images 520. Thus, preview images 520 can comprise an accumulated set of preview images 520 for a given media object 514. Additionally, each preview image 520 can have a unique identifier (or simply a "preview image ID"). In some instances, the preview image ID can identify the media object 514 and / or the selected keyframe 516 associated with it. For example, based on the preview image ID, media system 108 can identify the corresponding media object 514 and / or the selected keyframe 516 of the media object 514.

[0111] As briefly mentioned above, media object 514 may further include metadata 522 associated with media object 514. For example, media object may include metadata 522 such as those mentioned above. Figure 2B The metadata 522 is the metadata of the aforementioned metadata 522. In particular, the metadata can associate keyframe 516, marker 518, and preview image 520 with media object 514.

[0112] Figure 6 A flowchart illustrating an example method 600 for providing a relevant video scene (e.g., a preview image) in response to a video search query is shown. Method 600 can be implemented by the media system 108 described above. Method 600 involves an action 610 of analyzing video content within a media object 514 to determine multiple keyframes. For example, action 610 may include: analyzing the media object, which includes multiple frames of video content, by at least one processor to determine multiple keyframes within the video content.

[0113] Additionally, method 600 involves an action 620 of identifying content features depicted in each of a plurality of keyframes. For example, action 620 may include: identifying content features depicted in each of the plurality of keyframes using at least one processor. For example, media system 108 may utilize machine learning and / or deep learning to detect and identify one or more content features within each keyframe. Specifically, action 620 may include: detecting objects, activities (e.g., running, skydiving, hiking), photo types (e.g., macro, portrait, etc.), emotions (e.g., smiling, crying, etc.), animals, landscapes (e.g., beach, forest, mountains), locations, colors, etc., depicted in keyframe 516 of media object 514. Furthermore, action 620 may include: using machine learning to detect content features depicted in keyframe 516. Additionally, action 620 may include the above-mentioned... Figure 2A Any action described in step 208.

[0114] Furthermore, method 600 involves an action 630 of associating tags with content features identified within each keyframe. For example, action 630 may include: associating a tag with each of a plurality of keyframes by at least one processor, wherein a given tag corresponds to a given content feature depicted in a given keyframe of the plurality of keyframes. For example, action 630 may include: associating one or more tags 518 with one or more keyframes 516 of a plurality of keyframes 516 of a media object 514 within a media object database 512 (e.g., within a first or second data table of the media object database 512). Additionally, action 630 may include the above-mentioned... Figure 2B Any action described in step 226.

[0115] Method 600 also involves an action 640 of determining to provide media objects as search results for a search query. For example, action 640 may include determining to provide media objects as search results for a search query received from a client device. For example, a client device associated with a user may send a search query to media system 108 including one or more search terms, and media system 108 may determine one or more media objects to be provided in the search results list.

[0116] Furthermore, method 600 may further include an action 650 of selecting a keyframe from a plurality of keyframes as a preview image based on a search query corresponding to a tag. For example, action 650 may include selecting a keyframe from a plurality of keyframes as a preview image for a media object, based on a search query relating to specific content features depicted in the selected keyframe, as indicated by a specific tag associated with the selected keyframe.

[0117] Furthermore, method 600 relates to action 660 of providing a preview image in response to a search query. For example, action 660 may include providing a preview image of a media object to a client device in response to a search query. Additionally, method 600 may include selecting at least one keyframe 516 of a plurality of keyframes 516 of the media object 514 based on a confidence value associated with each detected content feature of each keyframe 516 in the plurality of keyframes 516 (e.g., selecting at least one keyframe 516 of the plurality of keyframes 516 that has the highest confidence value for each detected content feature, to be included in the preview image 520 of the media item). Furthermore, method 600 may also include discarding unselected keyframes 516 of the plurality of keyframes 516 of the media object 514. Furthermore, method 600 may include generating a plurality of preview images of the media object 514. Additionally, method 600 may include the above-mentioned... Figures 2A-4 Any action described.

[0118] Figure 7 A flowchart illustrating another example method 700 for providing a relevant video scene (e.g., preview image 520) in response to a video search query is shown. Method 700 can be implemented by the media system 108 described above. Method 700 involves an action 710 of identifying keyframes within video content. For example, action 710 may include identifying keyframes from multiple frames maintained on the media hosting server device via at least one processor of the media hosting server device. Furthermore, action 710 may include content-based and non-content-based methods utilizing keyframes 516 that identify media object 514.

[0119] Method 700 may further include action 720 of identifying at least one content feature depicted in each keyframe. Specifically, action 720 may include: identifying at least one content feature depicted in each keyframe by means of the at least one processor. For example, action 720 may further include: detecting objects, activities (e.g., running, skydiving, hiking), photo types (e.g., macro, portrait, etc.), emotions (e.g., smiling, crying, etc.), animals, landscapes (e.g., beach, forest, mountains), locations, colors, etc., depicted in keyframe 516 of media object 514. Furthermore, action 720 may include: using machine learning to detect content features depicted in keyframe 516.

[0120] Additionally, method 700 includes action 730 of determining a confidence value for the at least one content feature. Specifically, action 730 may include: determining a confidence value for the at least one content feature depicted in each keyframe 516 by at least one processor. For example, action 730 may include: assigning the confidence value to each detected content feature among one or more content features determined by content feature recognition techniques (such as, for example, machine learning as described above).

[0121] Method 700 also relates to an action 740 of associating a marker indicating a given content feature with each keyframe based on a confidence value. For example, action 740 may include associating a marker with a keyframe based on a confidence value for the at least one content feature depicted in each keyframe, the given marker indicating the given content feature depicted in the given keyframe. For example, action 740 may include associating one or more markers 518 with one or more keyframes 516 of media object 514 within a media object database 512 (e.g., within a first or second data table 300, 400 of the media object database 512).

[0122] Furthermore, method 700 relates to action 750: selecting a keyframe as a preview image for video content based on a received search query, and based on the search query's relation to specific content features of the keyframe (as indicated by associated markers). For example, action 750 may include: selecting a keyframe as a preview image for video content based on a received search query, by determining that the search query is related to specific content features depicted in the selected keyframe, as indicated by specific markers associated with the selected keyframe.

[0123] In addition to the steps illustrated, method 700 may further include storing data in a media object database 512 that associates one or more tags 518 with keyframes 516 of media object 514. Additionally, method 700 may further include determining frame specifications, including determining the frame rate and number of frames of media object 514. Furthermore, method 700 may include selecting at least one keyframe 516 of media object 514 for inclusion in a preview image 520 of media object 514 for each detected content feature. Additionally, method 700 may include the steps described above regarding... Figure 2A and Figure 2B Any action described.

[0124] Figure 8 The diagram illustrates a block diagram of an example computing device 800 that can be configured to perform one or more of the processes described above. It will be understood that one or more computing devices, such as computing device 800, may implement media system 108 and / or client device 102. Figure 8 As shown, computing device 800 may include: a processor 802, a memory 804, a storage device 806, an I / O interface 808, and a communication interface 810, which can be communicatively coupled via communication infrastructure 812. Although the example computing device 800... Figure 8 It is shown in the middle, but Figure 8 The components illustrated herein are not intended to be limiting. Additional or alternative components may be used in other embodiments. Furthermore, in some embodiments, the computing device 800 may include components that are larger than those shown in the figures. Figure 8 The number of parts shown is fewer. A more detailed description will follow. Figure 8 The components of the computing device 800 shown.

[0125] In one or more embodiments, processor 802 includes hardware for executing instructions, such as those constituting a computer program. By way of example, and not limitation, for executing instructions, processor 802 may fetch (or extract) instructions from internal registers, internal caches, memory 804, or storage device 806, and decode and execute the instructions. In one or more embodiments, processor 802 may include one or more internal caches for data, instructions, or addresses. By way of example, and not limitation, processor 802 may include one or more instruction caches, one or more data caches, and one or more translation back buffers (TLBs). Instructions in the instruction cache may be copies of instructions in memory 804 or storage device 806.

[0126] Computing device 800 includes memory 804 coupled to processor(s) 802. Memory 804 can be used to store data, metadata, and programs for execution by processor(s). Memory 804 may include one or more of volatile and non-volatile memories such as random access memory (“RAM”), read-only memory (“ROM”), solid-state drive (“SSD”), flash memory, phase-change memory (“PCM”), or other types of data storage devices. Memory 804 may be internal memory or distributed memory.

[0127] Computing device 800 includes storage device 806, which includes storage means for storing data or instructions. By way of example and not limitation, storage device 806 may include the non-transitory storage media described above. Storage device 806 may include a hard disk drive (HDD), floppy disk drive, flash memory, optical disk, magneto-optical disk, magnetic tape, or Universal Serial Bus (USB) drive, or two or more combinations thereof. Where appropriate, storage device 806 may include removable or non-removable (or fixed) media. Storage device 806 may be internal or external to computing device 800. In one or more embodiments, storage device 806 is a non-volatile, solid-state memory. In other embodiments, storage device 806 includes read-only memory (ROM). Where appropriate, this ROM may be a mask-programmable ROM, a programmable ROM (PROM), an erasable PROM (EPROM), an electrically erasable PROM (EEPROM), an electrically rewritable ROM (EAROM), or flash memory, or two or more combinations thereof.

[0128] The computing device 800 also includes one or more input or output (“I / O”) devices / interfaces 808 provided to allow a user to provide input to the computing device 800, receive output from the computing device 800, and otherwise transfer data to and receive data from the computing device 800. The I / O devices / interfaces 808 may include a mouse, keypad or keyboard, touchscreen, camera, optical scanner, network interface, modem, other known I / O devices, or combinations of such I / O devices / interfaces. A touchscreen may be activated by a stylus or finger.

[0129] I / O device / interface 808 may include one or more devices for presenting output to a user, including but not limited to: a graphics engine, a display (e.g., a screen), one or more output drivers (e.g., a display driver), one or more speakers, and one or more audio drivers. In some embodiments, I / O interface 808 is configured to provide graphics data to a display for presentation to a user. The graphics data may represent one or more graphical user interfaces and / or any other graphical content that may be used in a particular implementation.

[0130] The computing device 800 may further include a communication interface 810. The communication interface 810 may include hardware, software, or both. The communication interface 810 provides one or more interfaces for communication (such as packet-based communication) between the computing device 800 and one or more other computing devices or networks. By way of example and not limitation, the communication interface 810 may include a network interface controller (NIC) or network adapter for communicating with Ethernet or other wired networks, or a wireless NIC (WNIC) or wireless adapter for communicating with wireless networks (such as Wi-Fi). The computing device 800 may further include a bus 812. The bus 812 may include hardware, software, or both for coupling components of the computing device 800 to each other.

[0131] The foregoing description has been described with reference to specific example embodiments. Various embodiments and aspects of this disclosure have been described with reference to the details discussed herein, and the accompanying drawings illustrate various embodiments. The foregoing description and drawings are illustrative and should not be construed as limiting. Numerous specific details have been described to provide a thorough understanding of various embodiments.

[0132] Additional or alternative embodiments may be implemented in other specific forms without departing from their spirit or essential characteristics. The described embodiments are to be considered in all respects as illustrative rather than restrictive. Therefore, the scope of the invention is indicated by the appended claims, not by the foregoing description. All modifications within the meaning and scope of the equivalents of the claims will be included within their scope.

[0133] The embodiments described above and illustrated in the accompanying drawings do not limit the scope of the invention, as these embodiments are merely examples of embodiments of the invention, which is defined by the appended claims and their equivalents. Any equivalent embodiments are intended to be within the scope of the invention. In fact, various modifications to this disclosure will become apparent to those skilled in the art from the specification, in addition to those modifications shown and described herein (such as alternative useful combinations of the described features). Such modifications and embodiments are also intended to fall within the scope of the appended claims and their legal equivalents.

Claims

1. A method for providing relevant overlay frames in response to a video search query, the method comprising: Receive search queries that include one or more search terms; Identify the set of digital videos that respond to the search query; For each digital video from the digital video set, a video frame is determined from the digital video by comparing the one or more search terms with tags associated with video frames of the digital video, for use in a preview image of the digital video. The determined video frame depicts content features corresponding to the one or more search terms. Determining the video frame from the digital video for use in the preview image of the digital video for each digital video from the digital video set includes, for example, for a first digital video: A first keyframe is determined, the first keyframe having an associated marker corresponding to each of the one or more search terms; Determine a second keyframe, wherein the first keyframe has an associated marker corresponding to each of the one or more search terms; Compare one or more first confidence values for the first keyframe with one or more second confidence values for the second keyframe; as well as Based on comparing the one or more first confidence values and the one or more second confidence values, the first keyframe is selected as the preview image for the first digital video; as well as In response to the search query, the preview image is provided for each digital video from the digital video collection, wherein each preview image for the corresponding digital video from the digital video collection includes determined video frames depicting the content features corresponding to the one or more search terms.

2. The method according to claim 1, wherein: Receiving the search query includes: receiving two or more search terms; Determining the video frame includes: determining a video frame having an associated tag corresponding to each of the two or more search terms; and Providing the preview image in response to the search query includes providing the video frame having an associated tag corresponding to each of the two or more search terms and depicting the content features corresponding to each of the two or more search terms.

3. The method according to claim 1, wherein: Determining the video frame from each digital video from the digital video set for use in the preview image for the digital video includes: identifying a first video frame from a first digital video, the first video frame depicting the content features corresponding to the one or more search terms; and Providing the preview image for each digital video from the digital video collection in response to the search query includes: providing a first video frame depicting the content features corresponding to the one or more search terms.

4. The method according to claim 3, further comprising: Receive a second search query that includes one or more different search terms; The first digital video identifies a second video frame, the second video frame depicting content features corresponding to the one or more different search terms; as well as In response to the second search query, the second video frame is provided as a preview image of the first digital video.

5. The method of claim 1, wherein identifying the digital video set in response to the search query comprises: Identify multiple digital videos having associated tags corresponding to at least one of the one or more search terms.

6. The method of claim 5, wherein identifying the plurality of digital videos having an associated tag corresponding to at least one of the one or more search terms comprises: A digital video that identifies a search term with associated tags that are derived from, synonyms of, or related to the search term.

7. The method of claim 1, wherein providing the preview image for each digital video from the digital video collection in response to the search query comprises: Multiple preview images are provided.

8. The method of claim 7, further comprising: Receive selection of a first preview image depicting the content features corresponding to the one or more search terms; as well as In response to the selection, access to a first digital video is provided, from which the first preview image is obtained.

9. A non-transitory computer-readable medium storing instructions thereon, said instructions, when executed by at least one processor, causing a computer system to: Identify the content features depicted in video frames of digital video; The marker is associated with the video frame of the digital video, the marker indicating the identified content feature depicted in the video frame; Receives search queries that include one or more search terms; Identify the set of digital videos that respond to the search query; For each digital video from the digital video collection, a video frame is determined from the digital video by comparing the one or more search terms with one or more tags associated with one or more video frames of the digital video for use in a preview image of the digital video, wherein the determined video frame depicts content features corresponding to the one or more search terms; In response to the search query, the preview image is provided for each digital video from the digital video collection, wherein each preview image for a given digital video includes determined video frames for the given digital video that depict the content features corresponding to the one or more search terms; Determine the confidence value of the content features depicted within the video frame; and The marker indicating that the content feature is depicted within the video frame is associated with the video frame based on the confidence value being higher than a threshold.

10. The non-transitory computer-readable medium of claim 9, wherein the instructions, when executed by the at least one processor, cause the computer system to: identify the content features depicted in the video frames of the digital video by identifying objects, activities, or colors within keyframes of the digital video.

11. The non-transitory computer-readable medium of claim 9, further comprising instructions that, when executed by the at least one processor, cause the computer system to: Grouping video frames with shared content features into the cluster; and Select representative video frames from the cluster of video frames as keyframes.

12. The non-transitory computer-readable medium of claim 9, wherein the instructions, when executed by the at least one processor, cause the computer system to: identify the content features depicted in the video frames of the digital video using a convolutional neural network.

13. The non-transitory computer-readable medium of claim 9, wherein the instructions, when executed by the at least one processor, cause the computer system to: The search query is received by receiving two or more search terms; The video frame depicting the content features corresponding to the one or more search terms is determined by determining that the video frame has an associated tag corresponding to each of the two or more search terms; and In response to the search query, the preview image is provided by providing the video frame having an associated tag corresponding to each of the two or more search terms and depicting the content features corresponding to each of the two or more search terms.

14. A system for providing relevant overlay frames in response to a video search query, comprising: Digital storage devices, including databases of digital video; as well as At least one server is configured to enable the system to: Receive a search query from the client device that includes two or more search terms; A collection of digital videos responding to a search query is identified from the database of digital videos by identifying digital videos having associated tags corresponding to at least one of the two or more search terms; For each digital video from the digital video collection, a video frame is determined from the digital video by comparing the two or more search terms and a tag associated with a video frame of the digital video for use in a preview image of the digital video, the determined video frame depicting content features corresponding to the two or more search terms; In response to the search query, a preview image for each digital video from the digital video collection is sent to the client device, wherein each preview image for a corresponding digital video from the digital video collection includes a corresponding video frame depicting the content features corresponding to the two or more search terms; as well as The video frame is determined from each digital video in the digital video set by identifying the keyframe that has the highest confidence value for the marker corresponding to the two or more search terms, for use in the preview image for the digital video.

15. The system of claim 14, wherein the at least one server is further configured to cause the system to attach a hyperlink to each preview image to a corresponding digital video.

16. The system of claim 14, wherein the at least one server is further configured to cause the system to select the preview image for a given digital video by: Determine a first video frame and a second video frame that depict the content features corresponding to each of the two or more search terms; Compare the first confidence value for the first video frame with the second confidence value for the second video frame; as well as The first video frame is selected as the preview image for the given digital video based on the determination that the first confidence value exceeds the second confidence value.

17. The system of claim 14, wherein the at least one server is further configured to cause the system to: Analyze the digital videos from the database in response to the search query to determine multiple keyframes for each digital video; and For each digital video from the set of digital videos, the video frame depicting the content features corresponding to each of the two or more search terms is determined by comparing the two or more search terms and the tags associated with the plurality of keyframes for each digital video.

18. A system for providing relevant overlay frames in response to a video search query, comprising: Digital storage devices, including databases of digital video; as well as At least one server is configured to enable the system to: Receive a search query from the client device that includes two or more search terms; A collection of digital videos responding to a search query is identified from the database of digital videos by identifying digital videos having associated tags corresponding to at least one of the two or more search terms; For each digital video from the digital video collection, a video frame is determined from the digital video by comparing the two or more search terms and a tag associated with a video frame of the digital video for use in a preview image of the digital video, the determined video frame depicting content features corresponding to the two or more search terms; In response to the search query, a preview image for each digital video from the digital video collection is sent to the client device, wherein each preview image for a corresponding digital video from the digital video collection includes a corresponding video frame depicting the content features corresponding to the two or more search terms; and The preview image for a given digital video is selected as follows: Determine a first video frame and a second video frame that depict the content features corresponding to each of the two or more search terms; Compare the first confidence value for the first video frame with the second confidence value for the second video frame; as well as The first video frame is selected as the preview image for the given digital video based on the determination that the first confidence value exceeds the second confidence value.

19. The system of claim 18, wherein the at least one server is further configured to cause the system to: determine the video frame from the digital video for each digital video from the digital video set by identifying a keyframe that has the highest confidence value for a tag corresponding to the two or more search terms, for use in the preview image for the digital video.

20. The system of claim 18, wherein the at least one server is further configured to cause the system to attach a hyperlink to each preview image to a corresponding digital video.

21. The system of claim 18, wherein the at least one server is further configured to cause the system to: Analyze the digital videos from the database in response to the search query to determine multiple keyframes for each digital video; and For each digital video from the set of digital videos, the video frame depicting the content features corresponding to each of the two or more search terms is determined by comparing the two or more search terms and the tags associated with the plurality of keyframes for each digital video.