Video search method, apparatus, medium, and computing device
By identifying images indicating preset categories, application scenarios, and/or purposes using target parameters of each frame in the video, and then searching for videos similar to the video using image features and the images to be retrieved, this technology solves the problem of low efficiency in finding similar videos in existing technologies and achieves efficient video search.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HANGZHOU NETZHIYI INNOVATION TECH CO LTD
- Filing Date
- 2023-08-10
- Publication Date
- 2026-06-19
AI Technical Summary
When searching for similar videos in a massive video database, existing technologies suffer from low efficiency, particularly image feature matching methods for finding similar videos.
This video search method uses target parameters of each frame in a video to determine image features that indicate a preset category, application scenario, and/or purpose, and then clusters these features into a set of images. By using the target parameters of each frame in a video to determine images that indicate a preset category, application scenario, and/or purpose, and then using these image features to search for similar videos, the method reduces the number of comparisons between each frame in the video and the images to be searched, thus improving the efficiency of finding similar videos.
By identifying images that indicate preset categories, application scenarios, and/or purposes using target parameters of each frame in the video, and then searching for videos similar to the video by comparing the images with the images to be retrieved, the number of comparisons between the images and each frame in the video is reduced, thus improving the efficiency of finding similar videos.
Smart Images

Figure CN117112839B_ABST
Abstract
Description
Technical Field
[0001] Embodiments of this disclosure relate to the field of images, and more specifically, embodiments of this disclosure relate to video search methods, apparatus, media, and computing devices. Background Technology
[0002] This section is intended to provide background or context for embodiments of this disclosure. The description herein is not intended to imply that it is prior art simply because it is included in this section.
[0003] When searching for videos, you can use images as references, that is, search for videos that are similar to existing images.
[0004] An exemplary technique compares the image features of an image with the image features of each frame of a video in a database to find videos similar to the image. However, the database contains a massive amount of video data, making the search for similar videos time-consuming and inefficient. Summary of the Invention
[0005] This disclosure provides a video search method, apparatus, medium, and computing device to address the problem of low efficiency in searching for similar videos of images.
[0006] In a first aspect of this disclosure, a video search method is provided, comprising: acquiring target parameters for each frame of a first image in a first video, the target parameters indicating at least one of an image category, an application scenario, and a purpose of the first image; determining a second image among each of the first images based on the target parameters, wherein the image category of the second image is a preset category, the application scenario of the second image is a preset application scenario, and / or the purpose of the second image is a preset purpose; determining a first similarity between the third image and each frame of the second image based on image features of the second image and image features of a third image to be searched; and determining a video similar to the third image among a plurality of second videos based on each of the first similarities, wherein the first video is any one of the second videos.
[0007] In one embodiment of this disclosure, determining a video similar to the third image based on each of the first similarities includes: acquiring image features of each frame of a fourth image, wherein the fourth image is a first image other than the second image; clustering each frame of the fourth image based on the image features of each of the fourth images to obtain multiple cluster sets, wherein each second similarity corresponding to the cluster set is greater than a first preset similarity, and the second similarity is used to indicate the similarity between any two frames of the fourth images in the cluster set; determining a first cluster set in each of the cluster sets, wherein the image content of the fourth images in the first cluster set is used to indicate the main content of the first video; determining a third similarity between the third image and each of the first cluster sets based on the image features of the fourth images in the first cluster set, and determining a video similar to the third image based on each of the first similarities and each of the third similarities.
[0008] In another embodiment of this disclosure, determining the video similar to the third image based on each of the first similarities and each of the third similarities includes: determining a fourth similarity corresponding to the first video based on each of the first similarities and each of the third similarities; determining a target similarity among the fourth similarity and each of the fifth similarities corresponding to the third video, wherein the target similarity is greater than a second preset similarity, and the third video is a second video other than the first video; and taking the first video or the third video corresponding to the target similarity as the video similar to the third image.
[0009] In another embodiment of this disclosure, determining the fourth similarity corresponding to the first video based on each of the first similarities and each of the third similarities includes: in response to any first similarity or any of the third similarities being greater than a third preset similarity, determining the first similarity or the third similarity being greater than the third preset similarity as the fourth similarity corresponding to the first video.
[0010] In another embodiment of this disclosure, determining the fourth similarity corresponding to the first video based on each of the first similarities and each of the third similarities includes: in response to each of the first similarities and each of the third similarities being less than or equal to a third preset similarity, determining a sixth similarity between the third image and each of the second cluster sets, wherein the second cluster sets are cluster sets other than the first cluster sets; in response to any of the sixth similarities being greater than the third preset similarity, determining the sixth similarity greater than the third preset similarity as the fourth similarity corresponding to the first video.
[0011] In another embodiment of this disclosure, determining the sixth similarity between the third image and each of the second cluster sets includes: determining the first target image features corresponding to the second cluster set based on the image features of each of the fourth images in the second cluster set; and determining the sixth similarity between the third image and the second cluster set based on the image features of the third image and the first target image features.
[0012] In another embodiment of this disclosure, determining the first cluster set in each of the cluster sets includes: obtaining a first number and a second number of fourth images in the cluster set, the second number indicating the number of second similarities greater than a fourth preset similarity, the fourth preset similarity being greater than or equal to the first preset similarity; determining the first cluster set in each of the cluster sets based on the first number and the second number, wherein the first number corresponding to the first cluster set is greater than a first preset number, and the second number corresponding to the first cluster set is greater than a second preset number.
[0013] In another embodiment of this disclosure, determining the third similarity between the third image and each of the first cluster sets based on the image features of the fourth images in the first cluster set includes: determining the second target image features corresponding to the first cluster set based on the image features of each of the fourth images in the first cluster set; and determining the third similarity between the third image and the first cluster set based on the image features of the third image and the second target image features.
[0014] In another embodiment of this disclosure, obtaining the target parameters of each frame of the first image in the first video includes: sampling the images of the first video at a preset interval to obtain multiple frames of fifth images; using each frame of the fifth image as a first image, and obtaining the target parameters of each frame of the first image.
[0015] In a second aspect of this disclosure, a video search apparatus is provided, comprising: an acquisition module, configured to acquire target parameters of each frame of a first image in a first video, the target parameters indicating at least one of an image category, an application scenario, and a purpose of the first image; a first determination module, configured to determine a second image among the respective first images based on the target parameters, wherein the image category of the second image is a preset category, the application scenario of the second image is a preset application scenario, and / or the purpose of the second image is a preset purpose; a second determination module, configured to determine a first similarity between the third image and each frame of the second image based on image features of the second image and image features of a third image to be searched; and a third determination module, configured to determine a video similar to the third image among a plurality of second videos based on each of the first similarities, wherein the first video is any one of the second videos.
[0016] In one embodiment of this disclosure, the third determining module includes: a first acquiring unit, configured to acquire image features of each frame of a fourth image, wherein the fourth image is a first image other than the second image; a clustering unit, configured to cluster each frame of the fourth images according to the image features of each fourth image to obtain multiple cluster sets, wherein each second similarity corresponding to the cluster set is greater than a first preset similarity, and the second similarity is used to indicate the similarity between any two frames of the fourth images in the cluster set; a first determining unit, configured to determine a first cluster set in each of the cluster sets, wherein the image content of the fourth images in the first cluster set is used to indicate the main content of the first video; and a second determining unit, configured to determine a third similarity between a third image and each of the first cluster sets based on the image features of the fourth images in the first cluster set, and to determine a video similar to the third image based on each of the first similarities and each of the third similarities.
[0017] In another embodiment of this disclosure, the second determining unit includes: a first determining subunit, configured to determine a fourth similarity corresponding to the first video based on each of the first similarities and each of the third similarities; a second determining subunit, configured to determine a target similarity among the fourth similarity and the fifth similarities corresponding to each of the third videos, wherein the target similarity is greater than a second preset similarity, and the third video is a second video other than the first video; and a third determining subunit, configured to regard the first video or the third video corresponding to the target similarity as a video similar to the third image.
[0018] In another embodiment of this disclosure, the first determining unit includes: a first determining component, configured to determine the first similarity or the third similarity that is greater than the third preset similarity as the fourth similarity corresponding to the first video in response to any first similarity or any third similarity being greater than the third preset similarity.
[0019] In another embodiment of this disclosure, the first determining unit includes: a second determining component, configured to determine a sixth similarity between the third image and each of the first similarities and the third similarities being less than or equal to a third preset similarity, wherein the second cluster set is a cluster set other than the first cluster set; and a third determining component, configured to determine the sixth similarity greater than the third preset similarity as a fourth similarity corresponding to the first video, in response to any sixth similarity being greater than the third preset similarity.
[0020] In another embodiment of this disclosure, the second determining component includes: a first determining module, configured to determine a first target image feature corresponding to the second cluster set based on the image features of each fourth image in the second cluster set; and a second determining module, configured to determine a sixth similarity between the third image and the second cluster set based on the image features of the third image and the first target image feature.
[0021] In another embodiment of this disclosure, the first determining unit includes: an acquisition subunit, configured to acquire a first number and a second number of fourth images in the cluster set, the second number indicating the number of second similarities greater than a fourth preset similarity, the fourth preset similarity being greater than or equal to the first preset similarity; and a fourth determining subunit, configured to determine a first cluster set in each of the cluster sets based on the first number and the second number, the first number corresponding to the first cluster set being greater than a first preset number, and the second number corresponding to the first cluster set being greater than a second preset number.
[0022] In another embodiment of this disclosure, the second determining unit includes: a fifth determining subunit, configured to determine a second target image feature corresponding to the first cluster set based on the image features of each fourth image in the first cluster set; and a sixth determining subunit, configured to determine a third similarity between the third image and the first cluster set based on the image features of the third image and the second target image feature.
[0023] In another embodiment of this disclosure, the acquisition module includes: a sampling unit, configured to sample the images of the first video at preset intervals to obtain multiple frames of fifth images; and a third determining unit, configured to use each frame of the fifth images as a first image and obtain target parameters for each frame of the first image.
[0024] In a third aspect of this disclosure, a medium is provided, comprising: computer execution instructions, which, when executed by a processor, are used to implement the method described above.
[0025] In a fourth aspect of this disclosure, a computing device is provided, comprising:
[0026] Memory and processor;
[0027] The memory stores computer-executed instructions;
[0028] The processor executes computer execution instructions stored in the memory, causing the processor to perform the method described above.
[0029] In this embodiment of the disclosure, a second image indicating a preset category, preset application scenario, and / or preset purpose is determined by the target parameters of the first image in each frame of the video. Then, the image features of the second image are used to search for videos similar to the third image. This eliminates the need to compare the image with the first image in each frame of the video, reducing the number of comparisons with the image to be searched and improving the efficiency of finding similar videos. Attached Figure Description
[0030] The above and other objects, features, and advantages of this disclosure will become readily apparent from the following detailed description of exemplary embodiments, taken in conjunction with the accompanying drawings. Several embodiments of this disclosure are illustrated in the drawings by way of example and not limitation, in which:
[0031] Figure 1 A schematic diagram illustrating an application scenario of the video search method according to embodiments of the present disclosure is provided.
[0032] Figure 2 A schematic flowchart according to an embodiment of the present disclosure is shown;
[0033] Figure 3 A schematic flowchart according to another embodiment of the present disclosure is shown;
[0034] Figure 4 A schematic flowchart according to yet another embodiment of the present disclosure is shown;
[0035] Figure 5 A schematic flowchart according to another embodiment of the present disclosure is shown;
[0036] Figure 6 A schematic flowchart according to another embodiment of the present disclosure is shown;
[0037] Figure 7 A schematic flowchart according to an embodiment of the present disclosure is shown;
[0038] Figure 8 A schematic diagram of a program product provided according to an embodiment of the present disclosure is shown.
[0039] Figure 9 A schematic diagram of the structure of a video search device provided according to an embodiment of the present disclosure is shown.
[0040] Figure 10 A schematic diagram of the structure of a computing device provided according to an embodiment of the present disclosure is shown.
[0041] In the accompanying drawings, the same or corresponding reference numerals indicate the same or corresponding parts. Detailed Implementation
[0042] The principles and spirit of this disclosure will now be described with reference to several exemplary embodiments. It should be understood that these embodiments are given merely to enable those skilled in the art to better understand and implement this disclosure, and are not intended to limit the scope of this disclosure in any way. Rather, these embodiments are provided to make this disclosure more thorough and complete, and to fully convey the scope of this disclosure to those skilled in the art.
[0043] Those skilled in the art will recognize that embodiments of this disclosure can be implemented as a system, apparatus, device, method, or computer program product. Therefore, this disclosure can be specifically implemented in the following forms: entirely hardware, entirely software (including firmware, resident software, microcode, etc.), or a combination of hardware and software.
[0044] According to embodiments of this disclosure, a video search method, apparatus, medium, and computing device are proposed.
[0045] Furthermore, the number of any elements in the accompanying drawings is for illustrative purposes only and not for limitation, and any naming is for distinction only and has no limiting meaning.
[0046] In addition, the data involved in this disclosure may be data authorized by the user or fully authorized by all parties. The collection, dissemination and use of the data shall comply with the requirements of relevant national laws and regulations. The implementation methods / executives of this disclosure may be combined with each other.
[0047] The principles and spirit of this disclosure will be explained in detail below with reference to several representative embodiments. Invention Overview
[0049] When searching for videos, you can use images as references, that is, search for videos that are similar to existing images.
[0050] The inventors of this disclosure have discovered that comparing the image features of an image with the image features of each frame of a video in a database can help find videos similar to the image. However, the database contains a massive amount of video data, making the search for similar videos time-consuming and inefficient.
[0051] The inventors of this disclosure therefore conceived of determining a second image indicating a preset category, preset application scenario, and / or preset purpose by using the target parameters of the first image in each frame of the video, and then searching for videos similar to the third image by using the image features of the second image and the image of the third image to be searched. This eliminates the need to compare the image with the first image in each frame of the video, reduces the number of comparisons with the image to be searched, and improves the efficiency of finding similar videos.
[0052] Application Scenarios Overview
[0053] First refer to Figure 1 , Figure 1 This is a schematic diagram illustrating an application scenario of the video search method according to an embodiment of the present disclosure. The video search device 100 retrieves multiple videos from the database 200, and then retrieves target parameters for a first image in each frame of the video. The target parameters indicate at least one of the image category, application scenario, and purpose of the first image. The video search device 100 determines a second image among the first images based on the target parameters. The image category of the second image is a preset category, the application scenario of the second image is a preset application scenario, and / or the purpose of the second image is a preset purpose. The video search device 100 then determines the similarity between the third image and each frame of the second image based on the image features of the second image and the image features of the third image to be searched, and searches for videos similar to the third image among the multiple videos based on each similarity.
[0054] Exemplary methods
[0055] The following is combined Figure 1 Application scenarios, refer to Figures 2-7 This document describes a video search method according to exemplary embodiments of the present disclosure. It should be noted that the above application scenarios are shown only to facilitate understanding of the spirit and principles of the present disclosure, and the embodiments of the present disclosure are not limited in any way. Rather, the embodiments of the present disclosure can be applied to any applicable scenario.
[0056] Reference Figure 2 , Figure 2An exemplary flowchart of an embodiment of a video search method provided according to this disclosure is shown. The video search method includes:
[0057] Step S201: Obtain the target parameters of the first image in each frame of the first video. The target parameters are used to indicate at least one of the image category, application scenario, and purpose of the first image.
[0058] In this embodiment, the executing entity is a video search device. For ease of description, the term "device" will be used to refer to the video search device below. The device can be any terminal device with image processing capabilities.
[0059] When users need to search for videos, they can use an image as a reference point, that is, search for videos similar to the image. Videos similar to the image refer to videos whose content is for a specific purpose, are of a specific type, or can be used in a specific scenario. In this embodiment, the device defines the image used as a reference point as the third image to be searched.
[0060] Users can send search commands to the device via a terminal or directly input search commands into the device. The search command includes a second image. The device retrieves multiple videos from a database, checks if each video is similar to the second image, and ensures that each video is processed in the same way. Any one of these videos is defined as the first video. This embodiment will be described below.
[0061] The device acquires target parameters for each frame of the first image in the first video. These target parameters indicate at least one of the image category, application scenario, and purpose of the first image. For example, the first image has a corresponding tag, which can be manually labeled, indicating the image category, applicable application scenario, and purpose of the first image. The device can acquire the corresponding target parameters based on the tag of the first image. The tags and the identifier of the first image are stored in a database. The device can extract the tag corresponding to the first image from the database based on the identifier of the first image, thereby acquiring the target parameters of the first image through the tag.
[0062] Step S202: Based on the target parameters, determine the second image among the first images, wherein the image category of the second image is a preset category, the application scenario of the second image is a preset application scenario, and / or the purpose of the second image is a preset purpose.
[0063] After determining the target parameters of each frame of the first image, the device identifies the second image among the various first images. For example, the search instruction includes user-inputted category, purpose, and application scenario. The device sets the category to a preset category, the purpose to a preset purpose, and the application scenario to a preset application scenario. Based on the target parameters of the first image, the device determines whether the first image is the second image. If the image category of the first image is a preset category, the application scenario of the first image is a preset application scenario, and / or the purpose of the first image is a preset purpose, then the first image is the second image.
[0064] Step S203: Based on the image features of the second image and the image features of the third image to be retrieved, determine the first similarity between the third image and each frame of the second image.
[0065] After determining the second image in each frame, the device determines the similarity between the third image and each frame of the second image based on the image features of the second image and the image features of the third image. This similarity is defined as the first similarity.
[0066] For example, the image features can be a feature vector composed of the pixel values of each pixel in the second image, and the image features can also include a label for the second image, which indicates the content of the second image. Furthermore, the label can be information configured by the creator; for example, the label may include the people included in the second image, the intended message, the purpose of the content, etc. The image features can be stored in a database, meaning the device retrieves the image features of each frame of the second image from the database.
[0067] The device can calculate the distance between the image features of the second image and the image features of the third image, and convert the distance into a first similarity. The smaller the distance, the greater the first similarity.
[0068] Step S204: Based on each first similarity, determine the video that is similar to the third image among multiple second videos, where the first video is any one of the second videos.
[0069] The second image has multiple frames, therefore the first video has multiple first similarities. The device retrieves multiple videos from the database, defining each video as a second video, and the first video is any one of the second videos. Each second video obtains multiple first similarities in the manner described above. The device finds the largest first similarity among all the similarities, and the video corresponding to the largest first similarity is taken as the video similar to the third image. The device then outputs this similar video.
[0070] In this embodiment, a second image indicating a preset category, preset application scenario, and / or preset purpose is determined by the target parameters of the first image in each frame of the video. Then, the image features of the second image are used to search for videos similar to the third image. This eliminates the need to compare the image with the first image in each frame of the video, reducing the number of comparisons with the image to be searched and improving the efficiency of finding similar videos.
[0071] Reference Figure 3 , Figure 3 An exemplary flowchart of another embodiment of the video search method provided according to embodiments of this disclosure is shown, based on... Figure 2 In the embodiment shown, step S204 includes:
[0072] Step S301: Obtain the image features of the fourth image in each frame. The fourth image is the first image other than the second image.
[0073] In this embodiment, the device can also search for images representing the main content of the first video in the first video. For example, the device acquires the image features of the fourth image in each frame, where the fourth image is the first image in the first video excluding the second image. The method for acquiring the image features of the fourth image is the same as the method for acquiring the image features of the second image, and will not be described again here.
[0074] Step S302: Cluster each frame of the fourth image according to the image features of each fourth image to obtain multiple cluster sets. The second similarity of each cluster set is greater than the first preset similarity. The second similarity is used to indicate the similarity between any two frames of the fourth image in the cluster set.
[0075] After determining each fourth image, the device clusters the fourth images in each frame based on their image features, resulting in multiple cluster sets.
[0076] For example, an image feature can be a feature vector. The device calculates the distance between any two image features. The two frames of fourth images corresponding to two image features with a distance less than a preset distance are initially treated as a set. Then, the distance between the fourth images in one set and the fourth images in another set is calculated. If the distance is less than the preset distance, the two sets are treated as a set, and the sets are then aggregated until multiple cluster sets are finally generated. The second similarity of each cluster set is greater than the first preset similarity. The second similarity is used to indicate the similarity between any two frames of fourth images in the cluster set.
[0077] Step S303: Determine the first cluster set among the various cluster sets. The image content of the fourth image in the first cluster set is used to indicate the main content of the first video.
[0078] After obtaining multiple cluster sets, the device determines a first cluster set within each cluster set. The image content of the fourth image in the first cluster set is used to indicate the main content of the first video. For example, the first video has tags used to indicate the main content. The device identifies the content of the fourth image in the cluster set. If the content matches the tag and the fourth images in the cluster set are similar images, it can be determined that the content of the fourth images in each frame of the cluster set matches the tag, and the device uses this cluster set as the first cluster set.
[0079] Step S304: Based on the image features of the fourth image in the first cluster set, determine the third similarity between the third image and each of the first cluster sets, and determine the video similar to the third image according to each of the first similarity and each of the third similarity.
[0080] After determining each first cluster set, the device determines the third similarity between the third image and each first cluster set based on the image features of the fourth image in the first cluster set.
[0081] For example, the device can calculate the distance between the image features of any fourth image in the first cluster set and the image features of the fourth image, and convert the distance into a third similarity. The smaller the distance, the greater the third similarity.
[0082] There are multiple first cluster sets, therefore the first video has multiple third similarities. The device retrieves multiple videos from the database, defining each video as a second video, and the first video is any one of the second videos. Each second video obtains multiple third similarities in the manner described above. The device finds the video with the highest similarity among all similarities (all similarities refer to both first and third similarities), and the video corresponding to the highest similarity is taken as the video similar to the third image.
[0083] In this embodiment, a cluster set indicating the main content of the video is obtained by clustering the image features of the fourth image in the video. Then, the third similarity between the images in the cluster set and the image to be retrieved is determined by the image features of the images in the cluster set. Thus, the videos similar to the image to be retrieved are determined by the first similarity and the third similarity. That is, the video is found by comparing the image with the main content image in the video and the specified image in the video. It is not necessary to compare the image with every frame of the video, which reduces the number of image comparisons and improves the efficiency of finding similar videos.
[0084] Reference Figure 4 , Figure 4 An exemplary flowchart of yet another embodiment of the video search method provided according to embodiments of the present disclosure is shown, based on... Figure 3 In the illustrated embodiment, step 304 includes:
[0085] Step S401: Determine the fourth similarity corresponding to the first video based on each first similarity and each third similarity.
[0086] In this embodiment, the first video corresponds to multiple first similarities and multiple third similarities, and the device determines the fourth similarity corresponding to the first video based on each first similarity and each third similarity.
[0087] In one example, if any first similarity or any third similarity is greater than a third preset similarity, then the first similarity or third similarity that is greater than the third preset similarity is determined as the fourth similarity corresponding to the first video.
[0088] In another example, if there are multiple first similarities and third similarities that are greater than the third preset similarity, then the largest similarity among them is found as the fourth similarity corresponding to the first video.
[0089] Step S402: Determine the target similarity from the fourth similarity and the fifth similarity corresponding to each third video. The target similarity is greater than the second preset similarity. The third video is the second video other than the first video.
[0090] The device defines the second video other than the first video as the third video, and the third video has a corresponding similarity, which is defined as the fifth similarity. The method for determining the fifth similarity is the same as the method for determining the fourth similarity, and will not be repeated here.
[0091] The device determines a target similarity among a fourth similarity and each of the fifth similarities, where the target similarity is greater than a second preset similarity. In one example, a third or fourth similarity greater than the second preset similarity can be determined as the target similarity. In another example, the largest similarity is determined among the fourth and each of the fifth similarities, and if the largest similarity is greater than the second preset similarity, then the largest similarity is determined as the target similarity.
[0092] Step S403: The first video or the third video corresponding to the target similarity is taken as the video similar to the third image.
[0093] After determining the target similarity, the device will use the first or third video corresponding to the target similarity as the video that is similar to the third image.
[0094] In this embodiment, the device determines the fourth similarity corresponding to the first video based on each first similarity and each third similarity, and determines the target similarity among the fourth similarity and the fifth similarity of each third video, thereby accurately determining the video similar to the third image based on the target similarity.
[0095] Reference Figure 5 , Figure 5 An exemplary flowchart of another embodiment of the video search method provided according to embodiments of the present disclosure is shown, based on... Figure 3 In the embodiment shown, step S304 includes:
[0096] Step S501: In response to each first similarity and each third similarity being less than or equal to a third preset similarity, a sixth similarity is determined between the third image and each second cluster set, wherein the second cluster set is a cluster set other than the first cluster set.
[0097] In this embodiment, after obtaining each third similarity, if any first similarity and any third similarity are greater than the third preset similarity, then steps S301 to S304 are executed.
[0098] If all first similarities and all third similarities are less than or equal to a third preset similarity, the device can determine a fourth similarity of the video within the cluster set indicating the secondary content of the first video. For example, the device determines a second cluster set within each cluster set; the second cluster set is a cluster set other than the first cluster set. The device determines a sixth similarity between the third image and each of the second cluster sets based on the image features of the fourth image in the second cluster set.
[0099] In one example, the device calculates the distance between the image features of any fourth image in the second cluster set and the image features of the third image to obtain the sixth similarity.
[0100] In another example, the device determines the first target image features corresponding to the second cluster set based on the image features of each fourth image in the second cluster set. For example, the first target image features can be obtained by averaging the image features of each fourth image in the second cluster set. The device then determines a sixth similarity using the first target image features and the image features of the third image.
[0101] Step S502: In response to any sixth similarity being greater than the third preset similarity, the sixth similarity being greater than the third preset similarity is determined as the fourth similarity corresponding to the first video.
[0102] If any sixth similarity is greater than the third preset similarity, then the sixth similarity that is greater than the third preset similarity is determined as the fourth similarity corresponding to the first video.
[0103] In this embodiment, when each of the first similarity and each of the third similarity is less than or equal to the third preset similarity, the fourth similarity corresponding to the first video is accurately determined based on the second cluster set indicating the secondary content of the first video.
[0104] Reference Figure 6 , Figure 6 An exemplary flowchart of another embodiment of the video search method provided according to embodiments of the present disclosure is shown, based on... Figures 3 to 5 In any of the embodiments shown, step S303 includes:
[0105] Step S601: Obtain the first number and the second number of the fourth image in the cluster set. The second number is used to indicate the number of second similarities that are greater than the fourth preset similarity. The fourth preset similarity is greater than or equal to the first preset similarity.
[0106] Step S602: Based on the first quantity and the second quantity, determine the first cluster set in each cluster set. The first quantity corresponding to the first cluster set is greater than the first preset quantity, and the second quantity corresponding to the first cluster set is greater than the second preset quantity.
[0107] In this embodiment, the device determines the number of fourth images in the cluster set, which is defined as a first number. Furthermore, the device also acquires a second number, which refers to the number of first similarities that are greater than a fourth preset similarity, and the fourth preset similarity is greater than the first preset similarity.
[0108] After acquiring the first quantity and the second quantity, the device determines whether the cluster set is the first cluster set based on the first quantity and the second quantity.
[0109] For example, the device determines whether a first quantity is greater than a first preset quantity and whether a second quantity is greater than a second preset quantity. If the first quantity is greater than the first preset quantity, the cluster set includes a large number of fourth images describing the same content, and since the main content of the first video is usually described using a large number of images, the cluster set can be initially determined as an aggregate set describing the main content. Based on the first quantity being greater than the first preset quantity, if the second quantity is greater than the second preset quantity, it can be determined that most of the images in the cluster set describe the same content, and thus the cluster set can be determined as the first cluster set. If the first quantity is less than or equal to the first preset quantity, and / or the second quantity is less than or equal to the second preset quantity, the cluster set is determined as the second cluster set, that is, determined as a cluster set describing the secondary content of the first video.
[0110] In this embodiment, the device accurately determines whether a cluster set is a first cluster set describing the main content of the first video based on the number of fourth images in the cluster set and the number of first similarities greater than the fourth preset similarity.
[0111] In one embodiment, when it is necessary to determine the third similarity between the first cluster set and the third image, the device determines the second target image features corresponding to the first cluster set based on the image features of each fourth image in the first cluster set, that is, it calculates the average value of the image features of each fourth image as the second target image feature. The device then determines the third similarity between the third image and the first cluster set based on the image features of the third image and the second target image features.
[0112] In this embodiment, the device accurately determines the third similarity between the first cluster set and the third image based on the image features of the fourth image in each frame of the first cluster set.
[0113] Reference Figure 7 , Figure 7 An exemplary flowchart of an embodiment of the video search method provided according to this disclosure is shown, based on... Figures 2 to 6 In any of the embodiments shown, step S201 includes:
[0114] Step S701: Sample the images of the first video according to a preset interval to obtain multiple frames of the fifth image.
[0115] Step S702: Use the fifth image of each frame as the first image, and obtain the target parameters of the first image of each frame.
[0116] In this embodiment, due to the massive amount of video data in the database, searching for videos similar to the third image would be extremely time-consuming. To address this, the device samples images from the first video at preset intervals to obtain multiple frames of the fifth image. For example, the preset interval is 1 second, meaning that one frame is extracted from the playing video every 1 second as the fifth image. After acquiring each frame of the fifth image, the device uses each frame of the first image as the first image and then retrieves the target parameters for each frame of the first image from the database.
[0117] In this embodiment, the device samples the first video at preset intervals to obtain multiple frames of the fifth image, and uses all the fifth images as the first image, thereby reducing the number of images in the first video that are compared with the third image and improving the search efficiency of similar videos to the third image.
[0118] Exemplary media
[0119] After introducing the methods of exemplary embodiments of this disclosure, the following references are made. Figure 7 The storage medium of the exemplary embodiments of this disclosure will be described.
[0120] refer to Figure 8As shown, the storage medium 80 stores a program product for implementing the above-described method according to an embodiment of the present disclosure. This program product may be a portable compact disc read-only memory (CD-ROM) and includes computer-executable instructions for causing a computing device to execute the video search method provided in this disclosure. However, the program product of this disclosure is not limited thereto.
[0121] The program product may employ any combination of one or more readable media. A readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (a non-exhaustive list) of readable storage media include: electrical connections having one or more wires, portable disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof.
[0122] A readable signal medium may include data signals propagated in baseband or as part of a carrier wave, carrying computer-executed instructions. Such propagated data signals may take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. A readable signal medium may also be any readable medium other than a readable storage medium.
[0123] Computer-executable instructions for performing the operations disclosed herein can be written in any combination of one or more programming languages, including object-oriented programming languages such as Java and C++, and conventional procedural programming languages such as C or similar languages. The computer-executable instructions can be executed entirely on the user's computing device, partially on the user's computing device, partially on a remote computing device, or entirely on a remote computing device or server. In cases involving remote computing devices, the remote computing devices can be connected to the user's computing device via any type of network, including a local area network (LAN) or a wide area network (WAN).
[0124] Exemplary device
[0125] Having introduced the medium of exemplary embodiments of this disclosure, the following references are made to... Figure 9 The terminal device described is an exemplary embodiment of the above-described video search method, and its implementation principle and technical effect are similar.
[0126] refer to Figure 9 , Figure 9A schematic diagram of the structure of a video search device provided according to an embodiment of the present disclosure is shown.
[0127] like Figure 9 As shown, the video search device includes: an acquisition module 910, configured to acquire target parameters of each frame of a first image in a first video, wherein the target parameters indicate at least one of the image category, application scenario, and purpose of the first image; a first determination module 920, configured to determine a second image among the various first images based on the target parameters, wherein the image category of the second image is a preset category, the application scenario of the second image is a preset application scenario, and / or the purpose of the second image is a preset purpose; a second determination module 930, configured to determine a first similarity between the third image and each frame of the second image based on the image features of the second image and the image features of the third image to be searched; and a third determination module 940, configured to determine a video similar to the third image among a plurality of second videos based on the respective first similarities, wherein the first video is any one of the second videos.
[0128] In one embodiment, the third determining module 940 includes: a first acquiring unit, configured to acquire image features of each frame of a fourth image, wherein the fourth image is a first image other than the second image; a clustering unit, configured to cluster each frame of the fourth images according to the image features of each fourth image to obtain multiple cluster sets, wherein each second similarity corresponding to a cluster set is greater than a first preset similarity, and the second similarity is used to indicate the similarity between any two frames of the fourth images in the cluster set; a first determining unit, configured to determine a first cluster set in each cluster set, wherein the image content of the fourth images in the first cluster set is used to indicate the main content of the first video; and a second determining unit, configured to determine a third similarity between a third image and each first cluster set based on the image features of the fourth images in the first cluster set, and to determine a video similar to the third image based on each first similarity and each third similarity.
[0129] In one embodiment, the second determining unit includes: a first determining subunit, configured to determine a fourth similarity corresponding to the first video based on each first similarity and each third similarity; a second determining subunit, configured to determine a target similarity among the fourth similarity and the fifth similarities corresponding to each third video, wherein the target similarity is greater than a second preset similarity, and the third video is a second video other than the first video; and a third determining subunit, configured to regard the first video or the third video corresponding to the target similarity as a video similar to the third image.
[0130] In one embodiment, the first determining unit includes: a first determining component, configured to determine the first similarity or the third similarity that is greater than the third preset similarity as the fourth similarity corresponding to the first video in response to any first similarity or any third similarity being greater than the third preset similarity.
[0131] In one embodiment, the first determining unit includes: a second determining component, configured to determine a sixth similarity between the third image and each of the second cluster sets in response to each of the first similarities and each of the third similarities being less than or equal to a third preset similarity, wherein the second cluster sets are cluster sets other than the first cluster sets; and a third determining component, configured to determine the sixth similarity greater than the third preset similarity as the fourth similarity corresponding to the first video in response to any sixth similarity being greater than the third preset similarity.
[0132] In one embodiment, the second determining component includes: a first determining module, configured to determine a first target image feature corresponding to the second cluster set based on the image features of each fourth image in the second cluster set; and a second determining module, configured to determine a sixth similarity between the third image and the second cluster set based on the image features of the third image and the first target image feature.
[0133] In one embodiment, the first determining unit includes: an acquisition subunit, configured to acquire a first number and a second number of fourth images in a cluster set, the second number indicating the number of second similarities greater than a fourth preset similarity, the fourth preset similarity being greater than or equal to a first preset similarity; and a fourth determining subunit, configured to determine a first cluster set in each cluster set based on the first number and the second number, the first number corresponding to the first cluster set being greater than a first preset number, and the second number corresponding to the first cluster set being greater than a second preset number.
[0134] In one embodiment, the second determining unit includes: a fifth determining subunit, configured to determine the second target image features corresponding to the first cluster set based on the image features of each fourth image in the first cluster set; and a sixth determining subunit, configured to determine the third similarity between the third image and the first cluster set based on the image features of the third image and the second target image features.
[0135] In one embodiment, the acquisition module 910 includes: a sampling unit, used to sample the images of the first video at a preset interval to obtain multiple frames of fifth images; and a third determination unit, used to take each frame of fifth images as a first image and obtain the target parameters of each frame of first image.
[0136] Exemplary computing device
[0137] Having described the methods, media, and apparatus of exemplary embodiments of this disclosure, the following references... Figure 10 A computing device according to an exemplary embodiment of the present disclosure will be described.
[0138] Figure 10The computing device 100 shown is merely an example and should not be construed as limiting the functionality and scope of the embodiments disclosed herein. Figure 10 As shown, the computing device 100 is presented in the form of a general-purpose computing device. The components of the computing device 100 may include, but are not limited to: at least one processing unit 1001, at least one storage unit 1002, and a bus 1003 connecting different system components (including the processing unit 1001 and the storage unit 1002). The at least one storage unit 1002 stores computer-executable instructions; the at least one processing unit 1001 includes a processor that executes the computer-executable instructions to implement the methods described above.
[0139] Bus 1003 includes a data bus, a control bus, and an address bus.
[0140] Storage unit 1002 may include readable media in the form of volatile memory, such as random access memory (RAM) 10021 and / or cache memory 10022, and may further include readable media in the form of non-volatile memory, such as read-only memory (ROM) 10023.
[0141] Storage unit 1002 may also include a program / utility 10025 having a set (at least one) program module 10024, such program module 10024 including but not limited to: operating system, one or more application programs, other program modules and program data, each or some combination of these examples may include an implementation of a network environment.
[0142] The computing device 100 can also communicate with one or more external devices 1004 (e.g., keyboard, pointing device, etc.). This communication can be performed via the input / output (I / O) interface 1005. Furthermore, the computing device 100 can also communicate with one or more networks (e.g., local area network (LAN), wide area network (WAN), and / or public networks, such as the Internet) via a network adapter 1006. Figure 10 As shown, network adapter 1006 communicates with other modules of computing device 100 via bus 1003. It should be understood that, although not shown in the figure, other hardware and / or software modules may be used in conjunction with computing device 100, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems.
[0143] It should be noted that although several units / modules or sub-units / modules of the terminal device / server are mentioned in the detailed description above, this division is merely exemplary and not mandatory. In fact, according to embodiments of this disclosure, the features and functions of two or more units / modules described above can be embodied in one unit / module. Conversely, the features and functions of one unit / module described above can be further divided and embodied by multiple units / modules.
[0144] Furthermore, although the operations of the methods disclosed herein are described in a specific order in the accompanying drawings, this does not require or imply that these operations must be performed in that specific order, or that all of the operations shown must be performed to achieve the desired result. Additionally or alternatively, certain steps may be omitted, multiple steps may be combined into one step, and / or one step may be broken down into multiple steps.
[0145] While the spirit and principles of this disclosure have been described with reference to several specific embodiments, it should be understood that this disclosure is not limited to the disclosed specific embodiments, and the division of aspects does not imply that features in these aspects cannot be combined for benefit; such division is merely for convenience of expression. This disclosure is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.
Claims
1. A video search method, characterized in that, include: Obtain target parameters for the first image in each frame of the first video, wherein the target parameters are used to indicate at least one of the image category, application scenario, and purpose of the first image; Based on the target parameters, a second image is determined from each of the first images, wherein the image category of the second image is a preset category, the application scenario of the second image is a preset application scenario, and / or the purpose of the second image is a preset purpose; Based on the image features of the second image and the image features of the third image to be retrieved, a first similarity between the third image and each frame of the second image is determined; Obtain the image features of the fourth image in each frame, wherein the fourth image is the first image other than the second image; Based on the image features of each of the fourth images, each frame of the fourth images is clustered to obtain multiple cluster sets. Each second similarity corresponding to the cluster set is greater than the first preset similarity. The second similarity is used to indicate the similarity between any two frames of the fourth images in the cluster set. A first cluster set is determined among the various cluster sets, and the image content of the fourth image in the first cluster set is used to indicate the main content of the first video; Based on the image features of the fourth image in the first cluster set, a third similarity is determined between the third image and each of the first cluster sets, and a video similar to the third image is determined based on each of the first similarities and each of the third similarities.
2. The video search method according to claim 1, characterized in that, The step of determining the video similar to the third image based on each of the first similarity scores and each of the third similarity scores includes: Based on each of the first similarity scores and each of the third similarity scores, a fourth similarity score corresponding to the first video is determined; Among the fourth similarity and the fifth similarity corresponding to each third video, a target similarity is determined. The target similarity is greater than the second preset similarity. The third video is a second video other than the first video. The first or third video corresponding to the target similarity is taken as the video similar to the third image.
3. The video search method according to claim 2, characterized in that, The step of determining the fourth similarity corresponding to the first video based on each of the first similarities and each of the third similarities includes: In response to any of the first similarity or any of the third similarity being greater than a third preset similarity, the first similarity or the third similarity being greater than the third preset similarity is determined as the fourth similarity corresponding to the first video.
4. The video search method according to claim 2, characterized in that, The step of determining the fourth similarity corresponding to the first video based on each of the first similarities and each of the third similarities includes: In response to each of the first similarity and each of the third similarity being less than or equal to a third preset similarity, a sixth similarity is determined between the third image and each of the second cluster sets, wherein the second cluster sets are cluster sets other than the first cluster sets; In response to any of the sixth similarities being greater than the third preset similarity, the sixth similarity that is greater than the third preset similarity is determined as the fourth similarity corresponding to the first video.
5. The video search method according to claim 4, characterized in that, Determining the sixth similarity between the third image and each of the second cluster sets includes: Based on the image features of each fourth image in the second cluster set, determine the first target image features corresponding to the second cluster set; Based on the image features of the third image and the features of the first target image, a sixth similarity is determined between the third image and the second cluster set.
6. The video search method according to claim 1, characterized in that, Determining the first cluster set among each of the said cluster sets includes: Obtain a first number and a second number of fourth images in the cluster set, wherein the second number indicates the number of second similarities that are greater than a fourth preset similarity, and the fourth preset similarity is greater than or equal to the first preset similarity; Based on the first quantity and the second quantity, a first cluster set is determined in each of the cluster sets, wherein the first quantity corresponding to the first cluster set is greater than a first preset quantity, and the second quantity corresponding to the first cluster set is greater than a second preset quantity.
7. The video search method according to claim 1, characterized in that, The step of determining the third similarity between the third image and each of the first cluster sets based on the image features of the fourth image in the first cluster set includes: Based on the image features of each fourth image in the first cluster set, determine the second target image features corresponding to the first cluster set; Based on the image features of the third image and the features of the second target image, a third similarity is determined between the third image and the first cluster set.
8. The video search method according to any one of claims 1-7, characterized in that, The step of obtaining the target parameters of the first image in each frame of the first video includes: The images of the first video are sampled at preset intervals to obtain multiple frames of the fifth image; Each frame of the fifth image is used as the first image, and the target parameters of the first image in each frame are obtained.
9. A video search device, characterized in that, include: The acquisition module is used to acquire target parameters of the first image in each frame of the first video, wherein the target parameters are used to indicate at least one of the image category, application scenario, and purpose of the first image; The first determining module is used to determine a second image from each of the first images according to the target parameters, wherein the image category of the second image is a preset category, the application scenario of the second image is a preset application scenario, and / or the purpose of the second image is a preset purpose; The second determining module is used to determine a first similarity between the third image and each frame of the second image based on the image features of the second image and the image features of the third image to be retrieved; The third determining module is used to determine, based on each of the first similarities, a video similar to the third image among a plurality of second videos, wherein the first video is any one of the second videos; The third determining module includes: The first acquisition unit is used to acquire the image features of the fourth image in each frame, wherein the fourth image is the first image other than the second image; A clustering unit is used to cluster each frame of the fourth image according to the image features of each fourth image to obtain multiple cluster sets. Each second similarity corresponding to the cluster set is greater than the first preset similarity. The second similarity is used to indicate the similarity between any two frames of the fourth image in the cluster set. The first determining unit is configured to determine a first cluster set among the various cluster sets, wherein the image content of the fourth image in the first cluster set is used to indicate the main content of the first video; The second determining unit is used to determine the third similarity between the third image and each of the first clusters based on the image features of the fourth image in the first cluster set, and to determine the video similar to the third image based on each of the first similarities and each of the third similarities.
10. The video search device according to claim 9, characterized in that, The second determining unit includes: The first determining subunit is used to determine the fourth similarity corresponding to the first video based on each of the first similarities and each of the third similarities; The second determining subunit is used to determine a target similarity among the fourth similarity and the fifth similarity corresponding to each third video, wherein the target similarity is greater than the second preset similarity, and the third video is a second video other than the first video; The third determining subunit is used to identify the first video or the third video corresponding to the target similarity as a video similar to the third image.
11. The video search device according to claim 10, characterized in that, The first determining unit includes: A first determining component is configured to, in response to any first similarity or any third similarity being greater than a third preset similarity, determine the first similarity or third similarity being greater than the third preset similarity as the fourth similarity corresponding to the first video.
12. The video search device according to claim 10, characterized in that, The first determining unit includes: The second determining component is configured to determine a sixth similarity between the third image and each of the first similarities and the third similarities in response to each of the first similarities and the third similarities being less than or equal to a third preset similarity, wherein the second cluster set is a cluster set other than the first cluster set. The third determining component is configured to, in response to any sixth similarity being greater than the third preset similarity, determine the sixth similarity being greater than the third preset similarity as the fourth similarity corresponding to the first video.
13. The video search device according to claim 12, characterized in that, The second determining component includes: The first determining module is used to determine the first target image features corresponding to the second cluster set based on the image features of each fourth image in the second cluster set; The second determining module is used to determine the sixth similarity between the third image and the second cluster set based on the image features of the third image and the features of the first target image.
14. The video search device according to claim 9, characterized in that, The first determining unit includes: The acquisition subunit is used to acquire a first number and a second number of fourth images in the cluster set, wherein the second number is used to indicate the number of second similarities that are greater than a fourth preset similarity, and the fourth preset similarity is greater than or equal to the first preset similarity; The fourth subunit is used to determine a first cluster set in each of the cluster sets based on the first quantity and the second quantity, wherein the first quantity corresponding to the first cluster set is greater than a first preset quantity, and the second quantity corresponding to the first cluster set is greater than a second preset quantity.
15. The video search device according to claim 9, characterized in that, The second determining unit includes: The fifth determining subunit is used to determine the second target image features corresponding to the first cluster set based on the image features of each fourth image in the first cluster set; The sixth determining subunit is used to determine the third similarity between the third image and the first cluster set based on the image features of the third image and the features of the second target image.
16. The video search device according to any one of claims 9-15, characterized in that, The acquisition module includes: The sampling unit is used to sample the images of the first video at preset intervals to obtain multiple frames of the fifth image; The third determining unit is used to treat each frame of the fifth image as a first image and to obtain the target parameters of the first image in each frame.
17. A medium, characterized in that, include: Computer execution instructions, when executed by a processor, are used to implement the method as described in any one of claims 1 to 8.
18. A computing device, characterized in that, include: Memory and processor; The memory stores computer-executed instructions; The processor executes computer execution instructions stored in the memory, causing the processor to perform the method as described in any one of claims 1 to 8.