Video search methods and apparatus, electronic devices and storage media

By segmenting and extracting features from dynamic video data, a morphological codebook is constructed, which solves the problem of searching incomplete video data and enables effective searching of fragmented video data.

CN119025710BActive Publication Date: 2026-06-30PING AN TECH (SHENZHEN) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
PING AN TECH (SHENZHEN) CO LTD
Filing Date
2024-08-20
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

In existing technologies, video data is prone to partial loss or unplayability due to network issues, preventing users from understanding insurance products through incomplete video data and hindering effective video search.

Method used

By performing video segmentation, image preprocessing, and subject-environment segmentation on dynamic video data, a subject morphology codebook is constructed. Dynamic features of video fragments are extracted, and similar features are searched in the codebook to achieve the search of video fragments.

Benefits of technology

It enables effective matching and searching of target video features even with incomplete video data, ensuring the accuracy and completeness of video search.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN119025710B_ABST
    Figure CN119025710B_ABST
Patent Text Reader

Abstract

This application provides a video search method, apparatus, electronic device, and storage medium, belonging to the fields of feature extraction and artificial intelligence technology. The method includes: performing video segmentation on dynamic video data containing a target object to obtain a sequence of motion image frames; performing image preprocessing on the motion image frame sequence to obtain a binarized image frame sequence; performing image segmentation on the binarized image frame sequence to obtain a set of subject contours; constructing a subject morphology codebook based on the subject contour set; responding to a video search request sent by a client carrying video fragment data, extracting dynamic features from the video fragment data to obtain subject dynamic features; searching for features similar to the subject dynamic features from the subject morphology codebook to obtain a set of target subject morphology features; and performing video search on the video fragment data based on the target subject morphology feature set. This application can achieve the purpose of video search using video fragment data.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the fields of feature extraction and artificial intelligence technology, and in particular to a video search method and apparatus, electronic device and storage medium. Background Technology

[0002] In the financial sector, watching introductory videos about insurance products can enhance users' understanding of these products. For example, users can increase their knowledge of car insurance by watching introductory videos.

[0003] Currently, video data can only effectively introduce insurance products when it is complete. However, due to various reasons such as network issues, video data is often incomplete or unplayable, preventing users from understanding insurance products based on incomplete video data. Therefore, how to achieve the goal of video search using fragmented video data has become an urgent technical problem to be solved. Summary of the Invention

[0004] The main objective of this application is to provide a video search method, apparatus, electronic device, and storage medium, which aims to achieve the purpose of video search using video clip data.

[0005] To achieve the above objectives, a first aspect of this application proposes a video search method applied to a server, the method comprising:

[0006] Acquire dynamic video data containing the target object, and perform video segmentation on the dynamic video data to obtain a sequence of motion image frames;

[0007] The action image frame sequence is preprocessed to obtain a binarized image frame sequence;

[0008] Perform subject environment segmentation on the binarized image frame sequence to obtain the subject contour set of the target object;

[0009] Based on the aforementioned set of main body outlines, a main body morphology codebook is constructed;

[0010] In response to a video search request sent by a client carrying video clip data, dynamic features are extracted from the video clip data to obtain the main dynamic features;

[0011] The target subject morphology feature set is obtained by searching for features similar to the dynamic features of the subject in the subject morphology code book.

[0012] Based on the set of morphological features of the target subject, a video search is performed on the video segment data.

[0013] In some embodiments, constructing a subject morphology codebook based on the subject outline set includes:

[0014] The three-dimensional Zernike moments are calculated on the set of main body contours to obtain the set of main body morphological features;

[0015] The subject morphological feature set is subjected to feature clustering to obtain subject morphological feature categories;

[0016] Based on the subject's morphological feature category, the subject's morphological codebook is generated.

[0017] In some embodiments, the set of main body contours includes multiple main body contours, and the calculation of three-dimensional Zernike moments on the set of main body contours to obtain a set of main body morphological features includes:

[0018] The coordinates of each of the main body contours are normalized to obtain normalized contour data;

[0019] Based on the normalized contour data, calculate the Zernike moment;

[0020] The Zernike moments are transformed in three dimensions to obtain the set of morphological features of the main body.

[0021] In some embodiments, the step of performing video search on the segment video data based on the target subject morphological feature set includes:

[0022] Query the dynamic video data of the target subject morphological feature set to obtain the target video data;

[0023] Obtain the number of clicks on the target video data;

[0024] Based on the number of clicks, the target video data is sorted to obtain a target video sequence.

[0025] In some embodiments, performing subject environment segmentation on the binarized image frame sequence to obtain the subject contour set of the target object includes:

[0026] The binarized image frame sequence is smoothed to obtain a smoothed image frame sequence, wherein the smoothed image frame sequence includes multiple smoothed image frames;

[0027] Gradient calculation is performed on each smoothed image frame using a preset operator to obtain a set of candidate subject contours;

[0028] The candidate subject contour set is subjected to nonmaximum suppression processing to obtain the suppressed subject contour set;

[0029] Edge connections are performed on the suppressed subject contour set to obtain the subject contour set.

[0030] In some embodiments, the subject morphology codebook includes multiple subject morphology features, and the step of searching for features similar to the subject dynamic features from the subject morphology codebook to obtain a target subject morphology feature set includes:

[0031] Calculate the feature similarity between the dynamic features of the subject and each of the morphological features of the subject;

[0032] Based on the feature similarity and the preset similarity threshold, features that are similar to the dynamic features of the subject are selected from the subject morphological features to obtain a candidate morphological feature sequence;

[0033] Based on the feature similarity, the candidate morphological feature sequences are sorted to obtain the target subject morphological feature set.

[0034] In some embodiments, the step of performing image preprocessing on the action image frame sequence to obtain a binarized image frame sequence includes:

[0035] Perform grayscale transformation on the motion image frame sequence to obtain a grayscale image frame sequence;

[0036] The grayscale image frame sequence is binarized to obtain the binarized image frame sequence.

[0037] To achieve the above objectives, a second aspect of this application provides a video search device, the device comprising:

[0038] The video segmentation module is used to acquire dynamic video data containing the target object, and to perform video segmentation on the dynamic video data to obtain a sequence of motion image frames;

[0039] The video frame binarization module is used to perform image preprocessing on the action image frame sequence to obtain a binarized image frame sequence;

[0040] The edge detection module is used to perform subject environment segmentation on the binarized image frame sequence to obtain the subject contour set of the target object;

[0041] The codebook construction module is used to construct a main body shape codebook based on the main body outline set;

[0042] The feature extraction module is used to respond to a video search request sent by the client carrying video clip data, extract the dynamic features from the video clip data, and obtain the main dynamic features.

[0043] The feature approximation search module is used to search for features that are similar to the dynamic features of the subject from the subject morphology code book, so as to obtain a set of target subject morphology features;

[0044] The video search module is used to perform video search on the video segment data based on the set of morphological features of the target subject.

[0045] To achieve the above objectives, a third aspect of this application provides an electronic device, which includes a memory and a processor. The memory stores a computer program, and the processor executes the computer program to implement the method described in the first aspect.

[0046] To achieve the above objectives, a fourth aspect of the present application provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the method described in the first aspect.

[0047] The video search method, apparatus, electronic device, and storage medium proposed in this application perform subject environment segmentation on acquired dynamic video data containing target objects to obtain a set of subject contours containing target objects. Based on this set of subject contours, a subject morphology codebook is constructed, forming a search database of video features. Furthermore, when a video search request carrying video fragment data is received from a client, the dynamic features of the subject in the video fragment data are extracted, and approximate features of the dynamic features are searched from the subject morphology codebook to obtain a set of target subject morphology features. This ensures that the video features in the incomplete video match the video features of the target video. Then, based on the set of target subject morphology features, video search is performed on the video fragment data, achieving the purpose of video search using video fragment data. Attached Figure Description

[0048] Figure 1 This is a flowchart of the video search method provided in the embodiments of this application;

[0049] Figure 2 yes Figure 1 The flowchart of step S102 in the document;

[0050] Figure 3 yes Figure 1 The flowchart of step S103 in the process;

[0051] Figure 4 yes Figure 1 The flowchart of step S104 in the process;

[0052] Figure 5 yes Figure 4 The flowchart of step S401 in the text;

[0053] Figure 6 yes Figure 1 The flowchart of step S106 in the process;

[0054] Figure 7 yes Figure 1 The flowchart of step S107 in the process;

[0055] Figure 8 This is a schematic diagram of the structure of the video search device provided in the embodiments of this application;

[0056] Figure 9 This is a schematic diagram of the hardware structure of the electronic device provided in the embodiments of this application. Detailed Implementation

[0057] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.

[0058] It should be noted that although functional modules are divided in the device schematic diagram and a logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in a different order than the module division in the device or the order in the flowchart. The terms "first," "second," etc., in the specification, claims, and the aforementioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence.

[0059] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of this application only and is not intended to limit this application.

[0060] First, let's analyze some of the terms used in this application:

[0061] Codebook: A codebook is used to define similarity metrics between different categories. For a new video to be identified, its feature vector can be compared with words in the codebook to calculate similarity. Common similarity metrics include Euclidean distance and cosine similarity. With a codebook, feature computation and matching become more efficient, allowing for the rapid identification of similar actions. The codebook transforms complex features in a video into lower-dimensional representations, simplifying subsequent classification and search processes. By continuously updating the codebook, new categories and features can be continuously incorporated, enhancing the model's adaptability.

[0062] In the financial sector, watching introductory videos about insurance products can enhance users' understanding of these products. For example, users can increase their knowledge of car insurance by watching introductory videos.

[0063] Currently, video data can only effectively introduce insurance products when it is complete. However, due to various reasons such as network issues, video data is often incomplete or unplayable, preventing users from understanding insurance products based on incomplete video data. Therefore, how to achieve the goal of video search using fragmented video data has become an urgent technical problem to be solved.

[0064] Based on this, embodiments of this application provide a video search method and apparatus, an electronic device and a storage medium, which aim to achieve the purpose of video search using video clip data.

[0065] The video search method, apparatus, electronic device, and storage medium provided in this application are specifically described through the following embodiments. First, the video search method in this application is described.

[0066] The embodiments of this application can acquire and process relevant data based on artificial intelligence technology. Artificial intelligence (AI) is the theory, method, technology, and application system that uses digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to obtain optimal results.

[0067] Foundational technologies for artificial intelligence generally include sensors, dedicated AI chips, cloud computing, distributed storage, big data processing, operating / interactive systems, and mechatronics. AI software technologies mainly encompass computer vision, robotics, biometrics, speech processing, natural language processing, and machine learning / deep learning.

[0068] The video search method provided in this application relates to the fields of feature extraction and artificial intelligence. The video search method provided in this application can be applied to a terminal, a server, or software running on either a terminal or a server. In some embodiments, the terminal can be a smartphone, tablet, laptop, desktop computer, etc.; the server can be configured as an independent physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms; the software can be an application implementing the video search method, but is not limited to the above forms.

[0069] This application can be used in a wide variety of general-purpose or special-purpose computer system environments or configurations. Examples include: personal computers, server computers, handheld or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, and distributed computing environments including any of the above systems or devices. This application can be described in the general context of computer-executable instructions executed by a computer, such as program modules. Generally, program modules include routines, programs, objects, components, data structures, etc., that perform specific tasks or implement specific abstract data types. This application can also be practiced in distributed computing environments where tasks are performed by remote processing devices connected via a communication network. In distributed computing environments, program modules can reside in local and remote computer storage media, including storage devices.

[0070] Figure 1 This is an optional flowchart of the video search method provided in the embodiments of this application. The method is applied to the server. Figure 1 The method may include, but is not limited to, steps S101 to S107.

[0071] Step S101: Obtain dynamic video data containing the target object, and perform video segmentation on the dynamic video data to obtain a sequence of motion image frames;

[0072] Step S102: Perform image preprocessing on the action image frame sequence to obtain a binarized image frame sequence;

[0073] Step S103: Perform subject environment segmentation on the binarized image frame sequence to obtain the subject contour set of the target object;

[0074] Step S104: Construct a main body morphology codebook based on the main body outline set;

[0075] Step S105: In response to the video search request sent by the client carrying video fragment data, extract the dynamic features from the video fragment data to obtain the main dynamic features;

[0076] Step S106: Search for features similar to the dynamic features of the subject from the subject morphology code book to obtain the target subject morphology feature set;

[0077] Step S107: Based on the target subject's morphological feature set, perform video search on the video segment data.

[0078] In steps S101 to S107 of this embodiment, the server performs video segmentation on the acquired dynamic video data containing the target object to obtain a sequence of motion image frames of the target object. Then, a series of image preprocessing is performed on the motion image frame sequence to obtain a binary image frame sequence containing only black and white pixels. Further, the subject and environment in the above binary image frame sequence are segmented to obtain a set of subject contours containing only the target object. Next, a subject morphology codebook of the target object is constructed based on the subject contour set. Finally, when a video search request carrying fragment video data is received from the client, feature extraction is performed on the fragment video data to obtain the subject dynamic features. Then, based on the subject dynamic features, approximate features in the above subject morphology codebook are searched to obtain a set of target subject morphology features. Finally, a video search is performed on the fragment video data based on the set of target subject morphology features. Therefore, this application performs subject environment segmentation on the acquired dynamic video data containing the target object to obtain a set of subject contours containing the target object. Then, based on the subject contour set, a subject morphology code book is constructed, which can form a search database of video features. Furthermore, when a video search request carrying video fragment data is received from the client, the dynamic features of the subject in the video fragment data are extracted, and the approximate features of the dynamic features of the subject are searched from the subject morphology code book to obtain a set of target subject morphology features. This ensures that the video features in the incomplete video match the video features of the target video. Then, based on the set of target subject morphology features, video search is performed on the video fragment data, which can achieve the purpose of video search using video fragment data.

[0079] In step S101 of some embodiments, the target object can be a moving object or target text. For example, when the dynamic video data is an introductory video for a car insurance product, the target object can be text such as "car insurance" or "vehicle insurance." When the dynamic video data is a dance video, the target object can be a moving human body. The dynamic video data can be video data in which the target object in the video continuously moves and produces changes in motion.

[0080] This application can obtain the required dynamic video data of the target object through network search, and after obtaining the dynamic video data of the target object, it can extract the frames of the dynamic video data to obtain the motion image frames of the dynamic video data, and sort the motion image frames to obtain the motion image frame sequence.

[0081] In one application scenario, taking a car insurance product introduction video as an example, the server can search for car insurance products online, obtain car insurance product videos, filter the car insurance product videos to obtain dynamic car insurance video data, then perform video segmentation on the dynamic car insurance video data to obtain each frame of car insurance image in the dynamic car insurance video data, and then sort the car insurance images to obtain a car insurance image frame sequence.

[0082] In another application scenario, taking dance videos as an example, the server searches for dances on the network, obtains dance videos, filters the dance videos to obtain dynamic dance videos in which the target object is constantly moving and changing its movements. Furthermore, by extracting the dance dynamic video frame by frame, the dance image frame sequence of the dance dynamic video can be obtained.

[0083] In step S102 of some embodiments, the motion image frame sequence is mostly a color image. The computational load for processing color images is enormous, resulting in low processing efficiency of the motion image frame sequence. Therefore, after obtaining the motion image frame sequence, a series of image preprocessing is required to convert the motion image frame sequence into a binary image frame sequence containing only melanin and white pigment.

[0084] Please see Figure 2 In some embodiments, step S102 may include, but is not limited to, steps S201 to S202:

[0085] Step S201: Perform grayscale transformation on the action image frame sequence to obtain a grayscale image frame sequence;

[0086] Step S202: Binarize the grayscale image frame sequence to obtain a binarized image frame sequence.

[0087] In step S201 of some embodiments, after obtaining the motion image frame sequence, it is first necessary to convert the color images in the motion image frame sequence into grayscale images, that is, to perform grayscale transformation on the motion image frame sequence, thereby reducing the processing difficulty of the motion image frame sequence.

[0088] In step S202 of some embodiments, after obtaining the grayscale image frame sequence, a pigment threshold is selected, and each pixel in the grayscale image frame sequence is traversed to obtain the grayscale value of each pixel. Then, according to the pigment threshold and the grayscale value, pixel judgment is performed on each pixel to realize the binarization processing of the grayscale image frame sequence and obtain the binarized image frame sequence.

[0089] Specifically, if the gray value of a pixel in the grayscale image frame sequence is greater than or equal to the pigment threshold, the pixel is set to 255 (white); if the gray value of a pixel in the grayscale image frame sequence is less than the pigment threshold, it is set to 0 (black).

[0090] In steps S201 to S202 of this embodiment, grayscale image frame sequence is obtained by performing grayscale transformation on the action image frame sequence, and then binarization processing is performed on the grayscale image frame sequence to obtain a binarized image frame sequence. This can reduce the computational difficulty of the action image frame sequence and thus improve the processing efficiency of the action image frame sequence.

[0091] In step S103 of some embodiments, after obtaining the binarized image frame sequence, the subject and environment can be segmented according to the pixel values ​​of the pixels in the binarized image frame sequence, thereby obtaining the subject contour set of the target object. It should be noted that the subject contour set includes multiple subject contours of the target object.

[0092] For details, please refer to Figure 3 In some embodiments, step S103 may include, but is not limited to, steps S301 to S304:

[0093] Step S301: Smooth the binarized image frame sequence to obtain a smoothed image frame sequence, wherein the smoothed image frame sequence includes multiple smoothed image frames.

[0094] Step S302: Use a preset operator to perform gradient calculation on each smoothed image frame to obtain a set of candidate subject contours;

[0095] Step S303: Perform nonmaximum suppression processing on the candidate subject contour set to obtain the suppressed subject contour set;

[0096] Step S304: Perform edge connection on the suppressed main contour set to obtain the main contour set.

[0097] In step S301 of some embodiments, a Gaussian filter can be used to smooth the binarized image frame sequence to obtain a smoothed image frame sequence including multiple smoothed image frames, thereby reducing the impact of noise on the binarized image frame sequence and improving the accuracy of the binarized image frame sequence.

[0098] In step S302 of some embodiments, the preset operator can be a pre-set Sobe l operator. This application uses the operator to calculate the gradient of each smoothed image frame, and then, based on the calculated gradient, it can obtain the gradient intensity and gradient direction of each pixel in the smoothed image frame. Furthermore, based on the gradient intensity and gradient direction of the pixels, it can roughly determine the contour of the target object, thus obtaining a set of candidate subject contours.

[0099] In step S303 of some embodiments, after obtaining the candidate subject contour set, non-maximum suppression is performed on the pixels on the candidate subject contours in the candidate subject contour set to further reduce the contour range of the target object and obtain the suppressed subject contour set.

[0100] In detail, this application determines whether to retain the gradient intensity of a pixel by comparing the gradient intensity of a pixel on the candidate subject contour with that of its neighboring pixels. For example, if the gradient intensity of the pixel is greater than the gradient intensity of two neighboring pixels, the gradient intensity of the pixel is retained; otherwise, the gradient intensity of the pixel is set to zero.

[0101] In step S304 of some embodiments, after obtaining the suppressed main body contour set, the pixels are classified by setting a low gradient intensity threshold and a high gradient intensity threshold, and then the gradient preservation judgment is performed on the classified pixels by edge connection, thereby obtaining the main body contour set of the target object.

[0102] In detail, a low gradient intensity threshold T1 and a high gradient intensity threshold T2 are set. Pixels with gradient intensity greater than T2 are designated as strong pixels, and pixels with gradient intensity between T1 and T2 are designated as weak pixels. The gradient intensity of pixels with gradient intensity less than T1 is suppressed to zero. Furthermore, when a weak pixel is connected to a strong pixel, the gradient intensity of the weak pixel is considered a valid gradient intensity and is retained. If the weak pixel is not connected to a strong pixel, its gradient intensity is suppressed to zero. Finally, edge connections are performed based on the pixels with retained gradient intensity to obtain the main contour set of the target object.

[0103] In steps S301 to S304 of this embodiment, by smoothing, calculating gradients, performing nonmaximum suppression, and connecting edges on the binarized image frame sequence, a set of main contours of the target object is obtained, which can improve the accuracy of the main contours and thus avoid feature extraction errors caused by incorrect main contour segmentation.

[0104] In step S104 of some embodiments, the subject morphology code book can be a collection of representative subject morphology features of the target object. The subject morphology features can be used to encode the video data input by the client, thereby achieving efficient representation and classification of different categories of subject morphology features. It should be noted that the subject morphology code book includes multiple subject morphology features.

[0105] This application classifies the features of the target object based on the set of the main outlines of the target object, and then selects representative main morphological features from each category to fill them into a pre-constructed codebook, thus obtaining the main morphological codebook.

[0106] For details, please refer to Figure 4 In some embodiments, step S202 may include, but is not limited to, steps S401 to S403:

[0107] Step S401: Perform three-dimensional Zernike moments calculation on the main body contour set to obtain the main body morphological feature set;

[0108] Step S402: Perform feature clustering on the set of main body morphological features to obtain the main body morphological feature categories;

[0109] Step S403: Generate a subject morphology codebook based on the subject morphology feature category.

[0110] In step S401 of some embodiments, the main morphological features of each main contour are obtained by calculating the three-dimensional Zernike moment representation of each main contour in the main contour set, and then the main morphological features of each main contour are sorted and integrated to obtain the main morphological feature set.

[0111] Please see Figure 5 In some embodiments, step S401 may include, but is not limited to, steps S501 to S503:

[0112] Step S501: Perform coordinate normalization on each main contour to obtain normalized contour data;

[0113] Step S502: Calculate the Zernike moment based on the normalized contour data;

[0114] Step S503: Perform a three-dimensional transformation on the Zernike moments to obtain the set of main body morphological features.

[0115] In step S501 of some embodiments, the coordinates of each main contour are normalized using a normalization formula to obtain normalized contour data, so as to map each main contour into the unit circle and make the main contour adapt to the calculation requirements of Zernike moments.

[0116] In step S502 of some embodiments, after obtaining the normalized contour data, the normalized contour data is substituted into the calculation formula of the Zernike moment to obtain the Zernike moment of the normalized contour data.

[0117] It is important to know that the data required to calculate the Zernike moment includes the gray values ​​of the normalized contour image (which can be represented using a binarized image).

[0118] In step S503 of some embodiments, the three-dimensional Zernike moments can represent the main morphological features of the target object within the main outline. Therefore, by converting the above Zernike moments into three-dimensional Zernike moments, the set of main morphological features of the target object can be obtained.

[0119] In steps S501 to S503 of this embodiment, the coordinates of the main contours in the main contour set are normalized to obtain normalized contour data. Then, the Zernike moments of each normalized contour data are calculated, and finally, the Zernike moments are transformed into three-dimensional Zernike moments. This allows for the acquisition of a set of main morphological features of the target object, thereby realizing the feature extraction of the target object within the main contour. The extracted main morphological features can effectively describe the shape and features of the target object within the main contour.

[0120] In step S402 of some embodiments, the subject morphology feature set is classified by performing feature clustering on the subject morphology feature set to obtain the subject morphology feature category.

[0121] It should be noted that this application can perform feature clustering on the subject morphological features in the subject morphological feature set using clustering methods such as K-means clustering, hierarchical clustering, and DBSCAN.

[0122] In step S403 of some embodiments, representative subject morphological features are selected from each subject morphological feature category based on the clustering and written into a pre-constructed codebook, thereby generating a subject morphological codebook.

[0123] In steps S401 to S403 of this embodiment, the three-dimensional Zernike moments of the subject contour set are calculated to obtain the subject morphological feature set. Then, based on the subject morphological feature set, the subject morphological features are clustered to determine the subject morphological feature categories. Finally, based on the subject morphological feature categories, representative subject morphological features are selected and stored in a pre-constructed codebook to obtain the subject morphological codebook. This realizes the construction of a feature catalog for complete dynamic video data, thereby providing a search basis for video search of incomplete videos.

[0124] In step S105 of some embodiments, the client may be a terminal that inputs video clip data to perform a video search. The video clip data may be incomplete video data, such as a segment of a dance video, a segment of an insurance product introduction video, etc.

[0125] After receiving a video search request carrying video fragment data from a client, the server in this application can obtain the main dynamic features of the video fragment data by extracting features from the video fragment data.

[0126] In step S106 of some embodiments, after obtaining the dynamic features of the subject in the video segment data, subject morphology features similar to the dynamic features are selected from the subject morphology codebook, and subject morphology features with high similarity are used as target subject morphology features to form a target subject morphology feature set.

[0127] Please see Figure 6 In some embodiments, step S106 may include, but is not limited to, steps S601 to S603:

[0128] Step S601: Calculate the feature similarity between the dynamic features of the main body and each morphological feature of the main body;

[0129] Step S602: Based on feature similarity and a preset similarity threshold, select features that are similar to the dynamic features of the subject from the subject's morphological features to obtain a candidate morphological feature sequence.

[0130] Step S603: Based on feature similarity, sort the candidate morphological feature sequences to obtain the target subject morphological feature set.

[0131] In step S601 of some embodiments, the dynamic features of the subject are compared with each morphological feature of the subject, and the similarity between the dynamic features of the subject and each morphological feature of the subject is calculated using a similarity algorithm to obtain the feature similarity between the morphological features of the subject and the dynamic features of the subject.

[0132] In step S602 of some embodiments, the preset similarity threshold can be the minimum feature similarity allowed in the pre-set candidate morphological feature sequence. It should be noted that the range of the preset similarity threshold in this application can be 75%-100%. By comparing the feature similarity between the subject morphological features and the subject dynamic features with the similarity threshold, subject morphological features with feature similarity greater than the similarity threshold can be selected from the subject morphological feature set, and the subject morphological features that meet the conditions can be stored in the pre-constructed candidate feature sequence to obtain the candidate morphological feature sequence.

[0133] In step S603 of some embodiments, after obtaining the candidate morphological feature sequence, the main morphological features in the candidate morphological feature sequence are sorted from largest to smallest according to feature similarity, so as to obtain the target main morphological feature set.

[0134] In steps S601 to S603 of this embodiment, the feature similarity between the dynamic features of the subject and each morphological feature of the subject is calculated using a similarity calculation formula. Then, according to a pre-set similarity threshold, candidate morphological feature sequences with feature similarity greater than the similarity threshold are selected from the morphological features of the subject. Finally, the candidate morphological feature sequences are sorted according to feature similarity to obtain a set of target morphological features. This enables feature lookup of the dynamic features of the subject in the morphological codebook, thereby finding the morphological features of the subject that are closest to the dynamic features of the subject in the video clip data, thus improving the accuracy of video search.

[0135] In step S107 of some embodiments, after obtaining the target subject morphological feature set, video data is selected from the dynamic video data as the target video of the segment video data according to the target subject morphological feature set, thereby realizing video search of the segment video data.

[0136] For details, please refer to Figure 7 In some embodiments, step S107 may include, but is not limited to, steps S701 to S703:

[0137] Step S701: Query the dynamic video data of the target subject's morphological feature set to obtain the target video data;

[0138] Step S702: Obtain the number of clicks on the target video data;

[0139] Step S703: Sort the target video data based on the number of clicks to obtain the target video sequence.

[0140] In step S701 of some embodiments, the target video data can be obtained by querying the video data corresponding to the set of morphological features of the target subject in the dynamic video data.

[0141] It is important to know that the target video data can be one or more.

[0142] In steps S702 and S703 of some embodiments, the number of clicks can be the number of times the target video data is clicked by the user to play. In order to improve the watchability of the video data found by the segment video data search, it is often necessary to sort the searched target video data. Specifically, the sorting method of the target video data can be based on dimensions such as the number of plays, the number of clicks, and the video duration.

[0143] This application can crawl the click count of target video data using a web crawler, and then determine the sorting of the target video data based on the click count, thereby achieving the purpose of video search using video clip data.

[0144] In steps S701 to S703 of this embodiment, target video data is obtained by querying the video data corresponding to the target subject morphological feature set in the dynamic video data. Then, the target video data is sorted according to the number of clicks on the target video data to obtain the target video sequence, which can improve the watchability of the target video data and improve the user's experience comfort.

[0145] This application segments the acquired dynamic video data containing the target object into a subject environment to obtain a set of subject outlines containing the target object. Based on the subject outline set, a subject morphology code book is constructed, forming a search database of video features. Furthermore, when a video search request carrying video fragment data is received from a client, the dynamic features of the subject in the video fragment data are extracted, and approximate features of the dynamic features are searched from the subject morphology code book to obtain a set of target subject morphology features. This ensures that the video features in the incomplete video match the video features of the target video. Then, based on the set of target subject morphology features, video search is performed on the video fragment data, thus achieving the purpose of video search using video fragment data.

[0146] Please see Figure 8 This application also provides a video search device that can implement the above-described video search method. The device includes:

[0147] The video segmentation module 801 is used to acquire dynamic video data containing the target object and to perform video segmentation on the dynamic video data to obtain a sequence of motion image frames.

[0148] The video frame binarization module 802 is used to perform image preprocessing on the action image frame sequence to obtain a binarized image frame sequence;

[0149] Edge detection module 803 is used to perform subject environment segmentation on the binarized image frame sequence to obtain the subject contour set of the target object;

[0150] The code book construction module 804 is used to construct a main body shape code book based on the main body outline set;

[0151] Feature extraction module 805 is used to extract dynamic features from the video fragment data in response to a video search request sent by the client, thereby obtaining the main dynamic features.

[0152] The feature approximation search module 806 is used to search for approximate features of the dynamic features of the subject from the subject morphology code book to obtain a set of target subject morphology features;

[0153] The video search module 807 is used to perform video search on the video segment data based on the set of morphological features of the target subject.

[0154] The specific implementation of this video search device is basically the same as the specific implementation of the video search method described above, and will not be repeated here.

[0155] This application also provides an electronic device, which includes a memory and a processor. The memory stores a computer program, and the processor executes the computer program to implement the aforementioned video search method. This electronic device can be any smart terminal, including tablet computers, in-vehicle computers, etc.

[0156] Please see Figure 9 , Figure 9 The hardware structure of an electronic device according to another embodiment is illustrated. The electronic device includes:

[0157] The processor 901 can be implemented using a general-purpose CPU (Central Processing Unit), microprocessor, application-specific integrated circuit (ASIC), or one or more integrated circuits, and is used to execute relevant programs to implement the technical solutions provided in the embodiments of this application.

[0158] The memory 902 can be implemented as a read-only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM). The memory 902 can store the operating system and other applications. When the technical solutions provided in the embodiments of this specification are implemented through software or firmware, the relevant program code is stored in the memory 902 and is called and executed by the processor 901 using the video search method of the embodiments of this application.

[0159] The input / output interface 903 is used to implement information input and output;

[0160] The communication interface 904 is used to enable communication and interaction between this device and other devices. Communication can be achieved through wired means (such as USB, Ethernet cable, etc.) or wireless means (such as mobile network, WIFI, Bluetooth, etc.).

[0161] Bus 905 transmits information between various components of the device (e.g., processor 901, memory 902, input / output interface 903, and communication interface 904);

[0162] The processor 901, memory 902, input / output interface 903, and communication interface 904 are connected to each other within the device via bus 905.

[0163] This application also provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the video search method described above.

[0164] Memory, as a non-transitory computer-readable storage medium, can be used to store non-transitory software programs and non-transitory computer-executable programs. Furthermore, memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory may optionally include memory remotely located relative to the processor, and these remote memories can be connected to the processor via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

[0165] The video search method, video search device, electronic device, and storage medium provided in this application embodiment segment the acquired dynamic video data containing the target object to obtain a set of subject contours containing the target object. Then, based on the subject contour set, a subject morphology code book is constructed to form a search database of video features. Furthermore, when a video search request carrying fragment video data is received from a client, the dynamic features of the subject in the fragment video data are extracted, and approximate features of the dynamic features of the subject are searched from the subject morphology code book to obtain a set of target subject morphology features. This ensures that the video features in the incomplete video match the video features of the target video. Then, based on the set of target subject morphology features, video search is performed on the fragment video data, thereby achieving the purpose of video search using fragment video data.

[0166] The embodiments described in this application are for the purpose of more clearly illustrating the technical solutions of the embodiments of this application, and do not constitute a limitation on the technical solutions provided by the embodiments of this application. As those skilled in the art will know, with the evolution of technology and the emergence of new application scenarios, the technical solutions provided by the embodiments of this application are also applicable to similar technical problems.

[0167] Those skilled in the art will understand that the technical solutions shown in the figures do not constitute a limitation on the embodiments of this application, and may include more or fewer steps than shown, or combine certain steps, or different steps.

[0168] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs.

[0169] Those skilled in the art will understand that all or some of the steps in the methods disclosed above, as well as the functional modules / units in the systems and devices, can be implemented as software, firmware, hardware, or suitable combinations thereof.

[0170] The terms “first,” “second,” “third,” “fourth,” etc. (if present) in the specification and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms “comprising” and “having,” and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.

[0171] It should be understood that in this application, "at least one (item)" means one or more, and "more than" means two or more. "And / or" is used to describe the relationship between related objects, indicating that three relationships can exist. For example, "A and / or B" can represent three cases: only A exists, only B exists, and both A and B exist simultaneously, where A and B can be singular or plural. The character " / " generally indicates that the preceding and following related objects are in an "or" relationship. "At least one (item) of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one (item) of a, b, or c can represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", where a, b, and c can be single or multiple.

[0172] In the several embodiments provided in this application, it should be understood that the disclosed apparatus and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of the units described above is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. The coupling or direct coupling or communication connection between the shown or discussed units may be through some interfaces, or indirect coupling or communication connection between the apparatus or units, and may be electrical, mechanical, or other forms.

[0173] The units described above as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0174] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.

[0175] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes multiple instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of this application. The aforementioned storage medium includes various media capable of storing programs, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0176] The preferred embodiments of the present application have been described above with reference to the accompanying drawings, but this does not limit the scope of the claims of the present application. Any modifications, equivalent substitutions, and improvements made by those skilled in the art without departing from the scope and substance of the embodiments of the present application shall be within the scope of the claims of the present application.

Claims

1. A video search method characterized by, Applied to the server side, the method includes: Acquire dynamic video data containing the target object, and perform video segmentation on the dynamic video data to obtain a sequence of motion image frames; The action image frame sequence is preprocessed to obtain a binarized image frame sequence; The binarized image frame sequence is segmented into a subject environment to obtain a set of subject contours of the target object, wherein the set of subject contours includes multiple subject contours; Based on the aforementioned set of main body outlines, a main body morphology codebook is constructed; In response to a video search request sent by a client carrying video clip data, dynamic features are extracted from the video clip data to obtain the main dynamic features; The target subject morphology feature set is obtained by searching for features similar to the dynamic features of the subject in the subject morphology code book. Based on the set of morphological features of the target subject, a video search is performed on the video segment data; The construction of the main body morphology codebook based on the main body contour set includes: The three-dimensional Zernike moments are calculated on the set of main body contours to obtain the set of main body morphological features; The subject morphological feature set is subjected to feature clustering to obtain subject morphological feature categories; Based on the subject's morphological feature category, the subject's morphological codebook is generated; The step of performing three-dimensional Zernike moment calculation on the main body contour set to obtain the main body morphological feature set includes: The coordinates of each of the main body contours are normalized to obtain normalized contour data; Based on the normalized contour data, calculate the Zernike moment; The Zernike moments are transformed in three dimensions to obtain the set of morphological features of the main body.

2. The method of claim 1, wherein, The step of performing video search on the video segment data based on the target subject morphological feature set includes: Query the dynamic video data of the target subject morphological feature set to obtain the target video data; Obtain the number of clicks on the target video data; Based on the number of clicks, the target video data is sorted to obtain a target video sequence.

3. The method of claim 1, wherein, The step of performing subject environment segmentation on the binarized image frame sequence to obtain the subject contour set of the target object includes: The binarized image frame sequence is smoothed to obtain a smoothed image frame sequence, wherein the smoothed image frame sequence includes multiple smoothed image frames; Gradient calculation is performed on each smoothed image frame using a preset operator to obtain a set of candidate subject contours; The candidate subject contour set is subjected to nonmaximum suppression processing to obtain the suppressed subject contour set; Edge connections are performed on the suppressed subject contour set to obtain the subject contour set.

4. The method according to any one of claims 1 to 3, characterized in that, The subject morphology codebook includes multiple subject morphology features. The step of searching the subject morphology codebook for features similar to the subject dynamic features yields a target subject morphology feature set, including: Calculate the feature similarity between the dynamic features of the subject and each of the morphological features of the subject; Based on the feature similarity and the preset similarity threshold, features that are similar to the dynamic features of the subject are selected from the subject morphological features to obtain a candidate morphological feature sequence; Based on the feature similarity, the candidate morphological feature sequences are sorted to obtain the target subject morphological feature set.

5. The method according to any one of claims 1 to 3, characterized in that, The step of preprocessing the action image frame sequence to obtain a binarized image frame sequence includes: Perform grayscale transformation on the motion image frame sequence to obtain a grayscale image frame sequence; The grayscale image frame sequence is binarized to obtain the binarized image frame sequence.

6. A video search apparatus applied to the video search method according to any one of claims 1-5, characterized in that, The device includes: The video segmentation module is used to acquire dynamic video data containing the target object, and to perform video segmentation on the dynamic video data to obtain a sequence of motion image frames; The video frame binarization module is used to perform image preprocessing on the action image frame sequence to obtain a binarized image frame sequence; The edge detection module is used to perform subject environment segmentation on the binarized image frame sequence to obtain the subject contour set of the target object; The codebook construction module is used to construct a main body shape codebook based on the main body outline set; The feature extraction module is used to respond to a video search request sent by the client carrying video fragment data, extract the dynamic features from the video fragment data, and obtain the main dynamic features; The feature approximation search module is used to search for features that are similar to the dynamic features of the subject from the subject morphology code book, so as to obtain a set of target subject morphology features; The video search module is used to perform video search on the video segment data based on the set of morphological features of the target subject.

7. An electronic device, comprising: The electronic device includes a memory and a processor, the memory storing a computer program, and the processor executing the computer program to implement the video search method according to any one of claims 1 to 5.

8. A computer-readable storage medium storing a computer program, the computer-readable storage medium comprising: When the computer program is executed by the processor, it implements the video search method according to any one of claims 1 to 5.