A portrait instance tracking method, device, equipment and medium
By using a Kalman filter to predict the location of a person and segment the target person when the device malfunctions, the problem of missing person images is solved, and the accuracy of person instance tracking is improved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HISENSE GRP HLDG CO LTD
- Filing Date
- 2021-09-27
- Publication Date
- 2026-06-19
AI Technical Summary
In existing technologies, since the human face segmentation and tracking processes are independent, equipment failures may lead to the loss of human faces, reducing the accuracy of human face instance tracking.
If the target human image is not segmented in no more than a preset number of image frames, a preset number of image frames prior to the acquisition time of the first image frame are acquired, the human image position is predicted using a Kalman filter, and the target human image is segmented using a human image segmentation model.
Even in the event of equipment failure, it can still acquire lost human images, improving the accuracy of human image instance tracking.
Smart Images

Figure CN115880329B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of image processing technology, and in particular to a method, apparatus, device and medium for tracking human portrait instances. Background Technology
[0002] With the development of technology, users have higher and higher requirements for image processing technology, and there are more and more ways to replace the background in a video with other backgrounds. Among them, when replacing the background in a video other than the human figure with other backgrounds, the usual method is to track the human figure in the video, segment the human figure, and then apply the segmented human figure to the human figure background replacement.
[0003] However, in existing technologies, when tracking people in videos, person detection and segmentation are two independent processes. Person segmentation involves determining the bounding box containing the person using a model, and then segmenting the person from within that bounding box. Person tracking, on the other hand, involves determining the bounding box using a model, then identifying the target person based on the similarity of features between the bounding boxes in adjacent image frames, and determining the image frames containing that target person. Finally, the target people segmented from these image frames are sorted according to their acquisition time, and the sorted result is used to determine the person tracking result. However, during person segmentation, equipment malfunctions may prevent the target person from being segmented from the image frames containing it, resulting in missing people in the final person tracking result and reducing the accuracy of the tracking. Summary of the Invention
[0004] This application provides a method, apparatus, device, and medium for human image instance tracking, which solves the problem in the prior art where equipment failure during human image segmentation results in the failure to segment the target human image from an image frame containing the target human image, leading to the loss of the human image in the obtained human image instance tracking results and low accuracy of human image instance tracking.
[0005] Firstly, this application provides a method for tracking human faces, the method comprising:
[0006] If it is found that the target human image is not segmented in the first image frames that do not exceed the first preset number, then the second preset number of second image frames acquired before the acquisition time of the first image frame are acquired.
[0007] Based on the first position information of the first target portrait frame corresponding to the target portrait in the second image frame, predict the second position information of the second target portrait frame corresponding to the target portrait in the first image frame;
[0008] The target human image is segmented from the first image frame based on the second location information.
[0009] Secondly, this application also provides a human face instance tracking device, the device comprising:
[0010] The acquisition module is used to acquire a second preset number of second image frames acquired before the acquisition time of the first image frame if it is detected that the target human image has not been segmented in no more than a first preset number of first image frames.
[0011] The preset module is used to predict the second position information of the second target image frame corresponding to the target image in the first image frame based on the first position information of the first target image frame corresponding to the target image in the second image frame;
[0012] The portrait segmentation module is used to segment the target portrait from the first image frame based on the second location information.
[0013] Thirdly, this application also provides an electronic device, which includes at least a processor and a memory, wherein the processor is configured to execute a computer program stored in the memory to implement the steps of any of the above-described human image instance tracking methods.
[0014] Fourthly, this application also provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the steps of any of the above-described human image instance tracking methods.
[0015] In this application, if a target human image is not segmented within a first preset number of first image frames, a second preset number of second image frames acquired before the acquisition time of the first first image frame are obtained. Based on the first position information of the first target human image frame corresponding to the target human image in the second image frame, the second position information of the second target human image frame corresponding to the target human image in the first image frame is predicted. Based on the second position information, the target human image is segmented from the first image frame. In this application, if a target human image is not segmented within a preset number of first image frames, it is considered that a human image was missed or missegmented during human image detection and segmentation. At this time, the second position information of the target human image in the first image frame can be predicted based on the first position information of the target human image in the second preset number of second image frames acquired before the acquisition time of the first first image frame, and the target human image is segmented from the first image frame. This achieves the goal of obtaining the missing human image even if the device malfunctions during human image segmentation and the target human image is not segmented from the image frame containing the target human image, thus obtaining a complete human image instance tracking result and improving the accuracy of human image instance tracking. Attached Figure Description
[0016] To more clearly illustrate the technical solutions of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0017] Figure 1 A flowchart illustrating the process of human face tracking for related technologies;
[0018] Figure 2 A schematic diagram of a human image instance tracking process provided for some embodiments of this application;
[0019] Figure 3 A schematic diagram of the human image tracking device provided in this application;
[0020] Figure 4 This is a schematic diagram of an electronic device structure provided in this application. Detailed Implementation
[0021] To make the objectives, technical solutions, and advantages of this application clearer, the application will be further described in detail below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0022] Figure 1 A flowchart illustrating the human instance tracking process for related technologies, the process including:
[0023] S101: Obtain image frames from the video to be used for human portrait instance tracking.
[0024] S102: Input each image frame into the encoder-decoder and receive the feature map corresponding to the image frame output by the encoder-decoder.
[0025] S103: Input the feature map into the target portrait detection network to determine the target portrait bounding box corresponding to the target portrait.
[0026] S104: Segment the human portrait frame with the highest confidence from the image frame, and input the segmented human portrait frame into the human portrait segmentation model. The human portrait segmentation model generates an instance mask for the human portrait frame to obtain the mask image corresponding to the human portrait frame, and segments the human portrait from the mask image through the semantic segmentation network.
[0027] S105: Determine the image frame containing the target human image.
[0028] S106: Based on the determined image frame containing the target human image and the target human image segmented from the image frame, obtain the human image instance segmentation result.
[0029] In human image tracking, each image frame from the video to be tracked is acquired and input into the encoder and decoder to determine the corresponding feature map. The encoder is a series of convolutional networks, including at least convolutional layers, pooling layers, and batch normalization (BN) layers. The convolutional layers acquire local features of the image frame, the pooling layers downsample the image frame and pass scale-invariant features to the next layer, and the BN layers normalize the distribution of the image frame. In other words, the encoder classifies and analyzes the low-level local pixel values of the image frame to obtain higher-order semantic information, such as "human image." The decoder upsamples the reduced feature map and then performs convolution processing on the upsampled feature map to refine the geometry of the object in the feature map, reducing the detail loss caused by downsampling in the encoder's pooling layers, and simultaneously obtaining a feature map of the same size as the image frame. Then, the feature map corresponding to the image frame is input into the human face detection model, which can be, for example, Mask or R-CNN. The human face detection model outputs an image frame containing a human face frame and outputs the confidence score corresponding to the human face frame.
[0030] The highest-confidence human portrait frames are segmented from the image frames and input into the human portrait segmentation model. The human portrait segmentation model generates an instance mask for the human portrait frame to obtain the mask image corresponding to the human portrait frame. The human portrait is then segmented from the mask image through a semantic segmentation network.
[0031] Furthermore, the highest-confidence human frame in each image frame of the video is identified and input into the target tracking network. This network transforms the size of the human frames to ensure consistency across all frames. Features are then extracted from the human frame corresponding to the first image frame, and an encoding is assigned to that frame. The similarity between the features in the human frame corresponding to the second image frame and those in the first image frame is compared. If the similarity is not lower than a preset similarity threshold, the encoding of the second image frame is set to that of the first image frame. If the similarity is lower than the preset threshold, a different encoding is assigned to the second image frame, and so on, until all image frames are encoded. The target encoding containing the most image frames is identified, and the human images segmented from these target image frames are identified as the target human images to be tracked. The segmented target human images are arranged according to the acquisition time of the target image frames, and the arrangement result is determined as the human image instance segmentation result.
[0032] In this application, if it is found that the target human image has not been segmented in no more than a first preset number of first image frames, then a second preset number of second image frames acquired before the acquisition time of the first image in the no more than first preset number of first image frames are obtained. Based on the first position information of the first target human image frame corresponding to the target human image in the second image frame, the second position information of the second target human image frame corresponding to the target human image in the first image frame is predicted. Based on the second position information, the target human image is segmented from the first image frame.
[0033] In order to obtain a complete image instance tracking result even when the target image is not segmented from an image frame containing the target image, thereby improving the accuracy of image instance tracking, this application provides an image instance tracking method, apparatus, device and medium.
[0034] Figure 2 A schematic diagram of a human face instance tracking process provided for some embodiments of this application, the process including:
[0035] S201: If it is found that the target human image has not been segmented in the first image frames that do not exceed the first preset number, then the second preset number of second image frames acquired before the acquisition time of the first image frame are obtained.
[0036] The present application provides a method for tracking human images, which can be applied to electronic devices such as image acquisition devices, PCs, and servers.
[0037] In this application, since the images contained in the image frames may be different, the images segmented from the image frames may not be the same person. When performing image instance tracking, the image is segmented from each image frame, the image with the most corresponding image frames is identified, and the image is then tracked. Therefore, it is necessary to determine the image frame corresponding to the image with the most occurrences and to track it.
[0038] If the electronic device detects a first image frame in which the target person's image has not been segmented, it obtains the number of such first image frames. If the number of such first image frames does not exceed a first preset number, it is considered that the electronic device has malfunctioned, resulting in the failure to segment the target person's image from the first image frames not exceeding the first preset number. In this case, the position of the target person's image in each first image frame can be predicted. In this application, the first image frames not exceeding the first preset number are consecutive image frames in the video.
[0039] Specifically, if the electronic device detects that the target human image has not been segmented in no more than a first preset number of first image frames, it acquires a second preset number of second image frames acquired before the acquisition time of the first first image frame, so as to predict the position of the target human image in each first image frame based on the second preset number of second image frames.
[0040] Furthermore, in this application, if the number of first image frames from which the target image is not segmented exceeds a first preset number, it is considered that the failure to segment the target image from the first image frame is not due to a malfunction of the electronic device, but rather because the target image does not actually exist in the first image frame.
[0041] S202: Based on the first position information of the first target portrait frame corresponding to the target portrait in the second image frame, predict the second position information of the second target portrait frame corresponding to the target portrait in the first image frame.
[0042] In this application, after determining the second preset number of second image frames acquired before the acquisition time of the first image frame, it is necessary to determine the target human image in the first image frame based on these second image frames. In this application, when determining the target human image in the first image frame based on the second image frames, the position information of the target human image frame corresponding to the target human image in each first image frame is determined.
[0043] Specifically, in this application, after determining a second preset number of second image frames, the first position information of the first target person frame corresponding to the target person image in each second image frame is determined, resulting in a second preset number of first position information. This second preset number of first position information is then input into a Kalman filter, causing the Kalman filter to predict the second position information of the target person frame corresponding to the target person image in each second image frame based on the first position information of the first target person frame corresponding to the target person image in each second image frame. Specifically, in this application, the Kalman filter calculates the velocity change of the first target person frame in the X direction of the second image frames in the second preset number of second image frames based on the input second preset number of first position information, obtaining a function of the velocity change in the X direction. It also determines the velocity change of the first target person frame in the Y direction of the second image frames, obtaining a function of the velocity change in the Y direction. Based on these functions of velocity change in the X and Y directions, the second position information corresponding to the second target person frame in each first image frame is predicted.
[0044] S203: Based on the second location information, segment the target human image from the first image frame.
[0045] In this application, after determining the second position information of the second target image frame corresponding to the target image in each first image frame where the target image has not been segmented, for each first image frame, the electronic device determines the second target image frame in the first image frame based on the second position information corresponding to the second target image frame in the first image frame, and then segments the target image from the second target image frame.
[0046] In this application, if the target human image is not segmented in no more than a preset number of image frames, it is considered that the human image was missed or missed during human image detection and segmentation. At this time, the second position information of the target human image in the first image frame can be predicted based on the first position information of the target human image in the second preset number of second image frames acquired before the acquisition time of the first image frame, and the target human image can be segmented from the first image frame. This enables the acquisition of the lost human image even if the device malfunctions during human image segmentation and the target human image is not segmented from the image frame containing the target human image, thus obtaining a complete human image instance tracking result and improving the accuracy of human image instance tracking.
[0047] To improve the accuracy of the predicted second location information, based on the above embodiments, in this application, after predicting the second location information corresponding to the target portrait in the first image frame, the method further includes:
[0048] Obtain the first third image frame acquired after the acquisition time of the last first image frame;
[0049] Determine the deviation between the third position information of the third target portrait frame corresponding to the target portrait in the third image frame and the second position information of the second target portrait frame in the last first image frame;
[0050] If the deviation exceeds a preset difference threshold, the second position information in the first image frames, which does not exceed a first preset number, is adjusted according to the deviation.
[0051] In this application, since the second position information of the second target human image frame in each first image frame is predicted based on the position information of the target human image in the second image frame, and the second image frame is acquired before the acquisition time of the first first image frame, the second position information may deviate significantly from the third position information of the third target human image frame corresponding to the target human image in the first third image frame acquired after the acquisition time of the last first image frame. In this case, it can be considered that the prediction of the second position information in each first image frame is inaccurate.
[0052] To further improve the accuracy of the second position information prediction, in this application, after predicting the second position information in each first image frame by no more than a preset number, the accuracy of the second position information in each first image frame is judged based on the first third image frame acquired after the acquisition time of the last first image frame, and the judgment result is used to determine whether to adjust the second position information in each first image frame.
[0053] Specifically, in this application, after predicting the second position information corresponding to the second target person frame in the first image frame, the first third image frame acquired after the acquisition time of the last first image frame is obtained, and the third position information of the third target person frame corresponding to the target person in the third image frame is determined. The deviation between the third position information and the second position information of the second target person frame in the last first image frame is determined. If the deviation exceeds a preset difference threshold, the predicted second position information in each first image frame is considered inaccurate, and the predicted second position information in no more than a first preset number of first image frames is adjusted according to the deviation.
[0054] Furthermore, in this application, if the deviation between the third position information in the third image frame and the second position information in the last first image frame does not exceed a preset difference threshold, then the predicted second position information in each first image frame is considered accurate, and there is no need to adjust the second position information in each first image frame.
[0055] To improve the accuracy of the predicted second location information, based on the above embodiments, in this application, adjusting the second location information in no more than a first preset number of first image frames according to the deviation includes:
[0056] Based on the number of the first image frames, the deviation is divided into a corresponding number of sub-deviations, wherein the sum of the sub-deviations is the deviation;
[0057] The second position information in the first image frame is adjusted according to each of the sub-deviations.
[0058] In this application, if the deviation between the third position information in the determined third image frame and the second position information in the last first image frame exceeds a preset difference threshold, then the predicted second position information in each first image frame is considered inaccurate, and the predicted second position information in no more than a first preset number of first image frames needs to be adjusted according to the deviation.
[0059] Specifically, in this application, when adjusting the second position information in no more than a first preset number of first image frames, the deviation is divided into a corresponding number of sub-deviations based on the number of first image frames. The sum of these corresponding number of sub-deviations is the deviation. Then, the second position information in each first image frame is adjusted according to the sub-deviation. In this application, the value of each sub-deviation can be the same or different.
[0060] Furthermore, in this application, the second position information is the coordinate information of the four vertices of the second target portrait frame in the first image frame, and the third position information is the coordinate information of the four vertices of the third target portrait frame in the third image frame. The deviation between the third position information of the third target portrait frame corresponding to the target portrait in the third image frame and the second position information of the second target portrait frame in the last first image frame is the difference between the coordinate information of each vertex of the third target portrait frame and the coordinate information of the vertex corresponding to the second target portrait frame in the last first image frame. When the deviation is divided into sub-deviations equal to the number of first image frames, each sub-deviation includes the sub-difference value of the coordinate information corresponding to the four vertices respectively, and the vertices corresponding to the second target portrait frame in each first image frame are adjusted according to the sub-difference value of the coordinate information corresponding to each vertex included in the sub-deviation.
[0061] To reacquire the lost human image from the first image frame and improve the accuracy of human image instance tracking, based on the above embodiments, in this application, the step of segmenting the target human image from the first image frame according to the second location information includes:
[0062] Based on the second location information, a second target human figure frame is determined in the first image frame;
[0063] The second target portrait frame is input into the trained portrait segmentation model to segment the target portrait.
[0064] In this application, after determining the second position information of the second target human image frame in each first image frame, the target human image is segmented from the first image frame based on the second position information in that first image frame. Specifically, for each first image frame, the second target human image frame corresponding to the second position information in that first image frame is determined based on the second position information in that first image frame, and then the second target human image frame is input into the trained human image segmentation model to segment the target human image.
[0065] Figure 3 A schematic diagram of the human face tracking device provided in this application is shown below. Figure 3 As shown, the device includes:
[0066] The acquisition module 301 is used to acquire a second preset number of second image frames acquired before the acquisition time of the first image frame if it is detected that the target human image has not been segmented in no more than a first preset number of first image frames.
[0067] The preset module 302 is used to predict the second position information of the second target image frame corresponding to the target image in the first image frame based on the first position information of the first target image frame corresponding to the target image in the second image frame;
[0068] The portrait segmentation module 303 is used to segment the target portrait from the first image frame based on the second location information.
[0069] In one possible implementation, the acquisition module 301 is further configured to acquire the first third image frame acquired after the acquisition time of the last first image frame.
[0070] The device further includes:
[0071] The adjustment module 304 is used to determine the deviation between the third position information of the third target portrait frame corresponding to the target portrait in the third image frame and the second position information of the second target portrait frame in the last first image frame; if the deviation exceeds a preset difference threshold, the second position information in the first image frames, which does not exceed a first preset number, is adjusted according to the deviation.
[0072] In one possible implementation, the adjustment module 304 is specifically configured to divide the deviation into a corresponding number of sub-deviations based on the number of the first image frames, wherein the sum of the sub-deviations is the deviation; and to adjust the second position information in the first image frame according to each sub-deviation.
[0073] In one possible implementation, the portrait segmentation module 303 is specifically used to determine a second target portrait frame in the first image frame based on the second location information; input the second target portrait frame into a trained portrait segmentation model to segment the target portrait.
[0074] Figure 4 This application provides a schematic diagram of an electronic device structure. Based on the above embodiments, this application also provides an electronic device, such as... Figure 4 As shown, it includes: processor 401, communication interface 402, memory 403 and communication bus 404, wherein processor 401, communication interface 402 and memory 403 communicate with each other through communication bus 404.
[0075] The memory 403 stores a computer program, which, when executed by the processor 401, causes the processor 401 to perform the following steps:
[0076] If it is found that the target human image is not segmented in the first image frames that do not exceed the first preset number, then the second preset number of second image frames acquired before the acquisition time of the first image frame are acquired.
[0077] Based on the first position information of the first target portrait frame corresponding to the target portrait in the second image frame, predict the second position information of the second target portrait frame corresponding to the target portrait in the first image frame;
[0078] The target human image is segmented from the first image frame based on the second location information.
[0079] In one possible implementation, after predicting the second location information corresponding to the target portrait in the first image frame, the method further includes:
[0080] Obtain the first third image frame acquired after the acquisition time of the last first image frame;
[0081] Determine the deviation between the third position information of the third target portrait frame corresponding to the target portrait in the third image frame and the second position information of the second target portrait frame in the last first image frame;
[0082] If the deviation exceeds a preset difference threshold, the second position information in the first image frames, which does not exceed a first preset number, is adjusted according to the deviation.
[0083] In one possible implementation, adjusting the second position information in no more than a first preset number of first image frames according to the deviation includes:
[0084] Based on the number of the first image frames, the deviation is divided into a corresponding number of sub-deviations, wherein the sum of the sub-deviations is the deviation;
[0085] The second position information in the first image frame is adjusted according to each of the sub-deviations.
[0086] In one possible implementation, segmenting the target human image from the first image frame based on the second location information includes:
[0087] Based on the second location information, a second target human figure frame is determined in the first image frame;
[0088] The second target portrait frame is input into the trained portrait segmentation model to segment the target portrait.
[0089] Since the principle of the above-mentioned electronic device in solving the problem is similar to that of the human face instance tracking method, the implementation of the above-mentioned electronic device can refer to the above embodiments, and the repeated parts will not be described again.
[0090] The communication bus mentioned in the above-mentioned electronic device can be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. This communication bus can be divided into address bus, data bus, control bus, etc. For ease of representation, only one thick line is used in the figure, but this does not indicate that there is only one bus or one type of bus. Communication interface 502 is used for communication between the above-mentioned electronic device and other devices. The memory can include random access memory (RAM), or non-volatile memory (NVM), such as at least one disk storage device. Optionally, the memory can also be at least one storage device located remotely from the aforementioned processor. The aforementioned processor can be a general-purpose processor, including a central processing unit (CPU), a network processor (NP), etc.; it can also be a digital signal processing unit (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
[0091] Based on the above embodiments, this application also provides a computer-readable storage medium storing a computer program executable by a processor. When the program is run on the processor, the processor executes the following steps:
[0092] If it is found that the target human image is not segmented in the first image frames that do not exceed the first preset number, then the second preset number of second image frames acquired before the acquisition time of the first image frame are acquired.
[0093] Based on the first position information of the first target portrait frame corresponding to the target portrait in the second image frame, predict the second position information of the second target portrait frame corresponding to the target portrait in the first image frame;
[0094] The target human image is segmented from the first image frame based on the second location information.
[0095] In one possible implementation, after predicting the second location information corresponding to the target portrait in the first image frame, the method further includes:
[0096] Obtain the first third image frame acquired after the acquisition time of the last first image frame;
[0097] Determine the deviation between the third position information of the third target portrait frame corresponding to the target portrait in the third image frame and the second position information of the second target portrait frame in the last first image frame;
[0098] If the deviation exceeds a preset difference threshold, the second position information in the first image frames, which does not exceed a first preset number, is adjusted according to the deviation.
[0099] In one possible implementation, adjusting the second position information in no more than a first preset number of first image frames according to the deviation includes:
[0100] Based on the number of the first image frames, the deviation is divided into a corresponding number of sub-deviations, wherein the sum of the sub-deviations is the deviation;
[0101] The second position information in the first image frame is adjusted according to each of the sub-deviations.
[0102] In one possible implementation, segmenting the target human image from the first image frame based on the second location information includes:
[0103] Based on the second location information, a second target human figure frame is determined in the first image frame;
[0104] The second target portrait frame is input into the trained portrait segmentation model to segment the target portrait.
[0105] Since the principle of solving the problem using the computer-readable medium provided above is similar to that of the human face instance tracking method, the steps implemented after the processor executes the computer program in the computer-readable medium can be referred to the above embodiments, and the repeated parts will not be described again.
[0106] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
[0107] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to this application. It should be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.
[0108] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.
[0109] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.
[0110] Obviously, those skilled in the art can make various modifications and variations to this application without departing from the spirit and scope of this application. Therefore, if such modifications and variations fall within the scope of the claims of this application and their equivalents, this application also intends to include such modifications and variations.
Claims
1. A method for tracking human faces, characterized in that, The method includes: If it is found that the target human image is not segmented in the first image frames that do not exceed the first preset number, then the second preset number of second image frames acquired before the acquisition time of the first image frame are acquired. Based on the first position information of the first target portrait frame corresponding to the target portrait in the second image frame, predict the second position information of the second target portrait frame corresponding to the target portrait in the first image frame; Based on the second location information, the target human image is segmented from the first image frame; Wherein, after predicting the second location information corresponding to the target portrait in the first image frame, the method further includes: Obtain the first third image frame acquired after the acquisition time of the last first image frame; Determine the deviation between the third position information of the third target portrait frame corresponding to the target portrait in the third image frame and the second position information of the second target portrait frame in the last first image frame; If the deviation exceeds a preset difference threshold, the second position information in the first image frames, which does not exceed a first preset number, is adjusted according to the deviation.
2. The method according to claim 1, characterized in that, The step of adjusting the second position information in no more than a first preset number of first image frames according to the deviation includes: Based on the number of the first image frames, the deviation is divided into a corresponding number of sub-deviations, wherein the sum of the sub-deviations is the deviation; The second position information in the first image frame is adjusted according to each of the sub-deviations.
3. The method according to claim 1, characterized in that, The step of segmenting the target human image from the first image frame based on the second location information includes: Based on the second location information, a second target human figure frame is determined in the first image frame; The second target portrait frame is input into the trained portrait segmentation model to segment the target portrait.
4. A human face tracking device, characterized in that, The device includes: The acquisition module is used to acquire a second preset number of second image frames acquired before the acquisition time of the first image frame if it is detected that the target human image has not been segmented in no more than a first preset number of first image frames. The preset module is used to predict the second position information of the second target image frame corresponding to the target image in the first image frame based on the first position information of the first target image frame corresponding to the target image in the second image frame; A human image segmentation module is used to segment the target human image from the first image frame based on the second location information; The acquisition module is further configured to acquire the first third image frame acquired after the acquisition time of the last first image frame. The device further includes: An adjustment module is used to determine the deviation between the third position information of the third target portrait frame corresponding to the target portrait in the third image frame and the second position information of the second target portrait frame in the last first image frame; if the deviation exceeds a preset difference threshold, the second position information in the first image frames, which does not exceed a first preset number, is adjusted according to the deviation.
5. The apparatus according to claim 4, characterized in that, The adjustment module is specifically used to divide the deviation into a corresponding number of sub-deviations according to the number of the first image frames, wherein the sum of the sub-deviations is the deviation; and to adjust the second position information in the first image frame according to each sub-deviation.
6. The apparatus according to claim 4, characterized in that, The portrait segmentation module is specifically used to determine the second target portrait frame in the first image frame based on the second location information; input the second target portrait frame into the trained portrait segmentation model to segment the target portrait.
7. An electronic device, characterized in that, The electronic device includes at least a processor and a memory, wherein the processor is used to execute a computer program stored in the memory to implement the steps of the human image instance tracking method according to any one of claims 1-3.
8. A computer-readable storage medium, characterized in that, It stores a computer program that, when executed by a processor, implements the steps of the human image instance tracking method according to any one of claims 1-3.
Citation Information
Patent Citations
Face tracking method, face tracking device and computer storage medium
CN110210285A
Target tracking method and device, computer readable storage medium and robot
CN111563919A