Video frame processing method and device, computer device and storage medium

By determining the pose and deflection values ​​of key parts in the video frame, noise jitter is filtered out using a backslot filter, thus improving the stability of the video frame.

CN117689569BActive Publication Date: 2026-06-23TENCENT TECHNOLOGY (SHENZHEN) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
TENCENT TECHNOLOGY (SHENZHEN) CO LTD
Filing Date
2022-08-16
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

During video frame processing, noise can cause jitter, affecting the stability of the video frames.

Method used

By determining the pose of key parts of the target object in the video frame, a deflection value is generated. This value is then filtered using a backslot filter to obtain a deflection update value, which is then used for image processing.

Benefits of technology

It effectively eliminates noise-induced jitter and improves the stability of video frames after image processing.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117689569B_ABST
    Figure CN117689569B_ABST
Patent Text Reader

Abstract

The application relates to a video frame processing method and device, computer equipment, a storage medium and a computer program product. The method can be applied to cloud conference, cloud storage, artificial intelligence and intelligent transportation scenes, and the method comprises the following steps: determining the posture of a key part of a target object in each video frame carrying noise; determining the deflection degree value of the key part in each video frame according to the posture of the key part; generating a control signal based on the deflection degree value of the key part in each video frame; performing back gap filtering on the control signal according to a back gap filtering coefficient through a back gap filter to obtain a deflection update value of the key part in each video frame; and sequentially performing image processing on each video frame based on the deflection update value in each video frame. The method can effectively eliminate the jitter caused by noise when the video frame is subjected to image processing, and is beneficial to improving the stability of the video frame after image processing.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of video processing technology, and in particular to a method, apparatus, computer device, storage medium, and computer program product for processing video frames. Background Technology

[0002] With the continuous development of video and internet technologies, users can easily conduct video conferences or watch various videos (such as live streams) through smart terminals. However, noise is inevitably introduced during video capture, causing video jitter and distortion during video conferencing or viewing. This distortion affects the stability of the processed video frames. Summary of the Invention

[0003] Therefore, it is necessary to provide a video frame processing method, apparatus, computer device, computer-readable storage medium, and computer program product to address the above-mentioned technical problems. This method can effectively eliminate the jitter caused by noise during image processing of video frames, thereby improving the stability of the processed video frames.

[0004] Firstly, this application provides a method for processing video frames. The method includes:

[0005] Determine the pose of key parts of the target object in each noisy video frame;

[0006] Based on the poses of the key parts, the deflection degree value of the key parts in each video frame is determined; the magnitude of the deflection degree value is affected by the pose changes of the key parts and the noise.

[0007] A control signal is generated based on the degree of deflection of the key parts in each video frame;

[0008] By using a back gap filter, the control signal is back gap filtered according to the back gap filter coefficient to obtain the deflection update value of the key part in each video frame.

[0009] Based on the deflection update value within each video frame, image processing is performed on each video frame sequentially.

[0010] Secondly, this application also provides a video frame processing apparatus. The apparatus includes:

[0011] The pose determination module is used to determine the pose of key parts of the target object in each video frame carrying noise.

[0012] The deflection degree value determination module is used to determine the deflection degree value of the key part in each video frame based on the various postures of the key part; the magnitude of the deflection degree value is affected by the deflection of the key part and the noise.

[0013] A control signal determination module is used to generate a control signal based on the degree of deflection of the key parts in each video frame;

[0014] The deflection degree value update module is used to perform back gap filtering on the control signal according to the back gap filtering coefficient through the back gap filter to obtain the deflection update value of the key part in each video frame.

[0015] The image processing module is used to perform image processing on each video frame sequentially based on the offset update value within each video frame.

[0016] In some embodiments, the pose determination module is further configured to acquire each video frame carrying noise acquired in real time; extract key points of the target object from each video frame; and determine the pose of the key parts of the target object in each video frame based on the key points.

[0017] In some embodiments, the control signal determination module is used to obtain the timing identifier of each video frame; generate a control signal based on the timing identifier and the deflection degree value of the key part in each video frame; the control signal is used to describe the trend of the deflection degree value changing over time.

[0018] In some embodiments, each of the video frames is a video frame acquired in real time, and the control signal is used to describe the trend of the deflection value changing over time;

[0019] The deflection degree update module is further configured to, during the process of the deflection degree value in the control signal increasing or decreasing over time, determine a first deflection degree change relative to the target inflection point using a backslot filter; compare the first deflection degree change with the backslot filter coefficients or a sum to obtain a comparison result; the sum is obtained by summing the backslot filter coefficients with a preset parameter; when the comparison result indicates that the first deflection degree change is less than or equal to the backslot filter coefficients, use the deflection update value at the target inflection point as the deflection update value of the key part in each video frame.

[0020] In some embodiments, the deflection degree value update module is further configured to perform linear transformation processing on the increased deflection degree value when the comparison result indicates that the change in the first deflection degree is greater than the back gap filter coefficient and less than or equal to the sum value, so as to obtain the deflection update value of the key part in each of the video frames.

[0021] In some embodiments, the deflection degree value update module is further configured to, when the comparison result indicates that the change in the first deflection degree is greater than the sum value, obtain the deflection update value corresponding to the increasing deflection degree value being equal to the sum value; and use the obtained deflection update value as the deflection update value of the key part in each of the video frames.

[0022] In some embodiments, the deflection degree value update module is further configured to, when it is detected that the deflection degree value in the control signal begins to decrease or increase over time, determine a second deflection degree change based on the deflection degree value at which it begins to decrease or increase; when the second deflection degree change is less than the back gap filter coefficient, determine the deflection update value of the key part in each of the video frames based on the deflection update value at the new target inflection point; the new target inflection point is an inflection point formed after the target inflection point.

[0023] In some embodiments, the key part includes the face, and the various postures include the pitch angle, yaw angle, and roll angle of the face in each of the video frames;

[0024] The deflection degree value determination module is further configured to normalize the pitch angle, yaw angle and roll angle of the face in each of the video frames to obtain normalized pitch angle, yaw angle and roll angle; determine the product value of the normalized pitch angle, yaw angle and roll angle; and determine the deflection degree value of the face in each of the video frames based on the product value.

[0025] In some embodiments, the posture determination module is further configured to determine the posture of the target object's face in each video frame carrying noise, when the eye contact function of the target application is enabled; wherein the target application is an application that plays each video frame.

[0026] In some embodiments, the image processing module is further configured to determine the location of the original eye feature points in each video frame; obtain target eye feature points based on the deflection update value in each video frame; and fuse the target eye feature points into each video frame based on the location to replace the original eye feature points in each video frame.

[0027] In some embodiments, the key part includes a hand; the image processing module is further configured to acquire special effects data when the deflection update value in each of the video frames meets the special effects addition conditions; and add the special effects data in each of the video frames.

[0028] Thirdly, this application also provides a computer device. The computer device includes a memory and a processor, the memory storing a computer program, and the processor executing the computer program to perform the following steps:

[0029] Determine the pose of key parts of the target object in each noisy video frame;

[0030] Based on the poses of the key parts, the deflection degree value of the key parts in each video frame is determined; the magnitude of the deflection degree value is affected by the pose changes of the key parts and the noise.

[0031] A control signal is generated based on the degree of deflection of the key parts in each video frame;

[0032] By using a back gap filter, the control signal is back gap filtered according to the back gap filter coefficient to obtain the deflection update value of the key part in each video frame.

[0033] Based on the deflection update value within each video frame, image processing is performed on each video frame sequentially.

[0034] Fourthly, this application also provides a computer-readable storage medium. The computer-readable storage medium stores a computer program thereon, which, when executed by a processor, performs the following steps:

[0035] Determine the pose of key parts of the target object in each noisy video frame;

[0036] Based on the poses of the key parts, the deflection degree value of the key parts in each video frame is determined; the magnitude of the deflection degree value is affected by the pose changes of the key parts and the noise.

[0037] A control signal is generated based on the degree of deflection of the key parts in each video frame;

[0038] By using a back gap filter, the control signal is back gap filtered according to the back gap filter coefficient to obtain the deflection update value of the key part in each video frame.

[0039] Based on the deflection update value within each video frame, image processing is performed on each video frame sequentially.

[0040] Fifthly, this application also provides a computer program product. The computer program product includes a computer program that, when executed by a processor, performs the following steps:

[0041] Determine the pose of key parts of the target object in each noisy video frame;

[0042] Based on the poses of the key parts, the deflection degree value of the key parts in each video frame is determined; the magnitude of the deflection degree value is affected by the pose changes of the key parts and the noise.

[0043] A control signal is generated based on the degree of deflection of the key parts in each video frame;

[0044] By using a back gap filter, the control signal is back gap filtered according to the back gap filter coefficient to obtain the deflection update value of the key part in each video frame.

[0045] Based on the deflection update value within each video frame, image processing is performed on each video frame sequentially.

[0046] The aforementioned video frame processing method, apparatus, computer equipment, storage medium, and computer program product acquire the posture of key parts of a target object in a noisy video frame, determine the deflection degree value of the key parts in each video frame based on the posture of the key parts, generate a control signal based on the deflection degree value of each video frame, perform backgap filtering on the control signal according to the backgap filtering coefficient through a backgap filter to obtain the deflection update value of the key parts in each video frame, and then perform image processing on each video frame sequentially based on the deflection update value in each video frame. Because video frames carry noise, the posture of key parts is affected by the noise, and the deflection value of key parts is also affected by the noise. The control signal has jitter caused by noise, which also affects the deflection update value. This results in jitter when processing video frames, such as noise causing frequent switching between two different states during image processing. The above video frame processing method uses a backslot filter to filter out the jitter caused by noise in the control signal based on the backslot filter coefficient. Then, the deflection update value of each video frame is determined based on the filtered control signal. In other words, the backslot filter removes the influence of noise on the deflection update value and eliminates the jitter caused by noise when processing video frames, thus improving the stability of the processed video frames. Attached Figure Description

[0047] Figure 1 This is an application environment diagram of a video frame processing method in one embodiment;

[0048] Figure 2 This is a flowchart illustrating a video frame processing method in one embodiment;

[0049] Figure 3 This is a schematic diagram of a visual signal curve of a control signal in one embodiment;

[0050] Figure 4 This is a schematic diagram of key points in a video frame in one embodiment;

[0051] Figure 5 This is a schematic diagram of a video frame in one embodiment;

[0052] Figure 6 This is a schematic diagram illustrating the extraction of key points from a video frame in one embodiment.

[0053] Figure 7 This is a schematic diagram of key points in one embodiment;

[0054] Figure 8 This is a flowchart illustrating the calculation of the deflection update value in another embodiment;

[0055] Figure 9 This is a schematic diagram of a visual signal curve of the control signal in another embodiment;

[0056] Figure 10 This is a schematic diagram illustrating the conversion between the change in deflection degree and the deflection update value in one embodiment;

[0057] Figure 11 This is a schematic diagram of the back gap phenomenon in one embodiment;

[0058] Figure 12 This is a schematic diagram illustrating the conversion between the change in deflection degree and the deflection update value in another embodiment;

[0059] Figure 13 This is a schematic diagram of the settings page of a target application in one embodiment;

[0060] Figure 14 This is a schematic diagram illustrating a video frame processing method in a specific embodiment;

[0061] Figure 15 This is a structural block diagram of a video frame processing device in one embodiment;

[0062] Figure 16 This is an internal structural diagram of a computer device in one embodiment. Detailed Implementation

[0063] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.

[0064] Before describing the embodiments of this application, the technology involved in this application will be explained in detail below:

[0065] Cloud storage is a new concept that extends and develops from the concept of cloud computing. A distributed cloud storage system (hereinafter referred to as a storage system) refers to a storage system that uses cluster applications, grid technology, and distributed storage file systems to bring together a large number of storage devices of various types (storage devices are also called storage nodes) in the network to work together through application software or application interfaces to provide data storage and business access functions to the outside world.

[0066] Cloud conferencing is an efficient, convenient, and low-cost form of meeting based on cloud computing technology. Users only need to use an internet interface to quickly and efficiently share voice, data files, and video with teams and clients around the world. The complex technologies such as data transmission and processing during the meeting are handled by the cloud conferencing service provider.

[0067] In the era of cloud conferencing, data transmission, processing, and storage are all handled by the video conferencing vendor's computer resources. Users no longer need to purchase expensive hardware or install cumbersome software; they can simply open a browser, log in to the corresponding interface, and conduct efficient remote meetings. Cloud conferencing systems support dynamic multi-server cluster deployment and provide multiple high-performance servers, greatly improving meeting stability, security, and availability.

[0068] Artificial Intelligence (AI) is the theory, methods, technology, and application systems that use digital computers or computers-controlled machines to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to achieve optimal results. AI technology is a comprehensive discipline involving a wide range of fields, encompassing both hardware and software technologies. Fundamental AI technologies generally include sensors, dedicated AI chips, cloud computing, distributed storage, big data processing, operating / interactive systems, and mechatronics. AI software technologies mainly include computer vision, speech processing, natural language processing, as well as machine learning / deep learning, autonomous driving, and intelligent transportation.

[0069] The video frame processing method provided in this application embodiment can be applied to, for example, Figure 1 In the application environment shown, terminal 102 communicates with server 104 via a network. A data storage system can store the data that server 104 needs to process. The data storage system can be integrated onto server 104, or it can be located in the cloud or on another network server.

[0070] In one application scenario, the terminal acquires video frames of the target object. This can be done by the terminal using its own camera to capture video frames of the target object, or by the terminal acquiring video frames of the target object from a server. The terminal determines the posture of key parts of the target object in each noisy video frame. Based on the posture of each key part, the terminal determines the deflection degree value of the key part in each video frame. Based on the deflection degree value of the key part in each video frame, the terminal generates a control signal. The terminal then performs backgap filtering on the control signal using a backgap filter and backgap filtering coefficients to obtain the deflection update value of the key part in each video frame. Finally, the terminal performs image processing on each video frame based on the deflection update value.

[0071] In one application scenario, the terminal acquires video frames containing the target object in real time. The terminal sends each video frame to the server. The server receives each video frame carrying noise. The server determines the posture of the key parts of the target object in each noisy video frame. Based on the posture of the key parts, the server determines the deflection value of the key parts in each video frame. Based on the deflection value of the key parts in each video frame, the server generates a control signal. The server uses a backslot filter to perform backslot filtering on the control signal using backslot filtering coefficients to obtain the deflection update value of the key parts in each video frame. The server performs image processing on each video frame based on the deflection update value.

[0072] The terminal 102 can be a smartphone, tablet, laptop, desktop computer, smart speaker, smartwatch, IoT device, or portable wearable device. IoT devices can include smart speakers, smart TVs, smart air conditioners, and smart in-vehicle devices, etc. Portable wearable devices can include smartwatches, smart bracelets, and head-mounted devices, etc.

[0073] Server 104 can be an independent physical server or a service node in a blockchain system. The service nodes in the blockchain system form a peer-to-peer (P2P) network. The P2P protocol is an application layer protocol that runs on top of the Transmission Control Protocol (TCP).

[0074] In addition, server 104 can also be a server cluster consisting of multiple physical servers, which can be a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (CDN), and big data and artificial intelligence platforms.

[0075] Terminal 102 and server 104 can be connected via Bluetooth, USB (Universal Serial Bus) or network, etc., and this application does not impose any restrictions.

[0076] In some embodiments, such as Figure 2 As shown, a method for processing video frames is provided, which can be applied to... Figure 1 The terminal or server in the middle, applied to the method Figure 1 Taking the terminal in the example, the explanation includes the following steps:

[0077] Step S202: Determine the pose of the key parts of the target object in each video frame carrying noise.

[0078] The target object is an object included in the video frame. For example, the target object can be a person included in the video frame. The key parts of the target object can be body parts of a person. For example, the key parts of the target object can be, but are not limited to, the face and hands.

[0079] In some embodiments, the target object can be an object within a video frame. For example, if the video frame contains only one person, that person is the target object; if the video frame contains at least two people, the person occupying the larger area of ​​the video frame is the target object.

[0080] In some embodiments, the target object can be multiple objects in a video frame. For example, if the video frame includes multiple people, the target object can be multiple people.

[0081] Pose can be used to characterize the orientation of a key part in a video frame, or it can be used to characterize the pose of a key part in a video frame. For example, if the body part is the face, the pose can be used to characterize the orientation of the face in a video frame; if the body part is the hand, the pose can be used to characterize the pose of the hand in a video frame.

[0082] Specifically, the terminal acquires each video frame. Each video frame can be captured by the terminal through its configured camera, or it can be obtained by the terminal from the server. In practical applications, a third-party device can capture each video frame and encode each video frame to obtain the corresponding bitstream. The third-party device sends the bitstream to the server, which then sends the bitstream to the terminal. The terminal acquires the bitstream and decodes it to obtain each video frame.

[0083] The acquisition process of each video frame introduces noise, and the encoding and decoding process of each video frame also introduces noise. Therefore, each video frame obtained by the terminal carries noise, and the noise carried by each video frame may be different.

[0084] The terminal processes each video frame to obtain the pose of the key parts of the target object in each video frame. Due to noise in each video frame, the pose of the key parts determined by the terminal may differ in each video frame even when the key parts remain stationary.

[0085] Step S204: Determine the deflection degree value of the key parts in each video frame based on the various postures of the key parts; the magnitude of the deflection degree value is affected by the deflection of the key parts and noise.

[0086] Among them, the deflection degree value is used to characterize the difference between the pose and the reference pose, and the deflection degree value of the key parts in the video frame is used to characterize the degree of deviation of the pose of the key parts in the video frame from the reference pose.

[0087] For example, the target area is the face. The degree of deflection of the face within a video frame is used to characterize the difference between the face's orientation (characterized by facial pose) and a reference orientation (characterized by reference pose) within the video frame. The larger the degree of deflection value, the greater the difference between the face's orientation and the reference orientation; the smaller the degree of deflection value, the smaller the difference between the face's orientation and the reference orientation.

[0088] The reference orientation can be the orientation of the face when facing the terminal's display screen. Therefore, the degree of deflection can be used to characterize the degree to which the face is facing the display screen. The smaller the degree of deflection, the closer the orientation of the face is to the orientation when the face is facing the display screen.

[0089] For example, the target part is the hand. The deflection value of the hand within the video frame is used to characterize the difference between the hand's pose (characterized by the hand's posture) and a reference pose (characterized by the reference posture). The larger the deflection value, the greater the difference between the hand's pose and the reference pose; the smaller the deflection value, the smaller the difference between the hand's pose and the reference pose.

[0090] Reference postures include, but are not limited to: the reference posture for "Yay" (hand clenched into a fist, index and middle fingers extended), the reference posture for "OK" (hand extended with fingers spread, thumb and index finger bent, forming a circle), and the reference posture for "1" (hand clenched into a fist, index finger extended). This degree of deflection can then be used to characterize the difference between the hand posture and the reference posture. For example, if the hand posture is the "Yay" posture, but the index and middle fingers are not straightened, there is a difference between the hand posture and the reference posture for "Yay".

[0091] Specifically, for each video frame, the terminal determines the pose of the key part in the video frame and the difference between it and the reference pose, thus obtaining the deflection value of the key part in the video frame.

[0092] The magnitude of the deflection value changes with the different poses of the key part; for example, the key part is the face. If the face is oriented differently in each video frame, the deflection value of the key part will also be different in each video frame.

[0093] Because each video frame carries noise, when the key part remains stationary in each video frame, the attitude of the key part determined by the terminal may be different in each video frame, and thus the degree of deflection of the key part in each video frame will also be different.

[0094] Step S206: Generate control signals based on the deflection values ​​of key components in each video frame.

[0095] The video frames are sequential in time, thus they can be considered a sequence of video frames with a temporal order. The control signal is a time-series signal generated based on the deflection degree value and is continuous in time. It can be used to reflect the changing trend of the deflection degree value of the target part in each video frame. For example, let the times (such as the acquisition time) corresponding to video frames f0 and f1 be t0 and t1, respectively. Then, the control signal at time t0 can represent the deflection degree value or the encoded value of the deflection degree value of the key part in video frame f0, and the control signal at time t1 can represent the deflection degree value or the encoded value of the deflection degree value of the key part in video frame f1.

[0096] Specifically, control signals are generated based on the temporal sequence of each video frame and the degree of deflection of key parts within each video frame.

[0097] For example, each video frame includes f1, f2, f3, f4, and f5 in chronological order. The control signal is the deflection degree value corresponding to f1, f2, f3, f4, and f5, respectively: p1, p2, p3, p4, and p5. By plotting points, a smooth curve is drawn based on p1, p2, p3, p4, and p5 to obtain a visual signal curve of the control signal. The horizontal axis of this visual signal curve is time, and the vertical axis is the deflection degree value.

[0098] For example, the visualized signal curve of the control signal is as follows: Figure 3 As shown, g t1 It is the value of the degree of facial deflection in video frame t1, g t2 It is the value of the degree of facial deflection in video frame t2, g t3This represents the degree of facial deflection in video frame t3. It is evident that the playback time sequence of video frames t1 precedes that of video frames t2, and the playback time sequence of video frames t2 precedes that of video frames t3; g t1 Less than g t2 g t2 Greater than g t3 Therefore, it can be seen that between video frame t1 and video frame t2, the trend of the deflection value changing with playback time is increasing, while between video frame t2 and video frame t3, the trend of the deflection value changing with playback time is decreasing. Figure 3 The visualized signal curve of the control signal described is merely an example. In practical applications, the visualized signal curve of the control signal can take other forms.

[0099] Step S208: The control signal is back-gap filtered according to the back-gap filter coefficient to obtain the deflection update value of the key parts in each video frame.

[0100] Here, backlash refers to the gap between two workpieces when they are engaged. The backlash phenomenon means that when two workpieces are engaged, the backlash causes one workpiece to need to complete the return stroke corresponding to the backlash before it can drive the other workpiece to switch its operating state. For example, when two gears mesh, there is backlash. When the master gear runs clockwise, the driven gear meshing with it runs counterclockwise. If the master gear switches to counterclockwise operation, it needs to complete the return stroke corresponding to the backlash before it can drive the driven gear to switch to clockwise operation.

[0101] A backslot filter is a filter derived from the aforementioned backslot phenomenon. It can filter out noise-induced jitter in the control signal and determine the deflection update value based on the filtered control signal. The backslot filter coefficients reflect the hysteresis corresponding to the backslot.

[0102] Specifically, the control signal is analogous to the operating state of the master gear, and the deflection update value is analogous to the operating state of the slave gear.

[0103] The increasing (or decreasing) trend of the deflection value in the control signal is analogous to the clockwise (or counterclockwise) rotation of the main gear. Correspondingly, when the driven gear rotates counterclockwise (or clockwise), the change in the deflection value in the control signal from an increasing trend to a decreasing trend (or from a decreasing trend to an increasing trend) is analogous to the main gear switching to counterclockwise (or clockwise) rotation. After the deflection value changes from an increasing to a decreasing trend (or from a decreasing to an increasing trend), if the change in the deflection value does not exceed the backlash filter coefficient, it is analogous to the main gear switching to counterclockwise rotation and not completing the backlash return stroke, so the driven gear's operating state does not change. After the deflection value changes from an increasing to a decreasing trend (or from a decreasing to an increasing trend), if the change in the deflection value exceeds the backlash filter coefficient, it is analogous to the main gear switching to counterclockwise rotation and completing the backlash return stroke, so the driven gear's operating state changes. Among them, the main gear and the driven gear can also be called the driving gear and the driven gear, respectively.

[0104] In one possible scenario, for the deflection value in the control signal, after the first shift in the trend of the deflection value (from increasing to decreasing, or from decreasing to increasing), if the change in the deflection value is no greater than the back gap filter coefficient, and the trend of the deflection value shifts again, with the deflection value continuing to increase or decrease until it is the same as the deflection value at the time of the first trend shift, since the change in the deflection value during this process is no greater than the back gap filter coefficient, the back gap filter filters out this process and determines the updated deflection value after filtering. The updated deflection value corresponding to each deflection value during this process is the same as the updated deflection value corresponding to the deflection value at the time of the first trend shift. In this case, the change in the deflection value is caused by noise, which causes jitter in the control signal. The back gap filter filters out this noise-induced jitter in the control signal.

[0105] In one possible scenario, after a shift in the trend of deflection value, if the change in deflection value exceeds the backslot filter coefficient, and the deflection value continues to change according to the shifted trend; and the change in deflection value during this shift corresponds to a change in the deflection value exceeding the backslot filter coefficient, then during this shift, the backslot filter will filter out the portion of the shift where the change in deflection value does not exceed the backslot filter coefficient, and determine the updated deflection value for that portion after filtering. This updated deflection value is the same as the updated deflection value corresponding to the deflection value at the time of the trend shift. The backslot filter will not filter out the portion of the shift where the change in deflection value exceeds the backslot filter coefficient, and will determine the updated deflection value based on the deflection value of that portion. In this case, the change in deflection value is caused by a change in the attitude of a critical component.

[0106] Step S210: Based on the offset update value within each video frame, perform image processing on each video frame sequentially.

[0107] The offset update value of the video frame is used to reflect the state of image processing of the video frame. The state of image processing of the video frame can include different degrees of image processing of the video frame, or it can include the state of image processing of the video frame or no image processing of the video frame.

[0108] In some embodiments, the deflection update value of a video frame can be used to reflect the degree of image processing performed on the video frame. The larger the deflection update value of the video frame, the higher the degree of image processing performed on the video frame; the smaller the deflection update value of the video frame, the lower the degree of image processing performed on the video frame. When the deflection update value of the video frame is at its minimum, the image processing state of the video frame is no image processing; when the deflection update value of the video frame is at its maximum, the image processing state of the video frame is image processing performed to the maximum extent.

[0109] For example, image processing of a video frame involves adjusting its image parameters. The larger the deflection update value of the video frame, the greater the difference between the video frame after the image parameter adjustment and the video frame before the adjustment. Conversely, the smaller the deflection update value of the video frame, the smaller the difference between the video frame after the image parameter adjustment and the video frame before the adjustment.

[0110] In some embodiments, the deflection update value of a video frame can be used to reflect whether image processing is performed on the video frame. For example, if the deflection update value of a video frame is a first update value, then image processing is performed on the video frame; if the deflection update value of a video frame is a second update value, then image processing is not performed on the video frame.

[0111] Specifically, for each video frame, the image processing state corresponding to the offset update value of that video frame is obtained, and the video frame is processed according to the image processing state to obtain the image-processed video frame.

[0112] In the above video frame processing method, the pose of the key parts of the target object in the noisy video frame is obtained. Based on the pose of the key parts, the deflection degree value of the key parts in each video frame is determined. A control signal is generated based on the deflection degree value of each video frame. The control signal is then filtered by a backslot filter according to the backslot filter coefficient to obtain the deflection update value of the key parts in each video frame. Finally, based on the deflection update value in each video frame, image processing is performed on each video frame in sequence. Because video frames carry noise, the posture of key parts is affected by the noise, and the deflection value of key parts is also affected by the noise. The control signal has jitter caused by noise, which also affects the deflection update value. This results in jitter when processing video frames, such as noise causing frequent switching between two different states during image processing. The above video frame processing method uses a backslot filter to filter out the jitter caused by noise in the control signal based on the backslot filter coefficient. Then, the deflection update value of each video frame is determined based on the filtered control signal. In other words, the backslot filter removes the influence of noise on the deflection update value and eliminates the jitter caused by noise when processing video frames, thus improving the stability of the processed video frames.

[0113] In some embodiments, determining the pose of key parts of a target object in each noisy video frame includes: acquiring each noisy video frame acquired in real time; extracting key points of the target object from each video frame; and determining the pose of key parts of the target object in each video frame based on the key points.

[0114] Here, the key points of the target object are the feature points of key parts of the target object. For example, if the key part is the face, the key points of the target object include, but are not limited to, pixels corresponding to the corners of the eyes, the pupils, and the corners of the mouth. For example, if the key part is the hand, the key points of the target object include, but are not limited to, pixels corresponding to the finger joints.

[0115] Specifically, the terminal acquires real-time video frames carrying noise, extracts key points from each video frame to obtain the key points of the target object in each video frame, and determines the pose of the key parts of the target object in each video frame based on the key points of the target object. Extracting the key points of the target object and determining the pose of the target parts in the video frames based on the key points can both be achieved using existing methods. This application, as an embodiment, does not limit the specific process of extracting the key points of the target object and determining the pose of the key parts of the target object in each video frame based on the key points.

[0116] For example, such as Figure 4 As shown, the terminal acquires the following video frames: frame t1, frame t2, frame t3, frame t4, and frame t5. The noise carried in frame t1 is designated as noise t1, the noise carried in frame t2 is designated as noise t2, the noise carried in frame t3 is designated as noise t3, the noise carried in frame t4 is designated as noise t4, and the noise carried in frame t5 is designated as noise t5. The terminal extracts key points from the noisy frames t1, t2, t3, t4, and t5 to obtain key points t1, t2, t3, t4, and t5 respectively. The terminal then performs pose estimation on the key points in frames t1, t2, t3, t4, and t5 to obtain the poses of the key parts in frames t1, t2, t3, t4, and t5.

[0117] In some embodiments, the key part of the target object is the face. For each video frame acquired by the terminal, after acquiring the video frame, the terminal determines the target object in the video frame and detects the key points (feature points of the face) of the target object. For example, a video frame is as follows: Figure 5 As shown, the feature points of the target object's face are detected in this video frame as follows: Figure 6 As shown, the feature points of the target object's face are extracted, such as... Figure 7 As shown; the facial pose in the video frame is determined based on the feature points of the target object's face.

[0118] In some embodiments, the key part of the target object is the hand. For each video frame acquired by the terminal, after the terminal acquires the video frame, it detects the key points (feature points of the hand) of the target object in the video frame, extracts the feature points of the hand, and determines the posture of the hand in the video frame based on the feature points of the hand.

[0119] In the above embodiments, the terminal extracts key points of the target object from the noisy video frame, that is, extracts feature points of key parts, and determines the pose of the key parts in the video frame based on the feature points of the key parts, so that the determined pose of the key parts is more accurate, so as to facilitate the subsequent determination of the deflection value of the key parts in the video frame based on the pose of the key parts in the video frame.

[0120] In some embodiments, each video frame is a real-time acquired video frame, and the control signal is used to describe the trend of the deflection value changing over time. Therefore, as... Figure 8 As shown, S208 may specifically include:

[0121] In step S802, as the deflection value in the control signal begins to increase or decrease over time, the first deflection change relative to the target inflection point is determined by the back gap filter.

[0122] The target inflection point is a point in the control signal, including: a critical point where the deflection value changes from increasing to decreasing (such as a peak in the control signal), or a critical point where the deflection value changes from decreasing to increasing (such as a trough in the control signal). Figure 9 Points a and b in the diagram are the target inflection points.

[0123] The first change in deflection degree represents the difference between the deflection degree at the target inflection point and the deflection degree at the current time (i.e., the current moment). If this difference is negative, its absolute value is taken to ensure that the final difference is positive. It should be noted that the deflection degree changes over time, and the corresponding first change in deflection degree also changes over time. For example, if the deflection degree increases over time, the corresponding first change in deflection degree also increases over time. Since the first change in deflection degree is the absolute value of the difference between two deflection degree values, even if the deflection degree decreases over time, the corresponding first change in deflection degree will still increase over time.

[0124] In some embodiments, S802 may specifically include: after the deflection degree value in the control signal has just switched from decreasing (or increasing) to increasing (or decreasing), the back gap filter calculates a first deflection degree change based on the deflection degree at the target inflection point, thereby obtaining a first deflection degree change relative to the target inflection point. Here, the target inflection point is the point where the deflection degree value has just switched from decreasing (or increasing) to increasing (or decreasing), such as... Figure 9 Point a in the middle.

[0125] In other embodiments, when the control signal is based on the signal obtained from the initially acquired video frame, as the deflection value in the control signal begins to increase or decrease over time, the backslot filter calculates the first deflection change based on the deflection value of the target part in the first frame. For example... Figure 9 As shown, the backslot filter uses the deflection degree value corresponding to the video frame acquired at time 0 (i.e., the first frame) as the reference value. It calculates the difference between the deflection degree value corresponding to each video frame acquired after time 0 and the reference value. The absolute value of this difference is used as the first deflection degree change for each video frame between time 0 and t(i). After obtaining the first deflection degree change, the server can perform a linear transformation on the obtained deflection degree value to obtain the deflection update value of the key part in each video frame, such as the deflection update value of the key part in each video frame between time 0 and t(i).

[0126] In some embodiments, the terminal can input a control signal to the back gap filter. Due to the influence of attitude changes and noise on key parts, the control signal may fluctuate. When the control signal changes from increasing to decreasing (i.e., the deflection value in the control signal begins to gradually decrease over time), the terminal calculates the first deflection change in real time from the target inflection point.

[0127] For example, such as Figure 9 As shown, the deflection value in the control signal decreases from time t(i). The terminal calculates the first deflection change in real time starting from time t(i). When time t(j) is reached, the difference between the deflection values ​​at times t(i) and t(j) is calculated, thus obtaining the first deflection change Δg = |g(t(i)) - g(t(j))| at time t(k). When time t(k) is reached, the difference between the deflection values ​​at times t(i) and t(k) is calculated, thus obtaining the first deflection change Δg = |g(t(i)) - g(t(k))| at time t(k). Since point b is the target inflection point, the terminal can recalculate the first deflection change starting from point b.

[0128] Step S804: Compare the change in the first deflection degree with the back gap filter coefficient or sum to obtain the comparison result.

[0129] The sum mentioned above is obtained by summing the backslot filter coefficients with preset parameters. The backslot filter coefficients are the filter coefficients in the backslot filter.

[0130] In some embodiments, the server can compare the change in the first deflection degree with the back gap filter coefficient to obtain a comparison result; or, when the change in the first deflection degree is greater than the back gap filter coefficient, the server can compare the change in the first deflection degree with the sum value to obtain a comparison result.

[0131] For example, such as Figure 10As shown in the figure, the small black dots represent the target inflection points. After the terminal calculates the change in the first deflection degree in real time starting from the target inflection point b, the magnitude of the change in the first deflection degree is compared with the backslot filter coefficient. It should be noted that when point c is not reached, i.e., not reached... Figure 9 At time t(l), the change in the first deflection degree is Δg = |g(t(k)) - g(t(l'))| < the back gap filter coefficient θ; when point c is reached, the change in the first deflection degree is Δg = |g(t(k)) - g(t(l))| = the back gap filter coefficient θ. Here, t(l') is between t(k) and t(l).

[0132] When the point exceeds point c but has not reached point d, if it reaches point d' (a point between c and d), the change in the first degree of deflection at time d', Δg = |g(t(k)) - g(t(m'))|, is compared with the sum. Clearly, Δg is less than the sum, i.e., Δg < θ + τ. When the point d is reached, the change in the first degree of deflection at time t(m), Δg = |g(t(k)) - g(t(m))|, is compared with the sum. Clearly, Δg is equal to the sum, i.e., Δg = θ + τ. Here, the sum is θ + τ, and t(m') lies between t(l) and t(m).

[0133] Step S806: When the comparison result indicates that the change in the degree of deflection is less than or equal to the back gap filter coefficient, the deflection update value at the target inflection point is used as the deflection update value of the key part in each video frame.

[0134] The deflection update value is the output value of the backslot filter, which is used to perform image processing on the video frame.

[0135] In one embodiment, the terminal obtains the deflection update value at the target inflection point, and then uses the deflection update value at the target inflection point as the deflection update value of the key part in each video frame. For example... Figure 10 As shown, each video frame in S806 refers to at least one video frame acquired during the time of phase 1.

[0136] Step S808: When the comparison result indicates that the change in the first deflection degree is greater than the back gap filter coefficient and less than or equal to the sum value, the increased deflection degree value is linearly transformed to obtain the deflection update value of the key part in each video frame.

[0137] In this case, a change in the first deflection degree greater than the backslot filter coefficient indicates that the backslot filter has eliminated the backslot, and the output deflection update value gradually increases. For example... Figure 10 As shown, each video frame in S808 refers to at least one video frame acquired during the time of phase 2.

[0138] Step S810: When the comparison result indicates that the change in the first degree of deflection is greater than the sum value, obtain the deflection update value corresponding to the increasing deflection degree value equal to the sum value; use the obtained deflection update value as the deflection update value of the key part in each video frame.

[0139] Specifically, when the change in the first deflection degree increases to the sum value, the output deflection update value no longer increases. Therefore, the terminal obtains the deflection update value corresponding to the sum value and uses this obtained deflection update value as the deflection update value of the key part in each video frame. Since the video frames are acquired in real time, each video frame in S810 refers to at least one video frame acquired at the time corresponding to the sum value.

[0140] To make the above embodiments clearer, in conjunction with Figure 10 The explanation is as follows:

[0141] The deflection value g changes from decreasing to increasing starting from point b. Before reaching point c, the change in the first deflection degree Δg is less than the back gap filter coefficient θ. Specifically, at point b, the corresponding change in the first deflection degree Δg = 0. As the deflection value g gradually increases, the corresponding change in the first deflection degree Δg also gradually increases. When it reaches point c, the change in the first deflection degree Δg = |g(t(k)) - g(t(l))| = the back gap filter coefficient θ. Since during the stage from b to c (i.e....) Figure 10 In stage 1), the backlash is not eliminated; therefore, the deflection update value output by the backlash filter in this stage is equal to the deflection update value at the time corresponding to point b. Here, the backlash filter coefficient θ corresponds to the backlash present when the drive wheel switches from clockwise to counterclockwise rotation, as shown... Figure 11 As shown, if the backlash is not completely eliminated during the counterclockwise rotation of the driving wheel, the driven wheel will not switch from counterclockwise to clockwise. That is, the driven wheel maintains its original state. Correspondingly, during this stage when the backlash is not eliminated, the deflection update value output by the backlash filter is equal to the deflection update value at the time corresponding to point b.

[0142] In stage c to d (i.e., stage 2), the backslot has been eliminated, so the deflection update value of the backslot filter output gradually increases during this stage. When point d is reached, the deflection update value of the backslot filter output reaches its maximum, that is, the deflection update value equals 1.

[0143] In the above embodiments, as the deflection degree value in the control signal begins to increase or decrease over time, the control signal is input to the backslot filter to eliminate the backslot. When the change in the first deflection degree is less than the backslot filter coefficient, it indicates that the backslot has not been eliminated. In this case, the deflection update value output by the backslot filter will not be adjusted, meaning the output deflection update value is equal to the deflection update value at the target inflection point, thus avoiding misprocessing of video frames due to noise. Since a change in the first deflection degree greater than the backslot filter coefficient indicates that the deflection degree value is caused by a change in the attitude of a critical part, a linear transformation is performed on the changing deflection degree value to obtain the deflection update value only when the change in the first deflection degree is greater than the backslot filter coefficient, i.e., when the backslot has been eliminated, thereby effectively processing the video frame.

[0144] In some embodiments, after the step of using the deflection update value at the target inflection point as the deflection update value of the key part in each video frame, the method further includes: when the terminal detects that the deflection degree value in the control signal begins to decrease or increase over time, determining a second deflection degree change based on the deflection degree value that begins to decrease or increase; when the second deflection degree change is less than the back gap filter coefficient, determining the deflection update value of the key part in each video frame based on the deflection update value at the new target inflection point; the new target inflection point is an inflection point formed after the target inflection point.

[0145] For example, when the deflection value in the control signal increases from t(k) to t(m) and then decreases from t(m), such as... Figure 9 As shown; this process corresponds to Figure 12 Phases 1, 2, and 3 are defined in the text. The second change in deflection is calculated starting from the deflection value at which the decrease begins (i.e., the deflection value at point d), as the baseline. Since... Figure 9 The decrease in the magnitude of the deflection is relatively small, therefore the change in the second deflection degree is also small, i.e., less than the back gap filter coefficient. Afterward, the deflection degree value in the control signal resumes its increasing trend. The method for calculating the change in deflection degree can be referred to the above embodiment. The change in the second deflection degree is calculated starting from the deflection degree value at point d as the baseline. If the decrease from point d is significant, the magnitude of the change in the second deflection degree can change from being less than the back gap filter coefficient to being greater than the back gap filter coefficient, and finally to being greater than the sum of the values. (Refer to...) Figure 11 Phases 3 and 4 in the process. It should be noted that the case where the change in the second degree of deflection is greater than the sum of its values ​​was not included in... Figure 12 As shown in the figure. The sum is θ + τ, and point d is the inflection point of the new target.

[0146] During stages 1 and 2, when the change in the second degree of deflection has not reached the sum value, the degree of deflection begins to decrease, and then enters stage 5. If it continues to decrease, then enters stage 4 from stage 5. The process of calculating the change in degree of deflection and the deflection update value can be referred to the above embodiment.

[0147] In the above embodiments, the deflection degree value in the control signal changes from increasing to decreasing, or from decreasing to increasing. After the trend changes, the changed control signal is input to the backslot filter to eliminate the backslot. When the change in the second deflection degree is less than the backslot filter coefficient, it indicates that the backslot has not been eliminated. In this case, the deflection update value output by the backslot filter will not be adjusted, that is, the output deflection update value is equal to the deflection update value at the new target inflection point, thereby avoiding misprocessing of video frames due to noise. Since the change in the second deflection degree is greater than the backslot filter coefficient, it indicates that the deflection degree value is caused by a change in the attitude of a critical part. Therefore, only when the change in the second deflection degree is greater than the backslot filter coefficient, that is, when the backslot has been eliminated, will a linear transformation be performed based on the changed deflection degree value to obtain the deflection update value, thereby effectively processing the video frame.

[0148] In some embodiments, the key part includes the face, and each posture includes the pitch angle, yaw angle, and roll angle of the face in each video frame; based on the posture of the key part, the deflection degree value of the key part in each video frame is determined, including: normalizing the pitch angle, yaw angle, and roll angle of the face in each video frame to obtain normalized pitch angle, yaw angle, and roll angle; determining the product value of the normalized pitch angle, yaw angle, and roll angle; and determining the deflection degree value of the face in each video frame based on the product value.

[0149] The Cartesian coordinate system in three-dimensional space is used to explain pitch, yaw, and roll angles. The plane formed by the x-axis and z-axis of the Cartesian coordinate system is parallel to the horizontal plane. The x-axis corresponds to the direction from the left ear to the right ear of the human head, and the z-axis corresponds to the direction from the face to the back of the head. The y-axis of the Cartesian coordinate system is the direction axis of the height of the human body. Pitch angle rotates around the x-axis, such as the change in pitch angle of the face caused by tilting the head up or down. Yaw angle rotates around the y-axis, such as the change in yaw angle of the face caused by turning the head from left to right. Roll angle rotates around the z-axis, such as the change in roll angle caused by turning the head so that the top of the head is close to the left shoulder.

[0150] Specifically, the terminal obtains the preset maximum pitch angle and preset minimum pitch angle, the preset maximum yaw angle and preset minimum yaw angle, and the preset maximum roll angle and preset minimum roll angle. For each video frame, the terminal normalizes the pitch angle of the face in that video frame based on the preset maximum pitch angle and preset minimum pitch angle to obtain the normalized pitch angle of the face in that video frame. The terminal normalizes the yaw angle of the face in that video frame based on the preset maximum yaw angle and preset minimum yaw angle to obtain the normalized yaw angle of the face in that video frame. The terminal normalizes the roll angle of the face in that video frame based on the preset maximum roll angle and preset minimum roll angle to obtain the normalized roll angle of the face in that video frame.

[0151] The posture of the face facing the terminal's display screen is used as the reference posture. The pitch angle corresponding to the reference posture is the preset minimum yaw angle, the yaw angle corresponding to the reference posture is the preset minimum yaw angle, and the roll angle corresponding to the reference posture is the preset minimum roll angle. The preset maximum pitch angle relative to the face facing the terminal's display screen can be the maximum angle achievable by tilting the head up or down. Similarly, the preset maximum yaw angle relative to the face facing the terminal's display screen can be the maximum angle achievable by rotating the head around the y-axis, and the preset maximum roll angle can be the maximum angle achievable by rotating the head around the z-axis. For example, the preset minimum pitch angle is 0 degrees, the preset maximum pitch angle is 90 degrees, the preset maximum yaw angle is 0 degrees, the preset maximum yaw angle is 90 degrees, and the preset maximum roll angle is 0 degrees and 90 degrees.

[0152] In some embodiments, the terminal normalizes the pitch angle of the face in the video frame according to the preset maximum pitch angle and the preset minimum pitch angle to obtain the normalized pitch angle of the face in the video frame. This can be achieved by the terminal calculating the first pitch angle difference between the pitch angle of the face in the video frame and the preset minimum pitch angle, and calculating the second pitch angle difference between the preset maximum pitch angle and the preset minimum pitch angle, and calculating the ratio between the first pitch angle difference and the second pitch angle difference to obtain the normalized pitch angle, as shown in formula (1).

[0153]

[0154] Among them, pitch t1 It is the normalized pitch angle of the face in video frame t1, and pitch1 is the pitch angle of the face in video frame t1. min It is the preset minimum pitch angle. max This is the preset maximum pitch angle. The normalized pitch angle ranges from [0,1].

[0155] In the same way, the terminal calculates the yaw angle of the face in the video frame, the first yaw angle difference between the preset minimum yaw angle, the second yaw angle difference between the preset maximum yaw angle and the preset minimum yaw angle, and calculates the ratio between the first yaw angle difference and the second yaw angle difference to obtain the normalized yaw angle, as shown in formula (2).

[0156]

[0157] Among them, yaw t1 yaw1 is the normalized yaw angle of the face in video frame t1, and yaw2 is the yaw angle of the face in video frame t1. min It is the preset minimum yaw angle. max This is the preset maximum yaw angle. The normalized yaw angle ranges from [0,1].

[0158] The terminal calculates the scroll angle of the face in the video frame, the first scroll angle difference between the preset minimum scroll angle, and the second scroll angle difference between the preset maximum scroll angle and the preset minimum scroll angle. It also calculates the ratio between the first scroll angle difference and the second scroll angle difference to obtain the normalized scroll angle, as shown in formula (3).

[0159]

[0160] Among them, roll t1 It is the normalized roll angle of the face in video frame t1, and roll1 is the roll angle of the face in video frame t1. min It is the preset minimum scroll angle, roll max This is the preset maximum scroll angle. The normalized scroll angle ranges from [0,1].

[0161] The terminal multiplies the normalized yaw angle, normalized deflection angle, and normalized roll angle of the face in the video frame to obtain the degree of deflection of the face in the video frame, as shown in formula (4).

[0162] g t1 =pitch t1 ×yaw t1 ×roll t1 (4)

[0163] Among them, g t1 It is the degree of facial deflection in video frame t1, pitch. t1 It is the normalized pitch angle of the face in video frame t1, yaw t1 It is the normalized yaw angle of the face in video frame t1, roll t1It is the normalized roll angle of the face in video frame t1. Since the normalized yaw angle, normalized deflection angle and normalized roll angle all range from [0,1], the deflection degree value also ranges from [0,1].

[0164] In the above embodiments, when the key part is the face, the facial posture is used to reflect the orientation of the face. The terminal determines the degree of deflection based on the facial posture. The degree of deflection of the face in the video frame is used to represent the orientation of the face in the video frame, the degree of deflection of the face relative to the screen. This is so that the degree of deflection of the face in the video frame can be used as a control signal to determine whether to perform image processing on the video frame, or to determine the degree of image processing on the video frame.

[0165] In some embodiments, generating a control signal based on the deflection degree value of the key part in each video frame includes: obtaining the timing identifier of each video frame; generating a control signal based on the timing identifier and the deflection degree value of the key part in each video frame; the control signal is used to describe the trend of the deflection degree value changing over time.

[0166] The timing identifier of each video frame can be used to reflect the playback time order of each video frame. The playback time order of each video frame is consistent with the acquisition time order of each video frame. Therefore, the timing identifier of each video frame can be used to reflect the acquisition time order of each video frame. The timing identifier can be represented by a value. The smaller the value of the timing identifier, the later the playback time of the corresponding video frame. The larger the value of the timing identifier, the earlier the playback time of the corresponding video frame.

[0167] Specifically, the terminal can determine the deflection value of each video frame sequentially according to the playback time order (acquisition time order). For each video frame, the terminal determines the dot signal of that video frame based on its deflection value and playback order. The terminal then generates control signals based on the dot signals corresponding to each video frame. Since the control signals are determined based on the playback time order (acquisition time order) and deflection values ​​of each video frame, the trend of the deflection value of each video frame changing with playback time (acquisition time) can be determined through the control signals.

[0168] In the above embodiments, the terminal determines the control signal based on the playback time sequence and the deflection value of each video frame. Through the control signal, the trend of the deflection value of each video frame changing with the playback time can be determined more intuitively.

[0169] In some embodiments, determining the pose of key parts of a target object in each video frame carrying noise includes: determining the pose of the target object's face in each video frame carrying noise, provided that the eye contact feature of the target application is enabled; wherein the target application is an application that plays each video frame.

[0170] In some scenarios, meetings can be conducted through the target application, where the target object in the video frame can be the participants, such as the speaker. In other scenarios, live streaming can be conducted through the target application, where the target object in the video frame can be the broadcaster. In still other scenarios, video calls can be conducted through the target application, where the target object in the video frame can be the two parties participating in the video call. The above scenarios are merely examples of possible scenarios. In practical applications, the target application can also be used in other scenarios, and this application does not limit these applications.

[0171] Eye contact refers to eye contact between the target object and the viewer, i.e., the target object and the viewer looking at each video frame. The eye contact function allows the target object to achieve the effect of eye contact with the viewer even when the target object is not looking directly at the camera of the video capture device.

[0172] In practical applications, there are many situations where the target audience does not look directly at the camera of the video capture device. For example, during a video call, both parties do not look directly at the camera in order to see each other's faces; during a meeting, the speaker does not look directly at the camera in order to see the content of their speech; and during a live broadcast, the host does not look directly at the camera in order to view the real-time comments from the audience.

[0173] The eye contact feature is a setting on the target application used to enable or disable the eye contact function of the target application.

[0174] Specifically, the terminal obtains the on / off state of the eye contact function item of the target application. If the eye contact function item is on, eye contact processing needs to be performed on each video frame. Eye contact processing is applied to the pixels corresponding to the eyes of the target object in the video frame. The pixels corresponding to the eyes of the target object in the video frame can only be processed when the target object's face is facing the display screen to the extent that the eye contact processing condition is met. Furthermore, the degree of processing of the pixels corresponding to the eyes of the target object is also related to the degree to which the target object's face is facing the display screen.

[0175] Therefore, the key part of the target object is the face. When the eye contact function of the target application is enabled, the terminal determines the posture of the target object's face in each noisy video frame, so as to determine whether to perform eye contact processing on the target object and the degree of eye contact processing based on the posture of the face in each noisy video frame.

[0176] In some embodiments, for a target application running on a terminal, the on / off state of the target application's eye contact function can be adjusted via the terminal, such as... Figure 13 As shown, the settings page 1300 of the target application includes an eye contact function control 1301; in response to a trigger operation on the eye contact function control 1301, the on / off state of the eye contact function is modified. For example, if the eye contact function control 1301 is in the on state before being triggered, then the eye contact function control 1301 is in the off state after being triggered; or, if the eye contact function control 1301 is in the off state before being triggered, then the eye contact function control 1301 is in the on state after being triggered.

[0177] The on / off state of the eye contact function item control 1301 can be determined by the display style of the eye contact function item control 1301. For example, the eye contact function item control 1301 includes a checkbox. When the eye contact function item is in the on state, the checkbox of the eye contact function item control 1301 includes a selection icon; when the eye contact function item is in the off state, the checkbox of the eye contact function item control 1301 does not include a selection icon. Figure 13 As shown, if the checkbox of the eye contact function control 1301 does not include the selected icon, then the eye contact function is in the off state. The eye contact function control 1301 may also include descriptive information about the eye contact function, such as... Figure 13 As shown, the description of the eye contact function includes: "Enhance your eye contact with other participants."

[0178] In the above embodiments, when the eye contact function is enabled, the terminal determines the facial posture of the target object in each noisy video frame, so as to determine whether to perform eye contact processing on the target object and the degree of eye contact processing based on the facial posture in each noisy video frame.

[0179] In some embodiments, image processing is performed on each video frame sequentially based on the deflection update value within each video frame, including: determining the location of the original eye feature points in each video frame; obtaining target eye feature points based on the deflection update value within each video frame; and fusing the target eye feature points into each video frame based on their location to replace the original eye feature points in each video frame.

[0180] Among them, eye feature points are feature points that affect the direction of the gaze, i.e., the line of sight. Adjusting the eye feature points adjusts the direction of the gaze, i.e., adjusts the line of sight, so that the adjusted line of sight is directly facing the camera of the video capture device. Eye feature points can be the pixels corresponding to the eyeballs.

[0181] The location of the eye feature point in a video frame is the pixel area where the eye feature point is located in the video frame. When the eye feature point is the pixel corresponding to the eyeball, the location of the eye feature point is the pixel area of ​​the pixel corresponding to the eyeball.

[0182] The target eye feature point can be the pixel point corresponding to the eyeball after adjustment based on the deflection update value.

[0183] In this embodiment, the deflection update value of the video frame is used to reflect the degree of eye contact processing on the video frame. The larger the deflection update value, the higher the degree of eye contact processing on the video frame. The smaller the deflection update value, the lower the degree of eye contact processing on the video frame.

[0184] Specifically, for each video frame, if the offset update value of that video frame is not 0 (if the offset update value of a video frame is 0, then no eye contact processing is performed on the video frame), then the pixel coordinates corresponding to the target eye feature points in that video frame are obtained. If the offset update value of that video frame is not 0, but the target object is in a closed-eye state in the video frame, then the original position of the eye feature points cannot be obtained in that video frame, and therefore no eye contact processing is performed on that video frame.

[0185] Obtaining target eye feature points based on the deflection update value within the video frame can be achieved by performing eye contact processing on the video frame based on the deflection update value. This eye contact processing can be performed by feeding the deflection update value and facial feature points in the video frame into a deep neural network. The deep neural network then identifies eye-related information within these facial feature points and obtains the target eye feature points based on this information. The process of performing eye contact processing on the deflection update value and video frame using a deep neural network can be implemented using existing methods and will not be elaborated upon here.

[0186] Based on the location of the original eye feature points in the video frame, the target eye feature points are fused into the video frame. This can be achieved by replacing the pixel values ​​corresponding to the original eye feature points with the pixel values ​​corresponding to the target eye feature points, thereby replacing the original eye feature points in the video frame.

[0187] For example, the eye feature point is the pixel corresponding to the eyeball. In video frame t1, the original pixel s1 corresponding to the eyeball is located in region p1, and the pixel value corresponding to region p1 is r1 (the pixel value of the original pixel corresponding to the eyeball is r1). The target eye feature point is determined according to the deflection update value of video frame t1. The target eye feature point is the adjusted pixel corresponding to the eyeball, and the pixel value of the adjusted pixel corresponding to the eyeball is r2. The pixel value r1 corresponding to region p1 in video frame t1 is modified to r2 to replace the original pixel corresponding to the eyeball in video frame t1.

[0188] In the above embodiments, the terminal obtains the target eye feature points based on the deflection update value of the video frame and replaces the original eye feature points in the video frame with the target eye feature points. Since the deflection update value is affected by noise, the target eye feature points are also affected by noise. If the noise causes the deflection update value of each video frame to frequently switch between the deflection update values ​​corresponding to the two levels of eye contact processing, the target object's eyes may flicker frequently in each video frame due to the eye contact processing, resulting in unstable and unnatural eye contact effects. In the above embodiments, the deflection update value of the video frame is output by a backslot filter. The backslot filter filters out the influence of noise on the deflection update value, avoiding the noise causing the deflection update value of each video frame to frequently switch between the deflection update values ​​corresponding to the two levels of eye contact processing. This improves the stability of the deflection update value, thereby improving the stability of the eye contact effect and making the eye contact effect more natural.

[0189] In some embodiments, the key part includes the hand; based on the deflection update value in each video frame, image processing is performed on each video frame in sequence, including: when the deflection update value in each video frame meets the effect addition condition, obtaining effect data; and adding effect data to each video frame.

[0190] Specifically, when the key part is the hand, the deflection update value of the video frame is either the first update value or the second update value. When the deflection update value of the video frame is the first update value, it means that the video frame needs to be image processed. When the deflection update value of the video frame is the second update value, it means that the video frame does not need to be image processed. The conditions for adding special effects include: the deflection update value is the first update value.

[0191] Specifically, for each video frame, the terminal obtains the deflection degree update value of that video frame. If the deflection degree update value of that video frame is the first update value, then the special effects data is obtained, and feature data is added to the video frame.

[0192] For example, the first update value is 1 and the second update value is 0; if the deflection update value of the video frame is 1, then special effects data is obtained and feature data is added to the video frame; if the deflection update value is 0, then special effects data is not added to the video frame.

[0193] In some embodiments, determining the pose of key parts of a target object in each noisy video frame includes: in response to an activation request for a gesture effects control on a video playback page, determining the pose of the target object's hand in each noisy video frame.

[0194] The gesture effects control is used to add special effects data to video frames based on gestures. In practical applications, the video playback page may include multiple gesture effects controls to achieve different ways of adding various special effects data, including but not limited to: adding multiple special effects data based on a first gesture, adding multiple special effects data based on a second gesture, and adding multiple special effects data based on a third gesture. For example, the first gesture may be a gesture representing "Yay," the second gesture may be a gesture representing "OK," and the third gesture may be a gesture representing "1." Multiple special effects data include, but are not limited to, adding multiple special effects icons and adding multiple special effects text to the video frame. This application embodiment does not limit the use of multiple special effects icons and multiple special effects text.

[0195] Specifically, the target application runs on the terminal and includes a video playback page. The video playback page includes gesture effects. The terminal responds to the trigger operation of the gesture effects to determine the posture of the target object's hand in each video frame carrying noise.

[0196] In the above embodiments, the terminal determines whether the deflection update value of the video frame meets the special effects conditions. If the deflection update value of the video frame meets the special effects addition conditions, then special effects data is added to the video frame. Since the deflection update value is affected by noise, whether or not special effects data is added to the video frame will also be affected by the deflection update value. If noise causes the deflection update value of each video frame to frequently switch between meeting and not meeting the special effects conditions, then each video frame will also frequently switch between adding and not adding special effects data, for example, sometimes there is a special effects icon and sometimes there is no special effects icon, resulting in instability of the function of adding special effects data. In the above embodiments, the deflection update value of the video frame is output by a backslot filter. The backslot filter filters out the influence of noise on the deflection update value, avoiding noise causing the deflection update value of each video frame to frequently switch between meeting and not meeting the special effects conditions, improving the stability of the deflection update value, and thus improving the stability of the function of adding special effects data.

[0197] In a specific embodiment, such as Figure 14 As shown, the video frame processing methods include:

[0198] S1401, with the eye contact function of the target application enabled, acquire each video frame carrying noise in real time; extract key points of the target object from each video frame; based on the key points, determine the pose of the key parts of the target object in each video frame; wherein, the target application is the application that plays each video frame; the target part is the face, and the pose of the face in each video frame includes the pitch angle, yaw angle and roll angle of the face in each video frame;

[0199] S1402, normalize the pitch angle, yaw angle and roll angle of the face in each video frame to obtain normalized pitch angle, yaw angle and roll angle; determine the product value of the normalized pitch angle, yaw angle and roll angle; determine the deflection degree value of the face in each video frame based on the product value; the magnitude of the deflection degree value is affected by the attitude change of key parts and noise.

[0200] S1403, Obtain the timing identifier of each video frame; Based on the timing identifier and the deflection degree value of key parts in each video frame, generate a control signal; The control signal is used to describe the trend of the deflection degree value changing over time;

[0201] S1404, during the process of the deflection degree value in the control signal increasing or decreasing with time, the first deflection degree change relative to the target inflection point is determined by the back gap filter.

[0202] S1405, compare the change in the first deflection degree with the back gap filter coefficient or the sum value to obtain the comparison result; the sum value is obtained by summing the back gap filter coefficient with the preset parameter;

[0203] S1406A: When the comparison result indicates that the change in the first degree of deflection is less than or equal to the back gap filter coefficient, the deflection update value at the target inflection point is used as the deflection update value of the key part in each video frame.

[0204] S1407A, when it is detected that the deflection degree value in the control signal begins to decrease or increase over time, the second deflection degree change is determined based on the deflection degree value at which it begins to decrease or increase.

[0205] S1408A: When the change in the second deflection degree is less than the back gap filter coefficient, the deflection update value of the key part in each video frame is determined based on the deflection update value at the target inflection point.

[0206] S1406B: When the comparison result indicates that the change in the first degree of deflection is greater than the back gap filter coefficient and less than or equal to the sum, the increased degree of deflection is linearly transformed to obtain the deflection update value of the key part in each video frame.

[0207] S1406C, when the comparison result indicates that the change in the first degree of deflection is greater than the sum value, obtain the deflection update value corresponding to the increasing deflection value equal to the sum value; use the obtained deflection update value as the deflection update value of the key parts in each video frame;

[0208] S1409, determine the location of the original eye feature points in each video frame; obtain the target eye feature points based on the deflection update value in each video frame; and fuse the target eye feature points into each video frame according to their location to replace the original eye feature points in each video frame.

[0209] In the above video frame processing method, the pose of the key parts of the target object in the noisy video frame is obtained. Based on the pose of the key parts, the deflection degree value of the key parts in each video frame is determined. A control signal is generated based on the deflection degree value of each video frame. The control signal is then filtered by a backslot filter according to the backslot filter coefficient to obtain the deflection update value of the key parts in each video frame. Finally, based on the deflection update value in each video frame, image processing is performed on each video frame in sequence. Because video frames carry noise, the posture of key parts is affected by the noise, and the deflection value of key parts is also affected by the noise. The control signal has jitter caused by noise, which also affects the deflection update value. This results in jitter when processing video frames, such as noise causing frequent switching between two different states during image processing. The above video frame processing method uses a backslot filter to filter out the jitter caused by noise in the control signal based on the backslot filter coefficient. Then, the deflection update value of each video frame is determined based on the filtered control signal. In other words, the backslot filter removes the influence of noise on the deflection update value and eliminates the jitter caused by noise when processing video frames, thus improving the stability of the processed video frames.

[0210] In one scenario embodiment, the terminal views each video frame through the video playback page of the target application; the video playback page includes a gesture effect control, which is used to add a "petal" icon to the video frame when a hand gesture indicating "OK" is detected.

[0211] The terminal responds to the triggering of gesture effects controls on the video playback page by determining the posture of the target object's hand in each video frame, even in noisy video frames. Based on the hand's posture in each video frame, it determines the deflection value of the hand within each video frame. A control signal is generated based on the deflection value of the hand in each video frame. This control signal is then filtered using a backslot filter and the backslot filter coefficients are applied to obtain the updated deflection value of the hand in each video frame. For each video frame, if the updated deflection value in that video frame meets the conditions for adding effects, a "petal" icon is added to that video frame.

[0212] For example, if the hand's deflection update value is 0 in frame t1, then the "petal" icon is not added in frame t1. If the hand's deflection update value is 1 in frame t2, then the "petal" icon is added in frame t2. It should be noted that in this case, the change between the hand's deflection value in frame t1 and the hand's deflection value in frame t2 is greater than the backslot filter coefficient.

[0213] It should be understood that although the steps in the flowcharts of the embodiments described above are shown sequentially according to the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the embodiments described above may include multiple steps or multiple stages. These steps or stages are not necessarily completed at the same time, but can be executed at different times. The execution order of these steps or stages is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the steps or stages of other steps.

[0214] Based on the same inventive concept, this application also provides a video frame processing apparatus for implementing the video frame processing method described above. The solution provided by this apparatus is similar to the implementation described in the above method; therefore, the specific limitations of one or more video frame processing apparatus embodiments provided below can be found in the limitations of the video frame processing method described above, and will not be repeated here.

[0215] In one embodiment, such as Figure 15 As shown, a video frame processing apparatus is provided, including: an attitude determination module 1502, a deflection degree value determination module 1504, a control signal determination module 1506, a deflection degree value update module 1508, and an image processing module 1510, wherein:

[0216] The pose determination module 1502 is used to determine the pose of key parts of the target object in each video frame carrying noise.

[0217] The deflection degree value determination module 1504 is used to determine the deflection degree value of the key parts in each video frame based on the various postures of the key parts; the magnitude of the deflection degree value is affected by the deflection of the key parts and noise.

[0218] The control signal determination module 1506 is used to generate control signals based on the degree of deflection of key parts in each video frame.

[0219] The deflection degree value update module 1508 is used to perform back gap filtering on the control signal according to the back gap filtering coefficient through the back gap filter to obtain the deflection update value of the key parts in each video frame.

[0220] The image processing module 1510 is used to perform image processing on each video frame sequentially based on the offset update value within each video frame.

[0221] The pose of the key parts of the target object in noisy video frames is obtained. Based on the pose of each key part, the deflection degree value of the key part in each video frame is determined. A control signal is generated based on the deflection degree value of each video frame. The control signal is then filtered by a backslot filter according to the backslot filter coefficient to obtain the deflection update value of the key part in each video frame. Based on the deflection update value in each video frame, image processing is performed on each video frame in sequence. Because video frames carry noise, the posture of key parts is affected by the noise, and the deflection value of key parts is also affected by the noise. The control signal has jitter caused by noise, which also affects the deflection update value. This results in jitter when processing video frames, such as noise causing frequent switching between two different states during image processing. The above video frame processing method uses a backslot filter to filter out the jitter caused by noise in the control signal based on the backslot filter coefficient. Then, the deflection update value of each video frame is determined based on the filtered control signal. In other words, the backslot filter removes the influence of noise on the deflection update value and eliminates the jitter caused by noise when processing video frames, thus improving the stability of the processed video frames.

[0222] In some embodiments, the pose determination module 1502 is further configured to acquire each video frame carrying noise acquired in real time; extract key points of the target object from each video frame; and determine the pose of the key parts of the target object in each video frame based on the key points.

[0223] In some embodiments, the control signal determination module 1506 is further configured to acquire the timing identifier of each video frame; generate a control signal based on the timing identifier and the deflection degree value of the key part in each video frame; and use the control signal to describe the trend of the deflection degree value changing over time.

[0224] In some embodiments, each video frame is a video frame acquired in real time, and the control signal is used to describe the trend of the deflection value changing over time.

[0225] The deflection degree value update module 1508 is also used to determine the first deflection degree change relative to the target inflection point through a back gap filter as the deflection degree value in the control signal begins to increase or decrease over time; compare the first deflection degree change with the back gap filter coefficient or the sum value to obtain a comparison result; the sum value is obtained by summing the back gap filter coefficient and the preset parameter; when the comparison result indicates that the first deflection degree change is less than or equal to the back gap filter coefficient, the deflection update value at the target inflection point is used as the deflection update value of the key part in each video frame.

[0226] In some embodiments, the deflection degree value update module 1508 is further configured to perform linear transformation processing on the increased deflection degree value when the comparison result indicates that the change in the first deflection degree is greater than the back gap filter coefficient and less than or equal to the sum value, so as to obtain the deflection update value of the key part in each video frame.

[0227] In some embodiments, the deflection degree value update module 1508, when the comparison result indicates that the change in the first deflection degree is greater than the sum value, obtains the deflection update value corresponding to the increasing deflection degree value being equal to the sum value; and uses the obtained deflection update value as the deflection update value of the key part in each video frame.

[0228] In the above embodiments, as the deflection degree value in the control signal begins to increase or decrease over time, the control signal is input to the backslot filter to eliminate the backslot. When the change in deflection degree is less than the backslot filter coefficient, it indicates that the backslot has not been eliminated. In this case, the deflection update value output by the backslot filter will not be adjusted, meaning the output deflection update value is equal to the deflection update value at the target inflection point, thus avoiding erroneous processing of video frames due to noise. Since a change in deflection degree greater than the backslot filter coefficient indicates that the deflection degree is caused by a change in attitude in a critical part, a linear transformation is performed on the changed deflection degree value to obtain the deflection update value only when the change in deflection degree is greater than the backslot filter coefficient, i.e., when the backslot has been eliminated, thereby effectively processing the video frame.

[0229] In some embodiments, the deflection degree value update module 1508 is further configured to determine a second deflection degree change based on the deflection degree value that begins to decrease or increase over time when the deflection degree value in the control signal is detected to begin to decrease or increase over time; and when the second deflection degree change is less than the back gap filter coefficient, determine the deflection update value of the key part in each video frame based on the deflection update value at the new target inflection point; the new target inflection point is an inflection point formed after the target inflection point.

[0230] In some embodiments, the key part includes the face, and each posture includes the pitch angle, yaw angle and roll angle of the face in each video frame;

[0231] The deflection degree value determination module 1504 is also used to normalize the pitch angle, yaw angle and roll angle of the face in each video frame to obtain normalized pitch angle, yaw angle and roll angle; determine the product value of the normalized pitch angle, yaw angle and roll angle; and determine the deflection degree value of the face in each video frame based on the product value.

[0232] In some embodiments, the posture determination module 1502 is further configured to determine the posture of the target object's face in each video frame carrying noise, when the eye contact function of the target application is enabled; wherein the target application is an application that plays each video frame.

[0233] In some embodiments, the image processing module 1510 is further configured to determine the location of the original eye feature points in each video frame; obtain the target eye feature points based on the deflection update value in each video frame; and fuse the target eye feature points into each video frame based on their location to replace the original eye feature points in each video frame.

[0234] In some embodiments, the key part includes the hand; the image processing module 1510 is further configured to acquire special effects data when the deflection update value in each video frame meets the special effects addition conditions; and add special effects data in each video frame.

[0235] Each module in the aforementioned video frame processing device can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in or independent of the processor in a computer device, or stored in the memory of a computer device as software, so that the processor can call and execute the operations corresponding to each module.

[0236] In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as follows: Figure 16As shown, the computer device includes a processor, memory, input / output interfaces, a communication interface, a display unit, and an input device. The processor, memory, and input / output interfaces are connected via a system bus, and the communication interface, display unit, and input device are also connected to the system bus via the input / output interfaces. The processor provides computing and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system and computer programs. The internal memory provides an environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The input / output interfaces are used for exchanging information between the processor and external devices. The communication interface is used for wired or wireless communication with external terminals; wireless communication can be achieved through Wi-Fi, mobile cellular networks, NFC (Near Field Communication), or other technologies. When the computer program is executed by the processor, it implements a business document review method. The display unit of the computer device is used to form a visually visible image. It can be a display screen, a projection device, or a virtual reality imaging device. The display screen can be an LCD screen or an e-ink screen. The input device of the computer device can be a touch layer covering the display screen, or buttons, trackballs, or touchpads set on the casing of the computer device, or external keyboards, touchpads, or mice, etc.

[0237] Those skilled in the art will understand that Figure 16 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer device to which the present application is applied. Specific computer devices may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.

[0238] In one embodiment, a computer device is provided, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps of the video frame processing method described above.

[0239] In one embodiment, a computer-readable storage medium is provided having a computer program stored thereon, which, when executed by a processor, implements the steps of the video frame processing method described above.

[0240] In one embodiment, a computer program product is provided, including a computer program that, when executed by a processor, implements the steps of the video frame processing method described above.

[0241] It should be noted that the video frames, user information (including but not limited to user image information, user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties, and the collection, use and processing of related data must comply with the relevant laws, regulations and standards of the relevant countries and regions.

[0242] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. Any references to memory, databases, or other media used in the embodiments provided in this application can include at least one of non-volatile and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive random access memory (ReRAM), magnetic random access memory (MRAM), ferroelectric random access memory (FRAM), phase change memory (PCM), graphene memory, etc. Volatile memory can include random access memory (RAM) or external cache memory, etc. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM). The databases involved in the embodiments provided in this application may include at least one type of relational database and non-relational database. Non-relational databases may include, but are not limited to, blockchain-based distributed databases. The processors involved in the embodiments provided in this application may be general-purpose processors, central processing units, graphics processing units, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, etc., and are not limited to these.

[0243] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

[0244] The embodiments described above are merely illustrative of several implementation methods of this application, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of this patent application. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this application should be determined by the appended claims.

Claims

1. A method for processing video frames, characterized in that, The method includes: Determine the pose of key parts of the target object in each video frame carrying noise; each video frame is a video frame acquired in real time; Based on the poses of the key parts, the deflection degree value of the key parts in each video frame is determined; the magnitude of the deflection degree value is affected by the pose changes of the key parts and the noise. A control signal is generated based on the deflection value of the key part in each video frame; the control signal is used to describe the trend of the deflection value changing over time; As the deflection value in the control signal begins to increase or decrease over time, the first deflection change relative to the target inflection point is determined by a back gap filter. The change in the first degree of deflection is compared with the back gap filter coefficient or the sum value to obtain the comparison result; the sum value is obtained by summing the back gap filter coefficient with the preset parameter; When the comparison result indicates that the change in the first deflection degree is less than or equal to the back gap filter coefficient, the deflection update value at the target inflection point is used as the deflection update value of the key part in each video frame. Based on the deflection update value within each video frame, image processing is performed on each video frame sequentially.

2. The method according to claim 1, characterized in that, Determining the pose of key parts of the target object in each noisy video frame includes: Acquire each video frame, which carries noise, in real time; Extract key points of the target object from each of the video frames; Based on the key points, the pose of the key parts of the target object in each video frame is determined.

3. The method according to claim 1, characterized in that, The generation of control signals based on the deflection values ​​of the key components within each video frame includes: Obtain the timing identifier of each video frame; A control signal is generated based on the timing identifier and the deflection degree value of the key part in each video frame; the control signal is used to describe the trend of the deflection degree value changing over time.

4. The method according to claim 1, characterized in that, The method further includes: When the comparison result indicates that the change in the first deflection degree is greater than the back gap filter coefficient and less than or equal to the sum, the increased deflection degree value is linearly transformed to obtain the deflection update value of the key part in each video frame.

5. The method according to claim 4, characterized in that, The method further includes: When the comparison result indicates that the change in the first deflection degree is greater than the sum value, obtain the deflection update value corresponding to when the increasing deflection degree value is equal to the sum value; The obtained deflection update value is used as the deflection update value of the key part in each video frame.

6. The method according to claim 1, characterized in that, After using the deflection update value at the target inflection point as the deflection update value of the key part in each video frame, the method further includes: When it is detected that the deflection degree value in the control signal begins to decrease or increase over time, the second deflection degree change is determined based on the deflection degree value at which it begins to decrease or increase. When the change in the second deflection degree is less than the back gap filter coefficient, the deflection update value of the key part in each of the video frames is determined according to the deflection update value at the new target inflection point; the new target inflection point is an inflection point formed after the target inflection point.

7. The method according to any one of claims 1 to 6, characterized in that, The key part includes the face, and the various postures include the pitch angle, yaw angle and roll angle of the face in each of the video frames; Determining the degree of deflection of the key parts in each video frame based on their respective poses includes: The pitch angle, yaw angle and roll angle of the face in each of the video frames are normalized to obtain normalized pitch angle, yaw angle and roll angle; Determine the product of the normalized pitch angle, yaw angle, and roll angle; The degree of deflection of the face within each video frame is determined based on the product value.

8. The method according to claim 7, characterized in that, Determining the pose of key parts of the target object in each noisy video frame includes: With the eye contact feature enabled in the target application, determine the facial posture of the target object in each noisy video frame. The target application is an application that plays each of the video frames.

9. The method according to claim 8, characterized in that, The step of sequentially performing image processing on each video frame based on the deflection update value within each video frame includes: Determine the location of the original eye feature points in each of the video frames; Target eye feature points are obtained based on the deflection update values ​​within each video frame; The target eye feature points are fused into each of the video frames according to the location, so as to replace the original eye feature points in each of the video frames.

10. The method according to any one of claims 1 to 6, characterized in that, The key parts include the hands; The step of sequentially performing image processing on each video frame based on the deflection update value within each video frame includes: When the deflection update value in each video frame meets the conditions for adding special effects, special effects data is obtained. Add the special effects data to each of the aforementioned video frames.

11. A video frame processing apparatus, characterized in that, The device includes: The attitude determination module is used to determine the attitude of key parts of the target object in each video frame carrying noise; each video frame is a video frame acquired in real time. The deflection degree value determination module is used to determine the deflection degree value of the key part in each video frame based on the various postures of the key part; the magnitude of the deflection degree value is affected by the posture change of the key part and the noise. A control signal determination module is used to generate a control signal based on the deflection degree value of the key part in each video frame; the control signal is used to describe the trend of the deflection degree value changing over time; The deflection degree update module is used to determine a first deflection degree change relative to a target inflection point through a backslot filter as the deflection degree value in the control signal begins to increase or decrease over time; compare the first deflection degree change with the backslot filter coefficients or a sum to obtain a comparison result; the sum is obtained by summing the backslot filter coefficients with a preset parameter; when the comparison result indicates that the first deflection degree change is less than or equal to the backslot filter coefficients, the deflection update value at the target inflection point is used as the deflection update value of the key part in each video frame; The image processing module is used to perform image processing on each video frame sequentially based on the offset update value within each video frame.

12. The video frame processing apparatus according to claim 11, characterized in that, The attitude determination module is also used to acquire each video frame carrying noise in real time; extract key points of the target object from each video frame; and determine the attitude of the key parts of the target object in each video frame based on the key points.

13. The video frame processing apparatus according to claim 11, characterized in that, The control signal determination module is further configured to acquire the timing identifier of each video frame; generate a control signal based on the timing identifier and the deflection degree value of the key part in each video frame; and describe the trend of the deflection degree value changing over time.

14. The video frame processing apparatus according to claim 11, characterized in that, The deflection degree value update module is further configured to perform linear transformation processing on the increased deflection degree value when the comparison result indicates that the change in the first deflection degree is greater than the back gap filter coefficient and less than or equal to the sum value, so as to obtain the deflection update value of the key part in each video frame.

15. The video frame processing apparatus according to claim 14, characterized in that, The deflection degree value update module is further configured to obtain the deflection update value corresponding to the sum value when the increasing deflection degree value is equal to the sum value when the comparison result indicates that the change in the first deflection degree is greater than the sum value. The obtained deflection update value is used as the deflection update value of the key part in each video frame.

16. The video frame processing apparatus according to claim 11, characterized in that, The deflection degree update module is further configured to determine a second deflection degree change based on the deflection degree value at which the deflection degree value in the control signal begins to decrease or increase over time when the deflection degree value in the control signal is detected to begin to decrease or increase over time; when the second deflection degree change is less than the back gap filter coefficient, the deflection update value of the key part in each of the video frames is determined based on the deflection update value at the new target inflection point; the new target inflection point is an inflection point formed after the target inflection point.

17. The video frame processing apparatus according to any one of claims 11 to 16, characterized in that, The key part includes the face, and the various postures include the pitch angle, yaw angle and roll angle of the face in each of the video frames; The deflection degree determination module is further used to normalize the pitch angle, yaw angle and roll angle of the face in each of the video frames to obtain normalized pitch angle, yaw angle and roll angle; determine the product value of the normalized pitch angle, yaw angle and roll angle; and determine the deflection degree value of the face in each video frame based on the product value.

18. The video frame processing apparatus according to claim 17, characterized in that, The posture determination module is further configured to determine the posture of the target object's face in each video frame carrying noise, when the eye contact function of the target application is enabled; wherein the target application is an application that plays each video frame.

19. The video frame processing apparatus according to claim 18, characterized in that, The image processing module is further configured to determine the location of the original eye feature points in each video frame; obtain the target eye feature points based on the deflection update value in each video frame; and fuse the target eye feature points into each video frame based on the location to replace the original eye feature points in each video frame.

20. The video frame processing apparatus according to any one of claims 11 to 16, characterized in that, The key parts include the hands; The image processing module is further configured to acquire special effects data when the deflection update value in each video frame meets the special effects addition conditions, and add the special effects data in each video frame.

21. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the steps of the method according to any one of claims 1 to 10.

22. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 10.

23. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 10.