Video desensitization method, apparatus, electronic device, and computer program product

By recognizing gestures and hand contact with objects in video data, using a preset sensitivity judgment model to determine sensitive areas and levels, and then encrypting and irreversibly processing the data, the problem of hand operation information leakage is solved, thus improving personal information security.

CN117133028BActive Publication Date: 2026-06-26CHINA MOBILE GRP GUANGDONG CO LTD +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
CHINA MOBILE GRP GUANGDONG CO LTD
Filing Date
2022-05-20
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing technologies have failed to effectively desensitize objects touched or manipulated by the subjects' hands, leading to the leakage of personal information.

Method used

By recognizing gestures and hand contact with objects in video data, a preset sensitivity judgment model is used to determine sensitive areas and levels, and then encryption and irreversible processing are performed to achieve desensitization.

Benefits of technology

Without affecting video recording functionality and efficiency, the information security and privacy of the subjects being filmed are protected to the greatest extent possible.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117133028B_ABST
    Figure CN117133028B_ABST
Patent Text Reader

Abstract

The application relates to the field of video applications, and provides a video desensitization method and device, electronic equipment and a computer program product. The method comprises the following steps: acquiring video data, identifying a human hand region based on the video data, obtaining N gesture recognition results and M hand contact article recognition results, determining a sensitive region and a target sensitive level according to the N gesture recognition results and the M hand contact article recognition results, and performing desensitization processing on the video data according to the sensitive region and the target sensitive level. The video desensitization method provided in the embodiment of the application can solve the technical problem that a hand contact article or hand operation of a photographed person cannot be desensitized, personal information of the photographed person is easily leaked, and the personal information safety of the photographed person is improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of video application technology, specifically to a video desensitization method, apparatus, electronic device, and computer program product. Background Technology

[0002] With the rapid development of internet technology, video cameras are ubiquitous, inevitably capturing a significant amount of personal privacy information. To protect personal privacy, numerous technologies now exist capable of analyzing video streams and blurring or altering sensitive areas based on characteristics, achieving desensitization. However, current technologies primarily focus on biometric information and publicly available information closely related to personal data, such as facial recognition and license plate information, while rarely paying attention to the subject's hand contact with objects or hand gestures. As video camera resolution increases, any sensitive actions or readings performed within the camera's field of view could result in the leakage of personal information.

[0003] In the prior art, a method is proposed to send video data in the form of image frames sequentially to a trained first object detection model, record the category of the detected first object, and send the image frames to a trained face detection model or license plate detection model according to their categories to obtain the coordinates of the face or license plate and map the coordinates to their source image frames, thereby performing privacy processing on the face and license plate regions in the image frames.

[0004] The aforementioned prior art has the following disadvantages:

[0005] The solution fails to desensitize objects touched by the subject's hands or hand movements, which could easily lead to the leakage of the subject's personal information. Summary of the Invention

[0006] This application provides a video desensitization method, apparatus, electronic device, and computer program product to solve the technical problem that failure to desensitize objects touched by the subject's hands or hand operations can easily lead to the leakage of the subject's personal information, thereby improving the security of the subject's personal information.

[0007] In a first aspect, embodiments of this application provide a video desensitization method, including:

[0008] Acquire video data, identify the human hand area based on the video data, and obtain N gesture recognition results and M hand contact object recognition results;

[0009] Sensitive areas and target sensitivity levels are determined based on N gesture recognition results and M hand-to-object contact recognition results;

[0010] Video data is anonymized based on sensitive areas and target sensitivity levels.

[0011] In one embodiment, the gesture recognition result includes the gesture recognition action and the gesture confidence score and hand coordinates corresponding to the gesture recognition action; the hand contact object recognition result includes the hand contact object category and the object confidence score and object coordinates corresponding to the hand contact object category.

[0012] Sensitive areas and target sensitivity levels are determined based on N gesture recognition results and M hand-to-object contact recognition results, including:

[0013] The P gesture recognition results with the highest confidence among the N gesture recognition results and the Q hand contact object recognition results with the highest confidence among the M hand contact object recognition results are sequentially input into several preset sensitivity judgment models for processing. The several preset sensitivity judgment models are sorted according to the sensitivity level corresponding to each preset sensitivity judgment model.

[0014] The sensitive area and the target sensitivity level are determined based on each processing result corresponding to each preset sensitivity judgment model.

[0015] In one embodiment, the P gesture recognition results with the highest confidence among N gesture recognition results and the Q hand-to-object recognition results with the highest confidence among M hand-to-object recognition results are sequentially input into several preset sensitivity judgment models for processing. The processing of the P gesture recognition results and the Q hand-to-object recognition results into the current preset sensitivity judgment model includes:

[0016] P gesture recognition actions and P gesture confidence scores from P gesture recognition results, and Q hand contact item categories and Q item confidence scores from Q hand contact item recognition results are imported into the sensitivity judgment expression of the current preset sensitivity judgment model. The sensitivity judgment expression contains preset feature factors, which include multiple preset gesture actions, preset gesture confidence thresholds corresponding to each preset gesture action, multiple preset item categories, and preset category confidence thresholds corresponding to each preset item category.

[0017] If any gesture recognition action matches one of the preset gesture actions and the confidence level of the current gesture recognition action is greater than the preset gesture confidence threshold corresponding to the current preset gesture action, and if any hand-touched item category matches one of the preset item categories and the confidence level of the current hand-touched item category is greater than the preset category confidence threshold corresponding to the current preset item category, then it is determined that P gesture recognition results and Q hand-touched item recognition results are successfully paired with the current preset sensitivity judgment model, and the input of P gesture recognition results and Q hand-touched item recognition results into the next preset sensitivity judgment model of the current preset sensitivity judgment model is stopped.

[0018] In one embodiment, determining the sensitive region and the target sensitivity level based on each processing result corresponding to each preset sensitivity judgment model includes:

[0019] If P gesture recognition results and Q hand contact object recognition results are successfully matched with the current preset sensitivity judgment model, then the area corresponding to the hand coordinates and the area corresponding to the object coordinates are determined as sensitive areas, and the sensitivity level corresponding to the current preset sensitivity judgment model is determined as the target sensitivity level.

[0020] If the P gesture recognition results and Q hand-touching object recognition results fail to match the current preset sensitivity judgment model, then the P gesture recognition results and Q hand-touching object recognition results will be input into the next preset sensitivity judgment model for processing.

[0021] In one embodiment, the video data is desensitized based on the sensitive region and the target sensitivity level, including:

[0022] Extract the video data to be desensitized from the sensitive areas in the video data;

[0023] The de-identified video data is subjected to asymmetric encryption to form encrypted sensitive information, and the target sensitivity level is marked in the encrypted sensitive information;

[0024] Sensitive areas in video data are removed using an irreversible algorithm to obtain desensitized video data.

[0025] In one embodiment, the video data to be de-identified is subjected to asymmetric encryption to form encrypted sensitive information, including:

[0026] A paired public and private key is generated based on the video timestamp corresponding to the video data to be de-identified and preset feature factors; the private key is used to decrypt encrypted sensitive information.

[0027] By using a public key to encrypt and save the un-identified video data, encrypted sensitive information can be obtained.

[0028] In one embodiment, identifying a human hand region based on video data includes:

[0029] Human body coordinates are detected based on video data, and each human body is segmented according to its coordinates to obtain K images of the human body to be detected.

[0030] The human body recognition classifier identifies human body parts in each human body image to be detected, and obtains the coordinates of each human body part, including the head and limbs.

[0031] The human hand region is determined based on the human hand coordinates in the human body part coordinate system corresponding to each human body part.

[0032] A gesture recognition classifier is used to recognize gestures in the human hand area, and an object recognition classifier is used to recognize objects touched by the human hand area.

[0033] Secondly, embodiments of this application provide a video desensitization device, comprising:

[0034] The hand recognition module is used to acquire video data, identify the human hand area based on the video data, and obtain N gesture recognition results and M hand contact object recognition results.

[0035] The sensitive information determination module is used to determine the sensitive areas and the sensitivity level of the target based on N gesture recognition results and M hand contact object recognition results;

[0036] The video desensitization module is used to desensitize video data based on sensitive areas and target sensitivity levels.

[0037] Thirdly, embodiments of this application provide an electronic device, including a processor and a memory storing a computer program, wherein the processor executes the program to implement the steps of the video desensitization method described in the first aspect.

[0038] Fourthly, embodiments of this application provide a computer program product, including a computer program that, when executed by a processor, implements the steps of the video desensitization method described in the first aspect.

[0039] The video desensitization method, apparatus, electronic device, and computer program product provided in this application acquire video data, identify the human hand area based on the video data, obtain N gesture recognition results and M hand-touching object recognition results, determine sensitive areas and target sensitivity levels based on the N gesture recognition results and M hand-touching object recognition results, and desensitize the video data based on the sensitive areas and target sensitivity levels. This solves the technical problem that failure to desensitize the objects touched or the hand operations of the subject can easily lead to the leakage of the subject's personal information, improves the personal information security of the subject, and maximizes the protection of the subject's information security and privacy without affecting the video shooting function and efficiency. Attached Figure Description

[0040] To more clearly illustrate the technical solutions in this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0041] Figure 1 This is one of the flowcharts illustrating the video desensitization method provided in the embodiments of this application;

[0042] Figure 2 This is a second schematic flowchart of the video desensitization method provided in the embodiments of this application;

[0043] Figure 3 This is the third flowchart illustrating the video desensitization method provided in the embodiments of this application;

[0044] Figure 4 This is a schematic diagram of the video desensitization device provided in the embodiments of this application;

[0045] Figure 5 This is a schematic diagram of the structure of the electronic device provided in the embodiments of this application. Detailed Implementation

[0046] To make the objectives, technical solutions, and advantages of this application clearer, the technical solutions of this application will be clearly and completely described below with reference to the accompanying drawings of the embodiments. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0047] Figure 1 This is one of the flowcharts illustrating the video desensitization method provided in this application. (Refer to...) Figure 1This application provides a video desensitization method, which may include:

[0048] Step 101: Acquire video data, identify the human hand area based on the video data, and obtain N gesture recognition results and M hand contact object recognition results.

[0049] In this embodiment of the application, video data can be acquired in real time through imaging devices such as cameras and webcams, or by directly importing saved video data. In practical applications, the appropriate video data acquisition method needs to be determined according to the actual application situation, and no single limitation is made here.

[0050] Understandably, if a subject performs a sensitive operation or accesses sensitive information within the imaging device's field of view—such as entering a password on an electronic device or touching an electronic device or paper document to view confidential business information—then the human hand area, once captured by the imaging device, could easily lead to the leakage of this information, resulting in the leakage of personal or privacy information. Therefore, while previous technologies focused solely on the security of biometric information and publicly available information closely related to personal information, it is necessary to add recognition of the human hand area to help de-identify sensitive information within this area and prevent threats to personal information security. Specifically, whether a sensitive operation was performed can be reflected in the gesture recognition results, and whether sensitive information was accessed can be reflected in the hand-touching object recognition results.

[0051] Step 102: Determine the sensitive areas and target sensitivity levels based on the N gesture recognition results and the M hand contact object recognition results.

[0052] In this embodiment, since the human body is quite flexible, the posture of the human hand will change over time. Since there may be similarities between objects, the corresponding hand contact with the object may identify multiple possible categories. Therefore, by recognizing the video data, multiple gesture recognition results and multiple hand contact with the object recognition results can be obtained. In this embodiment, N and M are both positive integers. By combining N gesture recognition results and M hand contact with the object recognition results for analysis and processing, sensitive areas and corresponding target sensitivity levels can be determined.

[0053] Step 103: De-identify the video data according to the sensitive areas and the sensitivity level of the target.

[0054] In this embodiment, after identifying sensitive areas, desensitization can be achieved through a combination of reversible and irreversible processing. Reversible processing typically involves encrypting the data using cryptographic techniques, which can then be restored using corresponding decryption algorithms. Irreversible processing generally involves data replacement, such as Gaussian filtering, mean filtering, and binarization, or data deletion, such as mosaic, color block overlay processing, blurring, and erosion processing. Adding a target sensitivity level as a tag to the video data in the sensitive areas facilitates retrieval when sensitive data needs to be restored.

[0055] The following beneficial effects can be seen from the above embodiments:

[0056] By acquiring video data and identifying the human hand area based on the video data, N gesture recognition results and M hand-touching object recognition results are obtained. Sensitive areas and target sensitivity levels are determined based on the N gesture recognition results and M hand-touching object recognition results. The video data is then de-identified based on the sensitive areas and target sensitivity levels. This solves the technical problem of easily leaking the personal information of the subject if the de-identification processing of the subject's hand-touching objects or hand operations is not performed. It improves the personal information security of the subject and maximizes the protection of the subject's information security and privacy without affecting the video shooting function and efficiency.

[0057] To facilitate understanding, an example of a video desensitization method is provided below. In practical applications, several preset sensitivity judgment models are used to determine the sensitive areas and the sensitivity level of the target.

[0058] Figure 2 This is the second schematic flowchart of the video desensitization method provided in the embodiments of this application. (Refer to...) Figure 2 This application provides a video desensitization method, which may include:

[0059] Step 201: Input the P gesture recognition results with the highest confidence among the N gesture recognition results and the Q hand contact object recognition results with the highest confidence among the M hand contact object recognition results into several preset sensitivity judgment models for processing.

[0060] In this embodiment, the gesture recognition result includes the gesture recognition action and the gesture confidence score and hand coordinates corresponding to the gesture recognition action; the hand contact object recognition result includes the hand contact object category and the object confidence score and object coordinates corresponding to the hand contact object category. It is understood that the gesture recognition action can include, but is not limited to, finger swiping, finger tapping, and fist clenching, etc., and the hand contact object category can include, but is not limited to, smartphones, rectangular paper, and rectangular mirrors, etc., while the gesture confidence score or object confidence score can be any value between 0% and 100%. Furthermore, it is understood that the hand coordinates and object coordinates can be coordinates in a two-dimensional coordinate system established with the video frame as the coordinate plane, with the purpose of locating the position of the human hand and the hand contact object in the video frame. For example, the i-th gesture recognition result can be expressed as: gesture recognition result i = [finger swipe, 90%, (X... 1i ,Y 1i The hand contact object recognition result for the j-th item can be represented as: hand contact object recognition result j = [smartphone, 80%, (X)]. 2j ,Y 2j )]. i is less than or equal to N, j is less than or equal to M.

[0061] Before selecting P gesture recognition results, all N gesture recognition results can be arranged in order of gesture confidence, resulting in a gesture result sequence. The top P gesture recognition results with the highest confidence are then selected from this sequence. Similarly, before selecting Q hand-to-object recognition results, all M hand-to-object recognition results can be arranged in order of object confidence, resulting in an object result sequence. The top Q hand-to-object recognition results with the highest object confidence are then selected from this sequence to reduce the error rate. In this embodiment, P and Q can both be set to 3, or they can be different values. In practical applications, the values ​​of P and Q need to be determined based on the specific application situation; no single limitation is imposed here.

[0062] In this embodiment, several preset sensitivity judgment models are pre-defined. Each preset sensitivity judgment model has a corresponding sensitivity level, which also determines the execution priority of the preset sensitivity judgment model. It is understood that the higher the sensitivity level, the higher the execution priority, because video content with a high sensitivity level should be prioritized for screening. In practical applications, the execution priority of the preset sensitivity judgment models can also be customized by the user and is not uniquely limited. The several preset sensitivity judgment models can be sorted according to the sensitivity level corresponding to each preset sensitivity judgment model, thereby realizing hierarchical management of the preset sensitivity judgment models and improving judgment efficiency.

[0063] Specifically, each preset sensitivity judgment model has a sensitivity judgment expression. The sensitivity judgment expressions in each preset sensitivity judgment model are different, but the sensitivity judgment expressions will all contain preset feature factors. The preset feature factors include, but are not limited to, multiple preset gesture actions, preset gesture confidence thresholds corresponding to each preset gesture action, multiple preset item categories, and preset category confidence thresholds corresponding to each preset item category. For example, the sensitivity judgment expression can be expressed as: ((finger swipe > 60% OR finger click > 60%) AND (tablet > 70% OR smartphone > 70%)), where OR represents "or" operation and AND represents "and" operation. Taking any one of the preset sensitivity judgment models as an example, the P gesture recognition results and Q hand-touching object recognition results are input into the current preset sensitivity judgment model for processing. Specifically, the P gesture recognition actions and P gesture confidence scores from the P gesture recognition results, and the Q hand-touching object categories and Q object confidence scores from the Q hand-touching object recognition results are imported into the sensitivity judgment expression of the current preset sensitivity judgment model. If any gesture recognition action matches one of the preset gesture actions and the gesture confidence score corresponding to the current gesture recognition action is greater than the preset gesture confidence threshold corresponding to the current preset gesture action, and if any hand-touching object category matches one of the preset object categories and the object confidence score corresponding to the current hand-touching object category is greater than the preset category confidence threshold corresponding to the current preset object category, then it is determined that the P gesture recognition results and Q hand-touching object recognition results are successfully paired with the current preset sensitivity judgment model, and the input of the P gesture recognition results and Q hand-touching object recognition results into the next preset sensitivity judgment model of the current preset sensitivity judgment model is stopped.

[0064] Assuming the current preset sensitivity judgment model's sensitivity judgment expression is ((finger swipe > 60% OR finger tap > 60%) AND (tablet > 70% OR smartphone > 70%)), the sensitivity level is set to level 1, and both P and Q are set to 3, the three gesture recognition results are as follows: Gesture recognition result 1 = [finger swipe, 90%, (X... 11 ,Y 11 Gesture recognition result 2 = [finger tap, 50%, (X)] 12 ,Y 12 )] and gesture recognition result 3 = [clenched fist, 40%, (X 13 ,Y 13 The results of the three hand-touching object recognition methods are as follows: Hand-touching object recognition result 1 = [Smartphone, 80%, (X)] 21 ,Y 21 )], Hand contact object recognition result 2 = [rectangular paper, 40%, (X 22 ,Y22 )] and hand contact object recognition result 3 = [rectangular mirror, 30%, (X 23 ,Y 23 Since both finger swiping and finger tapping in the three gesture recognition results match the preset gesture actions in the sensitivity judgment expression, and the confidence level of the gesture corresponding to finger swiping is greater than the preset gesture confidence threshold in the sensitivity judgment expression (90% > 60%), the gesture recognition results meet the matching requirements. Additionally, since smartphones in the three hand-touching object recognition results match the preset item category in the sensitivity judgment expression, and the item confidence level of smartphones is greater than the preset category confidence threshold in the sensitivity judgment expression (80% > 70%), although the hand-touching object categories other than smartphones in the three hand-touching object recognition results do not match the preset item categories in the sensitivity judgment expression, the hand-touching object recognition results still meet the matching requirements. Therefore, it is determined that P gesture recognition results and Q hand-touching object recognition results are successfully paired with the current preset sensitivity judgment model. At this time, the output result of the sensitivity judgment expression can be set to TRUE or other forms, without unique limitations.

[0065] Step 202: Determine the sensitive area and target sensitivity level based on each processing result corresponding to each preset sensitivity judgment model.

[0066] If P gesture recognition results and Q hand-touching object recognition results are successfully matched with the current preset sensitivity judgment model, then the region corresponding to the hand coordinates, i.e. (X... 1i ,Y 1i The corresponding locations of each region, and the corresponding region locations of the item coordinates, i.e., (X... 2j ,Y 2j The corresponding locations of each region are identified as sensitive regions, and the sensitivity level corresponding to the current preset sensitivity judgment model is determined as the target sensitivity level.

[0067] If the P gesture recognition results and Q hand contact object recognition results fail to match the current preset sensitivity judgment model, then the P gesture recognition results and Q hand contact object recognition results are input into the next preset sensitivity judgment model for processing, until all preset sensitivity judgment models are matched.

[0068] The following beneficial effects can be seen from the above embodiments:

[0069] By sequentially inputting the P gesture recognition results with the highest confidence among N gesture recognition results and the Q hand-to-object recognition results with the highest confidence among M hand-to-object recognition results into several preset sensitivity judgment models for processing, the sensitive area and target sensitivity level are determined according to each processing result corresponding to each preset sensitivity judgment model. This improves the efficiency of sensitive area determination while ensuring the accuracy of sensitive area recognition, thereby effectively desensitizing sensitive areas and protecting the information security of the subject.

[0070] For ease of understanding, an example of a video desensitization method is provided below. In practical applications, the human body and its parts are detected based on video data, and then the human hand area is identified. This enables accurate gesture recognition and hand-touching object recognition of the human hand area. In addition, the video data to be desensitized corresponding to the sensitive area is extracted from the video data and subjected to asymmetric encryption to improve information security.

[0071] Figure 3 This is the third flowchart illustrating the video desensitization method provided in this application. (Refer to...) Figure 3 This application provides a video desensitization method, which may include:

[0072] Step 301: Detect human body coordinates based on video data, and segment each human body according to its coordinates to obtain K human body images to be detected.

[0073] In this embodiment, human bodies in a video are detected using a background modeling and machine learning-based method, and the bounding box position of each human body is determined, while the coordinate position of each human body is marked. The background modeling and machine learning-based method can specifically be a pedestrian detection algorithm, or other suitable methods can be selected according to the actual application; no single method is limited here.

[0074] The human body is segmented according to its coordinate position, thereby obtaining each human body image to be detected. The human body image to be detected can be a picture or an image, and there are K images in total, which is the same as the number of human bodies, where K is a positive integer.

[0075] Step 302: Use a human body recognition classifier to identify human body parts in each human body image to obtain the coordinates of each human body part.

[0076] In this embodiment of the application, the human body recognition classifier can be a classifier obtained by training human body sample videos or image sets using a deep neural network algorithm. By using the human body recognition classifier to further identify each segmented human body image to be detected, the key points of the human skeleton can be identified, thereby determining the coordinate position of each human body part, i.e., the coordinates of the human body part. The human body part includes, but is not limited to, the head and limbs.

[0077] Step 303: Determine the human hand region based on the human hand coordinates in the human body part coordinates corresponding to each human body part. Perform gesture recognition on the human hand region using a gesture recognition classifier, and perform hand contact object recognition on the human hand region using an object recognition classifier.

[0078] The region corresponding to the coordinates of the human hand in the human body part coordinate system is selected as the human hand region. The gesture recognition classifier can be a classifier trained on a set of gesture action videos or images using a deep neural network algorithm. The gesture recognition classifier performs gesture recognition on the human hand region, obtaining N gesture recognition results. The object recognition classifier can be a classifier trained on a set of common object images using a deep neural network algorithm. The object recognition classifier performs hand-to-object contact recognition on the human hand region, obtaining M hand-to-object contact recognition results.

[0079] Step 304: Determine the sensitive areas and target sensitivity levels based on the N gesture recognition results and the M hand contact object recognition results.

[0080] In this embodiment of the application, the specific content of step 304 is the same as that of steps 201 to 202, and will not be repeated here.

[0081] Step 305: De-identify the video data according to the sensitive areas and the sensitivity level of the target.

[0082] The process involves extracting the video data corresponding to sensitive areas from the video data, and then performing asymmetric encryption on this data. Specifically, a paired public and private key can be generated based on the video timestamp and preset feature factors. The public key is used to encrypt and save the video data, forming encrypted sensitive information, and the target sensitivity level is marked in the encrypted sensitive information. An irreversible algorithm is then used to remove the sensitive areas from the video data. Specifically, one or more combinations of blurring, mosaic, alteration, and graphic occlusion methods can be used to remove the sensitive areas, resulting in desensitized video data. The desensitized video data, private key, and encrypted sensitive information are uniformly stored and managed in a video management system. In special scenarios, such as when law enforcement agencies are collecting evidence, users can log into the video management system with the highest privileges, decrypt the encrypted sensitive information using the private key, and then overlay the desensitized video data onto the desensitized video data to obtain the original video data.

[0083] The following beneficial effects can be seen from the above embodiments:

[0084] By detecting human body coordinates based on video data, and segmenting each human body according to these coordinates, K human body images to be detected are obtained. A human body recognition classifier identifies human body parts in each image, obtaining the coordinates of each corresponding human body part. Based on the coordinates of the human hand in each human body part, the human hand region is determined. A gesture recognition classifier performs gesture recognition in the human hand region, and an object recognition classifier performs hand-to-object recognition in the human hand region. Based on the N gesture recognition results and M hand-to-object recognition results, sensitive areas and target sensitivity levels are determined. The video data is then anonymized according to the sensitive areas and target sensitivity levels. This solves the technical problem of easily leaking the personal information of the subject if the anonymization processing of the subject's hand-to-object contact or hand operation is not performed. This improves the personal information security of the subject and maximizes the protection of the subject's information security and privacy without affecting the video shooting function and efficiency.

[0085] The video desensitization apparatus provided in the embodiments of this application is described below. The video desensitization apparatus described below can be referred to in correspondence with the video desensitization method described above.

[0086] Figure 4 This is a schematic diagram of the video desensitization device provided in an embodiment of this application. (Refer to...) Figure 4 This application provides a video desensitization device, which may include:

[0087] The hand recognition module 410 is used to acquire video data, recognize the human hand area based on the video data, and obtain N gesture recognition results and M hand contact object recognition results.

[0088] The sensitive information determination module 420 is used to determine the sensitive area and the target sensitivity level based on N gesture recognition results and M hand contact object recognition results;

[0089] The video desensitization module 430 is used to desensitize video data based on sensitive areas and target sensitivity levels.

[0090] The video desensitization device provided in this application acquires video data, identifies the human hand area based on the video data, obtains N gesture recognition results and M hand-touching object recognition results, determines sensitive areas and target sensitivity levels based on the N gesture recognition results and M hand-touching object recognition results, and performs desensitization processing on the video data based on the sensitive areas and target sensitivity levels. This solves the technical problem that failure to desensitize the objects touched or the hand operations of the subject can easily lead to the leakage of the subject's personal information, improves the personal information security of the subject, and maximizes the protection of the subject's information security and privacy without affecting the video shooting function and efficiency.

[0091] In one embodiment, the gesture recognition result includes the gesture recognition action and the gesture confidence score and hand coordinates corresponding to the gesture recognition action; the hand contact object recognition result includes the hand contact object category and the object confidence score and object coordinates corresponding to the hand contact object category.

[0092] The sensitive information determination module is specifically used for:

[0093] The P gesture recognition results with the highest confidence among the N gesture recognition results and the Q hand contact object recognition results with the highest confidence among the M hand contact object recognition results are sequentially input into several preset sensitivity judgment models for processing. The several preset sensitivity judgment models are sorted according to the sensitivity level corresponding to each preset sensitivity judgment model.

[0094] The sensitive area and the target sensitivity level are determined based on each processing result corresponding to each preset sensitivity judgment model.

[0095] In one embodiment, the sensitive information determination module is specifically used for:

[0096] P gesture recognition actions and P gesture confidence scores from P gesture recognition results, and Q hand contact item categories and Q item confidence scores from Q hand contact item recognition results are imported into the sensitivity judgment expression of the current preset sensitivity judgment model. The sensitivity judgment expression contains preset feature factors, which include multiple preset gesture actions, preset gesture confidence thresholds corresponding to each preset gesture action, multiple preset item categories, and preset category confidence thresholds corresponding to each preset item category.

[0097] If any gesture recognition action matches one of the preset gesture actions and the confidence level of the current gesture recognition action is greater than the preset gesture confidence threshold corresponding to the current preset gesture action, and if any hand-touched item category matches one of the preset item categories and the confidence level of the current hand-touched item category is greater than the preset category confidence threshold corresponding to the current preset item category, then it is determined that P gesture recognition results and Q hand-touched item recognition results are successfully paired with the current preset sensitivity judgment model, and the input of P gesture recognition results and Q hand-touched item recognition results into the next preset sensitivity judgment model of the current preset sensitivity judgment model is stopped.

[0098] In one embodiment, the sensitive information determination module is specifically used for:

[0099] If P gesture recognition results and Q hand contact object recognition results are successfully matched with the current preset sensitivity judgment model, then the area corresponding to the hand coordinates and the area corresponding to the object coordinates are determined as sensitive areas, and the sensitivity level corresponding to the current preset sensitivity judgment model is determined as the target sensitivity level.

[0100] If the P gesture recognition results and Q hand-touching object recognition results fail to match the current preset sensitivity judgment model, then the P gesture recognition results and Q hand-touching object recognition results will be input into the next preset sensitivity judgment model for processing.

[0101] In one embodiment, the video desensitization module is specifically used for:

[0102] Extract the video data to be desensitized from the sensitive areas in the video data;

[0103] The de-identified video data is subjected to asymmetric encryption to form encrypted sensitive information, and the target sensitivity level is marked in the encrypted sensitive information;

[0104] Sensitive areas in video data are removed using an irreversible algorithm to obtain desensitized video data.

[0105] In one embodiment, the video desensitization module is specifically used for:

[0106] A paired public and private key is generated based on the video timestamp corresponding to the video data to be de-identified and preset feature factors; the private key is used to decrypt encrypted sensitive information.

[0107] By using a public key to encrypt and save the un-identified video data, encrypted sensitive information can be obtained.

[0108] In one embodiment, the hand recognition module is specifically used for:

[0109] Human body coordinates are detected based on video data, and each human body is segmented according to its coordinates to obtain K images of the human body to be detected.

[0110] The human body recognition classifier identifies human body parts in each human body image to be detected, and obtains the coordinates of each human body part, including the head and limbs.

[0111] The human hand region is determined based on the human hand coordinates in the human body part coordinate system corresponding to each human body part.

[0112] A gesture recognition classifier is used to recognize gestures in the human hand area, and an object recognition classifier is used to recognize objects touched by the human hand area.

[0113] Figure 5 An example is a schematic diagram of the physical structure of an electronic device, such as... Figure 5 As shown, the electronic device may include: a processor 510, a communication interface 520, a memory 530, and a communication bus 540, wherein the processor 510, the communication interface 520, and the memory 530 communicate with each other via the communication bus 540. The processor 510 can call a computer program in the memory 530 to execute the steps of the video desensitization method, such as including:

[0114] Acquire video data, identify the human hand area based on the video data, and obtain N gesture recognition results and M hand contact object recognition results;

[0115] Sensitive areas and target sensitivity levels are determined based on N gesture recognition results and M hand-to-object contact recognition results;

[0116] Video data is anonymized based on sensitive areas and target sensitivity levels.

[0117] Furthermore, the logical instructions in the aforementioned memory 530 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0118] On the other hand, this application also provides a computer program product, which includes a computer program that can be stored on a non-transitory computer-readable storage medium. When the computer program is executed by a processor, the computer can perform the steps of the video desensitization method provided in the above embodiments, such as including:

[0119] Acquire video data, identify the human hand area based on the video data, and obtain N gesture recognition results and M hand contact object recognition results;

[0120] Sensitive areas and target sensitivity levels are determined based on N gesture recognition results and M hand-to-object contact recognition results;

[0121] Video data is anonymized based on sensitive areas and target sensitivity levels.

[0122] On the other hand, embodiments of this application also provide a processor-readable storage medium storing a computer program for causing a processor to perform the steps of the methods provided in the above embodiments, such as including:

[0123] Acquire video data, identify the human hand area based on the video data, and obtain N gesture recognition results and M hand contact object recognition results;

[0124] Sensitive areas and target sensitivity levels are determined based on N gesture recognition results and M hand-to-object contact recognition results;

[0125] Video data is anonymized based on sensitive areas and target sensitivity levels.

[0126] The processor-readable storage medium can be any available medium or data storage device that the processor can access, including but not limited to magnetic memory (e.g., floppy disk, hard disk, magnetic tape, magneto-optical disk (MO)), optical memory (e.g., CD, DVD, BD, HVD), and semiconductor memory (e.g., ROM, EPROM, EEPROM, non-volatile memory (NAND FLASH), solid-state drive (SSD)).

[0127] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.

[0128] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.

[0129] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit them. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this application.

Claims

1. A video desensitization method, characterized in that, include: Acquire video data, identify the human hand area based on the video data, and obtain N gesture recognition results and M hand contact object recognition results; Sensitive areas and target sensitivity levels are determined based on the N gesture recognition results and the M hand-to-object contact recognition results. The video data is desensitized based on the sensitive region and the target sensitivity level; The gesture recognition result includes the gesture recognition action and the gesture confidence score and hand coordinates corresponding to the gesture recognition action; the hand contact object recognition result includes the type of hand contact object and the item confidence score and item coordinates corresponding to the type of hand contact object. The step of determining the sensitive area and target sensitivity level based on the N gesture recognition results and the M hand-to-object contact recognition results includes: The P gesture recognition results with the highest confidence among the N gesture recognition results and the Q hand contact object recognition results with the highest confidence among the M hand contact object recognition results are sequentially input into several preset sensitivity judgment models for processing. The several preset sensitivity judgment models are sorted according to the sensitivity level corresponding to each preset sensitivity judgment model. The sensitive region and the target sensitivity level are determined based on each processing result corresponding to each preset sensitivity judgment model. The step involves sequentially inputting the P gesture recognition results with the highest confidence among the N gesture recognition results and the Q hand-to-object recognition results with the highest confidence among the M hand-to-object recognition results into several preset sensitivity judgment models for processing. Specifically, inputting the P gesture recognition results and the Q hand-to-object recognition results into the current preset sensitivity judgment model for processing includes: The P gesture recognition actions and P gesture confidence scores from the P gesture recognition results, and the Q hand contact item categories and Q item confidence scores from the Q hand contact item recognition results are imported into the sensitivity judgment expression of the current preset sensitivity judgment model. The sensitivity judgment expression includes preset feature factors, which include multiple preset gesture actions, a preset gesture confidence threshold corresponding to each preset gesture action, multiple preset item categories, and a preset category confidence threshold corresponding to each preset item category. If any gesture recognition action matches one of the preset gesture actions and the confidence level of the current gesture recognition action is greater than the preset gesture confidence threshold corresponding to the current preset gesture action, and if any hand contact item category matches one of the preset item categories and the confidence level of the current hand contact item category is greater than the preset category confidence threshold corresponding to the current preset item category, then it is determined that the P gesture recognition results and the Q hand contact item recognition results are successfully paired with the current preset sensitivity judgment model, and the input of the P gesture recognition results and the Q hand contact item recognition results into the next preset sensitivity judgment model of the current preset sensitivity judgment model is stopped; The step of determining the sensitive region and the target sensitivity level based on each processing result corresponding to each preset sensitivity judgment model includes: If the P gesture recognition results and the Q hand contact object recognition results are successfully matched with the current preset sensitivity judgment model, then the area corresponding to the hand coordinates and the area corresponding to the object coordinates are determined as the sensitive area, and the sensitivity level corresponding to the current preset sensitivity judgment model is determined as the target sensitivity level. If the P gesture recognition results and the Q hand contact object recognition results fail to match the current preset sensitivity judgment model, then the P gesture recognition results and the Q hand contact object recognition results are input into the next preset sensitivity judgment model of the current preset sensitivity judgment model for processing.

2. The video desensitization method according to claim 1, characterized in that, The process of desensitizing the video data based on the sensitive region and the target sensitivity level includes: Extract the video data to be desensitized corresponding to the sensitive area from the video data; The video data to be desensitized is subjected to asymmetric encryption to form encrypted sensitive information, and the target sensitivity level is marked in the encrypted sensitive information; Sensitive areas in the video data are removed using an irreversible algorithm to obtain desensitized video data.

3. The video desensitization method according to claim 2, characterized in that, The asymmetric encryption process performed on the video data to be desensitized to form encrypted sensitive information includes: A paired public key and private key are generated based on the video timestamp corresponding to the video data to be de-identified and the preset feature factors; the private key is used to decrypt the encrypted sensitive information. The encrypted sensitive information is obtained by encrypting and saving the video data to be de-identified using the public key.

4. The video desensitization method according to claim 1, characterized in that, The process of identifying the human hand region based on the video data includes: Based on the video data, the human body coordinate position is detected, and each human body is separated according to the human body coordinate position to obtain K human body images to be detected. The human body recognition classifier identifies human body parts in each human body image to be detected, and obtains the coordinates of each human body part, including the head and limbs. The human hand region is determined based on the coordinates of the human hand in the coordinates of the human body parts corresponding to each part of the human body. The gesture recognition classifier performs gesture recognition on the human hand area, and the object recognition classifier performs hand-to-object recognition on the human hand area.

5. A video desensitization device, characterized in that, include: The hand recognition module is used to acquire video data, identify the human hand area based on the video data, and obtain N gesture recognition results and M hand contact object recognition results. The sensitive information determination module is used to determine the sensitive area and the target sensitivity level based on the N gesture recognition results and the M hand contact object recognition results; The video desensitization module is used to desensitize the video data according to the sensitive area and the target sensitivity level; The gesture recognition result includes the gesture recognition action and the gesture confidence score and hand coordinates corresponding to the gesture recognition action; the hand contact object recognition result includes the type of hand contact object and the item confidence score and item coordinates corresponding to the type of hand contact object. The sensitive information determination module is specifically used to: sequentially input the P gesture recognition results with the highest confidence among the N gesture recognition results, and the Q hand-to-object recognition results with the highest confidence among the M hand-to-object recognition results into several preset sensitivity judgment models for processing; the several preset sensitivity judgment models are sorted according to the sensitivity level corresponding to each preset sensitivity judgment model; and determine the sensitive area and the target sensitivity level according to each processing result corresponding to each preset sensitivity judgment model. The sensitive information determination module is specifically used to: import the P gesture recognition actions and P gesture confidence scores from the P gesture recognition results, and the Q hand contact item categories and Q item confidence scores from the Q hand contact item recognition results into the sensitivity judgment expression of the current preset sensitivity judgment model. The sensitivity judgment expression includes preset feature factors, which include multiple preset gesture actions, a preset gesture confidence threshold corresponding to each preset gesture action, multiple preset item categories, and a preset category confidence threshold corresponding to each preset item category. If any gesture recognition action matches one of the preset gesture actions and the confidence level of the current gesture recognition action is greater than the preset gesture confidence threshold corresponding to the current preset gesture action, and if any hand contact item category matches one of the preset item categories and the confidence level of the current hand contact item category is greater than the preset category confidence threshold corresponding to the current preset item category, then it is determined that the P gesture recognition results and the Q hand contact item recognition results are successfully paired with the current preset sensitivity judgment model, and the input of the P gesture recognition results and the Q hand contact item recognition results into the next preset sensitivity judgment model of the current preset sensitivity judgment model is stopped; The sensitive information determination module is specifically used to: if the P gesture recognition results and the Q hand contact object recognition results are successfully matched with the current preset sensitivity judgment model, then the area corresponding to the hand coordinates and the area corresponding to the object coordinates are determined as the sensitive area, and the sensitivity level corresponding to the current preset sensitivity judgment model is determined as the target sensitivity level; If the P gesture recognition results and the Q hand contact object recognition results fail to match the current preset sensitivity judgment model, then the P gesture recognition results and the Q hand contact object recognition results are input into the next preset sensitivity judgment model of the current preset sensitivity judgment model for processing.

6. An electronic device comprising a processor and a memory storing a computer program, characterized in that, When the processor executes the computer program, it implements the steps of the video desensitization method according to any one of claims 1 to 4.

7. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by the processor, it implements the steps of the video desensitization method according to any one of claims 1 to 4.