An image privacy processing method based on region division

By using region segmentation and key point detection models, pixel-level segmentation and differential blurring are performed on images captured by cameras, solving the problems of high computational load and low efficiency in existing technologies, and achieving efficient image privacy protection and visibility preservation.

CN122243729APending Publication Date: 2026-06-19GUANGZHOU HAOTIAN INTELLIGENT EQUIPMENT CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
GUANGZHOU HAOTIAN INTELLIGENT EQUIPMENT CO LTD
Filing Date
2026-03-25
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies perform facial recognition on the entire scene captured by the camera and then blur it, resulting in high computational load, low processing efficiency, and an inability to effectively protect privacy and preserve image visibility.

Method used

The original image is acquired through an image acquisition device, and pixel-level segmentation is performed using a region segmentation model. A region mask is generated to distinguish between visible and invisible regions, and the invisible regions are globally blurred. The visible regions are input into a key point detection model to determine the target object region for pixelation processing, and other regions are Gaussian blurred. Finally, the images are fused.

Benefits of technology

It achieves efficient image privacy protection, which protects privacy information while preserving the visibility and usability of the image. Through multi-level region division and differentiated blurring processing, it improves processing efficiency and information balance.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122243729A_ABST
    Figure CN122243729A_ABST
Patent Text Reader

Abstract

This invention relates to the field of image processing technology, specifically disclosing an image privacy processing method based on region segmentation. The method includes: segmenting the original image to generate a region mask, marking visible and invisible regions with the region mask; using the image corresponding to the visible region as input to a keypoint detection model, outputting the keypoint coordinates of the target object; determining the target object region and non-target object regions based on the keypoint coordinates, further dividing the target object region into a key feature region and other regions; applying pixelation processing to the key feature region and Gaussian blur processing to the other regions to generate a privacy processing region; and a processing unit fusing the images of the non-target object region, the privacy processing region, and the invisible region to obtain a fused image. This region-segmentation-based image privacy processing method solves the problem of effectively distinguishing between visible and invisible regions, protecting the privacy of the target object, and retaining necessary information during image acquisition and processing.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of image processing technology, and more specifically, to an image privacy processing method based on region partitioning. Background Technology

[0002] With the rapid development of artificial intelligence technology in the field of computer vision, the demand for intelligent processing of video content in scenarios such as camera-based intelligent monitoring and live streaming is increasing. However, in these scenarios, it is often necessary to blur faces in a video frame to protect privacy and security.

[0003] However, current algorithms directly perform facial recognition on images captured by cameras and then blur the corresponding faces. But if the amount of information captured by the camera is too large, such as in a restaurant hall where it is necessary to monitor how customers are eating their food, then the focus should be on the situation at the table. However, to protect the privacy of the customers, it is necessary to blur all the faces in the image. The conventional approach is to recognize every face in the entire restaurant hall and then blur the recognized faces, but this would involve a huge amount of computation and therefore be very inefficient. Summary of the Invention

[0004] To overcome the shortcomings of existing technologies, this invention provides an image privacy processing method based on region segmentation, aiming to solve the problems mentioned above in the prior art.

[0005] The technical solution adopted by this invention to solve its technical problem is: an image privacy processing method based on region division, comprising the following steps: S1: Acquire raw images using an image acquisition device and transmit them to the processing unit; S2: The processing unit loads the region segmentation model to perform pixel-level segmentation on the original image, generates a region mask, the region mask marks the visible region and the invisible region, and performs global blurring on the invisible region; S3: The image corresponding to the visible area is used as the input to the key point detection model, and the key point detection model outputs the key point coordinates of the target object; S4: The processing unit determines the target object region and non-target object region based on the key point coordinates, divides the target object region into key feature region and other regions, applies pixelation processing to the key feature region, applies Gaussian blur processing to the other regions, and generates a privacy processing region. S5: The processing unit merges the non-target object region and the privacy processing region to obtain an updated visible region image. The processing unit then merges the updated visible region image with the invisible region image to obtain a fused image.

[0006] It is worth noting that in step S2, a fixed area boundary is defined during the camera initialization phase; The processing unit applies the defined region boundaries to perform region segmentation on the original image to obtain an initial visible static region of interest and an initial invisible static region of interest. Image frames are acquired from an initial visible static region of interest; For the acquired image frames, the pre-trained U-Net model is input for pixel-level segmentation, and the output is a region mask; The corresponding pixel region in the original image is marked as the visible region by using a region mask; Regions outside the visible area in the original image are marked as invisible regions, which include the initial invisible static region of interest and regions within the initial visible static region of interest that are not marked by the region mask.

[0007] Optionally, in step S2, the region partitioning model applies morphological operations to the region mask, including dilation and erosion, to optimize the boundaries of the region mask.

[0008] Specifically, in step S2, the morphological operation includes: For the region mask, a pre-defined structuring element is used to expand the region mask through an expansion operation to obtain an expanded mask; Based on the dilation mask, using the preset structural element, the dilation mask is shrunk through an erosion operation to obtain an erosion mask; the erosion mask is then used as the final region mask.

[0009] Preferably, in step S3, image data corresponding to the visible area is acquired, and the image data corresponding to the visible area is input into a pre-established key point detection model to obtain a set of facial key point coordinates from the key point detection model; the key point detection model is a human pose estimation key point detection model.

[0010] Optionally, in step S4, the minimum horizontal coordinate value is obtained from the key point coordinates in the set of facial key point coordinates as the left boundary of the bounding box, the minimum vertical coordinate value is obtained as the upper boundary, the maximum horizontal coordinate value is obtained as the right boundary, and the maximum vertical coordinate value is obtained as the lower boundary to generate the bounding box of the target object. The region within the bounding box of the target object is defined as the target object region, and the region outside the target object region is defined as the non-target object region. The key point coordinates of the eyes and the key point coordinates of the mouth are selected from the target object region, the minimum bounding rectangle of these coordinates is calculated, and the key feature region is determined. The key feature region is excluded from the target object region to generate other regions.

[0011] It is worth noting that in step S4, the key feature region is divided into multiple small squares, the average value of the pixels in each small square is calculated, and the average value is assigned to all pixels in the small square to obtain the pixelated key feature region. For the other regions, Gaussian blur is applied to each pixel and its neighboring pixels to generate Gaussian blurred other regions; The pixelated key feature region and the other Gaussian blurred regions are superimposed onto the corresponding positions of the target object region to generate a privacy processing region.

[0012] Preferably, in step S4, the processing unit combines Kalman filtering and the Hungarian algorithm to track the moving target object region and update the coordinates corresponding to the target object region.

[0013] Specifically, in step S4, the step of tracking the moving target object region includes: For each target object region, the boundary coordinates of the bounding box corresponding to the target object region are obtained. The Kalman filter is used to initialize the filtering state and predict the position of the next frame to obtain the predicted coordinate set. For the predicted coordinate set, calculate its overlap with the boundary coordinates of all target object bounding boxes detected in the current frame, and construct a cost matrix; For the overlap in the cost matrix, among the overlaps exceeding the second preset threshold, the boundary coordinates of the bounding box of the target object detected in the current frame with the largest overlap are selected as the coordinates of the successful match. By using the Kalman filter correction method, the coordinates of successfully matched points are input, the predicted coordinate set is updated, and the corrected coordinate set is obtained. Based on the corrected coordinate set, adjust the position of the target object region to generate an updated bounding box; Based on the updated bounding box, determine the updated target object region.

[0014] Specifically, in step S5, the number of pixels n1 in the non-target object region and the number of pixels n2 in the visible region image are obtained, and the weight a1 = n1 / n2 of the non-target object region is calculated. Obtain the number of pixels n3 in the target object region, and calculate the weight a2 = n3 / n2 of the privacy processing region; For the visible area image, a sub-region corresponding to the non-target object region is obtained through a region extraction operation. Based on the weight a1 of the non-target object region, the non-target object region is superimposed on the sub-region using a weighted fusion method to obtain the first intermediate image. For the first intermediate image, a sub-region corresponding to the target object region is obtained through region extraction operation. Based on the weight a2 of the privacy processing region, the privacy processing region is superimposed on the sub-region using a weighted fusion method to obtain the updated visible region image. Obtain the number of pixels n4 in the original visible area image and the number of pixels n5 in the original image, and calculate the weight a3 = n4 / n5 of the updated visible area image; Obtain the number of pixels n6 in the unseen region image, and calculate the weight a4 = n6 / n5 of the unseen region image; For the original image, a sub-region corresponding to the updated visible region image is obtained through region extraction operation. Based on the weight a3 of the new visible region image, the updated visible region image is superimposed on the sub-region using a weighted fusion method to obtain the second intermediate image. For the second intermediate image, a sub-region corresponding to the invisible region image is obtained through region extraction operation. Based on the weight a4 of the invisible region image, the invisible region image is superimposed on the sub-region using a weighted fusion method to obtain the final fused image.

[0015] The beneficial effects of this invention are as follows: In the image privacy processing method based on region segmentation, after acquiring the original image through an image acquisition device, the image is segmented at the pixel level using a region segmentation model to generate a region mask to distinguish between visible and invisible regions, and the invisible regions are subjected to global blurring. Simultaneously, the visible regions are input into a keypoint detection model to determine the target object region and non-target object regions. Then, the key feature regions of the target object region are pixelated, and other regions are subjected to Gaussian blurring to form privacy processing regions. Finally, the images of the non-target object regions, privacy processing regions, and invisible regions are fused to generate the final fused image. This invention, through multi-level region segmentation and differentiated blurring, protects privacy information while preserving the visibility and usability of the image, achieving a highly efficient technical effect of image privacy protection and information balance. Attached Figure Description

[0016] Figure 1 This is a flowchart of an image privacy processing method based on region partitioning.

[0017] Figure 2 A schematic diagram of key points on the human body. Detailed Implementation

[0018] The specific embodiments of the present invention will be further described below with reference to the accompanying drawings. It should be noted that these descriptions are for the purpose of aiding understanding the present invention, but do not constitute a limitation thereof. Furthermore, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not conflict with each other.

[0019] Combination Figure 1 and Figure 2 The image privacy processing method based on region partitioning shown includes the following steps: S1: Acquire raw images using an image acquisition device and transmit them to the processing unit; S2: The processing unit loads the region segmentation model to perform pixel-level segmentation on the original image, generating a region mask. The region mask marks the visible region (such as the desktop) and the invisible region (such as the wall and ceiling), and performs global blurring on the invisible region. In this embodiment, when performing global blurring, Gaussian blurring is applied to each pixel in the invisible region and its neighboring pixels, with the kernel size set to 5x5 and the standard deviation to 1, to generate the globally blurred invisible region. S3: The image corresponding to the visible area is used as the input to the key point detection model, and the key point detection model outputs the key point coordinates of the target object; S4: The processing unit determines the target object region (such as a face) and non-target object region (such as a region other than a face) based on the key point coordinates, and divides the target object region into key feature regions (such as eyes and mouth on a face) and other regions (such as other parts of a face other than eyes and mouth), applies pixelation processing to the key feature regions, and applies Gaussian blur processing to the other regions to generate privacy processing regions; S5: The processing unit merges the non-target object region and the privacy processing region to obtain an updated visible region image. The processing unit then merges the updated visible region image with the invisible region image to obtain a fused image.

[0020] In the image privacy processing method based on region segmentation, after acquiring the original image through an image acquisition device, the image is segmented at the pixel level using a region segmentation model to generate a region mask to distinguish between visible and invisible regions, and the invisible regions are then globally blurred. Simultaneously, the visible regions are input into a keypoint detection model to determine target and non-target regions. Then, the key feature regions of the target regions are pixelated, and other regions are Gaussian blurred to form privacy-processing regions. Finally, the images of the non-target regions, privacy-processing regions, and invisible regions are fused to generate the final fused image. This invention, through multi-level region segmentation and differentiated blurring, protects privacy information while preserving the image's visibility and usability, achieving a highly efficient technical effect of image privacy protection and information balance.

[0021] It is worth noting that in step S2, a fixed region boundary, such as a polygonal region of interest, is defined during the camera initialization phase. The processing unit applies the defined region boundaries to perform region segmentation on the original image to obtain an initial visible static region of interest and an initial invisible static region of interest. Image frames are acquired from an initial visible static region of interest; For the acquired image frames, the pre-trained U-Net model is input for pixel-level segmentation, and the output is a region mask; The corresponding pixel region in the original image is marked as the visible region by using a region mask; Regions outside the visible area in the original image are marked as invisible regions, which include the initial invisible static region of interest and regions within the initial visible static region of interest that are not marked by the region mask.

[0022] Specifically, in a restaurant setting, the initial area delineation during camera setup is to filter out irrelevant backgrounds and focus on the core display area. For example, users can draw polygons on the monitoring screen using client software, designating them as regions of interest (ROIs). These polygons cover the tables in the restaurant, excluding aisles, walls, and ceilings. The processing unit performs initial segmentation on the 1920x1080 resolution image based on these polygon boundaries. The area inside the polygon is marked as the initial visible static ROI, while the outer area, including aisles, walls, and ceilings, is directly marked as the initial invisible static ROI. This data is ignored in subsequent processing, thus reducing computational load.

[0023] In one possible implementation, the system extracts image frames only from the initial visible static regions of interest (ROIs) and inputs them into a pre-trained U-Net model. The U-Net model utilizes its encoder-decoder structure to perform deep feature extraction and pixel-level classification of the image. Assuming the model is trained to identify regions containing food items and regions not containing food items, when given an image frame containing both, the model determines each pixel's classification and outputs a binary region mask. In this mask, regions containing food items are marked as 1, and other regions not containing food items are marked as 0. Understandably, by mapping this region mask onto the original image, only pixels marked as 1 are ultimately identified as visible regions. At this point, all parts of the original image outside the visible regions are uniformly marked as invisible regions. This not only includes the initially excluded initial invisible static ROIs but also adds regions within the initial visible regions that were not identified as key content by the U-Net model (such as regions not containing food items). This dual filtering mechanism enables precise privacy protection and bandwidth saving, ensuring that the final output video stream contains only high-value dynamic information and effectively avoids interference from invalid background information.

[0024] Preferably, in step S2, the region partitioning model applies morphological operations to the region mask, including dilation and erosion, to optimize the boundaries of the region mask.

[0025] Optionally, in step S2, the morphological operation includes: For the region mask, a pre-defined structuring element is used to expand the region mask through an expansion operation to obtain an expanded mask; Based on the dilation mask, using the preset structural element, the dilation mask is shrunk through an erosion operation to obtain an erosion mask; the erosion mask is then used as the final region mask.

[0026] Specifically, in the video stream processing workflow of a restaurant, the initially generated region mask often suffers from rough edges or internal holes, which is usually caused by pixel classification jitter due to uneven lighting. To optimize this mask, the system introduces a morphological processing mechanism.

[0027] Specifically, the processing unit uses a preset structural element, such as a circle with a radius of 3 pixels or a 5x5 square matrix, to perform an expansion operation on the region mask. This process is similar to "growing" the region representing the visible area in the region mask outward, so that the originally broken pixel blocks are connected to each other, filling in the small holes and ensuring the continuity of the target subject.

[0028] It should be noted that while a simple dilation operation repairs internal holes, it inevitably expands the boundaries of the visible area, causing some irrelevant background pixels to be incorrectly included. Therefore, based on the dilated mask, the system then performs an erosion operation using the same structuring element. The erosion process is a reverse contraction of the dilation result; it can remove edge noise introduced during dilation, smooth the outer contour of the object, and preserve the filling effect of the dilation operation on internal holes.

[0029] This combination of dilation and erosion is called "closing" in image processing. Its core value lies in its ability to effectively filter out dark noise in the foreground object without significantly changing the object's area and shape. Understandably, the final region mask obtained after this processing has smoother edges and conforms to the continuity characteristics of physical entities.

[0030] Specifically, in step S3, image data corresponding to the visible area is acquired, and the image data corresponding to the visible area is input into a pre-established key point detection model. The set of facial key point coordinates is obtained from the key point detection model. The key point detection model is a human pose estimation key point detection model, such as the MediaPipe Face Mesh model.

[0031] The set of facial key point coordinates is filtered using the non-maximum suppression method in the image processing library, and the facial key point coordinates with a confidence level higher than the first preset threshold are retained to obtain the filtered set of facial key point coordinates. Multi-frame verification is performed on the filtered set of facial key point coordinates. If facial key point coordinates at the same location are detected in multiple consecutive frames, the facial key point coordinates are determined to be valid coordinates.

[0032] The valid coordinates are output as the key point coordinates of the target object.

[0033] Beneficial effects: In one possible implementation, the system extracts high-resolution local image patches from the visible area determined in the aforementioned steps. Specifically, the processing unit locks down pixel clusters with region mask marked as 1 in real time and uses them as the input source for keypoint detection.

[0034] This process ensures that subsequent algorithms only process valid pixels of facial features contained in the food placement area, avoiding redundant calculations of irrelevant information such as areas other than food placement. Because in this scheme, food is only placed in the food placement area, but faces may appear in this area, it is necessary to blur faces appearing in the food placement area to protect privacy.

[0035] In one possible implementation, the extracted image data corresponding to the visible region is input into a pre-established face mesh keypoint detection model. For example... Figure 2 As shown, this model uses a deep convolutional neural network to analyze the subtle topological structure of the face, and can annotate 32 points on the human body, including the eyes and mouth. Then, it extracts the coordinates of the annotations related to the face from these 32 points, such as the coordinates of the eyes, mouth, nose, etc., to form a set of facial key point coordinates.

[0036] In one possible implementation, to eliminate redundant interference in the model output, the system employs a non-maximum suppression method to streamline the set of facial keypoint coordinates. Specifically, when the model reports multiple overlapping candidate points in the same region, the algorithm calculates the overlap between each point and then calculates the confidence score as 1 - overlap. Assuming a preset first threshold of 0.8, if the confidence score for a predicted point at the left corner of the eye is only 0.65, that point will be considered unreliable data and filtered out. Only when the confidence score of a keypoint is higher than the first preset threshold will it be retained in the filtered set of facial keypoint coordinates. For example, multi-frame verification is performed on the filtered set of facial keypoint coordinates to address flickering issues in the video stream. The system establishes a sliding window containing three consecutive frames. Assuming that nose coordinates of 960 and 540 are detected in frame 100, the system immediately checks the corresponding positions in frames 101 and 102. If, in the following two frames, the coordinates of the nose tip are within a small neighborhood of 962 and 541 and 961 and 539 respectively, it indicates that the key point has temporal stability, and the system immediately determines the coordinates of this facial key point as valid coordinates. Conversely, if no corresponding point is detected at this location in frame 101, or if the coordinates abruptly change to 1200 and 800, the detection in frame 100 is determined to be a false alarm and discarded. Finally, the valid coordinates are output as the key point coordinates of the target object. This process ensures high availability of the output data, accurately assisting subsequent special effects overlay or identity verification, and effectively improving the accuracy of conference video analysis.

[0037] It is worth noting that in step S4, the minimum horizontal coordinate value is obtained from the key point coordinates in the set of facial key point coordinates as the left boundary of the bounding box, the minimum vertical coordinate value is obtained as the upper boundary, the maximum horizontal coordinate value is obtained as the right boundary, and the maximum vertical coordinate value is obtained as the lower boundary to generate the bounding box of the target object. The area within the bounding box of the target object is defined as the target object area, and the area outside the target object area is defined as the non-target object area; preferably, for the convenience of subsequent operations, the boundary of the bounding box of the target object is also included in the target object area. The key point coordinates of the eyes and the key point coordinates of the mouth are selected from the target object region, the minimum bounding rectangle of these coordinates is calculated, and the key feature region is determined. The key feature region is excluded from the target object region to generate other regions.

[0038] Specifically, the system first defines geometric boundaries based on the keypoint set output from the previous steps, aiming to accurately isolate the customer's facial region from the complex background. By traversing the horizontal and vertical coordinates of all keypoints, the algorithm quickly identifies extreme values. For example, it uses the leftmost cheek point (x-coordinate 200) as the left boundary and the bottommost chin point (y-coordinate 800) as the lower boundary, thus constructing a compact bounding box for the target object. This operation strictly divides the video frame into the target object region and the non-target object region, ensuring that subsequent privacy calculations only apply to the face area and do not visually interfere with the food content in the background.

[0039] Preferably, in step S4, the key feature region is divided into multiple small squares, the average value of the pixels in each small square is calculated, and the average value is assigned to all pixels in the small square to obtain the pixelated key feature region. For the other regions, Gaussian blur is applied to each pixel and its neighboring pixels, with the kernel size set to 5x5 and the standard deviation to 1, to generate Gaussian blurred other regions; The pixelated key feature region and the Gaussian blurred other regions are superimposed onto the corresponding positions of the target object region to generate a privacy processing region. Specifically, the number of pixels m1 in the key feature region and the number of pixels m2 in the target object region are obtained, and the weight b1 = m1 / m2 of the pixelated key feature region is calculated; the number of pixels m3 in other regions is obtained, and the weight b2 = m3 / m2 of the Gaussian blurred other regions is calculated; for the image of the target object region, the sub-region corresponding to the key feature region is extracted using OpenCV's ROI operation, and the pixelated key feature region is superimposed on this sub-region using OpenCV's addWeighted function with a weight of b1 to obtain a transition image; for the transition image, the sub-region corresponding to other regions is extracted using OpenCV's ROI operation, and the Gaussian blurred other regions are superimposed on this sub-region using OpenCV's addWeighted function with a weight of b2 to obtain the privacy processing region.

[0040] For example, to preserve certain interactive semantics while protecting privacy, and to balance privacy protection and visual effects, the system further implements layered processing within the target object area. First, based on the key point indices of the eyes and mouth, the system calculates the smallest bounding rectangle that can completely cover these highly recognizable areas, defining it as the key feature region. Then, pixelation is applied to this region, dividing the rectangle into several small squares, such as a 10x10 pixel grid. The system calculates the arithmetic mean of the RGB values ​​of all pixels within each grid and uses this mean to fill the entire grid. This processing method can completely erase biometric features such as iris texture and lip lines, making it impossible to identify the specific identity of the participant. For other areas within the target object area after deducting the key feature regions, such as the forehead, cheeks, and sides of the nose, the system applies a Gaussian blur algorithm. By setting a 5x5 convolution kernel and a standard deviation of 1, the algorithm performs a weighted average on each pixel and its neighborhood. This process effectively smooths skin texture, hides moles, scars, or wrinkles that could be used for identification, while preserving the overall contour and three-dimensionality of the face. Ultimately, the pixelated facial features and blurred skin layers are overlaid back into their original positions to generate a privacy-preserving image that is both unidentifiable and maintains a basic sense of being in the dining room, greatly improving data security.

[0041] Optionally, in step S4, the processing unit combines Kalman filtering and the Hungarian algorithm to track the moving target object region and update the coordinates corresponding to the target object region.

[0042] Specifically, in step S4, the step of tracking the moving target object region includes: For each target object region, the boundary coordinates of the bounding box corresponding to the target object region are obtained. The Kalman filter is used to initialize the filtering state and predict the position of the next frame to obtain the predicted coordinate set. In this embodiment, this operation is performed independently for each target object region, so that each target object region can be tracked in a scene with multiple target object regions. For the predicted coordinate set, calculate its overlap with the boundary coordinates of all target object bounding boxes detected in the current frame, and construct a cost matrix; For the overlap in the cost matrix, among the overlaps exceeding the second preset threshold, the boundary coordinates of the bounding box of the target object detected in the current frame with the largest overlap are selected as the coordinates of the successful match. By using the Kalman filter correction method, the coordinates of successfully matched points are input, the predicted coordinate set is updated, and the corrected coordinate set is obtained. Based on the corrected coordinate set, adjust the position of the target object region to generate an updated bounding box; Based on the updated bounding box, determine the updated target object region.

[0043] In one embodiment, to ensure the continuity of privacy protection as guests move during the data collection process, the system introduces a dynamic tracking mechanism for each individual target area.

[0044] The system first acquires the bounding box coordinates of the current target object region (such as the face of a customer picking up food), and initializes its filtering state using a Kalman filter. This state includes not only the current position information but also motion vectors such as velocity and acceleration. As the customer's face moves from left to right in the frame, the filter predicts its possible position in the next frame based on the state of the previous frame, thus obtaining a set of predicted coordinates. This prediction mechanism can estimate the approximate range of the face before the detection algorithm completes scanning a new frame, laying the foundation for subsequent rapid matching and effectively addressing potential frame drops or delays during video transmission.

[0045] For example, to verify the accuracy of the prediction and pinpoint the real target, the system compares the predicted coordinate set with the bounding boxes of all actually detected target objects in the current frame, i.e., with all faces actually detected in the visible area of ​​the current frame. This process is achieved by constructing a cost matrix, the core metric of which is the Intersection over Union (IoU). Assuming the predicted coordinate set is located in the left-hand area of ​​the image, and the current frame detects three facial regions, the system calculates the Intersection over Union (IoU) of the boundary coordinates of the predicted coordinate set with the bounding boxes of these three target objects. If the IoU of the predicted coordinate set with the boundary coordinates of one of the target object bounding boxes reaches 0.85, while the IoU with the other two boxes is only 0.1 or 0, and this value exceeds a preset second threshold (e.g., 0.5), the system determines that the boundary coordinates of the high-overlapping target object bounding box are a successfully matched object. This filtering logic ensures the uniqueness and accuracy of the tracking link.

[0046] Specifically, after determining the successfully matched coordinates, the system updates the predicted coordinate set using the correction equation of the Kalman filter. Since simple image detection may cause coordinate jumps due to changes in lighting or occlusion, directly using the detected coordinates could lead to jitter in the privacy mask within the image. By inputting the successfully matched observation coordinates, the filter combines the covariance between the predicted and observed values ​​to calculate the optimal estimate, i.e., the corrected coordinate set. For example, if the predicted face center x-coordinate is 305, while the actual detected value is 300, the filter might output 302 as the corrected position, thus smoothing the motion trajectory.

[0047] Ultimately, the system adjusts the position of the target area based on the corrected coordinate set, so that subsequent privacy processing (such as blurring or mosaic) can closely and smoothly follow the guest's movement, avoiding the risk of privacy leakage due to target loss or positioning deviation, and greatly improving visual stability and security.

[0048] Preferably, in step S5, the number of pixels n1 in the non-target object region and the number of pixels n2 in the visible region image are obtained, and the weight a1 = n1 / n2 of the non-target object region is calculated. Obtain the number of pixels n3 in the target object region, and calculate the weight a2 = n3 / n2 of the privacy processing region; For the visible area image, a sub-region corresponding to the non-target object region is obtained through region extraction operation. Based on the weight a1 of the non-target object region, a weighted fusion method is used to superimpose the non-target object region on the sub-region to obtain a first intermediate image. Specifically, for the visible area image, the ROI operation of OpenCV is used to extract the sub-region corresponding to the non-target object region. The non-target object region is superimposed on the sub-region using the addWeighted function of OpenCV with a weight of a1 to obtain a first intermediate image. For the first intermediate image, a sub-region corresponding to the target object region is obtained through region extraction operation. Based on the weight a2 of the privacy processing region, the privacy processing region is superimposed on the sub-region using a weighted fusion method to obtain an updated visible region image. Specifically, for the first intermediate image, the sub-region corresponding to the target object region is extracted using OpenCV's ROI operation. The privacy processing region is superimposed on the sub-region using OpenCV's addWeighted function with a weight of a2 to obtain an updated visible region image. Obtain the number of pixels n4 in the original visible area image and the number of pixels n5 in the original image, and calculate the weight a3 = n4 / n5 of the updated visible area image; Obtain the number of pixels n6 in the unseen region image, and calculate the weight a4 = n6 / n5 of the unseen region image; For the original image, a sub-region corresponding to the updated visible region image is obtained through region extraction operation. Based on the weight a3 of the updated visible region image, the updated visible region image is superimposed on the sub-region using a weighted fusion method to obtain the second intermediate image. Specifically, for the original image, the sub-region corresponding to the original visible region image is extracted using OpenCV's ROI operation. The updated visible region image with weight a3 is superimposed on the sub-region using OpenCV's addWeighted function to obtain the second intermediate image. For the second intermediate image, a sub-region corresponding to the invisible region image is obtained through region extraction operation. Based on the weight a4 of the invisible region image, the invisible region image is superimposed on the sub-region using a weighted fusion method to obtain the final fused image. Specifically, for the second intermediate image, the sub-region corresponding to the invisible region image is extracted using OpenCV's ROI operation. The invisible region image is then superimposed on the sub-region using OpenCV's addWeighted function with a weight of a4 to obtain the fused image.

[0049] Beneficial effects: Specifically, the system establishes a fusion benchmark by statistically analyzing the pixel scale of different functional areas.

[0050] For example, the weight a1 of the non-target object region is 0.8, calculated using the number of pixels n2 in the visible area and the number of pixels n1 in the non-target object region. The weight a2 of the privacy processing region is 0.2. This pixel-based weight allocation mechanism provides a quantifiable proportional basis for subsequent image synthesis, ensuring a smooth transition between different visual levels during the fusion process.

[0051] In one embodiment, the system utilizes OpenCV's Region of Interest (ROI) operation to accurately locate the spatial coordinates of non-target object regions. By invoking an image weighted fusion operator, the original background sub-region is linearly superimposed on the preprocessed background layer. During this process, weight a1 determines the degree of background information preservation, and the generated first intermediate image initially constructs the visual background color of the dining table. Subsequently, for this first intermediate image, the system again delineates the local area of ​​the target object region and superimposes the privacy-de-sensitized face image onto it with weight a2. This layered superposition method allows the privacy-processed region to be naturally embedded into the background, forming an updated visible region image.

[0052] To achieve the final integration of the global image, the system further considers the ratio between the visible and invisible areas. Then, using the same logic, the updated visible and invisible areas are overlaid with corresponding weights. The resulting fused image not only rigorously protects the guest's privacy but also maintains the overall structural integrity of the video frame.

[0053] The embodiments of the present invention have been described in detail above with reference to the accompanying drawings, but the present invention is not limited to the described embodiments. For those skilled in the art, various changes, modifications, substitutions, and variations can be made to these embodiments without departing from the principles and spirit of the present invention, and these variations still fall within the protection scope of the present invention.

Claims

1. An image privacy processing method based on region partitioning, characterized in that, Includes the following steps: S1: Acquire raw images using an image acquisition device and transmit them to the processing unit; S2: The processing unit loads the region segmentation model to perform pixel-level segmentation on the original image, generates a region mask, the region mask marks the visible region and the invisible region, and performs global blurring on the invisible region; S3: The image corresponding to the visible area is used as the input to the key point detection model, and the key point detection model outputs the key point coordinates of the target object; S4: The processing unit determines the target object region and non-target object region based on the key point coordinates, divides the target object region into key feature region and other regions, applies pixelation processing to the key feature region, applies Gaussian blur processing to the other regions, and generates a privacy processing region. S5: The processing unit merges the non-target object region and the privacy processing region to obtain an updated visible region image. The processing unit then merges the updated visible region image with the invisible region image to obtain a fused image.

2. The image privacy processing method based on region partitioning according to claim 1, characterized in that: In step S2, a fixed area boundary is defined during the camera initialization phase; The processing unit applies the defined region boundaries to perform region segmentation on the original image to obtain an initial visible static region of interest and an initial invisible static region of interest. Image frames are acquired from an initial visible static region of interest; For the acquired image frames, the pre-trained U-Net model is input for pixel-level segmentation, and the output is a region mask; The corresponding pixel region in the original image is marked as the visible region by using a region mask; Regions outside the visible area in the original image are marked as invisible regions, which include the initial invisible static region of interest and regions within the initial visible static region of interest that are not marked by the region mask.

3. The image privacy processing method based on region partitioning according to claim 1, characterized in that: In step S2, the region partitioning model applies morphological operations to the region mask, including dilation and erosion, to optimize the boundaries of the region mask.

4. The image privacy processing method based on region partitioning according to claim 3, characterized in that: In step S2, the morphological operation includes: For the region mask, a pre-defined structuring element is used to expand the region mask through an expansion operation to obtain an expanded mask; Based on the dilation mask, using the preset structural element, the dilation mask is shrunk through an erosion operation to obtain an erosion mask; the erosion mask is then used as the final region mask.

5. The image privacy processing method based on region partitioning according to claim 1, characterized in that: In step S3, image data corresponding to the visible area is obtained, and the image data corresponding to the visible area is input into a pre-established key point detection model to obtain a set of facial key point coordinates from the key point detection model. The keypoint detection model is a human pose estimation keypoint detection model.

6. The image privacy processing method based on region partitioning according to claim 1, characterized in that: In step S4, the minimum x-coordinate value is obtained from the key point coordinates in the face key point coordinate set as the left boundary of the bounding box, the minimum y-coordinate value is obtained as the upper boundary, the maximum x-coordinate value is obtained as the right boundary, and the maximum y-coordinate value is obtained as the lower boundary, to generate the bounding box of the target object. The region within the bounding box of the target object is defined as the target object region, and the region outside the target object region is defined as the non-target object region. The key point coordinates of the eyes and the key point coordinates of the mouth are selected from the target object region, the minimum bounding rectangle of these coordinates is calculated, and the key feature region is determined. The key feature region is excluded from the target object region to generate other regions.

7. The image privacy processing method based on region partitioning according to claim 6, characterized in that: In step S4, the key feature region is divided into multiple small squares, the average value of the pixels in each small square is calculated, and the average value is assigned to all pixels in the small square to obtain the pixelated key feature region. For the other regions, Gaussian blur is applied to each pixel and its neighboring pixels to generate Gaussian blurred other regions; The pixelated key feature region and the other Gaussian blurred regions are superimposed onto the corresponding positions of the target object region to generate a privacy processing region.

8. The image privacy processing method based on region partitioning according to claim 6, characterized in that: In step S4, the processing unit combines Kalman filtering and the Hungarian algorithm to track the moving target object region and updates the coordinates corresponding to the target object region.

9. The image privacy processing method based on region partitioning according to claim 8, characterized in that: In step S4, the step of tracking the moving target object region includes: For each target object region, the boundary coordinates of the bounding box corresponding to the target object region are obtained. The Kalman filter is used to initialize the filtering state and predict the position of the next frame to obtain the predicted coordinate set. For the predicted coordinate set, calculate its overlap with the boundary coordinates of all target object bounding boxes detected in the current frame, and construct a cost matrix; For the overlap in the cost matrix, among the overlaps exceeding the second preset threshold, the boundary coordinates of the bounding box of the target object detected in the current frame with the largest overlap are selected as the coordinates of the successful match. By using the Kalman filter correction method, the coordinates of successfully matched points are input, the predicted coordinate set is updated, and the corrected coordinate set is obtained. Based on the corrected coordinate set, adjust the position of the target object region to generate an updated bounding box; Based on the updated bounding box, determine the updated target object region.

10. The image privacy processing method based on region partitioning according to claim 1, characterized in that: In step S5, the number of pixels n1 in the non-target object region and the number of pixels n2 in the visible region image are obtained, and the weight a1 = n1 / n2 of the non-target object region is calculated. Obtain the number of pixels n3 in the target object region, and calculate the weight a2 = n3 / n2 of the privacy processing region; For the visible area image, a sub-region corresponding to the non-target object region is obtained through a region extraction operation. Based on the weight a1 of the non-target object region, the non-target object region is superimposed on the sub-region using a weighted fusion method to obtain the first intermediate image. For the first intermediate image, a sub-region corresponding to the target object region is obtained through region extraction operation. Based on the weight a2 of the privacy processing region, the privacy processing region is superimposed on the sub-region using a weighted fusion method to obtain the updated visible region image. Obtain the number of pixels n4 in the original visible area image and the number of pixels n5 in the original image, and calculate the weight a3 = n4 / n5 of the updated visible area image; Obtain the number of pixels n6 in the unseen region image, and calculate the weight a4 = n6 / n5 of the unseen region image; For the original image, a sub-region corresponding to the updated visible region image is obtained through region extraction operation. Based on the weight a3 of the new visible region image, the updated visible region image is superimposed on the sub-region using a weighted fusion method to obtain the second intermediate image. For the second intermediate image, a sub-region corresponding to the invisible region image is obtained through region extraction operation. Based on the weight a4 of the invisible region image, the invisible region image is superimposed on the sub-region using a weighted fusion method to obtain the final fused image.