A method and system for location privacy protection of road infrastructure images
By extracting combined images of building edges and non-building parts, and using generators and style learning networks to generate fake images with different levels of privacy protection, this solves the problems of insufficient utility retention and low image quality in existing technologies, and achieves effective protection and intensity control of building location privacy.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHONGQING UNIV OF POSTS & TELECOMM
- Filing Date
- 2025-02-28
- Publication Date
- 2026-06-23
AI Technical Summary
Existing technologies for protecting building location privacy suffer from insufficient utility retention, low image quality, and a lack of control over the strength of privacy protection, making it difficult to achieve a balance between preserving image utility and improving the strength of privacy protection.
By extracting combined images of building edges and non-building parts, fake images with different levels of privacy protection are generated using a generator and style learning network. The privacy protection strength is controlled by adversarial training through a basic discriminator and a pixel-level semantic discriminator.
The generated fake images effectively protect the privacy of building locations, improve image authenticity, and enable flexible control over the strength of privacy protection, achieving a better balance between privacy utility.
Smart Images

Figure CN120145448B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of visual image privacy protection, and relates to a method and system for protecting the location privacy of road and building images. Background Technology
[0002] With the rapid development of the internet and the widespread deployment of deep learning models in daily life, image sharing and image-based mobile phones have become extremely common. People share various images they take through social media and online applications, and a large number of images are collected and labeled for model training. This raises serious concerns about the risk of privacy leaks, as users' personal identities, social relationships, location, and other private information are likely to be leaked along with the image content. Location privacy is particularly critical; the leakage of personal location information can threaten personal safety, and some criminals may use the obtained location information for criminal activities. Furthermore, the leakage of location information may also lead to the infringement of other personal privacy rights, such as personal habits, social circles, and other private information.
[0003] In research on location privacy, traditional location-based services (LBS) typically require users to explicitly share their location information to access related services (such as navigation and social media check-ins), making it a focal point of location privacy protection research. Much work has been done to protect user location privacy in LBS. However, compared to LBS, the potential for location privacy breaches in image content is often overlooked or underestimated. While images don't contain explicit geographic coordinates like LBS, the image content itself can still contain a wealth of exploitable geographic information. Buildings play a significant role in image location privacy breaches, especially in urban street view environments. Cities are densely populated with buildings and numerous landmarks, containing a wealth of location information. Without any protective measures, attackers can easily deduce the photographer's location through manual analysis or geolocation techniques. With more images, attackers may even infer more sensitive information, such as users' home addresses, routes, and behavioral habits. With the rapid development of machine learning technology, location inference methods have become more diverse and accurate, presenting new challenges to location privacy protection.
[0004] In this context, the importance of building location privacy protection becomes increasingly apparent. However, there is often a contradiction between privacy protection and the effective use of data; strict privacy protection inevitably reduces the service quality of the data. Therefore, in building location privacy protection, how to preserve the utility information in the image (so as not to affect the normal use of the image) while ensuring sufficient location privacy is a prerequisite that must be considered when designing location privacy protection methods. Currently, some studies have attempted to protect the location privacy of buildings while preserving the utility objects in the image. Xiong et al. first focused on location privacy protection of images captured by cameras of autonomous vehicles. They used generative adversarial networks (GANs) to generate images to eliminate privacy-preserving building objects in the images while maintaining the utility of other valuable objects. They mainly used the structural similarity index (SSIM) and L1 distance as utility and privacy metrics, and used them as loss functions to guide GAN training. However, such metrics are coarse, and the quality of the generated buildings is not high. Therefore, they further proposed ADGAN I and ADGAN-II models, which no longer use SSIM and L1 distance, but instead use semantic accuracy as a metric for utility and privacy, and set a pre-trained FCN semantic segmentation model as a guiding model. ADGAN-I uses a generator built with a U-net network to generate noise and add it to the original image to create fake images. This method retains high quality, but the visual changes of buildings to the human eye are very small, and it can only resist position inference attacks by machine models. ADGAN-II combines generative adversarial networks and variational autoencoders (VAEs) to directly generate fake street view building images. The images generated in this way have a stronger protective effect, but at the cost of greater utility loss.
[0005] Existing methods mostly reduce the risk of privacy breaches by removing location-related information from images. Specifically, they utilize generative models to eliminate privacy-preserving objects in images while maintaining the usability of utility objects, thus allowing the processed images to still be used in practical applications. However, some problems remain to be solved:
[0006] 1. Insufficient utility retention. Modifications to other utility objects while processing privacy objects reduce the overall usability of the image.
[0007] 2. The generated images are of low quality. Although existing methods typically use high-quality generative models such as GANs and VAEs, the realism of the generated images is significantly reduced compared to when privacy protection is not considered due to interference from privacy processing.
[0008] 3. Current location privacy protection methods lack consideration for controlling the strength of privacy protection. When using these models for privacy processing, users can only choose to protect or not protect, and there is no room for users to customize the strength of privacy protection, which severely limits the applicable scenarios of the models. Summary of the Invention
[0009] To more effectively address the issue of location privacy leakage in road and building images, this invention proposes a method and system for protecting the location privacy of road and building images. First, edges are extracted from the original building and combined with the remaining non-building image content, then injected into the encoder of the generator to induce the generator to produce realistic images that differ from the original building's appearance. A skip link is established between the generator's encoder and decoder. Second, the original road and building image is input into a style learning network, where it undergoes adaptive instance normalization along with the intermediate features output from the encoder, and then input into the decoder to obtain fake road and building images with varying degrees of style differences. Finally, a basic discriminator and a pixel-level semantic discriminator are used to distinguish between the original real images and the fake images, and adversarial training is performed against the generator.
[0010] To achieve the above objectives, the present invention provides the following technical solution:
[0011] On the one hand, a method for location privacy protection of road and building images is proposed, which includes the following steps:
[0012] S1, From the original road and building image I real Extract the building edge image and use the building mask M b By combining the image with the remaining non-building portion, an edge composite image I is synthesized. B ;
[0013] S2, Composite edge image I B and the original image I real The encoder G input to the generator down The intermediate features F are obtained respectively. B and F real ;
[0014] S3, Transfer the original image I real Encoded intermediate features F real The mean and standard deviation are used as the minimum difference style representation; the original road and building image I real The input is fed into a style learning network to generate standard deviation and mean, and the generated standard deviation and mean are used as the style representation with the greatest difference.
[0015] S4. Adaptive instance normalization is used to normalize the intermediate features F of the edge composite map. B The mean and standard deviation are aligned to the minimum and maximum difference styles, respectively, to obtain the aligned intermediate features F. min ,F max ;
[0016] S5. Align the intermediate features F of the edge composite map.min ,F max The input generator's decoder is upsampled to generate fake road and building images I with maximum and minimum privacy protection levels, respectively. max ,I min And calculate the style difference loss;
[0017] S6. Using a multi-scale Patchgan discriminator D b As a basic discriminator, it is used to analyze false road and building images. max ,I min and original real road and building images I real Perform a true / false test and calculate the countermeasure loss;
[0018] S7. The relative positional distribution of building areas and non-building areas is reflected by calculating the correlation between building mask elements. The loss of the basic discriminator and generator is calculated based on the mask-guided loss function.
[0019] S8, using pixel-level semantic discriminator D p For fake road and building images I max ,I min and original real road and building images I real Perform true / false discrimination and calculate its semantic discrimination loss;
[0020] S9. Backpropagate the calculated generator-related loss to iteratively train and update the generator parameters; for the trained generator, given the privacy protection strength α, generate the required fake road and building images I. α .
[0021] Furthermore, step S1 includes the following sub-steps:
[0022] S11. Use the Canny algorithm to extract the edge images of buildings from the original road and building images. The expression of the Canny algorithm is as follows:
[0023] B α =canny(I real ,α)⊙M b
[0024] Where ⊙ represents the Hadamard product, α represents the privacy protection strength, and the value of α controls the extraction of building edges B. α The degree of completeness; when α is 0, the extracted edges are the most complete and are considered as the true ground edges, labeled as B. gt B gt =canny(I real ,0)⊙M b ;
[0025] S12, extract the real edges B gt Compared with real street view images I real Constructing edge composite graph I under the guidance of building mask. B :
[0026] I B =B gt ⊙M b +I real ⊙(1-M b ).
[0027] Furthermore, step S3 includes the following sub-steps:
[0028] Minimal style difference representation: Calculate the original image I along the channel and batch dimensions. real Encoded intermediate features F real mean μ min and standard deviation σ min They are respectively:
[0029]
[0030] In the formula, H and W represent the height and width of the intermediate feature, respectively; h and w represent F, respectively. real The index of the element in the H, W dimension;
[0031] Minimal style difference representation: The original image I real The input is fed into a style learning network to generate a set of standard deviations and means (μ). max ,σ max As the maximum difference style representation of an image, the style learning network contains multiple convolutional modules and a final pooling layer. Each convolutional module includes a convolutional layer, an activation function, and a batch normalization layer.
[0032] Furthermore, step S4 includes the following sub-steps:
[0033] S41. Calculate the edge composite graph I along the batch and channel dimensions. B Encoded intermediate features F B Mean and standard deviation:
[0034]
[0035] S42. Perform style alignment to obtain the aligned intermediate feature F. min ,F max :
[0036]
[0037] In the formula, (μmin ,σ min ), (μ max ,σ max ) represent the mean and standard deviation of the minimum and maximum style differences, respectively.
[0038] Furthermore, step S5 includes the following sub-steps:
[0039] S51, intermediate feature F min ,F max The decoder G input to the generator up In the middle, the upsampled intermediate features F min ,F max The fake road and building images I obtained after privacy protection were obtained respectively. min ,I max :
[0040] I min =G up (F min ), I max =G up (F max )
[0041] S52. Use the L1 distance of the Gram matrix of the intermediate features of the two images in the pre-trained neural network to measure the style difference between the two images, denoted as I. min With smaller stylistic differences from the original image, I max If the objective is to achieve a greater stylistic difference from the original image, then the style difference loss function is:
[0042]
[0043] in, That is, style difference loss, φ i (I) represents the feature map calculated up to the i-th activation function after inputting I into the pre-trained classical classification neural network VGG19, with shape C. i ×H i ×W i ;
[0044] The Gram matrix is presented in the following way:
[0045] Gram(φ i (I))=φ i (I)(φ i (I)) T
[0046] This means to put φ i (I) In H i W iAfter vectorizing the features in each dimension, their Gram matrix is calculated.
[0047] Furthermore, step S6 includes the following sub-steps:
[0048] S61, Input I min I max I real The system performs discrimination against the base discriminator, calculates the adversarial loss on the base discriminator, performs backpropagation, and updates the parameters of the base discriminator. The adversarial loss is:
[0049]
[0050] Among them, I fake Indicate I min and I max The set, This represents the adversarial loss on the base discriminator;
[0051] S62. Calculate the corresponding adversarial loss on the generator:
[0052]
[0053] in, This represents the adversarial loss of the basic discriminator on the generator.
[0054] Furthermore, step S7 includes the following sub-steps:
[0055] S71, Given image I fake The discrimination result obtained after inputting into the basic discriminator is represented as follows: In f I Select N s For each query element p, there are 1 query element p. i Random sampling N k The elements are arranged into a vector. The relevance between the query element and these multiple elements is calculated as follows:
[0056]
[0057] N s The relevant queries are arranged into a relevant query set.
[0058] S72, Given correspondence and image I fake mask M I Scale to I fake For the same size, the mask M is calculated. I Relevance query set
[0059] S73, Based on two relevance query sets Calculate the mask-guided loss:
[0060]
[0061]
[0062] Among them, S c It is the matrix obtained from the intermediate steps, p ij This represents the element in the i-th row and j-th column of the matrix. That is, two relevance query sets. The average of the elements of the matrix obtained by subtraction represents the mask-guided loss;
[0063] Based on the mask-guided loss and the adversarial loss of the base discriminator, backpropagation is performed to iteratively train and update the parameters of the base discriminator.
[0064] Furthermore, step S8 includes the following sub-steps:
[0065] S81, Given a fake road and building image I fake , that is I min ,I max Two types of images, I fake The input is fed into a pixel-level semantic discriminator for true / false identification;
[0066] The pixel-level semantic discriminator adopts the U-net architecture, performing a multi-classification task on each pixel of the input image. The classification labels include: real buildings, fake buildings, real non-buildings, and fake non-buildings. The final output has the same size as the input image. The cross-entropy between the discriminator's predictions and the ground truth labels is used as the loss function, called the semantic discriminant loss, which is expressed as:
[0067]
[0068] in, This represents the semantic discrimination loss on a pixel-level semantic discriminator, where the size of the discrimination result is 4×H×W, and H and W are the values of I. real Image size, t c,i,j This represents the element value at coordinates (c,i,j) in the identification result of the identification image, that is, the proportion of the pixel at coordinates (i,j) in the identification image that is classified as c;
[0069] Backpropagation is performed based on the calculated loss, and the parameters of the pixel-level semantic discriminator are iteratively trained and updated.
[0070] S82. Calculate the semantic discriminative loss on the generator:
[0071]
[0072] in, This represents the semantic discriminative loss on the generator;
[0073] The generator parameters are iteratively trained and updated based on backpropagation using the calculated generator-related loss.
[0074] Furthermore, in step S9, using the trained generator, given a privacy protection strength α, for F... min ,F max Feature interpolation is performed in the feature space using α, followed by upsampling of the interpolated features to obtain the target building image I with a specific level of privacy protection. α :
[0075] I α =G up (F max *α+(1-α)*F min )
[0076] In the formula, G up (·) indicates an upsampling operation.
[0077] On the other hand, a system for implementing the aforementioned method for location privacy protection of road and building images is also proposed, the system comprising:
[0078] The edge composite module extracts building edges and stitches non-building images together to form an edge composite image.
[0079] The generator module is used to generate fake road and building images with location privacy protected;
[0080] The masking guidance module is used to calculate the correlation between building mask elements, reflecting the relative positional distribution of building areas and non-building areas.
[0081] The basic discriminator module is used to distinguish between real and fake input images and to conduct adversarial training with the generator.
[0082] The pixel-level semantic discriminator module is used to distinguish between real and fake pixels of the input image, perform semantic classification of buildings, and conduct adversarial training with the generator.
[0083] The generator module contains the following sub-modules:
[0084] The encoding module performs multi-layer convolution on the input image to learn image features;
[0085] The decoding module decodes intermediate features and generates fake building images;
[0086] The style learning module contains multiple convolutional layers and one pooling layer. It takes an original road and building image as input and generates a set of mean and standard deviation (μ). max ,σ max ) as a representation of the maximum difference in style of an image;
[0087] The style difference loss calculation module uses the Gram matrix to calculate the style difference of an image, prompting I... min With minor stylistic differences from the original image, I max It has a significant stylistic difference from the original image.
[0088] The beneficial effects of this invention are as follows:
[0089] (1) The fake road and building images generated by the present invention significantly alter the content of the building images while retaining the non-building areas of the images, which can effectively protect the privacy of building locations and resist location inference attacks.
[0090] (2) By designing an additional pixel-level semantic discriminator and a mask-guided loss function, the present invention effectively improves the realism of the generated building images.
[0091] (3) This invention achieves control over the level of privacy protection. By setting different levels of privacy protection, it is possible to flexibly interpolate between the features between the maximum and minimum style differences to generate building styles that differ to varying degrees from the original building styles, thus achieving a better balance of privacy utility.
[0092] Other advantages, objectives, and features of the invention will be set forth in part in the description which follows, and in part will be apparent to those skilled in the art from the following examination, or may be learned from practice of the invention. The objectives and other advantages of the invention can be realized and obtained through the following description. Attached Figure Description
[0093] To make the objectives, technical solutions, and advantages of the present invention clearer, the preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings, wherein:
[0094] Figure 1 This is a general flowchart of the method for protecting the location privacy of road and building images provided in an embodiment of the present invention.
[0095] Figure 2 A detailed flowchart of the method for protecting the location privacy of road and building images provided in an embodiment of the present invention.
[0096] Figure 3 This is a schematic diagram of a location privacy protection system for road and building images provided in an embodiment of the present invention.
[0097] Figure 4 This is a schematic diagram illustrating the working principle of a location privacy protection system for road and building images provided in an embodiment of the present invention. Detailed Implementation
[0098] The following specific examples illustrate the implementation of the present invention. Those skilled in the art can easily understand other advantages and effects of the present invention from the content disclosed in this specification. The present invention can also be implemented or applied through other different specific embodiments, and various details in this specification can be modified or changed based on different viewpoints and applications without departing from the spirit of the present invention. It should be noted that the illustrations provided in the following embodiments are only schematic representations of the basic concept of the present invention. Unless otherwise specified, the following embodiments and features can be combined with each other.
[0099] The accompanying drawings are for illustrative purposes only and are schematic diagrams, not actual pictures. They should not be construed as limiting the invention. To better illustrate the embodiments of the invention, some parts in the drawings may be omitted, enlarged, or reduced, and do not represent the actual product dimensions. It is understandable to those skilled in the art that some well-known structures and their descriptions may be omitted in the drawings.
[0100] In the accompanying drawings of the embodiments of the present invention, the same or similar reference numerals correspond to the same or similar components. In the description of the present invention, it should be understood that if terms such as "upper," "lower," "left," "right," "front," and "rear" indicate the orientation or positional relationship based on the orientation or positional relationship shown in the drawings, they are only for the convenience of describing the present invention and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation. Therefore, the terms used to describe positional relationships in the drawings are only for illustrative purposes and should not be construed as limiting the present invention. For those skilled in the art, the specific meaning of the above terms can be understood according to the specific circumstances.
[0101] Please see Figures 1-4 This invention relates to a method and system for protecting the location privacy of road and building images.
[0102] Example 1
[0103] This embodiment provides specific steps for a method to protect the location privacy of road and building images, such as... Figure 1 The overall steps of the method of the present invention are shown. Figure 2The specific steps of the method of the present invention are shown below. Taking the training of a generative adversarial network model using a street view image dataset containing 3000 training images as an example, the specific implementation steps of the present invention are illustrated. The goal is to modify the building images in the input street view image, protecting the location privacy contained therein, while preserving the image content of non-building parts unchanged, thereby generating fake street and building images. The steps include:
[0104] Step S1, synthesize the edge composite image. From the original road and building image I real Extract the building edge image and use the building mask M b By combining the image with the remaining non-building portion, an edge composite image I is obtained. B It includes the following sub-steps:
[0105] Step S1-1: Extract the edge images of buildings from the original road and building images using the Canny algorithm. The expression for the Canny algorithm is as follows:
[0106] B α =canny(I real ,α)⊙M b
[0107] Where ⊙ represents the Hadamard product, α represents the privacy protection strength, and the value of α controls the extraction of building edge B. α The degree of completeness. When α is 0, the extracted edges are the most complete and are considered as the true ground edges, labeled as B. gt B gt =canny(I real ,0)⊙M b .
[0108] In this embodiment, let I real The batch size is 4, the number of channels is 3, and the height and width of the image are 512 and 512 respectively; then the building mask M b for:
[0109]
[0110] Further calculation of the true ground edge B gt =canny(I real ,0)⊙M b ,have to:
[0111]
[0112] Step S1-2, extract the true edge B from step S1-1. gtAn edge composite map I is constructed together with a real street view image, guided by building masks. B :
[0113] I B =B gt ⊙M b +I real ⊙(1-M b )
[0114] In this embodiment, the constructed edge composite graph is as follows:
[0115]
[0116] Step S2: Downsample and calculate the minimum difference style representation of the original building image. Then, use the edge composite image I obtained in step S1... B and the original image I real The encoder G input to the generator down In the process, the mean and standard deviation of the encoded intermediate features are calculated. The encoder downsamples the edge composite image and I... B Original image I real, Obtain intermediate features F respectively B F real .
[0117] In this embodiment, let:
[0118]
[0119] Encoder G down It consists of 7 convolutional layers, each with a kernel size of (4,4), a stride of 2, and padding of 1. The intermediate features after convolution are...
[0120] Step S3: Minimal style calculation and learning of maximum style representation.
[0121] Calculate F along the channel and batch dimensions real mean μ min and standard deviation σ min This is called minimal difference style representation, which consists of:
[0122]
[0123] In the formula, H and W represent the height and width of the intermediate feature, respectively; h and w represent F, respectively. real The index of the element in the H, W dimension.
[0124] In this embodiment, we have:
[0125]
[0126] σ was calculated min =([0.6421,0.0658,...,0.1819,0.9962],...]);
[0127]
[0128] μ was calculated min =([[0.1167,0.8905,...,0.3577,0.7530],...]),
[0129] Original road and building image I real The input is fed into a style learning network to generate a set of standard deviations and means (μ). max ,σ max The style learning network is used as the maximum difference style representation for images. It consists of multiple convolutional modules and a final pooling layer. Each convolutional module includes a convolutional layer, an activation function, and a batch normalization layer.
[0130] In this embodiment, the input image size of the style learning network is 4×3×512×512. After two convolutional layers, the kernel size of each layer is (4,4), the stride is 2, and the padding is 1. The feature size after convolution is 4×512×128×128. The features are then input into two separate convolutional layers to obtain two output features of size 4×512×64×64. Average pooling is performed along the W and H dimensions to obtain the mean and standard deviation of the most differential style representation.
[0131] μ max =([[0.8664,0.6567,…,0.7066,0.6699],…])
[0132] σ max =([0.8860,0.2503,…,0.2748,0.6551],…])
[0133] Step S4, style alignment. Use adaptive instance normalization to align F B The mean and standard deviation are aligned to (μ min ,σ min ), (μ max ,σ max ), to obtain the aligned intermediate feature F min ,F max .
[0134] Step S4-1: Calculate F along the batch and channel dimensions. B The mean and standard deviation.
[0135]
[0136] In this embodiment, μ(F) is calculated according to the formula. B )=([[0.3821,0.3304,...,0.3751,0.7162],...]),σ(F B = ([[0.7427,0.6046,...,0.8594,0.2843],...]).
[0137] Step S4-2: Perform style alignment to obtain the aligned intermediate feature F. min ,F max .
[0138]
[0139] In the formula, (μ min ,σ min ), (μ max ,σ max ) represent the mean and standard deviation of the minimum and maximum style differences, respectively.
[0140] In this embodiment, the following is calculated according to the formula:
[0141]
[0142] Step S5: Generate fake road and building images and calculate style difference loss. Plot the obtained intermediate features F... min ,F max The images are fed into the generator decoder for upsampling, generating fake road and building images with maximum and minimum privacy protection. The loss is then calculated using the image-based style difference loss function. This includes the following sub-steps:
[0143] Step S5-1, extract the intermediate features F min ,F max The decoder G input to the generator up In the middle, the upsampled intermediate features F min ,F max The fake road and building images I obtained after privacy protection were obtained respectively. min ,I max
[0144] I min =G up (F min ), I max =G up (F max )
[0145] In this embodiment, the calculation is as follows:
[0146]
[0147] Step S5-2, use the L1 distance of the Gram matrix of the intermediate features of the two images in the pre-trained neural network to measure the style difference between the two images, such that I min With minor stylistic differences from the original image, I max The style differs significantly from the original image. The style difference loss function is formulated as follows:
[0148]
[0149] in, This refers to the loss of style difference. φ i (I) represents the feature map calculated up to the i-th activation function after inputting I into the trained VGG19 network, with shape C. i ×H i ×W i This network uses the relu_1, relu2_1, relu3_1, and relu4_1 layers of VGG19. This means to put φ i (I) In H i W i After vectorizing the features in each dimension, their Gram matrix is calculated using the following formula.
[0150] Gram(φ i (I))=φ i (I)(φ i (I)) T
[0151] In this embodiment, the formula for the style difference loss function is used to calculate...
[0152] Step S6, basic discriminator discrimination. Using the multi-scale Patchgan discriminator D... b The fake road and building images obtained in step S5 are compared with the original real road and building images to determine whether they are real or fake, and the adversarial loss is calculated. This includes the following sub-steps:
[0153] Step S6-1, input I min I max I real The system performs identification on the base discriminator, calculates the adversarial loss on the base discriminator, performs backpropagation, and updates the parameters of the base discriminator.
[0154]
[0155] Among them, I fake Indicate I min and Imax The set, This represents the adversarial loss on the base discriminator.
[0156] In this embodiment, the result is calculated according to the above formula.
[0157] Step S6-2: Calculate the adversarial loss corresponding to the generator.
[0158]
[0159] in, This represents the adversarial loss of the basic discriminator on the generator.
[0160] In this embodiment, the result is calculated according to the above formula.
[0161] Step S7: Calculate the mask guidance loss. By calculating the correlation between building mask elements to reflect the relative positional distribution of building and non-building areas, the discrimination capability of the base discriminator is improved. The losses of the base discriminator and generator are calculated based on the mask guidance loss function. This includes the following sub-steps:
[0162] Step S7-1, given image I fake , that is I min ,I max The two types of images, after being input into the basic discriminator, yield the following discrimination result: In f I Select N s For each query element p, there are 1 query element p. i Random sampling N k The elements are arranged into a vector. The relevance between the query element and these multiple elements is calculated as follows:
[0163]
[0164] N s The relevant queries are arranged into a set.
[0165] In this embodiment, N is taken as s =250, N k =100, calculated as follows:
[0166]
[0167] Step S7-2, given the corresponding image I fake mask M I Scale to I fake Same size. Following the steps in S7-1, the mask M is calculated.I Relevance query set
[0168] In this embodiment, the following calculations are obtained:
[0169]
[0170] Step S7-3: Utilize the two relevance query sets obtained in steps S7-1 and S7-2. Calculate the mask-guided loss:
[0171]
[0172]
[0173] Among them, S c It is the matrix obtained from the intermediate steps, p ij This represents the element in the i-th row and j-th column of the matrix. That is, two relevance query sets. The average of the elements of the matrix obtained by subtracting is used to represent the mask-guided loss.
[0174] Based on the mask-guided loss and the adversarial loss of the base discriminator, backpropagation is performed to iteratively train and update the parameters of the base discriminator.
[0175] In this embodiment, the following calculations are obtained:
[0176]
[0177] Step S8, pixel-level semantic discriminator identification. The fake road and building images obtained in step S5 and the original real road and building images are input into the pixel-level semantic discriminator D. p Perform a true / false detection and calculate the relevant adversarial loss. This includes the following sub-steps:
[0178] Step S8-1, given a fake road and building image I fake , that is I min ,I max Two types of images, I fake The input image is fed into a pixel-level semantic discriminator for real / fake identification. The pixel-level semantic discriminator uses the U-net architecture and performs a multi-class classification task on each pixel of the input image. It has four categories of labels: real buildings, fake buildings, real non-buildings, and fake non-buildings. The final output is the same size as the input image. Its loss function is the cross-entropy between the discriminator's prediction and the ground truth labels, called the semantic discrimination loss. The adversarial loss is calculated as follows:
[0179]
[0180] in, This represents the semantic discrimination loss on a pixel-level semantic discriminator, where the size of the discrimination result is 4×H×W, and H and W are the values of I. real Image size, t c,i,j This represents the value of the element with coordinates (c,i,j) in the identification result of the image, that is, the proportion of the pixel with coordinates (i,j) in the identification image that is classified as c.
[0181] The calculated loss is backpropagated to iteratively train and update the parameters of the pixel-level semantic discriminator.
[0182] In this embodiment, the pixel-level semantic discriminator consists of 7 downsampling convolutional layers and 7 transposed convolutional layers. The loss of the pixel-level semantic discriminator is calculated according to the formula as follows:
[0183] Step S8-2, correspondingly, calculate the semantic discrimination loss on the generator.
[0184]
[0185] in, This represents the semantic discriminative loss on the generator.
[0186] The generator parameters are iteratively trained and updated based on backpropagation using the calculated generator-related loss.
[0187] In this embodiment, the pixel-level semantic discrimination loss corresponding to the generator is calculated according to the formula as follows:
[0188] Step S9: For the trained generator, given the privacy protection strength α, generate the desired fake road and building images. For F... min ,F max Feature interpolation is performed in the feature space using α, followed by upsampling of the interpolated features. This yields a target building image I with a specific level of privacy protection. α :
[0189] I α =G up (F max *α+(1-α)*F min )
[0190] In this embodiment, α = 0.5 is set, and a fake road and building image is generated under this privacy protection level. The calculation results are as follows:
[0191]
[0192] Example 2
[0193] This embodiment provides a location privacy protection system for road building images used to implement the aforementioned location privacy protection method for road building images, such as... Figure 3 As shown, it includes:
[0194] The edge composite module extracts the edges of buildings and stitches together non-building images to form an edge composite image.
[0195] The generator module is used to generate location-privacy-protected fake road and building images. It contains the following sub-modules:
[0196] The encoding module performs multi-layer convolution on the input image to learn image features;
[0197] The decoding module decodes intermediate features and generates fake building images;
[0198] The style learning module contains multiple convolutional layers and one pooling layer. It takes an original road and building image as input and generates a set of mean and standard deviation (μ). max ,σ max ) as a representation of the maximum difference in style of an image;
[0199] The style difference loss calculation module uses the Gram matrix to calculate the style difference of an image, prompting I... min With minor stylistic differences from the original image, I max It has a significant stylistic difference from the original image;
[0200] The mask guidance module is used to calculate the correlation between building mask elements, reflecting the relative positional distribution of building areas and non-building areas.
[0201] The basic discriminator module is used to distinguish between real and fake input images and to conduct adversarial training with the generator.
[0202] The pixel-level semantic discriminator module is used to distinguish between real and fake pixels of the input image, perform semantic classification of buildings, and conduct adversarial training with the generator.
[0203] like Figure 4 The schematic diagram shown illustrates the principle structure. This invention designs a method and system for protecting the location privacy of road and building images by utilizing the discriminator of a generative adversarial network improved by building semantics and combining it with building edge information. By extracting the building edges of the image and inputting them into the generator, the style features of the image are calculated, and a maximum style learning network is set up to generate two fake building images with different levels of style difference. Then, the base discriminator guided by a mask and the pixel-level semantic discriminator are trained adversarially, and finally, a fake road and building image with a specific level of privacy protection is output.
[0204] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit it. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the present invention, and all such modifications or substitutions should be covered within the scope of the claims of the present invention.
Claims
1. A method for protecting the location privacy of road and building images, characterized in that: The method includes the following steps: S1, From the original road and building images I real Extract the building edge image and use the building mask. M b By combining the image with the remaining non-building portion, an edge composite image is created. I B ; S2, Composite edge map I B and the original image I real Encoder input to generator G down Intermediate features were obtained from the middle. F B and F real ; S3, Transfer the original image I real Encoded intermediate features F real The mean and standard deviation are used as the minimum difference style representation; the original road and building images are used. I real The input is fed into a style learning network to generate standard deviation and mean, and the generated standard deviation and mean are used as the style representation with the greatest difference. S4. Adaptive instance normalization is used to normalize the intermediate features of the edge composite graph. F B The mean and standard deviation were aligned to the minimum and maximum difference styles, respectively, to obtain the aligned intermediate features. F min , F max ; S5. Align the intermediate features of the edge composite map. F min , F max The input generator's decoder is upsampled to generate fake road and building images with maximum and minimum privacy protection, respectively. I max , I min And calculate the style difference loss; S6. Use a multi-scale Patchgan discriminator As a basic discriminator for images of fake roads and buildings I max , I min and original real road and building images I real Perform a true / false judgment and calculate the countermeasure loss; S7. The relative positional distribution of building areas and non-building areas is reflected by calculating the correlation between building mask elements. The loss of the basic discriminator and generator is calculated based on the mask-guided loss function. S8, using a pixel-level semantic discriminator D p Images of fake roads and buildings I max , I min and original real road and building images I real Perform true / false discrimination and calculate its semantic discrimination loss; S9. Backpropagate the calculated generator-related loss and iteratively train and update the generator parameters; for the trained generator, given the privacy protection strength... Generate the desired fake road and building images .
2. The method for protecting the location privacy of road and building images according to claim 1, characterized in that: Step S1 includes the following sub-steps: S11. Use the Canny algorithm to extract the edge images of buildings from the original road and building images. The expression of the Canny algorithm is as follows: in The product of Hadamard is represented. Indicates the level of privacy protection. The value controls the extracted building edges The degree of completeness; when When the value is 0, the extracted edges are the most complete and are marked as the true edges of the ground. , ; S12, Extract the real edges Compared to real street view images Construct edge composite maps jointly under the guidance of building masks. : 。 3. The method for protecting the location privacy of road and building images according to claim 1, characterized in that: Step S3 includes the following sub-steps: Minimal style difference representation: Calculated along the channel and batch dimensions of the original image. I real Encoded intermediate features F real mean and standard deviation They are respectively: In the formula, These represent the height and width of the intermediate feature, respectively. They represent F real The elements in H,W Dimensional index; Minimal style difference representation: the original image I real The input is fed into a style learning network to generate a set of standard deviations and means. As the maximum difference style representation of an image, the style learning network contains multiple convolutional modules and a final pooling layer. Each convolutional module includes a convolutional layer, an activation function, and a batch normalization layer.
4. The method for protecting the location privacy of road and building images according to claim 1, characterized in that: Step S4 includes the following sub-steps: S41. Calculate the edge composite graph along the batch and channel dimensions. I B Encoded intermediate features F B Mean and standard deviation: S42. Perform style alignment to obtain the aligned intermediate features. F min , F max : In the formula, , , respectively, represent the mean and standard deviation of the minimum and maximum style differences.
5. The method for protecting the location privacy of road and building images according to claim 1, characterized in that: Step S5 includes the following sub-steps: S51, intermediate features F min , F max Decoder input to generator G up In the middle, upsampling intermediate features F min , F max Each obtained a fake road and building image with privacy protection. I min ,I max : , S52. Using intermediate features from two images in a pre-trained neural network Gram Matrix L 1. Distance is used to measure the style difference between two images. I min It has a smaller stylistic difference from the original image. I max If the goal is to achieve a greater stylistic difference from the original image, then the style difference loss function is: in, That is, loss of style difference. This represents the input in the pre-trained classic classification neural network VGG19. I The feature map after the activation function of the i-th layer is calculated, and its shape is as follows: ; Gram The matrix method is as follows: Indicates to exist After vectorizing the features in the dimension, calculate its Gram matrix.
6. The method for protecting the location privacy of road and building images according to claim 1, characterized in that: Step S6 includes the following sub-steps: S61, Input I min ,I max , I real The system performs discrimination against the base discriminator, calculates the adversarial loss on the base discriminator, performs backpropagation, and updates the parameters of the base discriminator. The adversarial loss is: in, express I min and I max The set, This represents the adversarial loss on the base discriminator; S62. Calculate the corresponding adversarial loss on the generator: in, This represents the adversarial loss of the basic discriminator on the generator.
7. The method for protecting the location privacy of road and building images according to claim 1, characterized in that: Step S7 includes the following sub-steps: S71, Given image I fake The discrimination result obtained after inputting into the basic discriminator is represented as follows: ,exist f I Select N s Each query element, for each query element Random sampling N k The elements are arranged into a vector. The relevance between the query element and these multiple elements is calculated as follows: Will N s The relevant queries are arranged into a relevant query set. ; S72, Given correspondence and image I fake mask M I Scale to I fake For the same size, the mask is calculated. M I Relevance query set ; S73, Based on two relevance query sets , Calculate the mask-guided loss: in, S c It is a matrix obtained from intermediate steps. p ij Represents the first in the matrix i Line number j Column elements, That is, two relevance query sets. , The average of the elements of the matrix obtained by subtraction represents the mask-guided loss; Based on the mask-guided loss and the adversarial loss of the base discriminator, backpropagation is performed to iteratively train and update the parameters of the base discriminator.
8. The method for protecting the location privacy of road and building images according to claim 1, characterized in that: Step S8 includes the following sub-steps: S81, Given a fake road and building image I fake ,Right now I min ,I max Two types of images, I fake The input is fed into a pixel-level semantic discriminator for true / false identification; The pixel-level semantic discriminator adopts the U-net architecture, performing a multi-classification task on each pixel of the input image. The classification labels include: real buildings, fake buildings, real non-buildings, and fake non-buildings. The final output has the same size as the input image. The cross-entropy between the discriminator's predictions and the ground truth labels is used as the loss function, called the semantic discriminant loss, which is expressed as: in, This represents the semantic discrimination loss on the pixel-level semantic discriminator, where the size of the discrimination result is [value missing]. ,in H , W yes I real Image size, The coordinates in the identification result of the identification image are ( c, i,j The element value of ), that is, the coordinates in the identification image. The pixels are classified as c The proportion of gravity; Backpropagation is performed based on the calculated loss, and the parameters of the pixel-level semantic discriminator are iteratively trained and updated. S82. Calculate the semantic discriminative loss on the generator: in, This represents the semantic discriminative loss on the generator; The generator parameters are iteratively trained and updated based on backpropagation using the calculated generator-related loss.
9. A method for protecting the location privacy of road and building images according to claim 1, characterized in that: In step S9, the trained generator is used to determine the privacy protection strength. ,right F min , F max Utilizing feature space Feature interpolation is performed, followed by upsampling of the interpolated features to obtain a target building image with a specific level of privacy protection. : In the formula, This indicates an upsampling operation.
10. A system for performing the location privacy protection method for road and building images according to any one of claims 1-9, characterized in that: The system includes: The edge composite module extracts building edges and stitches non-building images together to form an edge composite image. The generator module is used to generate fake road and building images with location privacy protected; The masking guidance module is used to calculate the correlation between building mask elements, reflecting the relative positional distribution of building areas and non-building areas. The basic discriminator module is used to distinguish between real and fake input images and to conduct adversarial training with the generator. The pixel-level semantic discriminator module is used to distinguish between real and fake pixels of the input image, perform semantic classification of buildings, and conduct adversarial training with the generator. The generator module contains the following sub-modules: The encoding module performs multi-layer convolution on the input image to learn image features; The decoding module decodes intermediate features and generates fake building images; The style learning module contains multiple convolutional layers and one pooling layer. It takes an original road and building image as input and generates a set of mean and standard deviations. As a representation of the most significant style differences in images; The style difference loss calculation module utilizes Gram Matrix calculation of style differences in images, and with I min It has a smaller stylistic difference from the original image. I max The goal is to create a greater stylistic difference from the original image.