A multimedia data security measurement method based on multi-dimension correlation
By employing methods such as spectral clustering, content recognition models, ULDC, and higher-order logic theorem proofs, the shortcomings in security verification after multimedia data encryption are addressed, enabling multi-dimensional security assessment and ensuring content, privacy, and semantic security.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HANGZHOU HIKVISION DIGITAL TECHNOLOGY CO LTD
- Filing Date
- 2025-08-14
- Publication Date
- 2026-06-26
AI Technical Summary
There is a lack of verification methods to determine whether encrypted multimedia data meets security requirements, especially in terms of content recognition, privacy protection, and semantic security.
This paper employs spectral clustering algorithms and content recognition models to perform feature clustering and key component identification on multimedia data. It combines ULDC for data cleaning and confidence assessment, performs semantic security verification by selecting plaintext attacks, and adds noise perturbation in a high-order logic theorem proof environment for formal security verification. The paper comprehensively evaluates the security of multimedia data.
It enables comprehensive and accurate security assessment of multimedia data, ensuring content security, privacy information security, and semantic security, and meeting multi-dimensional security verification needs.
Smart Images

Figure CN120951176B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of information security technology, and in particular to a multimedia data security measurement method and electronic device based on multi-dimensional correlation. Background Technology
[0002] With the development of internet technology, multimedia data (such as images and videos) is widely used in various scenarios such as social platforms, e-commerce, and intelligent video communication. For example, with the dissemination and storage of visual data, security issues have gradually emerged, leading to problems such as the tampering, leakage, and misuse of visual data, seriously threatening user privacy and security. To ensure visual security, much multimedia data is encrypted. How to verify whether encrypted multimedia data meets security requirements has become an urgent problem to be solved. Summary of the Invention
[0003] The purpose of this application is to provide a method and device for measuring the security of multimedia data based on multi-dimensional correlation, so as to solve the technical problem of how to verify whether encrypted multimedia data can meet security requirements.
[0004] In a first aspect, embodiments of this application provide a method for measuring the security of multimedia data based on multi-dimensional association, including: for multimedia data, using a spectral clustering algorithm based on spatial similarity, dividing multiple features of a single image in the multimedia data into multiple cluster groups; based on a content recognition model, performing content recognition on key parts of the features of the multiple cluster groups; and performing content security assessment on the multimedia data based on the content recognition results to obtain a content security score.
[0005] The confidence level of multimedia data is assessed by ULDC. Based on the confidence level assessment results and Lagrange interpolation, the multimedia data is cleaned to obtain cleaned and restored data. The privacy and security of the restored data is assessed to obtain a privacy and security score.
[0006] By selecting plaintext attacks to perform semantic security verification on multimedia data, a semantic security score is obtained.
[0007] In the context of high-order logic theorem proof, the three processes of content security assessment, privacy information security assessment and semantic security verification are abstractly modeled. Invariant constraints of the three processes are constructed. After adding noise perturbation to the multimedia data, the invariant constraints are input. The key attributes of the invariant constraints are formally verified to obtain the formal security score of the multimedia data.
[0008] The security of the multimedia data is determined by combining content security score, privacy information security score, semantic security score, and formal security score.
[0009] Optionally, before performing content recognition of key parts based on the features of multiple cluster groups using the content recognition model, the method further includes:
[0010] The convolutional feature maps of the original samples are clustered into multiple different cluster groups;
[0011] Convolutional feature maps belonging to the same cluster group are fused into a mask map of key parts based on spatial similarity. The spatial size of the mask map is the same as that of the convolutional feature map.
[0012] The mask image of each key part is input into the noise reduction diffusion model, and forward noise reduction diffusion and reverse noise reduction diffusion are performed respectively until a new sample that has learned the probability distribution of the original sample is generated.
[0013] The mean squared error of the new sample is compared with that of the original sample, and the content recognition model is trained according to the inverse relationship between the mean squared error and the content recognition result.
[0014] Optionally, after training the content recognition model according to the inverse relationship between mean squared error and content recognition result, the method further includes:
[0015] The content recognition model uses a ResNet-50 network. The batch normalization layer, activation layer, and fully connected layer of the ResNet-50 network are removed, and the five convolutional layers of the ResNet-50 network are retained. The five convolutional layers realize the content recognition of the input image through forward propagation.
[0016] Optionally, based on spatial similarity, a spectral clustering algorithm is used to divide multiple features of a single image in the multimedia data into multiple cluster groups. Based on a content recognition model, content recognition of key parts of the features in the multiple cluster groups is performed. Based on the content recognition results, a content security assessment is conducted on the multimedia data to obtain a content security score, including:
[0017] For a single image in the multimedia data, a spectral clustering algorithm is used to group multiple features with a spatial distance less than a preset threshold into the same cluster group based on spatial similarity. Each cluster group includes multiple appearance features and multiple relationship features.
[0018] In each cluster group, the multiple appearance features are fused based on the multiple relational features to obtain the fused features corresponding to the key parts;
[0019] The fused features are input into the content recognition model, and the content recognition model is used to predict the content recognition category of multiple key parts of the multimedia data. The content recognition result is determined based on the mean square error between the true value of the category and the content recognition category of the multiple key parts.
[0020] The multimedia data is assessed for content security based on the content recognition results. The larger the mean square error, the greater the deviation of the content recognition results, and the higher the content security coefficient of the multimedia data.
[0021] Optionally, the multimedia data is assessed for confidence using ULDC, and the multimedia data is cleaned based on the confidence assessment results and Lagrange interpolation, including:
[0022] Features in the same cluster group are compressed to obtain low-dimensional features of key parts. The probability distribution of the low-dimensional features is determined based on ULDC. The confidence of the low-dimensional features and the intra-cluster threshold in the same cluster group are determined based on the Mahalanobis distance from the low-dimensional features to the center of the probability distribution.
[0023] Based on the confidence level of each feature in the low-dimensional features, the proportion of difficult samples retained in the low-dimensional features is maximized by using the intra-class threshold and Lagrange interpolation, and outliers in the low-dimensional features are deleted or corrected to obtain cleaned and repaired data. The outliers include at least one of duplicate data, data distribution outliers, format outliers, and logical outliers.
[0024] The repaired data is subjected to Lagrange interpolation, and the interpolated data is then deduplicated a second time to achieve data cleaning of the multimedia data.
[0025] Optionally, outliers in the low-dimensional features may be deleted or corrected, including at least one of the following methods:
[0026] Compare the attribute values of the low-dimensional features, define at least two sets of low-dimensional features with the same attribute values as duplicate data, and delete at least one set of duplicate data; and / or
[0027] Construct the quartiles of the low-dimensional features using a box plot, determine the interquartile range (IQR) based on the quartiles, delete the low-dimensional features corresponding to the box-whisker portion of the box plot; and / or;
[0028] By using at least one of data type matching, regular expressions, and value range validation, determine whether the format of the low-dimensional feature conforms to expectations, and correct the format of outliers that do not conform to expectations according to a preset format; and / or
[0029] By using business logic constraints, outliers that do not meet expectations are corrected by filling in the median value.
[0030] Optionally, in a higher-order logic theorem proof environment, the three processes—content security assessment, privacy information security assessment, and semantic security verification—are abstractly modeled, and data flow invariant constraints for these three processes are constructed. Formal security verification is then performed on the key attributes of these invariant constraints, including:
[0031] The content security assessment process, the privacy information security assessment process, and the semantic security verification process are mapped to the formal verification tool Isabelle / HOL using the structured proofreading language Isar, forming three state transition rule modules.
[0032] For the image data input to the single state transition rule module, noise is added through random interference error or deterministic interference error;
[0033] Through the coupled differential equations of error propagation, the three state transition rule modules are subjected to interactive mechanical proof. When the solution space of the coupled differential equations satisfies the invariant constraints, it is verified whether the confidentiality of the content security assessment process falls below the minimum confidentiality threshold, whether the real-time performance of the privacy information security assessment process is achieved within the specified time window, and whether the reliability of the semantic security verification process satisfies the Lyapunov function proof.
[0034] Optionally, the method further includes: adding external noise to a simulated image of the image data input to the single state transition rule module, and adding internal noise to the state rule module;
[0035] If the total value of the external noise and the internal noise exceeds the noise tolerance, and the confidentiality function of the content security assessment process falls below the minimum confidentiality threshold, or the real-time function of the privacy information security assessment process falls within a specified time window, or the reliability function of the semantic security verification process satisfies the Lyapunov function proof, then the formal security verification is deemed to have failed.
[0036] Optionally, the method further includes: adding external noise to a simulated image of the image data input to the single state transition rule module, and adding internal noise to the state rule module;
[0037] If the total value of the external noise and the internal noise is not greater than the noise tolerance, and if the confidentiality function of the content security assessment process does not fall below the minimum confidentiality threshold, or if the real-time function of the privacy information security assessment process exceeds the specified time window, or if the reliability function of the semantic security verification process does not satisfy the Lyapunov function proof, then the formal security verification is deemed to have failed.
[0038] Secondly, embodiments of this application provide an electronic device, including:
[0039] processor;
[0040] A memory storing computer-readable instructions, which, when executed by the processor, implement the above method.
[0041] The beneficial effects of the technical solutions provided in this application include at least the following:
[0042] This application embodiment uses a spectral clustering algorithm combined with a content recognition model to perform content recognition and content security scoring on key parts of the clustered features of multimedia data. ULDC is used to clean the multimedia data to obtain repaired data, and a privacy information security assessment is performed on the repaired data to determine whether content recognition or privacy information can be directly obtained from the multimedia data. A plaintext attack is used to perform semantic security verification on the multimedia data to determine whether semantic recognition can be performed on the cleaned multimedia data to extract the semantics of the multimedia data, thereby determining whether the confidentiality of the multimedia data meets security requirements. Noise perturbation is applied to the content security assessment process, privacy information security assessment process, and semantic security verification process in a high-order logic theorem proof environment to verify its formal security under noise perturbation. The four security scores are combined to obtain a more comprehensive and accurate multimedia data security assessment that meets security verification requirements. Attached Figure Description
[0043] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0044] Figure 1 This is a schematic diagram of a multimedia data security measurement method based on multi-dimensional correlation provided in an embodiment of this application. Detailed Implementation
[0045] The present application will be described in detail below with reference to the specific embodiments shown in the accompanying drawings. However, these embodiments do not limit the present application. Any structural, methodological, or functional modifications made by those skilled in the art based on these embodiments are included within the protection scope of the present application.
[0046] In order to measure the security of multimedia data encryption more accurately, scientifically and objectively, the embodiments of this application can measure the security of multimedia data from multiple dimensions.
[0047] Multimedia data can be data obtained by encrypting image or video data.
[0048] See Figure 1 As shown in the embodiments of this application, a multimedia data security measurement method based on multi-dimensional correlation is provided, including:
[0049] S101: For multimedia data, a spectral clustering algorithm is used to divide multiple features of a single image in the multimedia data into multiple cluster groups based on spatial similarity. Based on the content recognition model, the content of key parts of the features of multiple cluster groups is identified. Based on the content recognition results, the multimedia data is evaluated for content security and a content security score is obtained.
[0050] Convolutional neural networks (CNNs) can be used to perform deep semantic analysis and extract key features from image data. In step S101, the CNN can predict which features in a cluster belong to a key part. Combining the prediction results of multiple key parts, the content recognition result is obtained. If S101 can obtain the content recognition result for multimedia data, the content security score is set to zero; if S101 cannot obtain the content recognition result for multimedia data, the content security score is set to 1.
[0051] In the embodiments of this application, multimedia data may include encrypted video or images, such as encrypted visual data. Alternatively, multimedia data may also include encrypted visual data and audio data, etc., without limitation.
[0052] For example, suppose the multimedia data includes multiple images. The segmentation of multiple features in a single image can be achieved using a spatial similarity matrix, decomposing multiple convolutional feature maps into multiple feature vectors. A spectral clustering algorithm is then used to cluster these feature vectors into three groups: foreground, background, and edge regions. The feature vectors from each group are then fused to generate a mask map for the corresponding region (i.e., masks for the foreground, background, and edge regions). A heatmap is used to calculate and locate key areas (e.g., eye and nose regions) within the mask map, determining the probability of their presence in the corresponding region. A noise reduction diffusion model is then used to perform forward and reverse noise reduction diffusion on the mask map of these key areas, resulting in a restored image with the same size as the original single image. The restored image is then compared to the original single image to determine if content recognition of the multimedia data is possible.
[0053] In the above noise reduction diffusion model, the process of determining whether the new sample has learned the probability distribution of the original sample can be done by judging whether two images have highly similar structural features and whether there is a strong dependency between pixels, thereby determining whether they are spatially close. The similarity of structural features can be calculated using the following formulas: the similarity of local brightness, the similarity of contrast, and the similarity of structural features between the new and original samples.
[0054]
[0055] Where C1 represents the local brightness of the mask image of the original sample, C2 represents the contrast of the original image, and C3 represents the structural parameters of the original image. These are the standard deviations of the new sample P and the original sample C, respectively. denoted as the pixel mean of the new sample and the original image, respectively; I(P,C) represents the similarity of local brightness between the new sample and the original sample; c(P,C) represents the similarity of contrast between the new sample and the original sample; and s(P,C) represents the similarity of structural features between the new sample and the original sample.
[0056]
[0057] By combining the three similarities using the above formula, a structural similarity index is obtained, which serves as the criterion for evaluating intra-class thresholds:
[0058]
[0059] in, Parameters characterizing the importance of controlling brightness comparison. Parameters characterizing the importance of controlling contrast comparison. The additional parameters characterizing the importance of the control structure comparison, in an alternative embodiment, can be implemented using the standard SSIM formula, in which case... , and All three are 1.
[0060]
[0061] Using the above formula, the similarity distribution of the three indicators is calculated. The similarity distribution ranges from 0 to 1. The closer it is to 1, the more similar the structures of the two images are. When SSIM becomes stable, it is determined that the noise reduction diffusion model has completed learning. At this time, new samples that have learned the probability distribution of the original samples can be output.
[0062] Before performing content recognition on key parts of features from multiple clusters based on the content recognition model, the content recognition model can be trained in the following way:
[0063] The convolutional feature maps of the original samples are clustered into multiple different cluster groups;
[0064] Convolutional feature maps belonging to the same cluster group are fused into a mask map of key parts based on spatial similarity. The spatial size of the mask map is the same as that of the convolutional feature map.
[0065] The Denoising Diffusion Probabilistic Model (DDPM) is used to perform forward and reverse denoising diffusion on the mask images of various key parts to obtain new samples from the original samples. Both forward and reverse denoising diffusion can be implemented using Markov algorithms. During the forward denoising diffusion process, noise is gradually added to the mask image using DDPM until the mask image is transformed into isotropic Gaussian noise. During the reverse denoising diffusion process, noise is gradually removed from the aforementioned isotropic Gaussian noise using DDPM until a new sample that conforms to the probability distribution of the original sample is generated.
[0066] Compare the mean squared errors of the new samples with those of the original samples, and train the content recognition model according to the inverse relationship between the mean squared error and the content recognition result.
[0067] The aforementioned key parts may include human joints (such as knees, elbows, shoulders, etc.), facial features (such as eyes, nose, mouth, ears, etc.), or key areas of objects (such as car doors, wheels, license plates, etc.), without limitation.
[0068] Assume that the new sample P and the original sample C both have the same size. The mean squared error (MSE) (P, C) can be calculated using the following formula:
[0069]
[0070] A larger MSE(P,C) indicates a lower content recognition accuracy and a higher content security score for the content recognition model.
[0071] A spectral clustering algorithm based on spatial similarity is used to cluster the convolutional feature maps into different cluster groups. The feature maps are then fused into a mask map of key regions, where the value of each pixel in the mask map represents the probability of that key region in the entire image.
[0072] Assume the convolutional neural network described above is implemented using the convolutional layers of the ResNet-50 network. This ResNet-50 network consists of five convolutional sub-networks, each comprising a convolutional layer, a batch normalization layer, and an activation layer, all connected by fully connected layers. After training the ResNet-50 network's content recognition function using samples, only the five convolutional layers of the ResNet-50 network can be retained.
[0073] For example, a 224×224 resolution image is input into the five convolutional layers of a ResNet-50 network. After forward propagation, the outputs of the third convolutional layer (Conv3), the fourth convolutional layer (Conv4), and the fifth convolutional layer (Conv5) are used as the convolutional feature map of the image to be processed. This convolutional feature map can be denoted as (…). ),in, Bilinear upsampling was performed on the Conv4 and Conv5 feature maps, with a uniform size of 56×56. The outputs of the three layers, Conv3, Conv4, and Conv5, were concatenated to obtain a fused feature map with a size of 56×56 and 1792 channels.
[0074] For multiple convolutional feature maps, the spatial similarity matrix between pixels is calculated using the Gaussian kernel function. The spectral clustering algorithm is then used to locate key parts, and convolutional feature maps belonging to the same key part are grouped into the same group.
[0075] S102: Confidence assessment of multimedia data is performed using ULDC (Unsupervised Learning-Based Data Cleaning). Based on the confidence assessment results and Lagrange interpolation, the multimedia data is cleaned to obtain cleaned and restored data. Privacy and security assessment is then conducted on the restored data to obtain a privacy and security score. ULDC assesses the confidence of multimedia data through the distribution and similarity of low-dimensional samples. It does not rely on training samples, allows for rapid deployment, and can improve the overall efficiency of the system.
[0076] Features within the same cluster are compressed to obtain low-dimensional features of key components. The probability distribution of these low-dimensional features can be determined based on ULDC (Ultra-Limited Data Conversion). The confidence level of each low-dimensional feature and its intra-cluster threshold can then be determined based on the Mahalanobis distance from the low-dimensional feature to the center of the probability distribution. This intra-cluster threshold allows for the retention of difficult samples within the low-dimensional features of the same cluster while improving the probability of identifying erroneous samples.
[0077] For example, suppose the low-dimensional features of key parts in the same cluster group are used This means that the mean vector (i.e., the center of the probability distribution) of these low-dimensional features can be calculated using the following formula:
[0078]
[0079] Where n represents the total number of low-dimensional features;
[0080] The covariance matrix between low-dimensional features and the probability distribution center can be calculated using the following formula.
[0081]
[0082] Where n represents the total number of low-dimensional features, μ represents the mean vector of the probability distribution of low-dimensional features in the same cluster group, and X i Represents the i-th low-dimensional vector.
[0083] The Mahalanobis distance from a low-dimensional feature to the center of the probability distribution can be calculated using the following formula:
[0084]
[0085] in, The inverse of the covariance matrix is represented by μ, and the center of the probability distribution is represented by X. i Representing the i-th low-dimensional vector, The transpose of the difference matrix between the i-th low-dimensional vector and the center of the probability distribution.
[0086] The Mahalanobis distance represents the distance between the i-th low-dimensional vector and the center of the probability distribution. The smaller the Mahalanobis distance, the closer the i-th low-dimensional vector is to the center of the probability distribution within the class, and the higher its confidence level.
[0087] The confidence level of the i-th low-dimensional vector can be obtained by the following probability density function PDF transformation:
[0088]
[0089] Among them, X i Representing the i-th low-dimensional vector, The covariance matrix that represents the relationship between low-dimensional features and the center of the probability distribution.
[0090] In conf(X) i Not less than the minimum confidence level When the confidence level of a low-dimensional feature meets the computational requirements, its corresponding i-th low-dimensional vector is assigned to the same cluster group. The intra-cluster threshold τ of this cluster group can then be calculated using the following formula:
[0091]
[0092] in, Characterizing the lowest confidence level, The covariance matrix that represents the relationship between low-dimensional features and the center of the probability distribution.
[0093] During the data cleaning process, based on the confidence level of each feature in the low-dimensional features, the proportion of difficult samples retained in the low-dimensional features is maximized by using intra-class thresholds and Lagrange interpolation. Outliers in the low-dimensional features are deleted or corrected to obtain cleaned and repaired data. Then, Lagrange interpolation is performed on the repaired data, and the interpolated data is deduplicated a second time to achieve data cleaning of the multimedia data.
[0094] Outliers may include, but are not limited to, at least one of the following: duplicate data, data distribution outliers, format outliers, and logical outliers.
[0095] For example, data cleaning can be performed using one or more of the following methods:
[0096] Method 1: By comparing the attribute values of the low-dimensional features, at least two sets of low-dimensional features with the same attribute values are defined as duplicate data, and at least one set of duplicate data is deleted. After the redundant data is removed, the overall amount of data analyzed is reduced, and the system's efficiency is improved.
[0097] Method 2: Use a box plot to construct the quartiles of the low-dimensional features, determine the interquartile range (IQR) based on the quartiles, and delete the low-dimensional features corresponding to IQR values that exceed the upper and lower limits, i.e., IQR values that significantly deviate from the distribution, as outliers.
[0098] A box plot consists of five data points: the minimum observation, the lower quartile (25th percentile) (Q1), the median, the upper quartile (75th percentile) (Q3), and the maximum observation. The minimum and maximum observations can be represented by the following formulas:
[0099] Minimum observation ,
[0100] Maximum observation ,
[0101] ,
[0102] IQR represents the distance between the lower quartile Q1 and the upper quartile Q3;
[0103] The box whiskers in the box configuration diagram are 1.5 IQR in length and are divided into two segments.
[0104] The two box segments are Q1 extended to the minimum and Q3 extended to the maximum, respectively.
[0105] Outliers that are identified as significant deviations from the distribution (i.e., values exceeding the maximum or falling below the minimum observed value) are directly deleted. By removing outliers, misleading analytical conclusions are reduced, and the accuracy of the assessment is improved.
[0106] Method 3: Determine whether the format of the low-dimensional feature meets expectations by using at least one of data type matching, regular expression, and value range verification, and correct the format of outliers that do not meet expectations according to a preset format.
[0107] For example, inconsistent text formatting or inconsistent content fields in a table can be removed as outliers. This includes full-width characters among multiple half-width characters, inconsistent date formats, inconsistent time formats, and inconsistent decimal places. For instance, but not limited to, at least one of data type matching, regular expressions, and value range validation can be used to determine whether the formats of multiple low-dimensional features are consistent. Standardizing data formats enhances data comparability and reduces the complexity of analysis.
[0108] Method 4: Correct outliers that do not meet expectations by filling them with the median value through business logic constraints.
[0109] Logic not conforming to expectations can be due to logical errors in the identified content. For example, identifying obvious errors in the text of idioms or proverbs in images, or logical errors in data such as identifying someone as being over 150 years old.
[0110] If a low-dimensional feature has few logical errors and is numerical data, it is considered correctable. This can be achieved by imputing the median value, such as when weight and age are negative. Removing illogical data makes the data more realistic and avoids absurd errors, thus improving the accuracy of the analytical conclusions.
[0111] Suppose there's only one logical error in the low-dimensional feature: age -130 years. Then, we can extract the median of the normal data in the target format of the low-dimensional feature and replace the outlier with this value. The median can be the value at the middle of the sorted sequence, thus avoiding the influence of extreme values and preserving the central tendency of the data. For example, we can replace the outlier -130 years with the median of the age-formatted data in the low-dimensional feature, which is 35 years old. Using the median to impute missing data ensures a complete data structure, sufficient fields, prevents bias in analysis results due to missing values, and guarantees data sufficiency.
[0112] If a low-dimensional feature has too many logical errors, it can be directly deleted, or Lagrange interpolation can be performed on it, returning to the step of determining whether the logical errors are below the logical error threshold based on business logic constraints.
[0113] S103: Perform semantic security verification on multimedia data by selecting plaintext attacks and obtain a semantic security score.
[0114]
[0115] S103's chosen-plaintext attack can be implemented, but is not limited to, through the following methods:
[0116] A semantic security model is used to construct two oracles, left and right, which query the answer using different plaintexts. If an attacker can distinguish which plaintext was used to encrypt the multimedia data under an IND-CCA attack, the encryption method is deemed insecure; if the attacker cannot distinguish which plaintext was used to encrypt the multimedia data under an IND-CCA attack, the encryption method is deemed semantically secure.
[0117] For example, suppose attacker A accesses the programs of two oracles, which reside in different worlds and use different encryption methods. Given any pair of messages of equal length as input, the oracles will return a ciphertext. In the world in which the oracles reside, the oracles calculate two possible methods for this ciphertext. The left and right oracles use... This indicates that one of the messages is encrypted using an oracle based on bit b.
[0118] The two worlds in which the two oracles exist are as follows:
[0119] World0: The oracle provided to attacker A is As long as an attacker sends a plaintext query to the oracle... ,in The oracle calculates And return C as the answer.
[0120] World1: The oracle provided to attacker A is As long as an attacker sends a plaintext query to the oracle... ,in The oracle calculates And return C as the answer.
[0121] Consider the oracle as a subroutine accessible to attacker A. Attacker A can perform an oracle query (M0, M1) by calling the subroutine with parameters (M0, M1). The oracle then returns the answer C (i.e., the ciphertext). Since attacker A has no control over how the ciphertext is computed and cannot see the internal workings of the subroutine (because attacker A cannot provide the key, only the interface to the subroutine), and only has the ability to treat the oracle as a black box and determine whether the returned answer C is from the left or right oracle, attacker A can only use it as a black box.
[0122] During semantic security verification, attackers can use a choose-plaintext attack to select the order of message pairs. By using the choose-plaintext attack described above, they can break symmetric encryption algorithms.
[0123] Assumption It is a symmetric encryption system. In the process of verifying semantic security through a chosen-plaintext attack, the left and right oracles can be represented by the following formula:
[0124] |
[0125] The reliability of this symmetric encryption system can be verified by the following formula: (The chosen plaintext attack can be used to verify the reliability of the symmetric encryption system.)
[0126]
[0127] Before attacker A begins interacting with the oracle, the order of the worlds where the left and right oracles reside is selected once from the message inputs.
[0128] In World0, all message pairs sent to the oracle are responded to by the oracle encrypting the left-hand message of the pair. In World1, all message pairs are responded to by the oracle encrypting the right-hand message of the pair. Attacker A needs to determine whether the oracle they are interacting with is the left-hand oracle or the right-hand oracle.
[0129] Using attacker A The probability of the oracle minus the probability of attacker A is... The probability of guessing incorrectly is calculated, and the probability difference is obtained. Based on this probability difference, the semantic security verification result under the chosen-plaintext attack is obtained. If this probability difference is negligible, it proves that the algorithm cannot distinguish whether the answer C is encrypted as plaintext M0 or M1 under the chosen-plaintext attack. In this case, it can be determined that the encryption algorithm is semantically secure, that is, the semantic security verification of the algorithm under the chosen-plaintext attack is successful.
[0130] S104: In a high-order logic theorem proof environment, abstract models are built for the three processes: content security assessment, privacy information security assessment, and semantic security verification. Invariant constraints for the three processes are constructed. After adding noise perturbation to the multimedia data, the invariant constraints are input, and formal security verification is performed on the key attributes of the invariant constraints to obtain the formal security score of the multimedia data.
[0131] In S104, the three processes of content security assessment, privacy information security assessment and semantic security verification can be mapped to the formal verification tool Isabelle / HOL using the structured proofreading language Isar, forming three state transition rule modules.
[0132] For the image data input to the single state transition rule module, noise is added through random interference error or deterministic interference error;
[0133] Through the coupled differential equations of error propagation, the three state transition rule modules are subjected to interactive mechanical proof. When the solution space of the coupled differential equations satisfies the invariant constraints, it is verified whether the confidentiality of the content security assessment process falls below the minimum confidentiality threshold, whether the real-time performance of the privacy information security assessment process is achieved within the specified time window, and whether the semantic security verification satisfies the Lyapunov function proof.
[0134] The above system of coupled differential equations can be expressed by the following formula:
[0135]
[0136] Characterizing the natural decay coefficient ( ); Characterizing the positive coupling coefficient between states, for example... This indicates that real-time performance enhances confidentiality. The negative impact coefficient of characterization error on the state.
[0137] The confidentiality function of the content security assessment process can be used This indicates the security attributes of multimedia data, used to characterize whether sensitive data is only visible to authorized entities. The above steps can be used to formally prove the probability of the process leaking sensitive data.
[0138] The real-time function of the privacy information security assessment process can be used This indicates a time constraint on multimedia data, verifying whether the privacy and security assessment process for the multimedia data can be completed within a specified time window. This specified time window can be a task deadline, a processing delay limit, or a worst-case execution time (WCET).
[0139] The reliability function of the semantic security verification process can be used This indicates that it represents the fault tolerance mechanism for multimedia data, verifying whether the semantic security of multimedia data can still satisfy the Lyapunov function proof in a noisy environment, thereby ensuring stable output and data consistency under simulated external interference (such as simulated noise) and internal interference (such as network jitter).
[0140] For example, the confidentiality of the content security assessment process meets the following conditions:
[0141] Initial confidentiality before adding a noisy environment
[0142] and ,Right now, It must not fall below the confidentiality threshold.
[0143] The real-time nature of the privacy information security assessment process meets the following conditions:
[0144]
[0145] Achieve the target in real time within the time window T.
[0146] Reliability function of semantic security verification process The following conditions must be met:
[0147] When the reliability function of the semantic security verification process is less than the noise tolerance. If the reliability of the semantic security process satisfies the Lyapunov function, then the semantic security verification process is considered successful.
[0148] When the reliability function of the semantic security verification process is not less than the noise tolerance, it is determined whether the total value of external noise and internal noise exceeds the noise tolerance. External noise is added to the simulated image of the image data, and internal noise is added in the state rule module. If the reliability of the semantic security verification process satisfies the Lyapunov function proof when the total value of external noise and internal noise exceeds the noise tolerance, then the formal security verification is deemed to have failed.
[0149] In proving the Lyapunov function, it is only necessary to find the eigenvalues of the coefficient matrix, and its stability can be determined according to the following process:
[0150] If all eigenvalues of the system matrix A lie in the left half-plane, meaning all eigenvalues are less than 0, and all eigenvalues of A have negative real parts, then it proves that the system's equilibrium state is asymptotically stable during semantic security verification. If, during semantic security verification, the total value of external and internal noise exceeds the noise tolerance, the system's equilibrium state is still asymptotically stable, and in this case, the semantic security verification result can be determined to be inaccurate.
[0151] The total value of the aforementioned external noise and the internal noise is calculated as follows:
[0152]
[0153] Wherein, E(t) represents the total value of the external noise and the internal noise. Characterizing external noise, and The weighting coefficients represent the external noise and the internal noise, respectively. A confidentiality function characterizing the content security assessment process. This represents the real-time function in the privacy information security assessment process. and The expression for the internal noise is obtained by subtraction.
[0154] For example, the aforementioned external noise and / or internal noise can be added not only to the formal security verification process, but also to the content security assessment process, and / or to the privacy information security assessment process;
[0155] If the total value of external noise and internal noise exceeds the noise tolerance, and the confidentiality of the content security assessment process falls below the minimum confidentiality threshold, or the real-time performance of the privacy information security assessment process is achieved within the specified time window, or the reliability of the semantic security verification process satisfies the Lyapunov function proof, then the formal security verification is deemed to have failed.
[0156] For example, if the total value of the external noise and the internal noise is not greater than the noise tolerance, and if the confidentiality function of the content security assessment process does not fall below the minimum confidentiality threshold, or the real-time function of the privacy information security assessment process exceeds the specified time window, or the reliability function of the semantic security verification process does not satisfy the Lyapunov function proof, then the formal security verification is determined to have failed.
[0157] If the total value of external noise and internal noise is not greater than the noise tolerance, the confidentiality of the content security assessment process does not fall below the minimum confidentiality threshold, or the real-time performance of the privacy information security assessment process is achieved within the specified time window, or the reliability of the semantic security verification process satisfies the Lyapunov function proof. Moreover, if the total value of external noise and internal noise is greater than the noise tolerance, the confidentiality of the content security assessment process is less than the minimum confidentiality threshold, or the real-time performance of the privacy information security assessment process cannot be achieved within the specified time window, or the reliability of the semantic security verification process does not satisfy the Lyapunov function proof, the formal security verification can be determined to be successful based on such a change process.
[0158] S105: Combine content security score, privacy information security score, semantic security score and formal security score to determine whether multimedia data is secure.
[0159] For example, if any one of the content security score, privacy information security score, semantic security score, and formal security score is 0, the multimedia data is deemed insecure. If none of the above four scores are 0, the security score of the multimedia data is calculated using the following formula.
[0160]
[0161] in, The confidentiality function represents the content security assessment process, and its weight is denoted by w. C express, The real-time function representing the privacy information security assessment process is denoted by w. R express, The reliability function representing the semantic security verification process is denoted by w. N express, The formal verification process security function is represented by w. F express.
[0162] The confidentiality function C(t) can be obtained as follows:
[0163] The confidentiality function is determined based on confidentiality and data integrity.
[0164] The formula for calculating confidentiality is as follows:
[0165]
[0166] In this formula, H represents the confidentiality of the encrypted image data to be detected (this parameter reflects whether the message of the encrypted image data has been leaked), and P... i The value of each pixel in the encrypted image is represented by the encryption method. The closer the encrypted image is to white noise, the better the confidentiality of the encrypted image data.
[0167] Here, 255 represents the maximum gray level (L-1) of a pixel value in a grayscale image, meaning the pixel value range of a grayscale image is [0, 255]. For a color image, it is divided into three channels: R, G, and B, each with a value of 255. Assuming the entropy of the pixel values represents the confidentiality of the image, then the confidentiality of the original image is <7, the confidentiality of a random image (or an ideal encrypted image) is ≈7.999, and if the confidentiality of the encrypted image to be detected corresponding to the encrypted image data is >7.8, it indicates that the confidentiality of the encrypted image data meets the standard.
[0168] Integrity W can be obtained by using hash verification on the encrypted image data, thus quantifying the tampered frames or pixels. If the tampered data in the encrypted image data does not exceed 5% of the total data, then the data integrity is not a problem. W can be represented by 1 to indicate that the data integrity is not a problem, and 0 to indicate that the data integrity is a problem.
[0169] If both confidentiality and integrity are met, C(t) takes the value of 1; otherwise, it takes the value of 0.
[0170] Schematic representation: The confidentiality function described above can be expressed by the following formula:
[0171]
[0172] C(t) represents the confidentiality function of the encrypted image data to be detected, H represents the confidentiality of the encrypted image data, g(x) represents the confidentiality of the ideal encrypted image, x represents the ideal encrypted image, and W represents the confidentiality of the encrypted image data to be detected.
[0173] Let T (threshold) be the maximum allowed processing time for the privacy information security assessment process of the above multimedia data, and t be the actual processing time. When t approaches 0 The value approaches 1, indicating high real-time performance. As t increases, Decreasing, when t>T, The value is 0, which does not meet the requirements.
[0174] The process of obtaining the reliability function N(t) can be as follows: During the extraction of low-dimensional features from key parts of multimedia data, a specified decibel of white noise is added. The change in MSE (Mean Sequence Equation) over a certain time range is evaluated. This change is fitted to a one-dimensional function, and then the range is normalized to obtain the result. The value range is 0-1.
[0175] The process of obtaining the safety function F(t) is as follows:
[0176] During the formal security proof process, if all three functions satisfy the constraints, i.e. the set minimum target value, the function is set to 1; otherwise, it is immediately set to 0, indicating that the formal security requirement is not met.
[0177] During formal verification, the above weights can be adjusted according to environmental requirements.
[0178] For example: when there are potential attack risks (e.g., data leakage risks) in the environment where multimedia data is located, increase w C .
[0179] For example: in environments where multimedia data is stored, where there is a high risk of overload or where security verification time is tight, increase w C .
[0180] For example, in situations where the multimedia data is stored in an environment with significantly increased noise (e.g., noticeable network jitter), increasing the bandwidth (w) is recommended. N .
[0181] For example, excessive noise interference was detected when attacking encrypted images.
[0182] This application embodiment also provides a multimedia data security assessment device, including:
[0183] The clustering module is used to divide multiple features of a single image in multimedia data into multiple cluster groups based on spatial similarity using the spectral clustering algorithm.
[0184] The content security scoring module is used to identify key parts of the features of multiple clusters based on the content recognition model, and to evaluate the content security of multimedia data based on the content recognition results to obtain a content security score.
[0185] The privacy information security scoring module is used to assess the confidence level of multimedia data through ULDC, clean the multimedia data based on the confidence assessment results and Lagrange interpolation to obtain cleaned and repaired data, assess the privacy information security of the repaired data, and obtain a privacy information security score.
[0186] The semantic security scoring module is used to perform semantic security verification on multimedia data by selecting plaintext attacks and obtain a semantic security score.
[0187] The formal security scoring module is used to abstractly model the three processes of content security assessment, privacy information security assessment, and semantic security verification in a high-order logic theorem proof environment. It constructs invariant constraints for the three processes, adds noise perturbation to the multimedia data, inputs the invariant constraints, performs formal security verification on the key attributes of the invariant constraints, and obtains the formal security score of the multimedia data.
[0188] The comprehensive scoring module is used to determine whether the multimedia data is secure by combining content security score, privacy information security score, semantic security score, and formal security score.
[0189] The content security scoring module in the aforementioned multimedia data security measurement device is also used for:
[0190] Before performing content recognition on key parts of features from multiple clusters based on content recognition models:
[0191] The convolutional feature maps of the original samples are clustered into multiple different cluster groups;
[0192] Convolutional feature maps belonging to the same cluster group are fused into a mask map of key parts based on spatial similarity. The spatial size of the mask map is the same as that of the convolutional feature map.
[0193] The mask image of each key part is input into the noise reduction diffusion model, and forward noise reduction diffusion and reverse noise reduction diffusion are performed respectively until a new sample that has learned the probability distribution of the original sample is generated.
[0194] The mean squared error of the new sample is compared with that of the original sample, and the content recognition model is trained according to the inverse relationship between the mean squared error and the content recognition result.
[0195] The content recognition model in the content security scoring module of the aforementioned multimedia data security measurement device adopts the ResNet-50 network. After the network training is completed, the batch normalization layer, activation layer and fully connected layer in the ResNet-50 network can be deleted, and the five convolutional layers of the ResNet-50 network can be retained. These five convolutional layers can realize the content recognition of the input image through forward propagation.
[0196] The aforementioned content security scoring module can also be used to classify multiple features with a spatial distance of less than a preset threshold into the same cluster group based on spatial similarity using a spectral clustering algorithm for a single image in the multimedia data. Each cluster group includes multiple appearance features and multiple relationship features.
[0197] In each cluster group, the multiple appearance features are fused based on the multiple relational features to obtain the fused features corresponding to the key parts;
[0198] The fused features are input into the content recognition model, which predicts the content recognition categories of multiple key parts of the multimedia data. The content recognition result is determined based on the mean squared error between the true category values and the content recognition categories of the multiple key parts. The mean squared error between the true category values and the content recognition categories is calculated using the following formula:
[0199]
[0200] Wherein, MSE represents the mean square error, P represents the content recognition category of the multiple key parts, and C represents the true value of the category;
[0201] The multimedia data is assessed for content security based on the content recognition results. The larger the mean square error, the greater the deviation of the content recognition results, and the higher the content security coefficient of the multimedia data.
[0202] The privacy information security scoring module can also be used for:
[0203] Features in the same cluster group are compressed to obtain low-dimensional features of key parts. The probability distribution of the low-dimensional features is determined based on ULDC. The confidence of the low-dimensional features and the intra-cluster threshold in the same cluster group are determined based on the Mahalanobis distance from the low-dimensional features to the center of the probability distribution.
[0204] Based on the confidence level of each feature in the low-dimensional features, the proportion of difficult samples retained in the low-dimensional features is maximized by using the intra-class threshold and Lagrange interpolation, and outliers in the low-dimensional features are deleted or corrected to obtain cleaned and repaired data. The outliers include at least one of duplicate data, data distribution outliers, format outliers, and logical outliers.
[0205] The repaired data is subjected to Lagrange interpolation, and the interpolated data is then deduplicated a second time to achieve data cleaning of the multimedia data.
[0206] Optionally, outliers in the low-dimensional features may be deleted or corrected, including at least one of the following methods:
[0207] Compare the attribute values of the low-dimensional features, define at least two sets of low-dimensional features with the same attribute values as duplicate data, and delete at least one set of duplicate data; and / or
[0208] Construct the quartiles of the low-dimensional features using a box plot, determine the interquartile range (IQR) based on the quartiles, delete the low-dimensional features corresponding to the box-whisker portion of the box plot; and / or;
[0209] By using at least one of data type matching, regular expressions, and value range validation, determine whether the format of the low-dimensional feature conforms to expectations, and correct the format of outliers that do not conform to expectations according to a preset format; and / or
[0210] By using business logic constraints, outliers that do not meet expectations are corrected by filling in the median value.
[0211] This semantic safety scoring module can also be used for:
[0212] The content security assessment process, the privacy information security assessment process, and the semantic security verification process are mapped to the formal verification tool Isabelle / HOL using the structured proofreading language Isar, forming three state transition rule modules.
[0213] For the image data input to the single state transition rule module, noise is added through random interference error or deterministic interference error;
[0214] Through the coupled differential equations of error propagation, the three state transition rule modules are subjected to interactive mechanical proof. When the solution space of the coupled differential equations satisfies the invariant constraints, it is verified whether the confidentiality of the content security assessment process falls below the minimum confidentiality threshold, whether the real-time performance of the privacy information security assessment process is achieved within the specified time window, and whether the reliability of the semantic security verification process satisfies the Lyapunov function proof.
[0215] Optionally, the formal security scoring module can also be used to: add external noise to a simulated image of the image data input to the single state transition rule module, and add internal noise to the state rule module;
[0216] If the total value of the external noise and the internal noise exceeds the noise tolerance, and the confidentiality function of the content security assessment process falls below the minimum confidentiality threshold, or the real-time function of the privacy information security assessment process falls within a specified time window, or the reliability function of the semantic security verification process satisfies the Lyapunov function proof, then the formal security verification is deemed to have failed.
[0217] Optionally, the formal security scoring module can also be used to: add external noise to a simulated image of the image data input to the single state transition rule module, and add internal noise to the state rule module;
[0218] If the total value of the external noise and the internal noise is not greater than the noise tolerance, and if the confidentiality function of the content security assessment process does not fall below the minimum confidentiality threshold, or if the real-time function of the privacy information security assessment process exceeds the specified time window, or if the reliability function of the semantic security verification process does not satisfy the Lyapunov function proof, then the formal security verification is deemed to have failed.
[0219] Optionally, the total value of the external noise and the internal noise is calculated as follows:
[0220] Wherein, E(t) represents the total value of the external noise and the internal noise. Characterizing external noise, and The weighting coefficients represent the external noise and the internal noise, respectively. A confidentiality function characterizing the content security assessment process. This represents the real-time function in the privacy information security assessment process. and The expression for the internal noise is obtained by subtraction.
[0221] This application embodiment uses a spectral clustering algorithm combined with a content recognition model to perform content recognition and content security scoring on key parts of the clustered features of multimedia data. ULDC is used to clean the multimedia data to obtain repaired data, and a privacy information security assessment is performed on the repaired data to determine whether content recognition or privacy information can be directly obtained from the multimedia data. A plaintext attack is used to perform semantic security verification on the multimedia data to determine whether semantic recognition can be performed on the cleaned multimedia data to extract the semantics of the multimedia data, thereby determining whether the confidentiality of the multimedia data meets security requirements. Noise perturbation is applied to the content security assessment process, privacy information security assessment process, and semantic security verification process in a high-order logic theorem proof environment to verify its formal security under noise perturbation. The four security scores are combined to obtain a more comprehensive and accurate multimedia data security assessment that meets security verification requirements.
[0222] This application provides an electronic device, including:
[0223] processor;
[0224] A memory storing computer-readable instructions, which, when executed by the processor, implement the aforementioned method for security assessment of multimedia data.
[0225] It should be understood that the processor in the embodiments of this application can be a central processing unit (CPU), or it can be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor can be a microprocessor or any conventional processor, etc.
[0226] It should also be understood that the memory in the embodiments of this application can be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. The non-volatile memory can be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. The volatile memory can be random access memory (RAM), which is used as an external cache. By way of example, but not limitation, many forms of random access memory (RAM) are available, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate synchronous DRAM (DDR SDRAM), enhanced synchronous DRAM (ESDRAM), synchronous linked DRAM (SLDRAM), and direct rambus RAM (DR RAM).
[0227] The above embodiments can be implemented, in whole or in part, by software, hardware (such as circuits), firmware, or any other combination thereof. When implemented using software, the above embodiments can be implemented, in whole or in part, as a computer program product. A computer program product includes one or more computer instructions or computer programs. When the computer instructions or computer programs are loaded or executed on a computer, all or part of the processes or functions according to the embodiments of this application are generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. Computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, computer instructions can be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via infrared, microwave, or other means. A computer-readable storage medium can be any available medium that a computer can access or a data storage device such as a server or data center that contains one or more sets of available media. Available media can be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., DVDs), solid-state drives, etc.
[0228] It should be understood that the term "and / or" in this article is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A alone, A and B simultaneously, or B alone. A and B can be singular or plural. Additionally, the character " / " in this article generally indicates an "or" relationship between the preceding and following related objects, but it can also represent an "and / or" relationship. Please refer to the context for a more accurate understanding.
[0229] In this invention, "at least one" means one or more, and "more than one" means two or more. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of a single item or a plurality of items. For example, at least one of a, b, or c can represent: a, b, c, ab, ac, bc, or abc, where a, b, and c can be a single item or multiple items.
[0230] The communication bus mentioned in the above electronic devices can be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. This communication bus can be divided into address bus, data bus, control bus, etc.
[0231] The communication interface is used for communication between the aforementioned electronic device and other devices. The memory may include random access memory (RAM) or non-volatile memory (NVM), such as at least one disk storage device. Optionally, the memory may also be at least one storage device located remotely from the aforementioned processor. The aforementioned processor may be a general-purpose processor, including a central processing unit (CPU), a network processor (NP), etc.; it may also be a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components.
[0232] It should be noted that, in this document, relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes the element.
[0233] The various embodiments in this specification are described in a related manner. Similar or identical parts between embodiments can be referred to mutually. Each embodiment focuses on describing the differences from other embodiments. In particular, the apparatus embodiments are basically similar to the method embodiments, so the description is relatively simple; relevant parts can be referred to the descriptions of the method embodiments.
[0234] The above are merely preferred embodiments of this application and are not intended to limit the scope of protection of this application. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application are included within the scope of protection of this application.
Claims
1. A multimedia data security measurement method based on multi-dimensional correlation, characterized in that, The method includes: For multimedia data, a spectral clustering algorithm is used to divide multiple features of a single image in the multimedia data into multiple cluster groups based on spatial similarity. Based on a content recognition model, the content of key parts of the features of multiple cluster groups is identified. Based on the content recognition results, the multimedia data is evaluated for content security and a content security score is obtained. The confidence level of the multimedia data is assessed using the lower and upper bounds of the uncertainty confidence interval (ULDC). Based on the confidence assessment results and Lagrange interpolation, the multimedia data is cleaned to obtain cleaned and repaired data. The privacy and security of the repaired data are then assessed to obtain a privacy and security score. The semantic security score is obtained by performing semantic security verification on the multimedia data by selecting a plaintext attack. In a high-order logic theorem proof environment, the three processes of content security assessment, privacy information security assessment, and semantic security verification are abstractly modeled, and invariant constraints of the three processes are constructed. After adding noise perturbation to the multimedia data, the invariant constraints are input, and the key attributes of the invariant constraints are formally verified to obtain the formal security score of the multimedia data. The security of the multimedia data is determined by combining the content security score, the privacy information security score, the semantic security score, and the formal security score.
2. The method as described in claim 1, characterized in that, Before performing content recognition on key parts of features from multiple clustered groups based on the content recognition model, the method further includes: The convolutional feature maps of the original samples are clustered into multiple different cluster groups; Convolutional feature maps belonging to the same cluster group are fused into a mask map of key parts based on spatial similarity. The spatial size of the mask map is the same as that of the convolutional feature map. The mask image of each key part is input into the noise reduction diffusion model, and forward noise reduction diffusion and reverse noise reduction diffusion are performed respectively until a new sample that has learned the probability distribution of the original sample is generated. The mean squared errors of the new samples and the original samples are compared, and the content recognition model is trained according to the inverse relationship between the mean squared error and the content recognition result.
3. The method as described in claim 2, characterized in that, After training the content recognition model according to the inverse relationship between mean squared error and content recognition result, the method further includes: The content recognition model uses a ResNet-50 network. The batch normalization layer, activation layer, and fully connected layer of the ResNet-50 network are removed, and the five convolutional layers of the ResNet-50 network are retained. The five convolutional layers realize the content recognition of the input image through forward propagation.
4. The method as described in claim 1, characterized in that, Based on spatial similarity, a spectral clustering algorithm is used to divide multiple features of a single image in the multimedia data into multiple cluster groups. Based on a content recognition model, key parts of the features in these cluster groups are identified. Based on the content recognition results, a content security assessment is performed on the multimedia data to obtain a content security score, including: For a single image in the multimedia data, a spectral clustering algorithm is used to group multiple features with a spatial distance less than a preset threshold into the same cluster group based on spatial similarity. Each cluster group includes multiple appearance features and multiple relationship features. In each cluster group, the multiple appearance features are fused based on the multiple relational features to obtain the fused features corresponding to the key parts; The fused features are input into the content recognition model, and the content recognition model is used to predict the content recognition category of multiple key parts of the multimedia data. The content recognition result is determined based on the mean square error between the true value of the category and the content recognition category of the multiple key parts. The multimedia data is assessed for content security based on the content recognition results. The larger the mean square error, the greater the deviation of the content recognition results, and the higher the content security coefficient of the multimedia data.
5. The method as described in claim 1, characterized in that, The multimedia data is assessed for confidence using ULDC, and data cleaning is performed on the multimedia data based on the confidence assessment results and Lagrange interpolation, including: Features in the same cluster group are compressed to obtain low-dimensional features of key parts. The probability distribution of the low-dimensional features is determined based on ULDC. The confidence of the low-dimensional features and the intra-cluster threshold in the same cluster group are determined based on the Mahalanobis distance from the low-dimensional features to the center of the probability distribution. Based on the confidence level of each feature in the low-dimensional features, the proportion of difficult samples retained in the low-dimensional features is maximized by using the intra-class threshold and Lagrange interpolation, and outliers in the low-dimensional features are deleted or corrected to obtain cleaned and repaired data. The outliers include at least one of duplicate data, data distribution outliers, format outliers, and logical outliers. The repaired data is subjected to Lagrange interpolation, and the interpolated data is then deduplicated a second time to achieve data cleaning of the multimedia data.
6. The method as described in claim 5, characterized in that, Deleting or correcting outliers in the low-dimensional features includes at least one of the following methods: Compare the attribute values of the low-dimensional features, define at least two sets of low-dimensional features with the same attribute values as duplicate data, and delete at least one set of duplicate data; and / or Construct the quartiles of the low-dimensional features using a box plot, determine the interquartile range (IQR) based on the quartiles, delete the low-dimensional features corresponding to the box-whisker portion of the box plot; and / or; By using at least one of data type matching, regular expressions, and value range validation, determine whether the format of the low-dimensional feature conforms to expectations, and correct the format of outliers that do not conform to expectations according to a preset format; and / or By using business logic constraints, outliers that do not meet expectations are corrected by filling in the median value.
7. The method as described in claim 1, characterized in that, In a high-order logic theorem proof environment, the three processes—content security assessment, privacy information security assessment, and semantic security verification—are abstractly modeled. Data flow invariant constraints for these three processes are constructed, and formal security verification is performed on the key attributes of these invariant constraints, including: The content security assessment process, the privacy information security assessment process, and the semantic security verification process are mapped to the formal verification tool Isabelle / HOL using the structured proofreading language Isar, forming three state transition rule modules. For the image data input to a single state transition rule module, noise is added through random or deterministic interference errors; Through the coupled differential equations of error propagation, the three state transition rule modules are subjected to interactive mechanical proof. When the solution space of the coupled differential equations satisfies the invariant constraints, it is verified whether the confidentiality of the content security assessment process falls below the minimum confidentiality threshold, whether the real-time performance of the privacy information security assessment process is achieved within the specified time window, and whether the reliability of the semantic security verification process satisfies the Lyapunov function proof.
8. The method as described in claim 7, characterized in that, The method further includes: for the image data input to the single state transition rule module, adding external noise to the simulated image of the image data, and adding internal noise to the state transition rule module; If the total value of the external noise and the internal noise exceeds the noise tolerance, and the confidentiality function of the content security assessment process falls below the minimum confidentiality threshold, or the real-time function of the privacy information security assessment process falls within a specified time window, or the reliability function of the semantic security verification process satisfies the Lyapunov function proof, then the formal security verification is deemed to have failed.
9. The method as described in claim 7, characterized in that, The method further includes: for the image data input to the single state transition rule module, adding external noise to the simulated image of the image data, and adding internal noise to the state transition rule module; If the total value of the external noise and the internal noise is not greater than the noise tolerance, and if the confidentiality function of the content security assessment process does not fall below the minimum confidentiality threshold, or if the real-time function of the privacy information security assessment process exceeds the specified time window, or if the reliability function of the semantic security verification process does not satisfy the Lyapunov function proof, then the formal security verification is deemed to have failed.
10. An electronic device, characterized in that, include: processor; A memory storing computer-readable instructions that, when executed by the processor, implement the method as described in any one of claims 1 to 9.