A user information service privacy protection method based on identity replacement

By performing context feature encoding and anonymized relational graph processing on IoT user information service request data, an alternative identifier pool is generated, which solves the identifiable risks in the process of user identifier anonymization in IoT and achieves efficient privacy protection and service quality assurance.

CN122241748APending Publication Date: 2026-06-19SHANGHAI YUANTOU INFORMATION TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SHANGHAI YUANTOU INFORMATION TECH CO LTD
Filing Date
2026-02-03
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies lack the ability to dynamically adapt to user context and field sensitivity in IoT privacy protection, which leads to the risk that identifiable fields can be reverse-constructed during the anonymization process, affecting service quality and availability.

Method used

By extracting service request data initiated by user terminals, performing context feature encoding, generating an anonymous sample set, analyzing sensitivity levels and performing statistical fuzzy processing, constructing an anonymous relationship graph, stripping identifiable information, generating a pool of alternative identifiers, establishing alternative mapping relationships, and outputting alternative mapping bodies for anonymous interaction.

Benefits of technology

It improves the non-reidentifiability of the replaced identifier in the Internet of Things environment, ensures the availability and behavioral continuity of the replaced identifier in the service interaction link, and enhances the implementation and stability of the privacy protection scheme.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122241748A_ABST
    Figure CN122241748A_ABST
Patent Text Reader

Abstract

This invention discloses a user information service privacy protection method based on identifier replacement, relating to the field of IoT privacy protection technology. The method includes: constructing an anonymous relationship graph based on the request data body and anonymity adaptation parameters; removing identifiable information from the anonymous relationship graph to output a de-identified relationship graph; dividing the anonymous relationship graph into multiple anonymous domains by calculating the entropy parameter of the de-identified relationship graph; statistically analyzing the entropy gradient and domain size information of each anonymous domain to generate a policy configuration body; extracting the topological features of the anonymous relationship graph and concatenating and encoding them to generate a replacement identifier pool; selecting candidate replacement identifiers from the replacement identifier pool according to the policy configuration body and establishing replacement mapping relationships; outputting a replacement mapping body; and replacing the request data body with the replacement mapping body for uplink anonymous interaction to form a response data payload. This invention improves the feasibility and stability of IoT privacy protection solutions in real-world service environments by dividing dynamic anonymous domains and constructing a replacement identifier pool.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of Internet of Things (IoT) privacy protection technology, and in particular to a method for protecting the privacy of user information services based on identifier replacement. Background Technology

[0002] With the rapid popularization of IoT devices, mobile terminals, and user information services, data interaction between terminals and services is characterized by high frequency, distribution, and cross-scenario features, ranging from browsers to smart homes, mobile applications to wearable devices, and in-vehicle devices to smart city platforms. Multi-source data, including path fields, behavioral actions, content expression, time context, network parameters, and terminal environment, contained in user requests are continuously incorporated into the service decision-making process for personalized recommendations, interest inference, content push, behavior prediction, and service quality optimization. Simultaneously, the industry is moving from "data availability" to "data interpretability," and from "business statistics" to "behavioral modeling," leading to a gradual refinement and strong correlation in the structure of service requests. In the context of "IoT privacy protection," the joint expression of user identity, behavioral context, and environmental parameters forms implicit trajectories and implicit profiles, making service requests inherently identifiable and correlated. Traditional security models that only focus on data confidentiality are no longer sufficient to cover complex risks of reproducible identification.

[0003] However, existing technologies still have certain shortcomings. Traditional encryption technologies can protect data content but cannot block re-identifiable paths. Identifier obfuscation and forgery technologies can change the form of user identifiers, but recovery risks still exist in cross-request or cross-session scenarios due to pattern association and weak perturbation of value domains. Anonymization technologies based on models such as K-anonymity and L-diversity can play a role in statistical data anonymization scenarios, but it is difficult to maintain business availability for input data of real-time service interaction behavior. Existing identifier anonymization mechanisms mostly rely on static rules or fixed perturbation strategies, lacking dynamic adaptation capabilities for user context, field sensitivity, and topological correlation, which means that identifiable fields can still be reverse-constructed under weak perturbation conditions. Some existing methods focus on "data unidentifiable" while ignoring "service availability," resulting in a lack of structural or contextual consistency between the alternative identifier and the original identifier, thereby affecting service quality or causing interaction anomalies. Summary of the Invention

[0004] In view of the aforementioned existing problems, the present invention is proposed.

[0005] Therefore, this invention provides a user information service privacy protection method based on identifier replacement to solve the problem of lack of dynamic adaptation capability for user context and field sensitivity and difficulty in balancing service availability.

[0006] To solve the above-mentioned technical problems, the present invention provides the following technical solution:

[0007] This invention provides a user information service privacy protection method based on identifier replacement, comprising: extracting service request data initiated by a user terminal and performing context feature encoding to obtain a request data body; performing field generalization and value range perturbation on the request data body to generate an anonymous sample set; analyzing the sensitivity level of the anonymous sample set and performing statistical fuzziness analysis to output anonymity adaptation parameters; constructing an anonymous relationship graph based on the request data body and the anonymity adaptation parameters, removing identifiable information from the anonymous relationship graph, and outputting a de-identified relationship graph; dividing the anonymous relationship graph into multiple anonymous domains by calculating the entropy parameter of the de-identified relationship graph; statistically analyzing the entropy gradient and domain size information of each anonymous domain to generate a policy configuration body; extracting the topological features of the anonymous relationship graph and performing concatenation encoding to generate a replacement identifier pool; selecting candidate replacement identifiers in the replacement identifier pool according to the policy configuration body and establishing a replacement mapping relationship, outputting a replacement mapping body; and replacing the request data body with the replacement mapping body for uplink anonymous interaction to form a response data payload.

[0008] As a preferred embodiment of the user information service privacy protection method based on identifier replacement described in this invention, the specific steps for obtaining the request data body are as follows:

[0009] Receive service interaction requests sent by user terminals and perform semantic extraction and field classification processing to form service request data;

[0010] Extract the contextual attributes of the service request data and perform feature encoding to generate the request data body.

[0011] As a preferred embodiment of the user information service privacy protection method based on identifier replacement described in this invention, the specific steps for generating the anonymous sample set are as follows:

[0012] Separate the user attribute fields and value field fields in the request data body and perform field generalization processing to form a field generalization group;

[0013] The field generalization group is subjected to value range perturbation and structural combination according to the field correspondence to generate an anonymous sample set.

[0014] As a preferred embodiment of the user information service privacy protection method based on identifier replacement described in this invention, the specific steps for outputting anonymous adaptation parameters are as follows:

[0015] Evaluate the field generalization level and perturbation magnitude of the anonymous sample set and detect sensitive fields, generating sensitivity level groups with sensitivity coefficients;

[0016] Based on the sensitivity coefficient, the fuzzy coverage range and field granularity coverage of the sensitivity level group are statistically analyzed, and anonymous adaptation parameters are generated.

[0017] As a preferred embodiment of the user information service privacy protection method based on identifier replacement described in this invention, the specific steps for outputting the identifier-free relationship graph are as follows:

[0018] An anonymous relationship graph is constructed using the request data body as anonymous nodes and the anonymous adaptation parameters as edge weights;

[0019] The identifiable fields related to service accounts, terminal identifiers, and user identifiers in the anonymous relationship graph are removed to form a de-identified relationship graph.

[0020] As a preferred embodiment of the user information service privacy protection method based on identifier replacement described in this invention, the specific steps for dividing the anonymous relationship graph into multiple anonymous domains are as follows:

[0021] Statistical analysis is used to identify the node density, edge connection number, and neighborhood distribution in the graph, forming a graph structure statistical group.

[0022] Calculate the local entropy parameters, structural entropy parameters, and diffusion entropy parameters of the graph structure statistics group to generate an entropy index group;

[0023] Based on the entropy index group, the anonymous nodes in the de-identified relation graph are divided into different anonymous domains, forming domain partitioning groups carrying anonymous domain numbers.

[0024] As a preferred embodiment of the user information service privacy protection method based on identifier replacement described in this invention, the specific steps for generating a policy configuration body by statistically analyzing the entropy gradient and domain size information of each anonymous domain are as follows:

[0025] The scale parameters and average entropy values ​​of each anonymous domain are statistically analyzed, and the entropy gradient parameters between adjacent anonymous domains are calculated based on the average entropy values ​​of each anonymous domain, forming a gradient hierarchical group.

[0026] Based on the gradient hierarchical group, the substitution level is labeled for each anonymity domain, and the substitution level of each anonymity domain is associated with the scale parameter to output the policy configuration body.

[0027] As a preferred embodiment of the user information service privacy protection method based on identifier replacement described in this invention, the specific steps for extracting the topological features of the anonymous relationship graph and concatenating and encoding them to generate a replacement identifier pool are as follows:

[0028] Extract the topological fields from the anonymous relation graph, and count the number of node connections, neighborhood size and edge weight distribution for each anonymous node to form a topological field group.

[0029] The structural dispersion is calculated by statistically analyzing the node connection patterns and neighborhood diffusion of the topological field group, thus forming a privacy attribute group.

[0030] Traverse the anonymous domain numbers of the domain partitioning group to form an anonymous node sequence, and perform vector fusion with the privacy attribute group to generate an anti-association coding group;

[0031] Using the anonymous field number as a prefix, the anti-associative coding group is concatenated and encoded to generate an alternative identifier pool.

[0032] As a preferred embodiment of the user information service privacy protection method based on identifier replacement described in this invention, the specific steps of the output replacement mapping are as follows:

[0033] The substitution level, substitution frequency, and substitution method fields of the policy configuration body are parsed in units of anonymous domains to form policy deconstruction groups;

[0034] Using the substitution level field of the strategy deconstruction group as the screening criterion, substitution identifiers that match the corresponding substitution level are extracted from the substitution identifier pool to form a substitution candidate group;

[0035] Extract the identifier field value from the request data body as the mapping key, use the alternative candidate group as the mapping value to establish a mapping key-value pair, and associate and encapsulate it with the strategy destructuring group to form an alternative mapping body.

[0036] As a preferred embodiment of the user information service privacy protection method based on identifier replacement described in this invention, the specific steps for forming the response data payload are as follows:

[0037] The identity field value in the request body is replaced with the mapping key-value pair in the alternative mapping body to form an anonymous request body;

[0038] An anonymous request body is sent to the user information server, and the data content returned by the user information server is received and encapsulated to form a response data payload.

[0039] The beneficial effects of this invention are as follows: By dynamically partitioning the anonymous domain based on structural complexity and anonymity correlation, anonymous nodes with similar identifiable features are automatically clustered into the same anonymous domain, improving the non-re-identifiableness after identifier replacement and reducing the reconstruction risk of cross-request pattern splicing; at the same time, by constructing an alternative identifier pool, the alternative identifier and the original identifier are replaced in a context-consistent manner at the structural and behavioral level, ensuring the availability and behavioral continuity of the alternative identifier in the service interaction link, and improving the implementation and stability of the IoT privacy protection scheme in the actual service environment. Attached Figure Description

[0040] To more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the following description of the embodiments will be briefly introduced. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0041] Figure 1 A flowchart of a privacy protection method for user information services based on identifier replacement.

[0042] Figure 2 A flowchart for generating an anonymous sample set and outputting anonymization adaptation parameters.

[0043] Figure 3 A flowchart for constructing an anonymous relation graph and dividing it into multiple anonymous domains.

[0044] Figure 4 A flowchart for outputting a replacement mapping volume and forming a response data payload. Detailed Implementation

[0045] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, the specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

[0046] Many specific details are set forth in the following description in order to provide a full understanding of the invention. However, the invention may also be practiced in other ways different from those described herein, and those skilled in the art can make similar extensions without departing from the spirit of the invention. Therefore, the invention is not limited to the specific embodiments disclosed below.

[0047] Secondly, the term "one embodiment" or "embodiment" as used herein refers to a specific feature, structure, or characteristic that may be included in at least one implementation of the present invention. The phrase "in one embodiment" appearing in different places in this specification does not necessarily refer to the same embodiment, nor is it a single or selective embodiment that is mutually exclusive with other embodiments.

[0048] Reference Figures 1-4 As one embodiment of the present invention, this embodiment provides a method for protecting the privacy of user information services based on identifier replacement, including the following steps:

[0049] S1. Extract the service request data initiated by the user terminal and perform context feature encoding to obtain the request data body.

[0050] S1.1 Receive service interaction requests sent by user terminals and perform semantic extraction and field classification processing to form service request data.

[0051] It should be noted that the service interaction request is triggered by the user terminal when performing search, recommendation, content access, browsing or query operations and is sent to the user information server via a network transmission payload through a channel. The service interaction request includes request path field, action field, content field, behavior time field, network parameter field and terminal parameter field.

[0052] After receiving a service interaction request from a user terminal, the system performs field format validation and null value detection on the path, action, content, behavior time, network parameter, and terminal parameter fields parsed from the service interaction request. Fields with inconsistent formats are standardized, and value range adaptation is performed on the value range of similar fields to eliminate differences in field expression caused by cross-terminal, cross-network, and cross-client issues. Semantic reduction processing is then performed on each processed field. Specifically, the path and action fields are reduced to the business intent domain, the content field is reduced to the content expression domain, the behavior time field is reduced to the behavior context domain, and the network parameter and terminal parameter fields are reduced to the environment context domain, forming service request data represented by a four-element structure of business intent, content expression, behavior context, and environment context.

[0053] S1.2 Extract the context attributes of the service request data and perform feature encoding to generate the request data body.

[0054] It should be noted that the behavior time field of the service request data is parsed according to a unified time format, the date, hour, and minute are split into numerical components, and time period numbers are generated according to time granularity; the access method, IP address prefix, and round-trip delay of the network parameter field are parsed, the access method is mapped to discrete encoding, the IP address prefix is ​​truncated into network segment markers, and the round-trip delay is divided into intervals and assigned interval numbers, and integrated to generate network environment encoding; the terminal type, operating device type, and client type of the terminal parameter field are extracted and mapped to discrete encoding respectively to generate terminal environment encoding; the operation type tag of the action field is extracted, and the topic words and key phrases of the content field are extracted, the operation type tag, topic words, and key phrases are concatenated into a business semantic string, and word segmentation and number mapping are performed on the business semantic string to generate business semantic encoding; the behavior time number, network environment encoding, terminal environment encoding, and business semantic encoding are arranged in a fixed field order and encapsulated into a request data body.

[0055] S2. Perform field generalization and value range perturbation on the request data body to generate an anonymous sample set, analyze the sensitivity level of the anonymous sample set and perform statistical fuzziness, and output the anonymization adaptation parameters.

[0056] S2.1 Separate the user attribute fields and value range fields in the request data body and perform field generalization processing to form a field generalization group.

[0057] It should be noted that the request data body undergoes field semantic parsing processing. Specifically, coded fields used to express business intent, content theme, or operation type are labeled as user attribute fields, and fields used to express behavior time number, network round-trip delay interval number, or terminal environment value number are labeled as value domain fields. For user attribute fields, the corresponding superordinate category words are extracted and replaced with superordinate category words to reduce the recognition granularity of user attribute fields. For value domain fields, the minimum, maximum, and percentile values ​​(e.g., the 25th, 50th, and 75th percentiles) are statistically analyzed. The value domain fields are divided into several numerical segments according to the continuity of the interval between the minimum and maximum values, and the original values ​​in the value domain fields are replaced with segment numbers. After completing the generalization processing of user attribute fields and value domain fields, the data is re-encapsulated according to the field arrangement order of the request data body to generate field generalization groups.

[0058] It should also be noted that the superordinate category term refers to a more abstract and broader semantic label used to represent the common semantic range of a certain type of concept. For example, the superordinate category term corresponding to "milk tea, coffee and juice" is "beverage".

[0059] S2.2 Perform value range perturbation on the field generalization group and combine the structures according to the field correspondence to generate an anonymous sample set.

[0060] It should be noted that the difference between the minimum and maximum values ​​of the domain field in the field generalization group is used as the interval span; the interval span is divided into several equal span segments, and the segment length of each span segment is used as the offset value. The offset value and the numerical content of the original domain field are weighted and summed to form the perturbation field content; the user attribute field in the field generalization group and the perturbation field content are combined according to the field arrangement order, and the field content after each combination is encapsulated to form a sample sequence. All sample sequences are collected to form an anonymous sample set.

[0061] It should also be noted that the ratio between the length of the current span segment used for range disturbance processing and the length of the entire range span is used as the weight of the offset value in the weighted summation. The weight of the original range field is the difference between 1 and the weight of the offset value.

[0062] S2.3 Evaluate the field generalization level and perturbation magnitude of the anonymous sample set and detect sensitive fields, generating sensitivity level groups carrying sensitivity coefficients.

[0063] It should be noted that the fields in the anonymous sample set are categorized according to their field names, and the values ​​of the same field in different sample sequences are grouped into corresponding field value groups. For field value groups labeled as user attribute fields, the number of word types and the average character length of the current field value in different sample sequences are counted, and the product of the number of word types and the average character length is used as the semantic refinement metric, and the reciprocal of the semantic refinement metric is used as the generalization evaluation coefficient. For field value groups labeled as value range fields, the number of different values ​​of the current field in the anonymous sample set, as well as the minimum and maximum values, are counted. The numerical span between values ​​is used as the perturbation coverage metric, and the product of the number of different values ​​and the numerical span is used as the perturbation evaluation coefficient. The reciprocal of the perturbation coverage metric is used as the perturbation evaluation coefficient. The generalization evaluation coefficient of the user attribute field and the perturbation evaluation coefficient of the value range field are used as the sensitivity coefficients of their respective fields. All fields are sorted according to the magnitude of the sensitivity coefficients to form a sensitivity sorting sequence. The sensitivity sorting sequence is divided into several continuous sensitive segments, and a level number is assigned to each sensitive segment. The field name, field type, sensitivity coefficient and corresponding level number are associated and encapsulated to form a sensitivity level group carrying the sensitivity coefficient.

[0064] It should also be noted that the sensitivity coefficients of all fields in the statistical sensitivity ranking sequence are calculated, and the arithmetic mean and dispersion of the sensitivity coefficients are calculated. Fields with sensitivity coefficients higher than the arithmetic mean and exceeding the dispersion range are divided into high-sensitivity segments, fields with sensitivity coefficients lower than the arithmetic mean and exceeding the dispersion range are divided into low-sensitivity segments, and fields with sensitivity coefficients not exceeding the dispersion range are divided into medium-sensitivity segments, forming several continuous sensitivity segments. The absolute value of the difference between the sensitivity coefficient of each field and the arithmetic mean is calculated, and the statistical mean of the absolute values ​​is used as the dispersion of the sensitivity coefficient.

[0065] S2.4. Based on the sensitivity coefficient, calculate the fuzzy coverage range and field granularity coverage of the sensitivity level group, and generate anonymous adaptation parameters.

[0066] It should be noted that the percentage of fields in the high-sensitivity, medium-sensitivity, and low-sensitivity segments of the sensitivity level group is calculated separately, and the field percentage distribution is regarded as the fuzzy coverage range of the sensitivity level group. At the same time, the percentage of fields from user attribute fields and fields from value range fields is calculated separately to the total number of fields in the sensitivity level group, and these percentages are encapsulated as the field granularity coverage. The fuzzy coverage range and the field granularity coverage are integrated into anonymous adaptation parameters according to a fixed field order.

[0067] S3. Based on the request data body and anonymous adaptation parameters, construct an anonymous relationship graph, remove identifiable information from the anonymous relationship graph, output the de-identified relationship graph, and divide the anonymous relationship graph into multiple anonymous domains by calculating the entropy parameter of the de-identified relationship graph.

[0068] S3.1 Construct an anonymous relationship graph using the request data body as anonymous nodes and the anonymous adaptation parameters as edge weights.

[0069] It should be noted that, taking the request data body as the baseline object, each sample sequence in the anonymous sample set is aligned with the request data body by field name. The number of fields in the sample sequence marked as high-sensitivity, medium-sensitivity, and low-sensitivity segments in the sensitivity level group that have different values ​​from the corresponding fields in the request data body is counted, and the ratio of each to the total number of sensitive fields is counted, forming a difference ratio group including the difference ratios of high-sensitivity, medium-sensitivity, and low-sensitivity fields. Based on the fuzzy coverage range and field granularity coverage recorded in the anonymization adaptation parameters, the difference ratios of the corresponding sensitive segments are extracted from the difference ratio group, and the difference ratios are weighted and summarized to generate edge weight values ​​used to quantify the anonymity association strength between the sample sequence and the request data body. Taking the request data body as the central anonymous node and each sample sequence in the anonymous sample set as the peripheral anonymous node, the edge weight values ​​are used as the edge weights connecting the central anonymous node and each peripheral anonymous node to establish a node set and an edge set, and the anonymous node identifier, node connection relationship, and corresponding edge weight are encapsulated together into an anonymous relationship graph.

[0070] It should also be noted that when weighting and summarizing the difference ratios corresponding to high-sensitivity, medium-sensitivity, and low-sensitivity segments, the proportion of each sensitive segment recorded in the anonymous adaptation parameters out of the total number of sensitive fields is used as the weight of the corresponding difference ratio in the weighted summary.

[0071] S3.2. Remove the identifiable fields related to service accounts, terminal identifiers, and user identifiers from the anonymous relationship graph to form a de-identified relationship graph.

[0072] It should be noted that all fields of the node set and edge set in the anonymous relation graph are scanned. Fields used to identify service login accounts, user IDs, and terminal device parameters are identified as identifiable fields. Simultaneously, fields in the edge attributes used to represent session tokens, request tracking tags, or fields that can be used to restore the one-to-one correspondence between service requests and terminal devices are identified as identifiable fields. After completing the identification of identifiable fields, all identifiable fields are deleted, and only the fields used to express the node connection relationship and the strength of the anonymous association are retained. The node set and edge set after deleting the identifiable fields are re-encapsulated according to the encapsulation format of the anonymous relation graph to form a de-identified relation graph composed of anonymous nodes and anonymous edges. The de-identified relation graph is designed for IoT privacy protection scenarios and helps to block re-identifiable links between device identifiers and service requests.

[0073] S3.3 Statistically analyze the node density, edge connection number, and neighborhood distribution in the graph to form a graph structure statistics group.

[0074] It should be noted that the number of connecting edges for each anonymous node in the de-identified graph is counted and encapsulated as a connection count field. The ratio between the sum of the connection count fields of all anonymous nodes and the total number of anonymous nodes is used as the node density field. For each anonymous node in the de-identified graph, the number of anonymous nodes in the set of neighboring nodes that the anonymous node can reach through a single anonymous edge is counted and recorded as a neighborhood size field. The existence of anonymous edges in the set of neighboring nodes is also counted to form a neighborhood connection field. The node density field, connection count field, neighborhood size field, and neighborhood connection field are associated and encapsulated into a graph structure statistics group.

[0075] S3.4 Calculate the local entropy parameters, structural entropy parameters, and diffusion entropy parameters of the graph structure statistics group to generate an entropy index group.

[0076] The expressions for the local entropy parameter, structural entropy parameter, and diffusion entropy parameter of the computational graph structure statistics group are as follows:

[0077] ;

[0078] ;

[0079] ;

[0080] ;

[0081] in, Anonymous nodes The set of neighboring nodes, Anonymous nodes The set of neighboring nodes, Anonymous nodes With anonymous nodes The edge weight values ​​between them Indicates an anonymous node The number of connected edges. Indicates an anonymous node The number of connected edges. This represents the set of all anonymous nodes in the identifier graph. This represents the total number of anonymous nodes. Anonymous nodes The local entropy parameter, Anonymous nodes The structural entropy parameter, Anonymous nodes The diffusion entropy parameter, This indicates starting from the current anonymous node, passing through no more than... An anonymous edge reaches an anonymous node The cumulative result of edge weight values ​​across all paths; This indicates that the number of anonymous edge lines is... Under these conditions, the diffusion process falls on anonymous nodes. The probability of it; This indicates that in the de-identification graph, starting from the current anonymous node, through no more than... The set of all anonymous node identifiers reachable by an anonymous edge.

[0082] It should be noted that the local entropy parameter, structural entropy parameter, and diffusion entropy parameter of the graph structure statistical group are calculated separately, and the three types of entropy parameters are numerically standardized to eliminate the difference in dimensions, generating an entropy value vector. The entropy value vectors are then concatenated to form an entropy value index group.

[0083] S3.5. Based on the entropy index group, divide the anonymous nodes in the de-identified relation graph into different anonymous domains to form a domain partitioning group carrying the anonymous domain number.

[0084] It should be noted that, according to the anonymous node identifier, the entropy vector corresponding to each anonymous node is extracted sequentially from the entropy index group. The Euclidean distance between the entropy vectors of any two anonymous nodes is calculated, and all Euclidean distances are counted to form pairwise distances. The average of the pairwise distances is used as the global distance benchmark. Each anonymous node in the de-identified graph is taken as a candidate anonymous domain, and the entropy vector of the candidate anonymous domain is taken as the domain center vector. Iterative merging processing is performed on the candidate anonymous domain set. Specifically, in each iteration, for any pair of different candidate anonymous domains, the Euclidean distance between the domain center vectors of the two candidate anonymous domains is calculated. The Euclidean distance is taken as the cross-domain distance between the two candidate anonymous domains. The pair of candidate anonymous domains with the smallest cross-domain distance is selected from all candidate anonymous domain pairs. In the anonymous domain, when the minimum cross-domain distance is less than the global distance benchmark, all anonymous nodes in the current candidate anonymous domain are merged into a new candidate anonymous domain. The average vector of the entropy values ​​of all anonymous nodes in the new candidate anonymous domain is used as the new domain center vector, replacing the original two candidate anonymous domains. The candidate anonymous domain set is updated and the next iteration begins. The iteration terminates when the cross-domain distance of all candidate anonymous domain pairs is greater than or equal to the global distance benchmark. Each candidate anonymous domain in the candidate anonymous domain set is confirmed as an anonymous domain. A unique anonymous domain number is assigned to each anonymous domain. The anonymous node identifier of each anonymous node in the deidentified relation graph is combined with the anonymous domain number of its respective anonymous domain to form a node-anonymous domain tuple, which is then summarized to form a domain partitioning group carrying the anonymous domain number.

[0085] The expression for calculating Euclidean distance is:

[0086] ;

[0087] in, Anonymous nodes With anonymous nodes The Euclidean distance between them The local entropy parameter, The structural entropy parameter, The diffusion entropy parameter.

[0088] It should also be noted that when performing iterative merging of candidate anonymity domains, the overall connectivity of anonymous nodes within the candidate anonymity domain is constrained by combining the node density field in the graph structure statistics group. When the difference in node density between two candidate anonymity domains exceeds the preset density span threshold, the merging operation will not be performed even if the Euclidean distance between the corresponding entropy vectors is less than the global distance benchmark.

[0089] The preset density span threshold is derived from the statistical distribution of the node density field in the de-identified graph, with a value range of [1.0, 2.0). The lower limit of 1.0 is derived from the node density field value of the de-identified graph under the minimum anonymized sample size, that is, when the anonymized sample set contains only one sample sequence, only one anonymous edge is formed between the central anonymous node and the peripheral anonymous nodes, and the ratio of the sum of the connection count fields of all anonymous nodes to the total number of anonymous nodes is equal to 1.0. The upper limit of 2.0 is derived from the theoretical upper limit of the node density field when the size of the anonymized sample set continues to increase. In the current case, the central anonymous node forms connections with a large number of peripheral anonymous nodes, and the peripheral anonymous nodes are only connected to the central anonymous node, so that the sum of the connection count fields increases approximately twice with the number of samples, while the total number of anonymous nodes increases by one with the number of samples, and the value of the node density field approaches but does not reach 2.0.

[0090] S4. Calculate the entropy gradient and domain size information of each anonymous domain, generate a strategy configuration body, extract the topological features of the anonymous relationship graph and splice and encode them to generate an alternative identifier pool.

[0091] S4.1 Calculate the scale parameter and mean entropy value of each anonymous domain, and calculate the entropy gradient parameter between adjacent anonymous domains based on the mean entropy value of each anonymous domain, forming a gradient hierarchical group.

[0092] It should be noted that, based on the domain division group, the number of anonymous node identifiers in each anonymous domain is counted and used as the scale parameter of the corresponding anonymous domain; the entropy vector of each anonymous node is extracted from the entropy index group according to the anonymous node identifier, and the arithmetic mean of all entropy vectors belonging to the same anonymous domain is calculated in each dimension to obtain the domain entropy vector of the corresponding anonymous domain. The mean of the domain entropy vector is used as the mean entropy value of the corresponding anonymous domain; the mean entropy values ​​of all anonymous domains are arranged in ascending order of numerical value to form an ordered anonymous domain sequence. For any pair of adjacent anonymous domains in the ordered anonymous domain sequence, the absolute value of the difference between their mean entropy values ​​is counted and used as the entropy gradient parameter between the current adjacent anonymous domains. The entropy gradient parameter of the anonymous domain at the beginning of the sequence is set to the absolute value of the difference between the mean entropy value of the next anonymous domain, and the entropy gradient parameter of the anonymous domain at the end of the sequence is set to the absolute value of the difference between the mean entropy value of the previous anonymous domain; for each anonymous domain, the anonymous domain number, scale parameter, mean entropy value, and corresponding entropy gradient parameter are associated and encapsulated, and summarized to form a gradient hierarchical group.

[0093] S4.2. Based on the gradient hierarchical group, label the substitution level for each anonymity domain, associate the substitution level of each anonymity domain with the scale parameter, and output the policy configuration body.

[0094] It should be noted that the minimum and maximum values ​​of the entropy gradient parameters in the gradient hierarchical group are statistically analyzed, and the difference between the two is taken as the entropy gradient span. The entropy gradient span is divided into several continuous gradient segments. Specifically, based on the sorting result of the entropy gradient parameters from smallest to largest, the anonymous domains corresponding to the entropy gradient parameters in the first segment of the span (e.g., the first 30%) are divided into the first gradient segment, the anonymous domains corresponding to the entropy gradient parameters in the middle segment of the span (e.g., the middle 40%) are divided into the second gradient segment, and the anonymous domains corresponding to the entropy gradient parameters in the last segment of the span (e.g., the last 30%) are divided into the third gradient segment. The anonymous domains in the first gradient segment are labeled as the first substitution level, the anonymous domains in the second gradient segment are labeled as the second substitution level, and the anonymous domains in the third gradient segment are labeled as the third substitution level. Within each anonymous domain, the identification substitution requirement of the anonymous domain is quantified and marked according to the scale parameter recorded in the gradient hierarchical group. The anonymous domain number, substitution level and corresponding scale parameter are associated and encapsulated to form a strategy configuration body.

[0095] S4.3 Extract the topological fields from the anonymous relation graph, and count the number of node connections, neighborhood size and edge weight distribution for each anonymous node to form a topological field group.

[0096] It should be noted that, from the anonymous relation graph, the corresponding anonymous node and all connected edge records are extracted one by one according to the anonymous node identifier. The number of times each anonymous node appears in the edge record is counted, and the number of occurrences is recorded as the node connection number of the current anonymous node. The node identifiers directly connected to the anonymous node are deduplicated, and the number of deduplicated connected nodes is recorded as the neighborhood size of the current anonymous node. The sum, arithmetic mean, and numerical dispersion index of all edge weights associated with the current anonymous node are counted as the edge weight distribution field of the current anonymous node. The node identifier, node connection number, neighborhood size, and corresponding edge weight distribution field of each anonymous node are combined into a topology record, and all topology records are summarized and encapsulated into a topology field group.

[0097] It should also be noted that an edge record is a structured record consisting of anonymous node identifiers and associated edge weight values, used to describe the connection relationships and connection strength between anonymous nodes.

[0098] Numerical dispersion index is a statistical indicator (including variance, standard deviation, and range) used to characterize the degree of diffusion and difference of all edge weights of an anonymous node in the numerical space.

[0099] S4.4 Calculate the structural dispersion by using the node connection pattern and neighborhood diffusion of the statistical topology field group to form a privacy attribute group.

[0100] It should be noted that, from the topology field group, the number of node connections, neighborhood size, and edge weight distribution fields are extracted one by one according to the anonymous node identifier. The number of node connections and neighborhood size fields are sorted in ascending order of connection value, and the absolute value of the difference between adjacent connection values ​​in the sorting results is calculated to form an absolute value set. The arithmetic mean of the absolute value set of the number of node connections is used as the connection mode parameter to characterize the magnitude of node connection changes. The arithmetic mean of the absolute value set of the neighborhood size field is used as the neighborhood diffusion parameter to characterize the neighborhood diffusion intensity. The degree of dispersion of the recorded values ​​in the edge weight distribution field is used as the edge weight dispersion parameter. The connection mode parameter, neighborhood diffusion parameter, and edge weight dispersion parameter are weighted and summed to form the structural dispersion. The anonymous node identifier is combined with the corresponding structural dispersion to form a node dispersion record. All node dispersion records are encapsulated to form a privacy attribute group.

[0101] It should also be noted that when calculating the structural dispersion, the number of node connections, neighborhood size, and edge weight distribution of each anonymous node are statistically analyzed, and the maximum and minimum values ​​of the three statistical results are calculated in the topology field group. The ratio between the value of the current anonymous node and the overall value range of the statistical results of the current category in each statistical result is normalized and used as the weight of the parameters of the current category (number of node connections, neighborhood size, and edge weight distribution) in the weighted summation.

[0102] S4.5 Traverse the anonymous domain numbers of the domain partitioning group to form an anonymous node sequence, and perform vector fusion with the privacy attribute group to generate an anti-association coding group.

[0103] It should be noted that, following the order of the anonymous domain numbers, each anonymous domain in the domain partitioning group is traversed sequentially. The anonymous node identifiers contained in each anonymous domain are arranged in ascending order according to the character encoding order of the anonymous node identifiers to form an anonymous node sequence. The structural dispersion value corresponding to the anonymous node identifiers in the anonymous node sequence is extracted from the privacy attribute group. Each node identifier in the anonymous node sequence is encapsulated with its corresponding structural dispersion to form a node attribute pair. All node attribute pairs in the same anonymous domain are concatenated in the order of the anonymous node sequence to form an anonymous domain structure vector. All anonymous domain structure vectors are then integrated to form an anti-association coding group.

[0104] S4.6. Using the anonymous field number as a prefix, perform concatenation encoding on the anti-associative coding group to generate an alternative identifier pool.

[0105] It should be noted that for each anonymous domain structure vector in the anti-association coding group, the anonymous domain number and the structural dispersion value of each anonymous node in the anonymous domain are parsed out. The anonymous domain number is used as a prefix symbol, and the structural dispersion values ​​are concatenated continuously with delimiters according to the order of the anonymous nodes to form a substitution identifier string. Each substitution identifier string and the corresponding anonymous domain number are used as substitution identifier entries. All substitution identifier entries are grouped and collected according to the anonymous domain number to form a substitution identifier pool.

[0106] S5. Based on the strategy configuration body, select candidate alternative identifiers from the alternative identifier pool and establish an alternative mapping relationship. Output the alternative mapping body and use it to replace the request data body for uplink anonymous interaction to form the response data payload.

[0107] S5.1. Parse the substitution level, substitution frequency, and substitution method fields of the policy configuration body in units of anonymous domains to form a policy deconstruction group.

[0108] It should be noted that the substitution level field and scale parameter field of each anonymous domain record in the traversal strategy configuration body are used as the anonymous domain number. The substitution level field is used as the basis for filtering the strength of the substitution identifier, and the scale parameter field is converted into substitution frequency by rounding up, which is used to constrain the number of times the identifier in this anonymous domain is substituted in a service interaction. At the same time, the substitution mode is determined according to the substitution level field. When the substitution level field is the first substitution level or the second substitution level, the substitution method field is marked as "rotation", and when the substitution level field is the third substitution level, the substitution method field is marked as "single substitution". The substitution level, substitution frequency and substitution method are encapsulated into substitution strategy entries, and all substitution strategy entries are collected with the anonymous domain number as the key to form a strategy deconstruction group.

[0109] S5.2. Using the substitution level field of the strategy deconstruction group as the screening basis, extract substitution identifiers that match the corresponding substitution level from the substitution identifier pool to form a substitution candidate group.

[0110] It should be noted that, according to the anonymous domain number recorded in the strategy deconstruction group, the corresponding substitution level field is read sequentially, and the substitution level field is used as the filtering condition to locate the set of substitution identifier entries that are consistent with the current anonymous domain number from the substitution identifier pool; for the located set of substitution identifier entries, only the substitution identifiers carried in the substitution identifier entries are retained, and all retained substitution identifiers are encapsulated into a subset of substitution candidates for the current anonymous domain; the subset of substitution candidates is stored in the substitution candidate group using the anonymous domain number as the index.

[0111] S5.3 Extract the identifier field value from the request data body as the mapping key, use the alternative candidate group as the mapping value to establish a mapping key-value pair and associate and encapsulate it with the strategy destructuring group to form an alternative mapping body.

[0112] It should be noted that the identifier field used to express user identity, device identity, or service session identity is located in the request data body, and the corresponding field value is read as the mapping key; based on the anonymous domain number and substitution level recorded in the policy deconstruction group, the set of substitution identifiers matching the current anonymous domain number is retrieved in the substitution candidate group as the mapping value; the mapping key and mapping value are encapsulated in a key-value mapping relationship and associated with the corresponding anonymous domain number, substitution level field, substitution method field, and substitution frequency field to form a substitution mapping entry; all substitution mapping entries are collected in the order of appearance of the identifier fields in the request data body to form a substitution mapping body.

[0113] It should also be noted that key-value mapping refers to a pairing structure consisting of "keys" and "values". The "key" is used to uniquely identify an input object (such as the original field value of an identifier field), and the "value" is used to represent the output object corresponding to the key (such as the corresponding set of alternative identifiers). The key and value are in a one-to-one or one-to-many pairing structure. In the lookup operation, the corresponding value can be obtained by entering the key.

[0114] S5.4 Replace the identifier field value in the request data body with the mapping key-value pair in the substitute mapping body to form an anonymous request body.

[0115] It should be noted that the request data body is scanned to locate the fields used to establish key-value mapping relationships in the substitution mapping body, and the field values ​​of the corresponding fields are recorded as the identifier field values. The corresponding set of substitution identifiers is queried from the substitution mapping body using the identifier field value as the key. Combined with the strategy deconstruction group, substitution identifiers are selected for the substitution method and substitution frequency of the current anonymous domain record. The selected substitution identifiers replace the original identifier field values. After all substituted fields in the request data body have completed the substitution operation, the original values ​​of other fields remain unchanged, and the request data body is re-encapsulated according to the field order to generate the anonymous request body.

[0116] S5.5 Send the anonymous request body to the user information server and receive the data content returned by the user information server, and encapsulate it to form a response data payload.

[0117] It should be noted that the field names and values ​​in the anonymous request body, arranged in a fixed field order, are parsed into key-value pairs. The field values ​​are then processed using a unified character encoding method, converting text field values ​​into character-encoded byte sequences and numeric field values ​​into strings, which are then converted into character-encoded byte sequences. These sequences are then concatenated in the order of "field name length + field name bytes + field value length + field value bytes" to form a message payload that can be transmitted over the network. This message payload is sent to the user information server through its service request interface. After the user information server returns data, the returned data is parsed into key-value pairs of field names and values ​​using the same character encoding method. This data is then encapsulated into a response field set, which is combined with the anonymous request body in a fixed field order to form the response data payload.

[0118] In summary, this invention improves the non-recognizable nature of identifier replacement and reduces the reconstruction risk of cross-request pattern splicing by using dynamic anonymity domain partitioning based on structural complexity and anonymity correlation. At the same time, by constructing an alternative identifier pool, it achieves context-consistent replacement of the alternative identifier and the original identifier at the structural behavior level, ensuring the availability and behavioral continuity of the alternative identifier in the service interaction link, and improving the implementation and stability of IoT privacy protection solutions in actual service environments.

[0119] It should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit it. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all such modifications or substitutions should be covered within the scope of the claims of the present invention.

Claims

1. A method for protecting the privacy of user information services based on identifier replacement, characterized in that: include, Extract service request data initiated by the user terminal and perform context feature encoding to obtain the request data body; Perform field generalization and value range perturbation on the request data body to generate an anonymous sample set, analyze the sensitivity level of the anonymous sample set and perform statistical fuzziness, and output the anonymization adaptation parameters. Based on the request data body and anonymous adaptation parameters, an anonymous relationship graph is constructed, and identifiable information in the anonymous relationship graph is removed, outputting a de-identified relationship graph. By calculating the entropy parameter of the de-identified relationship graph, the anonymous relationship graph is divided into multiple anonymous domains. The entropy gradient and domain size information of each anonymity domain are statistically analyzed to generate a policy configuration body. The topological features of the anonymity relationship graph are extracted and spliced ​​and encoded to generate an alternative identifier pool. Based on the policy configuration body, candidate alternative identifiers are selected from the alternative identifier pool and an alternative mapping relationship is established. The alternative mapping body is output and used to replace the request data body for uplink anonymous interaction to form the response data payload.

2. The user information service privacy protection method based on identifier replacement as described in claim 1, characterized in that: The specific steps for obtaining the request data body are as follows. Receive service interaction requests sent by user terminals and perform semantic extraction and field classification processing to form service request data; Extract the contextual attributes of the service request data and perform feature encoding to generate the request data body.

3. The user information service privacy protection method based on identifier replacement as described in claim 1, characterized in that: The specific steps for generating the anonymous sample set are as follows. Separate the user attribute fields and value range fields in the request data body and perform field generalization processing to form a field generalization group; The field generalization group is subjected to value range perturbation and structural combination according to the field correspondence to generate an anonymous sample set.

4. The user information service privacy protection method based on identifier replacement as described in claim 1, characterized in that: The specific steps for outputting anonymous adaptation parameters are as follows. Evaluate the field generalization level and perturbation magnitude of the anonymous sample set and detect sensitive fields, generating sensitivity level groups with sensitivity coefficients; Based on the sensitivity coefficient, the fuzzy coverage range and field granularity coverage of the sensitivity level group are statistically analyzed to generate anonymous adaptation parameters.

5. The user information service privacy protection method based on identifier replacement as described in claim 1, characterized in that: The specific steps for de-identifying the output relationship graph are as follows. An anonymous relationship graph is constructed using the request data body as anonymous nodes and the anonymous adaptation parameters as edge weights; The identifiable fields related to service accounts, terminal identifiers, and user identifiers in the anonymous relationship graph are removed to form a de-identified relationship graph.

6. The user information service privacy protection method based on identifier replacement as described in claim 5, characterized in that: The specific steps for dividing the anonymous relationship graph into multiple anonymous domains are as follows. Statistical analysis is used to identify the node density, edge connection number, and neighborhood distribution in the graph, forming a graph structure statistical group. Calculate the local entropy parameters, structural entropy parameters, and diffusion entropy parameters of the graph structure statistics group to generate an entropy index group; Based on the entropy index group, the anonymous nodes in the de-identified relation graph are divided into different anonymous domains, forming domain partitioning groups carrying anonymous domain numbers.

7. The user information service privacy protection method based on identifier replacement as described in claim 6, characterized in that: The steps for generating a policy configuration body by statistically analyzing the entropy gradient and domain size information of each anonymous domain are as follows. The scale parameters and average entropy values ​​of each anonymous domain are statistically analyzed, and the entropy gradient parameters between adjacent anonymous domains are calculated based on the average entropy values ​​of each anonymous domain, forming a gradient hierarchical group. Based on the gradient hierarchical group, the substitution level is labeled for each anonymity domain, and the substitution level of each anonymity domain is associated with the scale parameter to output the policy configuration body.

8. The user information service privacy protection method based on identifier replacement as described in claim 7, characterized in that: The specific steps for extracting the topological features of the anonymous relationship graph, concatenating and encoding them, and generating a replacement identifier pool are as follows. Extract the topological fields from the anonymous relation graph, and count the number of node connections, neighborhood size and edge weight distribution for each anonymous node to form a topological field group. The structural dispersion is calculated by statistically analyzing the node connection patterns and neighborhood diffusion of the topological field group, thus forming a privacy attribute group. Traverse the anonymous domain numbers of the domain partitioning group to form an anonymous node sequence, and perform vector fusion with the privacy attribute group to generate an anti-association coding group; The anonymous field number is used as a prefix to concatenate and encode the anti-associative coding group to generate an alternative identifier pool.

9. The user information service privacy protection method based on identifier replacement as described in claim 8, characterized in that: The specific steps for the output replacement mapping are as follows. The substitution level, substitution frequency, and substitution method fields of the policy configuration body are parsed in units of anonymous domains to form a policy deconstruction group; Based on the substitution level field of the strategy deconstruction group, substitution identifiers that match the corresponding substitution level are extracted from the substitution identifier pool to form a substitution candidate group; Extract the identifier field value from the request data body as the mapping key, use the alternative candidate group as the mapping value to establish a mapping key-value pair, and associate and encapsulate it with the strategy destructuring group to form an alternative mapping body.

10. The user information service privacy protection method based on identifier replacement as described in claim 9, characterized in that: The specific steps for forming the response data payload are as follows. The anonymous request body is formed by replacing the identifier field value in the request body with the mapping key-value pair in the alternative mapping body. An anonymous request body is sent to the user information server, and the data content returned by the user information server is received and encapsulated to form a response data payload.