Method for data transcoding for avoiding re-identification by ai

WO2026130894A1PCT designated stage Publication Date: 2026-06-25RES IND SYST ENG RISE FORSCHUNGS ENTWICKLUNGS UND GROSSPROJEKTBERATUNG

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
RES IND SYST ENG RISE FORSCHUNGS ENTWICKLUNGS UND GROSSPROJEKTBERATUNG
Filing Date
2025-11-13
Publication Date
2026-06-25

Smart Images

  • Figure EP2025082972_25062026_PF_FP_ABST
    Figure EP2025082972_25062026_PF_FP_ABST
Patent Text Reader

Abstract

The invention relates to a computer-implemented method for anonymising personal or health-related input data sets (DSj), comprising the steps of: - providing at least one input data set (DSj) which comprises data entries (DEi), in particular medical measurement values, - determining a positive upper limit (pOG) and negative upper limit (nOG) for a deviation of the data entry (DEi) to be transcoded, - determining a positive lower limit (pUG) and negative lower limit (nUG) for a deviation of the data entry (DEi) to be transcoded, - wherein the positive upper limit (pOG) and the positive lower limit (pUG) form an upper value range (oWB) and the negative upper limit (nOG) and the negative lower limit (nUG) form a lower value range (uWB); - randomly selecting a random value in the upper or lower value range (oWB, uWB) in order to obtain a transcoded data entry (tDEi) - outputting an output data set containing the transcoded data entry (tDEi).
Need to check novelty before this filing date? Find Prior Art

Description

[0001] Methods for data transcoding to prevent re-identification by AI

[0002] The invention relates to a computer-implemented method for anonymizing personal or health-related data sets.

[0003] The state of the art in anonymizing personal and health-related data has advanced considerably in recent years, driven by the significantly increased threat of re-identification, particularly through the use of artificial intelligence (AI). Traditional anonymization techniques, such as pseudonymization or the aggregation of data using k-anonymity, have significant weaknesses. Pseudonymization replaces identifiers with pseudonyms, but this does not offer complete protection, as re-identification is possible through correlation with external datasets. k-anonymity, which ensures that an individual is indistinguishable from at least k-1 others, provides a good approach to anonymization but is vulnerable to advanced AI methods that can detect hidden patterns in the data.

[0004] The increasing availability of large datasets and the exponentially growing computing power of AI systems make it possible to de-anonymize even highly anonymized data. Artificial intelligence can draw conclusions about individuals through cross-referencing with external databases or by analyzing behavioral patterns in datasets. Traditional techniques are therefore increasingly inadequate to guarantee data privacy in a world where AI-based analytics have become commonplace.

[0005] Against this backdrop, the current state of technology shows that anonymization methods need further improvement. While modern approaches offer enhanced protection, they reach their limits when processing large and complex datasets, particularly with regard to preventing re-identification by AI. Ongoing developments therefore require more robust solutions that can address these challenges.

[0006] The object of the invention is therefore to provide an improved method for anonymizing personal or health-related datasets. This object is achieved by a computer-implemented method for anonymizing personal or health-related datasets, comprising the following steps:

[0007] - Providing at least one input data set comprising data entries, in particular medical measurements, wherein the data entries are numeric data entries,

[0008] Transcoding at least one data entry of the input data set, outputting an output data set that includes the transcoded data entry, wherein the transcoding of at least one data entry is carried out by the following steps:

[0009] - Determining a positive and negative upper limit of the deviation of the data entry to be transcoded, where the positive upper limit is above the data entry and the negative upper limit is below the data entry,

[0010] - Determining a positive and negative lower limit of the deviation of the data entry to be transcoded, where the positive lower limit is above the data entry and the negative lower limit is below the data entry,

[0011] - where the positive upper limit and the positive lower limit form an upper range of values ​​and the negative upper limit and the negative lower limit form a lower range of values;

[0012] - Randomly selecting a random value in the upper or lower range of values, in particular by randomly selecting the upper or lower range of values ​​and randomly selecting a random value in the randomly selected range of values ​​to obtain the transcoded data entry.

[0013] According to the invention, the data entry is not merely "blurred" around the original data entry according to a standard distribution, but rather the data entry is randomly selected from one of two value ranges, which significantly complicates re-identification. It should be noted that the lower limit of the value ranges ensures that the original data entry is not included, thus enforcing a certain distance from it. The upper limit prevents the transcoded data entry from being too far removed from the original, which would otherwise render subsequent evaluation of the output data set impossible.

[0014] In other words, setting the lower bound serves to prevent re-identification as much as possible, while the upper bound enables subsequent analysis but also makes re-identification more difficult, as this becomes increasingly challenging the further the transcoded data entry deviates from the original data entry. Simultaneously, a large distance between the positive upper and lower bounds, and between the negative upper and lower bounds, is advantageous to hinder re-identification. This is because an original data entry shifted by a constant number could easily be reverse-coded back to the original data entry.

[0015] To further clarify this, it should be noted that the upper limit of the deviation from the baseline value can be determined by a discipline-specific maximum value, ensuring that any derived conclusion remains unchanged by the modification. Examples include blood test results from which no other disease diagnosis would be inferred, or financial values ​​within an investment history that would not imply a different wealth / asset category or investment strategy. The lower limit of the deviation from the baseline value can be calculated using a combination of statistical methods. These methods aim to determine the minimum deviation necessary to ensure, based on the population or set of values ​​and the value variance, that machine learning cannot re-identify the individual.

[0016] A further special feature of the inventive method is that the positive and negative lower limits do not need to be chosen symmetrically around the original data set, nor do the positive and negative upper limits need to be chosen symmetrically around the original data set. This can be particularly relevant if an upward deviation of the original data set is unproblematic for further evaluation, but a downward deviation of the original data set would complicate further evaluation, or vice versa.

[0017] It should be emphasized that “transcoding” in this context can be understood to mean in particular “non-reversible transcoding” or “non-reversible manipulation” of the data entry.

[0018] In a particularly preferred embodiment of the invention, the method comprises the following steps:

[0019] - Receiving or determining a characteristic to be examined, in particular a health condition such as a disease,

[0020] - Determining the positive and negative upper limits depending on the property under investigation, in particular by querying a database.

[0021] This allows each data entry to be individually transcoded, depending on the property being investigated. To illustrate this, consider the following hypothetical example. A third party may request anonymized datasets to investigate long-term COVID-19. In this case, it is known that for a data entry of a specific category, such as the ferritin measurement from a blood test, extremely high accuracy is required to draw conclusions about long-term COVID-19. The positive and negative upper limits are specified accordingly, for example, as + / - 0.1% of the measurement. However, if the third party requests anonymized datasets to investigate a specific type of cancer, the ferritin measurement may still be relevant but have less influence on the analysis, so a positive and negative upper limit of, for example, 0.1% would be appropriate.A limit of + / - 0.5% of the measured value can be used to avoid influencing the evaluation. It is understood that with an upper limit of, for example, + / - 0.5%, re-identification is considerably more difficult than with an upper limit of, for example, + / - 0.1%. The individual upper limits thus make it possible to complicate re-identification depending on the dependency of the property being investigated.

[0022] Furthermore, it is noted that for certain properties under investigation, an upward deviation or a deviation up to a specific threshold may be more or less relevant than a downward deviation or a deviation up to a specific threshold, so this asymmetry can also depend on the property being investigated. The positive and negative upper limits can, for example, be linked to the property under investigation in a database accessed during transcoding. In the example above, the database could contain, for instance, a first data entry "Long-COVID - Ferritin - 0.1%" and a second data entry "Cancer - Ferritin - 0.5%". Alternatively, the positive and negative upper limits could be specified directly in a third-party query.

[0023] In the simplest case, the positive and negative lower bounds can be chosen as default values. Preferably, however, the positive and negative lower bounds are selected based on corresponding data entries from other input or output data records that are to be passed to a third party. For example, four data records each contain a data entry "Ferritin," i.e., a data entry of the same category. The data entries are 1.51, 1.49, 1.52, and 1.41, respectively. If no lower bound or a lower bound of + / -0.02 were chosen, it would not be possible to sufficiently anonymize the data entry of the fourth data entry, thus making this data record re-identifiable. A lower bound of, for example, + / -0.08 according to the invention therefore helps to significantly hinder re-identification. For the first three data records, however, a lower bound of + / -0.03 might be sufficient.It is evident that the lower limit should be chosen depending on further data entries of further data records in order to make re-identification as difficult as possible. In other words, according to the invention, at least one further data record with at least one further data entry can be provided, which is of the same category as the data entry to be transcoded, wherein the positive and negative lower limits are determined depending on the at least one further data entry, in particular as at least half the distance to the nearest further data entry or at least the distance to the nearest further data entry.

[0024] In particular, the positive and negative lower bounds can be determined depending on a statistical procedure, preferably chosen from one of the following statistical procedures: a calculation of intra- and inter-subject variability, especially using ANOVA, a data standardization, especially using z-transformation, a cluster analysis, a discriminant function analysis (DFA), a definition of the threshold using receiver operating characteristics (ROC), curves, or a validation using further data sets.

[0025] The above explanations regarding upper and lower limits are sufficient in many cases to both hinder re-identification and prevent it from influencing subsequent analysis. However, in some cases, re-identification is still possible if the upper and lower limits are too close together. Therefore, it is preferable for the positive and negative lower limits to maintain a predetermined minimum distance from the positive and negative upper limits. Thus, for each range of values, there are three degrees of freedom: the choice of the respective lower limit, the choice of the respective upper limit, and the choice of the distance between the upper and lower limits.

[0026] In the simplest case, when randomly selecting a value from a randomly chosen range, no weighting can be applied; that is, the random value is equally likely to be close to the upper limit or close to the lower limit. However, to minimize the influence on subsequent evaluation, it may be preferable to select the random value with a weighting, where the weighting preferably has its maximum value at the positive and / or negative lower limit. In another case, the lower and upper value ranges can be symmetrical, meaning they are of equal size and equidistant from the original data entry.In this case, it makes no difference whether a specific selection between the lower and upper value ranges is made first, and only then a random value within the chosen range is selected, or whether both selections are made simultaneously—that is, a single random number is chosen that simultaneously defines the value range and the random value within that range. For example, a random number in the range between 0 and 1 could be chosen; if the random number is less than 0.5, the lower value range is selected, and if the random number is greater than 0.5, the upper value range is selected. The exact value of the random number then defines the random value. This can also be implemented for more general cases, even if the lower and upper value ranges are not symmetrical.

[0027] However, independently selecting the value range and the random value has the advantage of providing greater security against deanonymization, since a single random number generator can generate two random values ​​with different seed values. That is, the value range is determined by the random number generator with a first seed value, and the random value is determined by the random number generator with a second seed value, which significantly increases security against deanonymization. In other words, it is preferable if the random selection of the upper or lower value range and the random selection of a random value within the randomly selected value range occur independently of each other and are each performed using a random number generator with different seed values.

[0028] Up to now, it has always been assumed that the upper and lower bounds lie in the same range of values, i.e., data space, as the data entry to be manipulated; that is, the transcoded data entry was of the same order of magnitude as the original data entry. However, the procedure can also include the following step: mapping the transcoded data entry, the data entry to be transcoded, or the upper and lower bounds to a new range of values ​​using a predefined function. Each of these three options will result in the transcoded data entry lying in a different range of values, i.e., data space, than the original data entry. While such a mapping of the data entry to a new range of values ​​is reversible in itself, it creates an additional layer that complicates the interpretation of the output data set.For example, initial blood test results typically range from 1 to 10, and subsequent blood test results from 100 to 1000. If a third party sees a data entry of, say, 200, it's clear that this must be a subsequent blood test result, which simplifies the evaluation of the data set for that third party. However, if both (or one) of the blood test results are mapped to a range of, say, 2000 to 4000, it's impossible to determine from a single data entry whether it's an initial or subsequent blood test result. Surprisingly, it has been shown that even with the values ​​mapped to the new range, it's possible to perform AI-based analysis regarding diseases or other characteristics to be investigated, since the relative properties of the measurements are preserved even within the new range.For example, a previously high cholesterol level will remain a "high" cholesterol level within the new range. However, mapping the results to the new range has further improved anonymization.

[0029] It goes without saying that not only can a single data entry from a single input data set be transcoded, but also multiple data entries from a single input data set or data entries from multiple input data sets.

[0030] In other words, the method according to the invention can include the step of completely transcoding an input data set, wherein in the complete transcoding of the input data set all numerical data entries of the input data set are transcoded; and / or the method can include the step of transcoding at least one data entry of a plurality of input data sets, wherein the method can preferably further include the step of completely transcoding a plurality of input data sets, wherein the output data sets are preferably stored in a database.

[0031] The process can also include further steps, such as providing the output dataset to a third party and, if necessary, the step of computer-aided processing of at least a part of the output dataset by the third party, where this part comprises the transcoded data entry. This part can, if necessary, be used as training data for training an AI model.

[0032] The invention further provides a device for data processing, comprising means for carrying out the steps of the aforementioned method. Furthermore, the invention provides a computer program, comprising instructions which, when the program is executed by a computer, cause it to carry out the steps of the aforementioned method, and a computer-readable data carrier on which the aforementioned computer program is stored.

[0033] To better illustrate the present invention and explain its operation in detail, reference is made below to the accompanying figure. This figure serves to clarify the technical features of the invention. The following description of the figure is intended to contribute to a thorough understanding of the structural and functional properties of the invention and to explain its advantages and technical advances compared to the prior art. It shows:

[0034] Figure 1 shows several personal data records, each containing multiple data entries.

[0035] Figure 2 shows the transcoding of a data entry in a data set.

[0036] Figure 3 shows the transcoding of a data entry using a first example on a number line.

[0037] Figure 4 shows the transcoding of a data entry using a second example on a number line.

[0038] As shown in Figure 1, the present invention is based on at least one data set DE1 containing personal data. Personal data, according to the European General Data Protection Regulation (GDPR), is any information relating to an identified or identifiable natural person. A person is considered identifiable if they can be identified, directly or indirectly, for example, by reference to an identifier such as a name, an identification number, location data, or an online identifier. This includes both obvious information such as name and address, and less obvious information such as IP addresses or GPS data. Sensitive data, such as information concerning health or religious beliefs, also falls under the definition of personal data, as long as it allows for the identification of a person. For example, the combination of a date of birth and an address can be used to infer a person's identity.

[0039] Health-related data is a special case of personal data and requires particularly careful protection. According to the European General Data Protection Regulation (GDPR), for example, health-related data falls under special categories of personal data that contain information about a person's physical or mental health and from which inferences can be drawn about that person's health status. This includes data relating to current, past, or future health, such as information about illnesses, medical treatments, or genetic and biometric data used to uniquely identify a person. Such data is subject to particularly high protection because it is sensitive and provides deep insight into a person's privacy.

[0040] Figure 1 shows an initial data record DS1, which contains several data entries DEI, DE2, DE3, DE4, ... DEi. This initial data record DS1 comprises "actual" data, such as medical measurements or a person's actual date of birth. This data record DSj can also be referred to as the "input record." Even if this data record DSj does not contain a name or similar information, it can still be associated with a person. In the above example of personal data, the combination of a date of birth with an address allowed for the identification of a person. However, even if the data record contains health-related data, such as a blood test without a name, it can still be assigned to a person, for example, if a third party possesses the same data record with a name or even just a subset of the initial data record containing a name.The procedures described below in connection with Figures 2 to 4 are used to anonymize the first data set DSj so that it can no longer be associated with the person, i.e., it can no longer be re-identified.

[0041] Before proceeding, however, we will revisit Figure 1, as it illustrates that multiple datasets DS1, DS2, DS3, ... DSj, may exist. These DSj datasets are also input datasets and comprise "actual" data, such as medical measurement data. The DSj datasets typically originate from different individuals but may also have been recorded by the same individual at different times. All of these DSj datasets may reside within a single computing unit, such as a security-critical data center, and may only be output by this unit in anonymized form. Alternatively, the input DSj datasets can be anonymized immediately upon receipt by the data center using the procedure described below and stored only in anonymized form.

[0042] The data records DSj can all contain the same number of data entries DEi, although this is not mandatory. In Figure 1, for example, data record DS3 is missing the data entry DE3. The data entries DEi to be transcoded according to the present invention are numeric data entries, in particular fractional or integer numbers. These numeric data entries are, in particular, medical measurements such as blood values, temperature readings, blood pressure readings, pulse rate, respiratory rate, body weight, height, or urine analyses. Figure 1 also shows that the data records DSj can include data entries DE4, which are not numeric data entries. These are not transcoded using the method according to the invention but can be transcoded using another method.

[0043] The definition further states that two data entries DEi from two or more data records DSj belong to the same category if they denote the same value. For example, in the example of Figure 1, the data entry DEI is the blood measurement "ferritin", so the data entry DEI of the first data record DS1 and the data entry DEI of the second data record DS2 belong to the same category.

[0044] An anonymized dataset, as defined here, is a dataset in which at least one of the data entries DEi of an input dataset DSj has been transcoded. The output dataset, i.e., the anonymized dataset or transcoded dataset tDSj, may, for example, include all data entries DEi (both of the same category and with the same numerical value) of the input dataset DSj, provided that at least one of the data entries DEi has been transcoded. However, some of the data entries DEi may also be omitted from the output dataset tDSj, particularly if they are not needed for further analysis; that is, the output dataset tDSj may be both transcoded and reduced compared to the input dataset DSj. If all remaining numerical data entries tDEi in the output dataset are transcoded, it is referred to here as a fully transcoded output dataset tDSj.

[0045] Figure 2 shows such an example, illustrating that an output data record tDSl was generated from an input data record DS1. The numeric data entries DEI and DE3 were transcoded and are listed as transcoded data entries mDEl and mDE3 in the output data record tDSl. The data entry DE2 was not included in the output data record DE2. The data entry DE4 is not a numeric data entry and is not considered for the purposes of this invention.

[0046] Figure 3 shows how the data entries DEi are transcoded according to the present invention. According to this example, the original data entry DEI has the numerical value 1.51.

[0047] For transcoding, a positive upper limit (pOG) and a negative upper limit (nOG) are defined. The purpose of the positive upper limit (pOG) and the negative upper limit (nOG) is to prevent the original data entries (DEi) from being transcoded too high, as this would render subsequent analysis impossible. For example, it might make no difference whether a measured value of 1.51 or 1.55 is present in determining the presence of a specific disease. However, if the transcoded measured value were 1.65, this would make a significant difference in determining the presence of a specific disease. For another disease, it's possible that even a transcoded measured value of 1.65 would not make a difference in diagnosis, but a transcoded value of 1.75 would.

[0048] Furthermore, a positive lower bound (pOG) and a negative lower bound (nOG) are determined for transcoding. The purpose of the positive lower bound (pOG) and the negative lower bound (nOG) is to ensure that the transcoded data entry tDEi reveals as little as possible about the original data entry DEi. The lower bounds are generally determined based on data entries DEi of the same category from other input data records DS. In the example shown, for instance, lower bounds are to be found for the data entry DE1=1.51 of the first data record DS1. For this purpose, data entries DEI belonging to the same category as the data entry DEI to be transcoded are searched in the other data records DS2, DS3, and DSj, i.e., the data entries DE1=1.49, DE1=1.52, and DE1=1.41 (Figure 1). All four data entries DEI belong to the same category and are, for example, "ferritite" measurements. For the first data entry DEI, the positive lower bound can be set at, for example, 1.The lower limit of 52 and the negative limit of 1.49 should be chosen so that the transcoded data entry DEI lies within the range of the other data entries DEI. However, other statistical methods are also possible for choosing the lower limits, which do not necessarily have to consider data entries from other data sets, especially if statistical fluctuation ranges are already known.

[0049] It is understood that the positive and negative upper bounds pOG, nOG, as well as the positive and negative lower bounds pUG, nUG, do not correspond to the original data entry DEi. Furthermore, the positive upper bound pOG is further removed from the original data entry DEi than the positive lower bound pUG. Similarly, the negative upper bound nOG is further removed from the original data entry DEi than the negative lower bound nUG.

[0050] Once the positive and negative upper bounds pOG, nOG as well as the positive and negative lower bounds pUG, nUG have been determined, two value ranges are obtained: an upper value range oWB between the positive lower bound pUG and the positive upper bound pOG and a lower value range uWB between the negative lower bound nUG and the negative upper bound nOG.

[0051] To determine the transcoded data entry tDEi, a first step is to select whether the upper value range oWB or the lower value range uWB should be used. In the example shown in Figure 3, the upper value range oWB was used. Next, a random value is selected from the randomly chosen value range oWB, which represents the transcoded data entry tDEi. These two selection steps can be performed independently or simultaneously, for example, by fictitiously concatenating the two value ranges oWB and uWB and selecting a random number from this fictitiously contiguous range.

[0052] Figure 4 shows another example where the upper value range (oWB) or the lower value range (uWB) is asymmetrically positioned around the original data entry DE1. This asymmetry is expressed in two ways: firstly, the value ranges oWB and uWB have different sizes, and secondly, they are positioned at different distances from the original data entry DE1. However, an asymmetry can also be implemented where only one of these two factors is present.

[0053] In Figure 4, the triangular, hatched value ranges uWB, oWB further indicate that the random value is chosen with a weighting and is more likely to lie close to the positive or negative lower bound pUG, nUG than to the positive or negative upper bound pOG, nOG. In the example of Figure 4, a linear weighting is used, although it is understood that other weightings can also be used, in particular weightings based on a Gaussian curve, which is divided in the middle by the distance between the positive lower bound pUG and the negative lower bound nUG. Returning to Figure 3, it is evident that no weighting was applied and the probability of the random value is the same across the entire upper and lower value ranges uWB, oWB.

[0054] It should be emphasized once again that all numerical examples shown herein serve only to illustrate special cases and are in particular not actual measurement data.

Claims

Claims:

1. Computer-implemented method for anonymizing personal or health-related input datasets (DSj), comprising the steps: - Providing at least one input data set (DSj) comprising data entries (DEi), in particular medical measurements, wherein the data entries (DEi) are numeric data entries, Transcoding at least one data entry (DEi) of the input data record (DSj), outputting an output data record (tDSj) that includes the transcoded data entry (tDEi), characterized in that the transcoding of at least one data entry (DEi) is carried out by the following steps: - Determining a positive upper limit (pOG) and a negative upper limit (nOG) of the deviation of the data entry (DEi) to be transcoded, where the positive upper limit (pOG) is above the data entry (DEi) and the negative upper limit (nUP) is below the data entry (DEi), - Determining a positive lower bound (pUG) and negative lower bound (nUG) of the deviation of the data entry (DEi) to be transcoded, where the positive lower bound (pUG) is above the data entry (DEi) and the negative lower bound (nUG) is below the data entry (DEi), - where the positive upper limit (pOG) and the positive lower limit (pUG) form an upper range of values ​​(oWB) and the negative upper limit (nOG) and the negative lower limit (nUG) form a lower range of values ​​(uWB); - Randomly selecting a random value in the upper or lower value range (oWB, uWB) to obtain the transcoded data entry (tDEi).

2. The method of claim 1, comprising the steps of: - Receiving or determining a characteristic to be examined, in particular a health condition such as a disease, - Determining the positive upper bound (pOG) and negative upper bound (nOG) depending on the property to be investigated, in particular by querying a database.

3. A method according to any of the preceding claims, wherein at least one further data record (DSj) is provided with at least one further data entry (DEi) of the same category as the data entry (DEi) to be transcoded, wherein the positive lower bound (pUG) and negative lower bound (nUG) depend on the at least a further data entry (DEi) is determined, in particular as at least half the distance to the nearest further data entry (DEi).

4. A method according to one of the preceding claims, wherein the positive lower bound (pUG) and negative lower bound (nUG) are determined as a function of a statistical method, preferably selected from one of the following statistical methods: a calculation of intra- and inter-subject variability, in particular by means of Analysis of Variance (ANOVA), a data standardization, in particular by means of z-transformation, a cluster analysis, a discriminant function analysis (DFA), a definition of the threshold value by means of Receiver Operating Characteristics (ROC), curves, or a validation by means of further data sets.

5. Method according to any of the preceding claims, wherein the positive lower limit (pUG) and negative lower limit (nUG) assume a predetermined minimum distance to the positive upper limit (pOG) and negative upper limit (nOG).

6. Method according to one of the preceding claims, wherein the random value is selected with a weighting when randomly selecting, wherein the weighting preferably has the maximum value at the positive and / or negative lower limit.

7. Method according to any of the preceding claims, comprising the step of: - Random selection of the upper range of values ​​(oWB) or lower range of values ​​(uWB) and random selection of a random value in the randomly selected range of values ​​to obtain the transcoded data entry (tDEi), wherein the random selection of the upper range of values ​​(oWB) or lower range of values ​​(uWB) and the random selection of a random value in the randomly selected range of values ​​take place independently of each other and are each carried out using a random number generator with different starting values.

8. A method according to any of the preceding claims, further comprising the following step: Mapping a) the transcoded data entry (tDEi), b) the data entry to be transcoded (DEi), or c) the positive upper bound (pOG), negative upper bound (nOG), positive lower bound (pUG) and negative lower bound (nUG) to a new range of values ​​using a predefined function.

9. A method according to any of the preceding claims, comprising the step of completely transcoding an input data set (DSj), wherein in the complete transcoding of the input data set (DSj) all numeric data entries (DEi) of the input data set (DSj) are transcoded.

10. A method according to any of the preceding claims, comprising the step of transcoding at least one data entry (DEi) of a plurality of input data records (DSj), wherein the method preferably further comprises the step of completely transcoding a plurality of input data records (DSj), wherein the output data records (tDSj) are preferably stored in a database.

11. Method according to one of the preceding claims, comprising the step of providing the output data set (tDSj) to a third party, preferably further comprising the step of computer-aided processing of at least a part of the output data set (tDSj) by the third party, wherein the part comprises the transcoded data entry (tDEi).

12. The method of claim 11 in combination with claim 2, comprising the step of computer-aided investigation of the property by means of the output data set (tDSj) by the third party.

13. Device for data processing, comprising means for carrying out the steps of the method according to any one of claims 1 to 10.

14. Computer program comprising instructions which, when the program is executed by a computer, cause it to perform the steps of the method according to any one of claims 1 to 10.

15. Computer-readable data carrier on which the computer program according to claim 14 is stored.