Method for encrypting genetic data of a subject

The method addresses genomic data processing challenges by integrating a digital DNA tag and double encryption to ensure secure, traceable, and compliant genomic data processing, enhancing cybersecurity and facilitating data access.

WO2026132269A1PCT designated stage Publication Date: 2026-06-25GENARO

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
GENARO
Filing Date
2025-12-18
Publication Date
2026-06-25

AI Technical Summary

Technical Problem

Existing genomic data processing systems lack centralized traceability and cybersecurity measures, leading to potential breaches in confidentiality, integrity, and availability, and difficulties in managing identity verification and data sharing, especially in pre-analytical, analytical, and post-analytical phases.

Method used

A computer-implemented method involving the generation of a digital DNA tag encoding a unique proprietary key and public encryption key, which is used to create a synthetic exogenous DNA tag integrated with biological samples, ensuring traceability and double encryption of genomic data, and a centralized database system for secure storage and access control.

Benefits of technology

Ensures secure, traceable, and compliant processing of genomic data, facilitating access while preventing unauthorized decryption, reducing the risk of data breaches and unwanted disclosures, and streamlining data sharing for research.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 00000021_0000
    Figure 00000021_0000
  • Figure 00000022_0000
    Figure 00000022_0000
Patent Text Reader

Abstract

The invention relates to a computer-implemented method for processing genomic data, comprising the following steps: a) Receiving first metadata comprising a request for genetic analysis of the genome of a subject, the genome belonging to an owner, wherein the first metadata (MDD1) comprises data associated with the subject and at least one area of the genome to be analysed, the owner's consent for the analysis having been received; b) Generating a digital DNA tag encoding (i) a unique owner key (ii) a public encryption key specific to the owner, (iii) the first metadata (MDD1) associated with the subject, the first metadata and the unique key (CU) being stored in a first database; the public encryption key (CCP) being associated with a private encryption key.
Need to check novelty before this filing date? Find Prior Art

Description

[0001] DESCRIPTION

[0002] Method for encrypting a subject's genetic data

[0003] TECHNICAL FIELD

[0004] The invention relates to a computer-implemented method for processing genomic data from a biological sample.

[0005] STATE OF THE ART

[0006] For nearly 20 years, the world has entered an era of accelerated biological innovation. The rise of omics technologies, which simultaneously measure thousands of molecules in a complex biological sample, represents the core of biological systems. These biological signals, including genomic data, are converted into digital data which, in order to be processed, must be stored, archived, and linked to medical data.

[0007] After numerous molecular biology reactions (pre-analytical and analytical steps), this digital data must be archived and aggregated with medical data. Then, the entire dataset must be encrypted so that the data can be processed, re-analyzed, and exchanged (post-analytical steps).

[0008] They must be pseudonymized (reversible action) or anonymized (irreversible action) when shared. Each step in the processing of the sample and the data from the sequencing requires the implementation of quality procedures aimed at mitigating handling errors with regard to traceability and identity vigilance.

[0009] Managing data traceability is paramount, starting from the sampling stage and of course throughout all subsequent stages.

[0010] In the event of a failure, the consequences can be extremely serious for patients: delays in care, diagnostic errors, therapeutic errors, erroneous exchanges of information between professionals, recording of health data in a file that does not belong to the user concerned.

[0011] There are standards or recommendations (e.g., ISO 15189 standard, National Cancer Institute: "Next generation sequencing of a gene panel for somatic genetic analysis / Method validation") which require verification of the proper conduct of medical biology analyses (quality controls, automation of analyses, etc.), but no absolute control of traceability and, therefore, of identity vigilance.

[0012] Furthermore, the various pre-analytical, analytical and post-analytical stages are considered independent of each other, making it difficult to control them centrally.

[0013] Furthermore, when converting biological data into digital data, traceability and identity verification are essential. It is crucial to ensure that this digital data accurately reflects the individual to whom the biological sample belongs. Moreover, in post-analytical stages, cybersecurity methods do not currently consider traceability from the point of collection. This traceability is essential for the trust placed in these genomic and medical databases, which underpins their scientific and economic value.

[0014] Indeed, attacks targeting genomic data include attacks against its confidentiality, integrity, and availability. They can lead to its theft, but also its deletion or falsification.

[0015] The serious breaches of confidentiality they cause can harm individuals by making them vulnerable to blackmail, discrimination based on phenotype, risk of disease, general health, appearance, and physical abilities. This is in addition to the social problems that arise, for example, from the unwanted disclosure of hidden genetic links or a potential, as yet undeclared, genetic disease.

[0016] Unwanted disclosures following next-generation sequencing (NGS) analyses in routine and "recreational" genetic testing are common and inherent to the technology. Such analyses scan genomic regions not prescribed and therefore not consented to by the patient, in violation of the General Data Protection Regulation (GDPR).

[0017] Thus, human genomic data, its traceability, and its protection are at the heart of a dichotomy: on the one hand, an imperative need for traceability, identity verification, and security; on the other, a real need to facilitate access to this data to streamline its sharing for research purposes, as recommended by the Scientific Council of the World Health Organization in its report of July 12, 2022. STATEMENT OF THE INVENTION

[0018] The invention makes it possible to respond infallibly to all the challenges raised by the nature of human genomic data in the pre-analytical, analytical and post-analytical phases of their processing and use, in terms of traceability, identity vigilance, access on demand, cybersecurity and management of their confidentiality.

[0019] The invention notably proposes a centralized solution for the processing of genomic data by providing traceability of this data.

[0020] To this end, the invention proposes a computer-implemented method for processing genomic data, comprising the following steps: a) Receiving initial metadata including a request for genetic analysis of a subject's genome, the genome belonging to an owner, the initial metadata including data associated with the subject and at least one region of the genome to be analyzed, the owner's consent for the analysis having been obtained; b) Generating a digital DNA tag encoding (i) a unique proprietary key (ii) a public encryption key specific to the owner, (iii) the initial metadata associated with the subject, the initial metadata and the unique key being stored in a first database; the public encryption key being associated with a private encryption key;c) Receiving a genomic data file resulting from the sequencing of a sample comprising a biological sample from the owner mixed with a synthetic exogenous DNA tag associated with the owner, the synthetic exogenous DNA tag resulting from a synthesis of the digital DNA tag generated in step b); d) Processing the genomic data file so as to recover the digital DNA tag; e) Decoding the digital DNA tag to identify a decoded owner's unique key; f) Comparing the decoded unique key with the owner's unique key and if the decoded unique key matches the owner's unique key, the process includes the steps of g) Generating an intermediate file comprising all the subject's genomic data; h) Double-encrypting the intermediate file using a third-party private encryption key and the public encryption key;the intermediate file being doubly encrypted, thus preventing decryption without the owner's authorization; i) Storage of the intermediate doubly encrypted file and the unique owner key in a second database linked to the first database.

[0021] The invention is advantageously complemented by the following features, taken alone or in any technically feasible combination thereof:

[0022] - the process includes a step of sending the digital DNA tag and unique proprietary key to a laboratory terminal to prepare subject-specific sampling material, the laboratory generating the synthetic exogenous DNA tag and introducing it into the material, the unique proprietary key being affixed to the sampling material so as to be readable.

[0023] - The process includes, before sampling, receiving the unique key issued from a terminal in a sampling laboratory and retrieving the subject's identity from the first database and transmitting the subject's identity to the sampling laboratory terminal in order to verify the subject's identity before sampling to ensure that the sampling material corresponds to the subject.

[0024] - The process includes, after step g), an extraction of genomic data corresponding to the area(s) of the request for the first metadata to obtain a first result file and a transmission of the first result file to the requester in accordance with the analysis request.

[0025] - The process includes j) Extraction from the first database, the unique owner key (CU) and the subject identification data; k) Extraction from the second database, the intermediate file corresponding to the area(s) of the request associated with the extracted unique owner key; l) reception from an owner terminal of the intermediate file decrypted for the first time using the owner's private key, the reception corresponding to an authorization of access to the intermediate file by the owner; m) second decryption of the intermediate file using the private key of the trusted third party.

[0026] - The process includes, before step j), receiving second metadata corresponding to a request for reanalysis of a subject's genome, the second metadata including sequences to be analyzed, and after step m) extracting from the file deciphered in step m) second genomic data of interest corresponding to the reanalysis request to obtain a second result file; the process includes transmitting the second result file to the requester, the process wherein in step I) the owner authorizes the reanalysis request.

[0027] - The process includes before step k) receiving third metadata corresponding to a request to create at least one cohort for the research, the third metadata including sequences desired for the cohort; and after step m) extracting the intermediate file from a third result file including only the sequences desired for the cohort, storing in a third database the third result file and anonymized data corresponding to the subject identification data from which the genomic data originate, process wherein in step I) the owner authorizes participation in the cohort.

[0028] - The process includes storing the subject's intermediate file in a fourth database.

[0029] - The DNA tag encodes the owner's own public encryption key, the first metadata associated with the subject, and the unique owner key in the form of a binary code based on a combination of the 4 nucleotide bases A, T, G, C.

[0030] - Metadata includes medical information about the subject and / or information relating to the identity of the subject and / or the conditions of sample collection and / or the nature of the sample.

[0031] - The metadata includes at least one position of interest corresponding to the analysis / reanalysis request.

[0032] - The genomic data file, the intermediate file, and the results file are all text files. The invention offers numerous advantages.

[0033] The invention, while respecting the requirements of standardized processes (ISO 15189:2022) and regulations in force (GDPR for Europe), ensures double security - physical and digital - of the biological sample and the data from sequencing on all processes: pre-analytical, analytical and post-analytical.

[0034] In the pre-analytical phase:

[0035] The synthetic exogenous DNA tag is uniquely encoded for the sample. Once produced as a tag, it is added to the sample as close as possible to the collection site to be sequenced simultaneously with the sample's genomic DNA. This provides physical security. Thanks to the metadata it contains, it guarantees sample traceability and facilitates patient identification.

[0036] - In addition, the metadata encoded in the synthetic exogenous DNA tag is encrypted and then inserted and catalogued in a secure database, BBD A. This will allow the certification of identity and the securing of genomic data during the post-analytical phase.

[0037] In the analytical phase:

[0038] - The synthetic exogenous DNA tag is automatically authenticated and discriminated from genomic DNA during the sequencing phase.

[0039] - Thanks to its unique nature, traceability is ensured in this phase, in direct continuity with the pre-analytical phase, even if a human error occurred during the previous phases.

[0040] In the post-analytical phase:

[0041] - The metadata encoded in the synthetic exogenous DNA tag allows for the automation of bioinformatic analyses, the encryption of all digital data from sequencing, and related data. It therefore enables, iteratively:

[0042] - A key traceability process, the delivery of a reliable result that strictly adheres to the agreed-upon prescription, and facilitated subsequent access, with a new prescription, without requiring new sequencing for analysis of other regions of interest in the genome. - Full preservation of the data owner's right to withdraw consent. Consent is obtained through an existing integrated third-party application, for example, the AMELI™ service for France (e-prescription), or by scanning the paper version. This consent is recorded in the initial database, allowing for subsequent withdrawal in compliance with current legislation.

[0043] - The subsequent anonymization of data and therefore the facilitation of its sharing for research. Indeed, all the data from high-throughput sequencing are structured, doubly encrypted (owner and trusted third party for example) and stored in a second database which can be used later for, for example, clinical studies based on genomic data and related data (biological, clinical, real-life, etc.) which, in order to be transferred, will be anonymized and only with the consent of the owner, who will perform the first decryption of the portions of genomes corresponding to the study, and verification of the validity of the study by the trusted third party who will perform the second decryption and carry out the necessary anonymization tasks in collaboration with the services of those responsible for the study.

[0044] Furthermore, the invention simplifies the high-throughput sequencing process. It saves costs and time for genetic reanalysis by avoiding the need to repeat the high-throughput sequencing phase and is suitable for large volumes of genetic analysis requests.

[0045] PRESENTATION OF THE FIGURES

[0046] Other features, purposes and advantages of the invention will become apparent from the following description, which is purely illustrative and not limiting, and which should be read in conjunction with the accompanying drawings on which:

[0047] - Figure 1 illustrates an implementation architecture of the invention according to one embodiment;

[0048] - Figure 2 illustrates steps in a genomic data processing method according to one embodiment.

[0049] Throughout the figures, similar elements bear identical references. DETAILED DESCRIPTION

[0050] Architecture

[0051] Figure 1 illustrates an implementation architecture of the invention according to one embodiment.

[0052] An information system server of a genomic data processing entity is connected to a first database BDD A, a second database BDD B, a third database BDD C, a fourth database BDD D. The databases BDD A, BDD B, BDD C, BDD D are configured to store data during a genomic data processing procedure as described in this presentation.

[0053] Each database BDD A, BDD B, BDD C, BDD D is connected to the SI server via a local network or an Internet network.

[0054] Databases can be merged together by delimiting dedicated storage spaces within an overarching database.

[0055] Several terminals (TD, TP, TL, TL', TL") are connected to the SI server. Each TD, TP, TL, TL', TL" terminal includes a user interface that allows a user to interact with the SI server during a genomic data processing procedure as described in this document. A TD, TP, TL, TL' terminal is a computer, tablet, or smartphone. Each TD, TP, TL, TL', TL" terminal, like each database, is connected to the SI server via a local or internet connection, wired or wireless.

[0056] The SI server includes one or more distributed or unconfigured microprocessors to implement steps of a genomic data processing procedure described below.

[0057] Genomic data analysis

[0058] A requester D, who is, for example, a prescriber such as a doctor, pharmacist, veterinarian, dentist, genetic counselor, or laboratory, wishes to request a genetic analysis of a subject's genome. Requester D obtains consent from the subject or their owner, if applicable (step E1), either through an existing integrated third-party application, such as the AMELI™ service for France (e-prescription), or by scanning the paper version. This consent is recorded in the first database, BDD A, allowing for subsequent retrieval in accordance with applicable regulations.

[0059] The subject can be a human, an animal, a bacterium, a yeast, or a plant.

[0060] The genome has an owner P. When the subject is a human, the owner P is the subject itself if it has legal capacity, or otherwise the person who can exercise that capacity on its behalf. When the subject is not a human, the owner P is the one who possesses the subject.

[0061] Applicant D, using a TD terminal with dedicated software, will create this analysis by entering several pieces of information about the subject and the sequences to be analyzed. This includes specifying the region(s) of the subject's genome to be analyzed.

[0062] In this capacity, the requester D (via a TD terminal) generates initial MDD1 metadata including information on the analysis request, and data associated with the subject, including their identity and possibly location data such as GPS coordinates (in English "Global Positioning System") (step E2).

[0063] This metadata data can therefore be, for example, any information relating to the identity of the subject (e.g., name, barcode, database identification number, etc.); the conditions of sample collection (e.g., date and place); the nature of the sample (e.g., a blood sample taken from a patient with a specified condition) or even, in the case of a patient, the patient's medical record.

[0064] Simultaneously with the analysis request, owner P is required to generate a public key (CCP) and an associated private key (CCPRIV). This public key / private key pair (CCP / CCPRIV) is stored by owner P in a secure container, such as KeePass™. The private key (CCPRIV) is unique, associated with the subject, confidential, and possessed only by owner P. This public / private key pair is generated using proprietary software. This generation occurs at the time of consent. Owner P performs this operation using a terminal (TP) (which may be that of the requester D) (step E3).

[0065] The first MDD1 metadata and the CCP public key are issued by the requester D and the owner P and received by the SI server (step E4).

[0066] Upon receiving the metadata and the CCP public key, the SI server generates a unique key (CU) and associates it with the metadata (step E5). The unique key (CU) is a unique identifier, like the primary key of a database table. It is the thread that links the sample taken from the owner and the genomic data from the sequencing to the prescription and the subject. It is specific to the species in question; for example, it can be composed of two elements: the first referencing the species, and the second uniquely identifying the sample from that species. This unique key (CU), along with the associated MDD1 metadata, is stored in the first database, BDD A (step E6). In particular, the first database, BDD A, allows for the mapping between the subject's unique key (CU) and the subject's identity and plays a crucial role in the traceability of the subject's data.

[0067] A digital DNA tag (E-DNA) is then generated by the SI server using ad hoc software. This digital tag encodes i) the unique key (CU), ii) the owner's own public encryption key (CCP), iii) the initial metadata associated with the owner and the prescription (step E7).

[0068] The digital DNA tag includes data i), ii), and iii) in text format converted to a {A, T, C, G} format using one or more encoding configurations and a dictionary. Preferably, the data is encoded in the digital DNA tag using the four nucleotide bases, like the binary encoding used in computing, for example '00'='A'; '01-T, '00-C, '10'='G'.

[0069] This digital DNA tag, containing at least the unique key CU, is then sent (step E8) to a first laboratory L or a supplier who will prepare sampling equipment for collecting a sample from the subject. To this end, the first laboratory L will create a synthetic exogenous DNA tag associated with the owner P, which is the synthesized version of the digital DNA tag generated by the SI server (step E9). This synthetic exogenous DNA tag is preferably soluble (can be dissolved) in water, but not in the presence of alcohol and salt. Thus, the addition of ethanol or isopropyl alcohol (rubbing alcohol) will cause the DNA of the synthetic exogenous DNA tag to clump together, forming a visible white precipitate. The precipitate can then be collected as a pellet by centrifugation, either alone or simultaneously with the DNA of the subject of interest.

[0070] The synthetic exogenous DNA tag is, for example, synthesized using a known type of DNA synthesizer. The synthetic exogenous DNA tag is custom-made.

[0071] The synthetic exogenous DNA tag is then added to a biological sample collection device, such as a collection tube (or more generally, any collection device), and is therefore intended to be mixed with the biological sample that will be collected from the subject (step E10). The collection device is also accompanied by the unique CU key in the form of a barcode label affixed to the device.

[0072] The sampling material is then sent to a second laboratory responsible for collecting the biological sample from the subject if it is not the same laboratory that prepared the material.

[0073] The purpose of this synthetic exogenous DNA tag is to permanently accompany the biological sample physically and the resulting data digitally. Generally, all information pertaining to the subject can be encoded within the synthetic exogenous DNA tag to ensure the confidentiality of personal / sensitive information. Consequently, only a person in possession of the sample and capable of sequencing the DNA can access this information once it has been decoded by specialized software, unlike information typically printed on a label.

[0074] Prior to sample collection, the second laboratory L' verifies the subject's identity and ensures that the subject matches the sampling material. The second laboratory L' that performs the sample collection may be different from the one that prepares the sampling material.

[0075] To do this, the second laboratory, L', using a terminal TL', acquires the unique key Cil, for example, by scanning the barcode that retrieves the unique key (step E11). Alternatively, the owner can perform this check using their terminal TP. The unique key (CU) is sent to the SI server, which returns the identity of the corresponding subject (step E13). If the match is correct, the laboratory collects the biological sample, which is mixed with the synthetic exogenous DNA tag (step E14) in the collection material.

[0076] In case of non-match, the second laboratory L' replaces the sampling material with the correct one.

[0077] The collection material including the biological sample mixed with the synthetic exogenous DNA tag is sent to a third laboratory L” for sequencing (pre-analytical and sequencing stages) of this sample if it is not the same laboratory that performed the collection.

[0078] The third (analytical) laboratory L” first identifies the sample, notably by the barcode (i.e., the unique key CU) on the sampling equipment (step E15), and initiates the high-throughput sequencing process (step E16). Here again, the third laboratory L” may be the same as or different from either of the first or second laboratories L, L'.

[0079] This sample is specifically sequenced using a DNA sequencer configured to provide a genomic data file in text format (e.g., FASTQ data). The FASTQ format is a text-based format that allows for the storage of both a biological sequence (usually a nucleotide sequence) and the corresponding quality scores. The sequence letter and the quality score are each encoded with a single ASCII character for brevity.

[0080] The genomic data file is then transmitted to the SI server (step E17) to extract the synthetic exogenous DNA tag and a decoded unique key CLT (step E18). This decoded unique key Cil' is compared to the expected unique key Cil for this analysis (step E19), that of the subject.

[0081] The pre-analytical and sequencing steps require the creation of a plate plan (this specifies, for a 96-well plate for example, which sample is in which well). Once the unique key CU' is decoded (if there is no key and the plate plan provides one, an error will be reported), several checks will be performed:

[0082] 0) If two or more CU's are detected, a probable contamination report will be issued and checks may be carried out to understand the causes.

[0083] 1) verification with the plate plan of the consistency between the CU' found and that CU expected if not a non-consistency error is issued.

[0084] 2) verification of the key in the SI server (it must exist, if it does not exist an error will be returned, in fact it should not have corresponded to the one provided in the plate plan).

[0085] 3) if it exists and corresponds to the one provided for in the license plate plan the necessary identity vigilance check is carried out.

[0086] All the checks carried out allow for traceability, identity vigilance, and the assurance of delivering the right result to the right person.

[0087] If the decoded unique key CU' matches the expected unique key CU, then decoding of the genomic file is permitted.

[0088] The decoding (step E20) of the genomic file consists of aligning the genomic data file, indexing it against a reference genome (for example, human genome databases when the subject is human), and dividing it into blocks to generate an intermediate file containing at least all of the subject's genomic data and the unique key (CU). The intermediate file is a text file.

[0089] In one embodiment, from this intermediate file, the genomic data corresponding to the area(s) of the analysis request are extracted to obtain a results file (in the form of a BAM, for Binary Alignment Map) (step E21). The results file is then transmitted to the requester (step E22). This results file conforms to the analysis request and contains only the requested sequences. Thus, it is not possible to access unwanted sequences.

[0090] In one embodiment, the intermediate file is doubly encrypted (step E23). Such double encryption is performed, for example, using a third-party private key specific to the server or via a trusted third party and the owner's public encryption key. In this way, the intermediate file cannot be decrypted without the owner's authorization. The advantage of double encryption (the first by a trusted third party, the second by the owner, so decryption is performed in reverse) is as follows:

[0091] 1) It is mandatory to obtain the owner's consent for any new medical, scientific research, or recreational use (knowing its ancestry, etc.)

[0092] 2) the data decrypted via the owner therefore remains encrypted by the trusted third party and is necessarily transmitted to it.

[0093] 3) The trusted third party, thanks to this second encryption which makes its intervention mandatory to release the data consented to by the owner, allows it to control the validity of the request for a new prescription, a scientific research referenced and validated for the country of residence, or any other analysis which must comply with the legislation in force (prohibition of recreational tests currently in France).

[0094] 4) A final validation can (should) be considered by requesting confirmation from the owner via a third party in order to mitigate the possibility of fraudulent use of the owner's TP (despite the use of mandatory two-factor authentication in view of the data concerned).

[0095] This doubly encrypted file is then stored in the second database, BDD B (step E24). The second database, BDD B, therefore contains the subject's complete genomic data file as well as the unique key (and the initial MDD1 metadata).

[0096] The aim is to securely store genomic data from an analysis for later reuse, while ensuring that the owner's consent is obtained.

[0097] In particular, this data can be reanalyzed or used to create cohorts for, for example, clinical studies based on genomic and related data (biological, clinical, real-world, etc.), which will be anonymized in order to be transferable. The owner's consent to participate in the cohort and authorization for the transfer of the necessary data is therefore obtained at the time of the request. This anonymized extemporaneous database is then destroyed once the study's objective has been achieved. The study results are stored in a statistical database.

[0098] Reanalysis of genomic data based on a previously collected sample

[0099] Applicant D wishes to request a genetic reanalysis of the genome of a subject for whom an analysis has already been performed. The consent of the owner P is required again if it has not already been obtained for this type of reanalysis.

[0100] Requester D, using the TD terminal, creates this reanalysis request by entering information about the sequences to be analyzed. The TD terminal generates the corresponding MDD2 metadata (step E25).

[0101] These MDD2 metadata are sent from the TD terminal and received by the SI server. These second MDD2 metadata include, in particular, the subject's identity.

[0102] Upon receipt of this MDD2 metadata, the SI server extracts from the first database BDD A, the unique owner key (CU) corresponding to the subject for which the reanalysis request is requested (step E26).

[0103] Next, the SI server extracts from the second database (BDD B), the intermediate file associated with the unique owner key extracted from the first database BDD A (step E27) corresponding to the area(s) of the reanalysis request.

[0104] The SI server then sends to the terminal of the owner TP, a request for authorization to access the intermediate file of the owner P (step E28).

[0105] Owner P, via their terminal TP, authorizes the use of the intermediate file extracted in step E27 and performs the first decryption using their private key CCPRIV (step E29). Authorization is obtained, for example, through a two-factor authentication interface, a smartphone application, SMS, email, or an internet link, etc.

[0106] This intermediate file, decrypted once by the owner, is then received by the SI server, which can then proceed to the second decryption of the intermediate file using the private key either from the trusted third party or stored at the SI server level (step E30).

[0107] Then once the intermediate file is decrypted, the SI server proceeds to extract the genomic data corresponding to the area(s) of the reanalysis request to obtain a second result file (step E31). The second result file is then transmitted to the requester D (step E32).

[0108] Requester D may not be the same as the one who requested the analysis. Similarly, the data required for the reanalysis may be identical or different from that of the initial analysis.

[0109] Cohort creation

[0110] In a complementary or alternative manner, the analyzed data can be used to create cohorts for research.

[0111] The creation of cohorts is initiated by an applicant D who first obtains the consent of the owner P if it has not already been obtained.

[0112] Requester D, using the TD terminal, creates this creation request by entering information about the desired sequences for the cohort. The TD terminal generates the corresponding MDD3 metadata (step E25).

[0113] Next, steps E27 to E31 are implemented in the same way as for the reanalysis.

[0114] Once a third output file is extracted (step E31), the subject data are anonymized and the third output file and the corresponding anonymized data are stored in a third BDD C database for the cohort.

[0115] In parallel, if the subject has given their consent, the intermediate file containing the complete subject genome is stored in a fourth database (DBD) to have statistical databases, of the ordinary data type which only return statistical information to user queries, based on groups of records, but also data to improve the genomic mapping of the human genome (1000 genomes, hg38, etc.) or other species.

Claims

DEMANDS 1. A computer-implemented method for processing genomic data, comprising the following steps: a) Receiving initial metadata (MDD1) including a request for genetic analysis of a subject's genome, the genome belonging to an owner, the initial metadata (MDD1) including data associated with the subject and at least one region of the genome to be analyzed, with consent from the owner for the analysis having been obtained; b) Generating a digital DNA tag (E-DNA) encoding (i) a unique owner key (Cil) (ii) a public encryption key (CCP) specific to the owner, (iii) the initial metadata (MDD1) associated with the subject, the initial metadata and the unique key (Cil) being stored in a first database (DB A); the public encryption key (CCP) being associated with a private encryption key (CCPRIV);c) Receiving a genomic data file resulting from the sequencing of a sample comprising a biological sample from the owner mixed with a synthetic exogenous DNA tag associated with the owner, the synthetic exogenous DNA tag resulting from a synthesis of the digital DNA tag generated in step b); d) Processing the genomic data file so as to recover the digital DNA tag; e) Decoding the digital DNA tag to identify a decoded owner unique key (CLT); f) Comparing the decoded unique key (CLT) to the owner unique key (Cil) and if the decoded unique key matches the owner unique key, the process includes the steps of g) Generating an intermediate file comprising all the subject's genomic data; h) Double-encrypting the intermediate file using a third-party private encryption key and the public encryption key;the intermediate file being doubly encrypted, it cannot be decrypted without the owner's permission; i) Storage of the doubly encrypted intermediate file and the unique owner key (Cil) in a second database (BDD B) linked to the first database (BDD A).

2. A method according to claim 1, comprising a step of sending the digital DNA tag and the unique proprietary key (Cil) to a terminal in a laboratory (L, L') to prepare a sampling material dedicated to the subject, the laboratory (L) generating the synthetic exogenous DNA tag and introducing it into the material, the unique proprietary key (Cil) being affixed to the sampling material so as to be readable.

3. Method according to claim 2, prior to sampling, a receipt of the unique key (Cil) issued from a terminal (TL) of a sampling laboratory and a retrieval in the first database (DB A) of the identity of the subject and a transmission of the identity of the subject to the terminal of the sampling laboratory so as to verify the identity of the subject before sampling in order to ensure that the sampling material corresponds to the subject.

4. A method according to any one of claims 1 to 3, comprising after step g) an extraction of the genomic data corresponding to the area(s) of the first metadata request (MDD1) to obtain a first result file and a transmission of the first result file to the requester (D) in accordance with the analysis request.

5. A method according to any one of claims 1 to 4, comprising j) Extracting the first database (DB A), the unique owner key (Cil) and the subject identification data; k) Extracting the second database (DB B), the intermediate file corresponding to the area(s) of the request associated with the extracted unique owner key; l) receiving from an owner terminal (TP) the intermediate file once decrypted using the owner's (P) private key (CCPRIV), the reception corresponding to an authorization of access to the intermediate file by the owner (P); m) second decryption of the intermediate file using the private key of the trusted third party.

6. A method according to claim 5, comprising prior to step j) receiving second metadata (MDD2) corresponding to a request for reanalysis of a subject's genome, the second metadata (MDD2) comprising sequences to be analyzed and after step m) an extraction step from the file deciphered in step m), of second genomic data of interest corresponding to the reanalysis request to obtain a second result file; the method comprising transmitting the second result file to the requester, the method wherein in step I) the owner (P) authorizes the reanalysis request.

7. A method according to claim 6, comprising prior to step k) receiving third metadata (MDD3) corresponding to a request to create at least one cohort for research, the third metadata (MDD3) comprising sequences desired for the cohort; and after step m) extracting the intermediate file from a third result file comprising only the sequences desired for the cohort, storing in a third database (DB C) the third result file and anonymized data corresponding to the subject identification data from which the genomic data originate, a method wherein in step I) the owner (P) authorizes participation in the cohort.

8. Method according to claim 7, comprising storing the intermediate subject file in a fourth database (DBD).

9. A method according to any one of the preceding claims wherein the DNA tag encodes the owner's own public encryption key, the first metadata associated with the subject and the unique owner key in the form of a binary code based on a combination of the 4 nucleotide bases A, T, G, C.

10. A method according to any one of the preceding claims, wherein the metadata (MDD1, MDD2, MDD3) includes medical information on the subject and / or information relating to the identity of the subject and / or the conditions of sample collection and / or the nature of the sample.

11. A method according to any one of the preceding claims, wherein the metadata includes at least one position of interest corresponding to the request for analysis / reanalysis.

12. A method according to any one of claims 1 to 11, wherein the genomic data file, the intermediate file and the result file are text format files.