A method and system for gene sequence alignment based on multi-party secure computation

By employing a multi-party secure computation method, the base sequence is encoded into integer and binary sequences, divided into complementary shares, and alignment scores are calculated. The core sequence is then determined for alignment, thus mitigating the risk of data leakage in gene sequence alignment and enhancing security.

CN115101131BActive Publication Date: 2026-06-23TSINGHUA UNIVERSITY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
TSINGHUA UNIVERSITY
Filing Date
2022-05-23
Publication Date
2026-06-23

Smart Images

  • Figure CN115101131B_ABST
    Figure CN115101131B_ABST
Patent Text Reader

Abstract

The application provides a gene sequence alignment method and system based on multi-party secure calculation, comprising: a first participant and a second participant encode base sequence into integer sequence and binary sequence and distribute to each other; the first participant and the second participant respectively calculate two-by-two alignment first scores of their own held sequences under plaintext conditions, and send the first score split shares to each other through a secret sharing protocol; under the condition of holding two-party base sequence shares, the two parties jointly calculate second scores of two-by-two alignment of the sequences of the two parties, and send the second score split shares to each party holding respectively; the first participant and the second participant calculate the highest third score as an axis sequence according to the shares of the two-by-two alignment scores of all the sequences held; the first participant and the second participant respectively align their own held sequences with the axis sequence under the secret share condition. The application solves the problem of privacy leakage risk generally existing in the prior art plaintext sequence alignment.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of gene sequence computation technology, and in particular to a gene sequence alignment method and system based on multi-party secure computation. Background Technology

[0002] With the continuous decline in the cost of genome sequencing, the number of genome sequencing reads generated by high-throughput sequencers (such as the I1lumina Hiseq series sequencers) has exploded, especially the accumulation rate of human genome sequencing reads.

[0003] Currently, short sequence alignment software (such as Burrows Wheeler Aligner, BWA) is typically used to align each sequencing sequence to a reference sequence, obtaining a double-sequence alignment result for each sequence (including detailed information on matches, mismatches, insertions, and deletions between the sequencing and reference sequences). Then, based on the double-sequence alignment results of all sequencing and reference sequences, genomic variation results are obtained. When faced with newly emerging viruses, different regions may exhibit different variant versions. It is usually necessary to align and compare viral gene sequences from different regions to identify similarities and differences, facilitating vaccine development and research on related treatments. However, when both parties provide their respective gene sequences, there is a risk of unilateral leakage of gene sequence information. Therefore, gene sequence alignment must be performed while ensuring information security. Summary of the Invention

[0004] This invention provides a gene sequence alignment method and system based on multi-party secure computation, which addresses the shortcomings of existing technologies in gene alignment processes that pose a risk of gene sequence leakage, and enables gene sequence alignment while ensuring the security of gene sequence information.

[0005] This invention provides a gene sequence alignment method based on multi-party secure computation, comprising:

[0006] The first and second participants will encode the base sequence as an integer sequence and a binary sequence;

[0007] The first participant and the second participant each divide the encoded sequence into two complementary share sequences, send one share sequence to the other party, and keep the other share sequence for themselves.

[0008] The first participant and the second participant respectively calculate the first score of their respective sequences under plaintext conditions, and send the first score to each other in shares through a secret sharing protocol.

[0009] Given the base sequence shares held by the first and second participants, they jointly calculate the second score for the pairwise alignment of the sequences and then distribute the second score into shares to each party.

[0010] The first participant and the second participant calculate the highest third score based on the share of scores of all sequences held in pairs, and use the third score sequence as the pivot sequence.

[0011] The first participant and the second participant respectively align their holding sequence with the axis sequence under the secret share condition.

[0012] According to the gene sequence alignment method based on multi-party secure computation provided by the present invention, the first and second participating parties encode the base sequences they hold into integer sequences and binary sequences, specifically including:

[0013] When performing secret sharing of addition in the original base sequence, the base sequence is encoded into an integer sequence;

[0014] When determining whether bases are equal, the base sequence is encoded into a binary sequence.

[0015] According to the gene sequence alignment method based on multi-party secure computation provided by the present invention, the first participant and the second participant respectively divide the encoded sequence into two complementary share sequences, one share sequence is sent to the other party, and the other share sequence is held by themselves, specifically including:

[0016] The first and second participants send the encoded original sequence share to the other party through the additive secret sharing protocol, while each party holds another share. The original sequence can only be recovered by aggregating the shares of both parties.

[0017] In the aforementioned addition secret sharing process, the participants need to share a secret x, which is generated by randomly generating a random number x. a ,calculate Then {x a ,x b} constitutes a set of shared shares of x.

[0018] According to the present invention, a gene sequence alignment method based on multi-party secure computation is provided, wherein the first participant and the second participant respectively calculate the first score for aligning their respective share sequences in plaintext, and distribute the first score to both parties in shares through an additive secret sharing protocol, specifically including:

[0019] The Needleman-Wunsch algorithm is used to calculate the pairwise alignment scores of the holding sequences, and the first alignment score is obtained by calculating the score matrix.

[0020] The score matrices of the two sequences are calculated to satisfy:

[0021]

[0022] Where i represents the i-th base of sequence 1, j represents the j-th base of sequence 2, indel represents the misalignment penalty, match represents the matching penalty, mismatch represents the mismatch penalty, and f(i,j) is the score in the i-th row and j-th column of the score matrix.

[0023] According to the present invention, a gene sequence alignment method based on multi-party secure computation is provided, wherein the first participant and the second participant, under secure conditions, jointly calculate a second score for pairwise alignment of the two sequences by using their respective base sequence shares, and then distribute the second score in shares to each party. Specifically, this includes:

[0024] The first participant holds the sequence. The second participant holds the sequence The second score matrix is ​​jointly calculated under the condition of additive secret sharing;

[0025] The joint calculation of the second score matrix under the additive secret sharing condition includes: calculating the score matrix by means of the secret share of the sequence held by each party without disclosing the original sequence, and the score matrix is ​​held by both parties in a secret sharing form, and the specific score of the score matrix can only be recovered by the joint efforts of both parties.

[0026] According to the present invention, a gene sequence alignment method based on multi-party secure computation is provided, wherein the first participant and the second participant align all their sequences pairwise, calculate the highest third score, and use the third-score sequence as the axis sequence, specifically including:

[0027] Under the condition that both parties only hold the other party's sequence share, calculate the third score of each sequence held by both parties and the other two parties' sequences aligned, accumulate to obtain the final score of each sequence, output the share of the final score to each party, and take the highest score sequence as the pivot sequence.

[0028] According to the present invention, a gene sequence alignment method based on multi-party secure computation is provided, wherein the first participant and the second participant align their respective sequences with the core sequence under secret share conditions, specifically including:

[0029] Under the condition of secret sharing, the first participant and the second participant respectively obtain the share matrix of the score matrix through joint calculation, calculate the path matrix based on the share matrix of the score matrix, and send the share of the path matrix to each party for their respective holding;

[0030] The path matrix refers to the matrix formed by the paths from which the alignment sequence is obtained from the score matrix;

[0031] The final calculated core sequence is in secret sharing form, with shares held by the first participant and the second participant respectively. The first participant and the second participant align their own sequences with the core sequence, and finally output the shares of the aligned sequence.

[0032] This invention also provides a gene sequence alignment system based on multi-party secure computation, the system comprising:

[0033] The encoding module is used by the first and second participants to encode the base sequence they hold into integer and binary sequences, respectively.

[0034] The distribution module is used by the first participant and the second participant to divide the encoded sequence into two complementary share sequences, one share sequence is sent to the other party, and the other share sequence is held by themselves.

[0035] The self-alignment calculation module is used by the first participant and the second participant to calculate the first score of their respective share sequences under plaintext conditions, and to split the first score into shares and send them to both parties through an additive secret sharing protocol.

[0036] The joint alignment calculation module is used by the first participant and the second participant to jointly calculate the second score of the pairwise alignment of the two base sequence shares held by the first participant and the second participant under safe conditions, and then split the second score into shares and send them to each party respectively.

[0037] The pivot sequence determination module is used by the first participant and the second participant to align all the sequences they hold in pairs, calculate the highest third score, and take the third score sequence as the pivot sequence.

[0038] The alignment module is updated so that the first participant and the second participant can align their respective holding sequences with the axis sequence under the secret share condition.

[0039] The present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the gene sequence alignment method based on multi-party secure computation as described above.

[0040] The present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the gene sequence alignment method based on multi-party secure computation as described above.

[0041] The present invention also provides a computer program product, including a computer program that, when executed by a processor, implements the gene sequence alignment method based on multi-party secure computation as described above.

[0042] This invention provides a gene sequence alignment method and system based on multi-party secure computation. The method involves a first participant and a second participant encoding their own base sequences and sharing their shares. Alignment scores are calculated for each participant's sequences, and further, under secret conditions, pairwise alignment scores are calculated for both participants' sequences. The sequence with the highest score is used as the pivot sequence, and each participant's own sequences are aligned with the pivot sequence. This avoids the privacy leaks commonly associated with plaintext sequence alignment, preventing direct access to the original sequences held by one participant by the other, thus enhancing the security of the gene alignment process. Attached Figure Description

[0043] To more clearly illustrate the technical solutions in this invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.

[0044] Figure 1 This is a flowchart illustrating a gene sequence alignment method based on multi-party secure computation provided by the present invention.

[0045] Figure 2 This is a schematic diagram of the module connections of a gene sequence alignment system based on multi-party secure computation provided by the present invention;

[0046] Figure 3 This is a schematic diagram of the structure of the electronic device provided by the present invention.

[0047] Figure label:

[0048] 110: Encoding module; 120: Distribution module; 130: Self-alignment calculation module; 140: Joint alignment calculation module; 150: Axis sequence determination module; 160: Update alignment module; 310: Processor; 320: Communication interface; 330: Memory; 340: Communication bus. Detailed Implementation

[0049] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention.

[0050] The following is combined with Figure 1 Figure X illustrates a gene sequence alignment method based on multi-party secure computation according to the present invention, comprising:

[0051] S100, the first participant and the second participant will encode the base sequence into an integer sequence and a binary sequence;

[0052] S200, the first participant and the second participant respectively divide the encoded sequence into two complementary share sequences, send one share sequence to the other party, and keep the other share sequence themselves;

[0053] S300, the first participant and the second participant respectively calculate the first score of their respective paired sequences under plaintext conditions, and split the first score into shares and send them to each other through a secret sharing protocol;

[0054] S400, the first participant and the second participant, under the condition of holding the two base sequence shares, jointly calculate the second score of the pairwise alignment of the two sequences, and split the second score into shares and send them to each party for holding;

[0055] S500, the first participant and the second participant calculate the highest third score based on the share of scores of all sequences held in pairs, and take the third score sequence as the pivot sequence.

[0056] S600, the first participant and the second participant respectively align their own sequences with the axis sequence under the secret share conditions.

[0057] By calculating the alignment scores of the first and second participants, the core sequence is determined, and the existing sequence is aligned with the core sequence. This avoids the privacy leaks that are common in plaintext sequence alignment and prevents the original sequences held by each party from being directly obtained by the other party, thereby improving the security of the gene alignment process.

[0058] The first and second participants will hold base sequence encodings as integer and binary sequences, specifically including:

[0059] When performing addition-based secret sharing in the original base sequence, the base sequence is encoded into an integer sequence; the encoding rules for the six symbols "A, G, C, T, N, -" are: A-0, G-1, C-2, T-3, N-4, --5;

[0060] When determining whether bases are equal, the base sequence is encoded into a binary sequence. The encoding rules for the six symbols "A, G, C, T, N, -" are: A-0000, G-001, C-2010, T-011, N-100, --101.

[0061] The first and second participants will each divide the encoded sequence into two complementary share sequences. One share sequence will be sent to the other party, and the other share sequence will be held by the participants themselves. Specifically, this includes:

[0062] The first and second participants send the encoded original sequence share to the other party through the additive secret sharing protocol, while each party holds another share. The original sequence can only be recovered by aggregating the shares of both parties.

[0063] In the aforementioned addition secret sharing process, the participants need to share a secret x, which is generated by randomly generating a random number x. a ,calculate Then {x a ,x b} constitutes a set of shared shares of x.

[0064] Assume the first participant holds a share of secret 'a'. 1 and the share of secret b 1 The second participant holds a share of secret 'a'. 2 and the share of secret b 2 , where a, b, 1 、 2 、 1 、 2 ∈Z n ,satisfy 1 + 2 =a, 1 +〈b〉 2 =b.

[0065] The first and second participants each calculate their respective pairwise aligned first scores under plaintext conditions, and then distribute these first scores as shares to both parties via an additive secret-sharing protocol. Specifically, this includes:

[0066] The Needleman-Wunsch algorithm is used to calculate the pairwise alignment scores of the holding sequences, and the first alignment score is obtained by calculating the score matrix.

[0067] The score matrices of the two sequences are calculated to satisfy:

[0068]

[0069] Where i represents the i-th base of sequence 1, j represents the j-th base of sequence 2, indel represents the misalignment penalty, match represents the matching penalty, mismatch represents the mismatch penalty, and f(i,j) is the score in the i-th row and j-th column of the score matrix.

[0070] The privacy-preserving Needleman-Wunsch algorithm holds S1 for the first participant and S2 for the second participant;

[0071] The first participant and the second participant will each convert their holding sequences S1 and S2 into integers (e.g., A=1, G=2, C=3, T=4, N=5, -=6);

[0072] The first participant calculates the secret share of S1 through arithmetic secret sharing. <s1> 1 and <s1> 2 and will <s1> 2 The data is sent to the second participant, who then calculates the secret share of S2 using an arithmetic secret sharing mechanism. <s2> 1 and <s2> 2 and will <s2> 1 Send to the first participant;

[0073] Both parties jointly calculate the score matrix Sc and the path matrix P, and the first participant secretly shares the score matrix. <sc> 1 Secret sharing with path matrix 1 The second participant obtains a secret share of the scoring matrix. <sc> 2 Secret sharing with path matrix< / sc> 2 ;

[0074] Both parties jointly calculate the backtracking path P-back based on their respective path matrix shares and original sequence shares. The first participant obtains the secret sharing matrix of the backtracking path. <p-back> 1 The second participant obtains the secret sharing matrix of the backtracking path. <p-back> 1 ;

[0075] Both parties jointly calculate and align S1 and S2 based on the backtracking path, and the first participant ultimately obtains...<S1‘> 1 ,<S2‘> 1 The second participant obtains<S1‘> 2 ,<S2‘> 2 .

[0076] Wherein, S1 is the original sequence held by the first participant, S2 is the original sequence held by the second participant, and S1' and S2' are the aligned sequences;<S1‘> 1 This represents the secret sharing share of the sequence S1 after alignment.<S1‘> 2 For another secret share of sequence S1', by<S1‘> 1 and<S1‘> 2 S1' can be recovered; Sc represents the score matrix of len(S1+1)×len(S2+1) dimensions, P represents the path matrix of lenght(S1+1)×len(S2+1) dimensions, and len(S i ) represents sequence S i Dimensions <sc> 1 and <sc> 2 Let Sc be the two secret share matrices.< / sc> < / sc> 1 and < / sc> 2 Let P be the two secret share matrices of the path matrix P.

[0077] The first and second participants will jointly calculate a second score for pairwise alignment of their respective base sequence shares under safe conditions, and then split the second score into shares and distribute them to their respective parties, specifically including:

[0078] The first participant holds the sequence. The second participant holds the sequence The second score matrix is ​​jointly calculated under the condition of additive secret sharing;

[0079] The joint calculation of the second score matrix under the additive secret sharing condition includes: calculating the score matrix by means of the secret share of the sequence held by each party without disclosing the original sequence, and the score matrix is ​​held by both parties in a secret sharing form, and the specific score of the score matrix can only be recovered by the joint efforts of both parties.

[0080] The first and second participants align all their sequences pairwise, calculate the highest third score, and use the third-score sequence as the pivot sequence, specifically including:

[0081] Under the condition that both parties only hold the other party's sequence share, calculate the third score of each sequence held by both parties and the other two parties' sequences aligned, accumulate to obtain the final score of each sequence, output the share of the final score to each party, and take the highest score sequence as the pivot sequence.

[0082] By selecting the highest-scoring sequence as the pivot sequence, both sequences can be aligned to form a benchmark, and both are aligned with the pivot sequence.

[0083] The first and second participants, respectively, align their respective sequences with the axis sequence under the condition of a secret share, specifically including:

[0084] Under the condition of secret sharing, the first participant and the second participant respectively obtain the share matrix of the score matrix through joint calculation, calculate the path matrix based on the share matrix of the score matrix, and send the share of the path matrix to each party for their respective holding;

[0085] The path matrix refers to the matrix formed by the paths from which the alignment sequence is obtained from the score matrix;

[0086] The final calculated core sequence is in secret sharing form, with shares held by the first participant and the second participant respectively. The first participant and the second participant align their own sequences with the core sequence, and finally output the shares of the aligned sequence.

[0087] This invention provides a gene sequence alignment method based on multi-party secure computation. The method involves a first and second participant encoding their own base sequences and sharing the resulting shares. Alignment scores are calculated for each participant's own sequences, and further, under secret conditions, pairwise alignment scores are calculated for both participants' sequences. The sequence with the highest score is designated as the pivot sequence, and each participant's own sequences are aligned with this pivot sequence. This method avoids the privacy leaks commonly associated with plaintext sequence alignment, preventing direct access to the original sequences held by one participant and thus enhancing the security of the gene alignment process.

[0088] This invention also discloses a gene sequence alignment system based on multi-party secure computation, the system comprising:

[0089] Encoding module 110 is used by the first participant and the second participant to encode the held base sequence into an integer sequence and a binary sequence;

[0090] The distribution module 120 is used for the first participant and the second participant to divide the encoded sequence into two complementary share sequences, one share sequence is sent to the other party, and the other share sequence is held by themselves.

[0091] The self-alignment calculation module 130 is used by the first participant and the second participant to calculate the first score of their respective sequences under plaintext conditions, and to split the first score into shares and send them to the other party through a secret sharing protocol.

[0092] The joint alignment calculation module 140 is used to jointly calculate the second score of the pairwise alignment of the sequences of the first participant and the second participant under the condition of holding the two base sequence shares of the two parties, and to split the second score into shares and send them to each party respectively.

[0093] The axis sequence determination module 150 is used to calculate the highest third score based on the share of scores of all sequences held by the first participant and the second participant, and to take the third score sequence as the axis sequence.

[0094] The alignment module 160 is updated to align the sequence held by the first participant and the second participant with the axis sequence under the condition of secret share.

[0095] When the original base sequence is secretly shared through addition, the encoding module 110 encodes the base sequence into an integer sequence; the encoding rules for the six symbols "A, G, C, T, N, -" are: A-0, G-1, C-2, T-3, N-4, --5;

[0096] When determining whether bases are equal, the base sequence is encoded into a binary sequence. The encoding rules for the six symbols "A, G, C, T, N, -" are: A-0000, G-001, C-2010, T-011, N-100, --101.

[0097] The distribution module 120 divides the coded shares of the first participant and the second participant into two parts. The shares held by the two parties need to be combined to restore the original base sequence.

[0098] The self-alignment calculation module 130 calculates the first score for pairwise alignment of share sequences under plaintext conditions, avoiding alignment under secret conditions and reducing computational load.

[0099] After the joint alignment calculation module 140 performs base sequence alignment calculations for both parties, it helps to determine the alignment sequence with the highest score in the subsequent process; the axis sequence determination module 150 determines the axis sequence; and the update alignment module 160 aligns the sequences held by each party with the axis sequence.

[0100] This invention provides a gene sequence alignment system based on multi-party secure computation. The system involves a first and second participant encoding their own base sequences and sharing the resulting shares. Alignment scores are calculated for each participant's own sequences, and further, under secret conditions, for each pair of sequences. The sequence with the highest score is designated as the pivot sequence, and each participant's own sequences are aligned with this pivot sequence. This avoids the privacy leaks commonly associated with plaintext sequence alignment, preventing direct access to the original sequences held by one participant and enhancing the security of the gene alignment process.

[0101] Figure 3 An example is a schematic diagram of the physical structure of an electronic device, such as... Figure 3 As shown, the electronic device may include a processor 310, a communications interface 320, a memory 330, and a communication bus 340, wherein the processor 310, communications interface 320, and memory 330 communicate with each other via the communication bus 340. The processor 310 can call logical instructions in the memory 330 to execute a gene sequence alignment method based on multi-party secure computation. This method includes: a first participant and a second participant encoding the held base sequences into integer sequences and binary sequences;

[0102] The first participant and the second participant each divide the encoded sequence into two complementary share sequences, send one share sequence to the other party, and keep the other share sequence for themselves.

[0103] The first participant and the second participant respectively calculate the first score of their respective sequences under plaintext conditions, and send the first score to each other in shares through a secret sharing protocol.

[0104] Given the base sequence shares held by the first and second participants, they jointly calculate the second score for the pairwise alignment of the sequences and then distribute the second score into shares to each party.

[0105] The first participant and the second participant calculate the highest third score based on the share of scores of all sequences held in pairs, and use the third score sequence as the pivot sequence.

[0106] The first participant and the second participant respectively align their holding sequence with the axis sequence under the secret share condition.

[0107] Furthermore, the logical instructions in the aforementioned memory 330 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0108] On the other hand, the present invention also provides a computer program product, the computer program product including a computer program, the computer program being able to be stored on a non-transitory computer-readable storage medium, the computer program being executed by a processor, the computer being able to execute the gene sequence alignment method based on multi-party secure computation provided by the above methods, the method including: a first participant and a second participant encoding the held base sequence into an integer sequence and a binary sequence;

[0109] The first participant and the second participant each divide the encoded sequence into two complementary share sequences, send one share sequence to the other party, and keep the other share sequence for themselves.

[0110] The first participant and the second participant respectively calculate the first score of their respective sequences under plaintext conditions, and send the first score to each other in shares through a secret sharing protocol.

[0111] Given the base sequence shares held by the first and second participants, they jointly calculate the second score for the pairwise alignment of the sequences and then distribute the second score into shares to each party.

[0112] The first participant and the second participant calculate the highest third score based on the share of scores of all sequences held in pairs, and use the third score sequence as the pivot sequence.

[0113] The first participant and the second participant respectively align their holding sequence with the axis sequence under the secret share condition.

[0114] In another aspect, the present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements a gene sequence alignment method based on multi-party secure computation provided by the methods described above, the method comprising: a first participant and a second participant encoding a base sequence they hold into an integer sequence and a binary sequence;

[0115] The first participant and the second participant each divide the encoded sequence into two complementary share sequences, send one share sequence to the other party, and keep the other share sequence for themselves.

[0116] The first participant and the second participant respectively calculate the first score of their respective sequences under plaintext conditions, and send the first score to each other in shares through a secret sharing protocol.

[0117] Given the base sequence shares held by the first and second participants, they jointly calculate the second score for the pairwise alignment of the sequences and then distribute the second score into shares to each party.

[0118] The first participant and the second participant calculate the highest third score based on the share of scores of all sequences held in pairs, and use the third score sequence as the pivot sequence.

[0119] The first participant and the second participant respectively align their holding sequence with the axis sequence under the secret share condition.

[0120] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.

[0121] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.

[0122] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A gene sequence alignment method based on multi-party secure computation, characterized in that, include: The first and second participants will encode the base sequence as an integer sequence and a binary sequence; The first participant and the second participant each divide the encoded sequence into two complementary share sequences, send one share sequence to the other party, and keep the other share sequence for themselves. The first participant and the second participant respectively calculate the first score of their respective sequences under plaintext conditions, and send the first score to each other in shares through a secret sharing protocol. Given the base sequence shares held by the first and second participants, they jointly calculate the second score for the pairwise alignment of the sequences and then distribute the second score into shares to each party. The first and second participants calculate the highest third score based on the share of scores obtained by aligning all sequences they hold in pairs. The third score sequence is used as the pivot sequence. Specifically, this includes: under the condition that each party only holds the share of the other party's sequence, calculating the third score of each sequence held by each party aligned with the other two parties' sequences, accumulating them to obtain the final score of each sequence, outputting the share of the final score to each party, and using the highest score sequence as the pivot sequence. The first and second participants, respectively, align their respective sequences with the axis sequence under the condition of secret shares, specifically including: Under the condition of secret sharing, the first participant and the second participant respectively obtain the share matrix of the score matrix through joint calculation, calculate the path matrix based on the share matrix of the score matrix, and send the share of the path matrix to each party for their respective holding; The path matrix refers to the matrix formed by the paths from which the alignment sequence is obtained from the score matrix; The final calculated core sequence is in secret sharing form, with shares held by the first participant and the second participant respectively. The first participant and the second participant align their own sequences with the core sequence, and finally output the shares of the aligned sequence.

2. The gene sequence alignment method based on multi-party secure computation according to claim 1, characterized in that, The first and second participants will encode the base sequence into an integer sequence and a binary sequence, specifically including: When performing secret sharing of addition in the original base sequence, the base sequence is encoded into an integer sequence; When determining whether bases are equal, the base sequence is encoded into a binary sequence.

3. The gene sequence alignment method based on multi-party secure computation according to claim 1, characterized in that, The first and second participants each divide the encoded sequence into two complementary share sequences. One share sequence is sent to the other party, and the other share sequence is held by the participant. Specifically, this includes: The first and second participants send the encoded original sequence share to the other party through the additive secret sharing protocol, while each party holds another share. The original sequence can only be recovered by aggregating the shares of both parties. In the aforementioned addition secret sharing process, the participating parties need to share the secret as follows: By randomly generating random numbers ,calculate ,but It constitutes A set of shared shares.

4. The gene sequence alignment method based on multi-party secure computation according to claim 1, characterized in that, The first and second participants respectively calculate their paired first scores in plaintext, and then distribute these first scores as shares to both parties via an additive secret sharing protocol, specifically including: The Needleman-Wunsch algorithm is used to calculate the pairwise alignment scores of the holding sequences, and the first alignment score is obtained by calculating the score matrix. The score matrices of the two sequences are calculated to satisfy: ; in, Represents the first of sequence 1 base position. Indicates the first of sequence 2 base position. This indicates a penalty for misalignment. Indicates a matching penalty. Indicates a mismatch penalty. For the score matrix, the first OK Column score.

5. The gene sequence alignment method based on multi-party secure computation according to claim 1, characterized in that, The first and second participants will jointly calculate a second score for the pairwise alignment of their respective base sequence shares under safe conditions, and then split the second score into shares and send them to their respective parties, specifically including: The first participant holds the sequence. The second participant holds the sequence The second score matrix is ​​jointly calculated under the condition of additive secret sharing; The joint calculation of the second score matrix under the additive secret sharing condition includes: calculating the score matrix by means of the secret share of the sequence held by each party without disclosing the original sequence, and the score matrix is ​​held by both parties in a secret sharing form, and the specific score of the score matrix can only be recovered by the joint efforts of both parties.

6. A gene sequence alignment system based on multi-party secure computation, characterized in that, The system includes: The encoding module is used by the first and second participants to encode the base sequence they hold into integer and binary sequences, respectively. The distribution module is used by the first participant and the second participant to divide the encoded sequence into two complementary share sequences, one share sequence is sent to the other party, and the other share sequence is held by themselves. The self-alignment calculation module is used by the first participant and the second participant to calculate the first score of their respective share sequences under plaintext conditions, and to split the first score into shares and send them to both parties through an additive secret sharing protocol. The joint alignment calculation module is used by the first participant and the second participant to jointly calculate the second score of the pairwise alignment of the two base sequence shares held by the first participant and the second participant under safe conditions, and then split the second score into shares and send them to each party respectively. The pivot sequence determination module is used by the first participant and the second participant to align all the sequences they hold in pairs, calculate the highest third score, and use the third score sequence as the pivot sequence. Specifically, it includes: under the condition that both parties only hold the other party's sequence share, calculating the third score of each sequence held by both parties aligned with the other parties' sequences, accumulating the scores to obtain the final score of each sequence, outputting the share of the final score to each party, and using the highest score sequence as the pivot sequence. The update alignment module is used by the first and second participants to align their respective sequences with the axis sequence under the condition of secret shares, specifically including: Under the condition of secret sharing, the first participant and the second participant respectively obtain the share matrix of the score matrix through joint calculation, calculate the path matrix based on the share matrix of the score matrix, and send the share of the path matrix to each party for their respective holding; The path matrix refers to the matrix formed by the paths from which the alignment sequence is obtained from the score matrix; The final calculated core sequence is in secret sharing form, with shares held by the first participant and the second participant respectively. The first participant and the second participant align their own sequences with the core sequence, and finally output the shares of the aligned sequence.

7. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the gene sequence alignment method based on multi-party secure computation as described in any one of claims 1 to 5.

8. A non-transitory computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the gene sequence alignment method based on multi-party secure computation as described in any one of claims 1 to 5.