The invention discloses a
code word design method for
DNA storage, which specifically comprises the following steps of: converting storage information into
a DNA sequence, firstly converting the information into binary data, secondly, constructing a minimum variance Huffman tree, and compressing binary data by using the minimum variance Huffman tree, then, performing non-overlapping partitioning on the compressed binary data by taking 4 bits as a group to obtain at most 16 combinations, and sequentially selecting code words from the dictionary according to the probability of the combinations for mapping to obtain
a DNA sequence, finally, obtaining the
GC content of the
DNA sequence, and if the
GC content is higher than 60% or lower than 40%, adjusting the mapping relation to range from 40% to 60%, and further checking whether the
DNA sequence contains more than 3 homopolymers, and if so, carrying out replacement and modification. The method has the characteristics of high coding rate and simple structure, and the coded DNA sequence also meets the constraint conditions that the
GC content is between 40% and 60% and the running length of the homopolymer does not exceed 3.