Our research conducted with the
genome sequences of more than 250 species of organisms (including viral, microbial, and multi-
cellular organisms, and human) results in the discovery that the occurrence of a particular subsequence (the so-called “motifs” or “n-mers,” (n being the length of the subsequences), which can be up to 25 and higher) in the
genome of a particular species can be considered as a nearly
random event; and that the occurrences of a particular subsequence in the
genome sequences of different species can be considered as nearly independent events (with the exception of the cases where extremely closely related species are compared). The set of subsequences that occur in a particular species' genome can therefore be used as a genomic “
fingerprint” of this species. This discovery leads to the concept of utilizing a set of pseudo-randomly designed subsequences for
species identification or discrimination. These subsequences (probes, primers, motifs, n-mers) can be used with hybridization-based technologies (including, but not limited to, the
microarray or PCR technologies) and any other technology allow to identity the fact of presence / absence of particular subsequence in
genomic DNA for identification of species. The same approach can also be used to identify individuals of the same species (including the human species), to estimate the
genome size of unknown organisms, and to estimate the total
genome size in samples containing several viral, microbial, and eukaryotic genomes. The identification methods currently in use for these purposes require sequencing of the genomic sequences of the species or the individuals of interest. The introduction of the proposed computational method eradicates such requirement, and will tremendously reduce the expense of these tests.