Spectrogram extraction from
DNA sequence has been known since 2001.
A DNA spectrogram is generated by applying
Fourier transform to convert a symbolic
DNA sequence consisting of letters A, T, C, G into a visual representation that highlights periodicities of co-occurrence of
DNA patterns. Given
a DNA sequence or whole genomes, with this method it is easy to generate a large number of
spectrogram images. However, the difficult part is to elucidate where are the repetitive patterns and to associate a biological and clinical meaning to them. The present disclosure provides systems and methods that facilitate the location and / or identification of repetitive DNA patterns, such as CpG islands, Alu repeats, tandem repeats and various types of
satellite repeats. These repetitive elements can be found within a
chromosome, within a
genome or across genomes of various species. The disclosed systems and methods apply
image processing operators to find prominent features in the vertical and horizontal direction of the DNA spectrograms. Systems and methods for fast,
full scale analysis of the derived images using supervised
machine learning methods are also disclosed. The disclosed systems and methods for detecting and / or classifying repetitive DNA patterns include: (a) comparative
histogram method, (b)
feature selection and classification using support vector machines and genetic algorithms, and (c) generation of spectrovideo from a plurality of spectral images.