Compositions and methods for rapid and highly efficient characterization of
genetic diversity in organisms are provided. The methods involve rapid sequencing and characterization of extrachromosomal
DNA, particularly plasmids, to identify and isolate useful
nucleotide sequences. The method targets
plasmid DNA and avoids repeated
cloning and sequencing of the host
chromosome, thus allowing one to focus on the genetic, elements carrying maximum
genetic diversity. The method involves generating a
library of extrachromosomal
DNA clones, sequencing a portion of the clones, comparing the sequences against a
database of existing DNA sequences, using an
algorithm to select said novel
nucleotide sequence based on the presence or absence of said portion in a
database, and identification of at least one novel
nucleotide sequence. The DNA sequence can also be translated in all six frames and the resulting
amino acid sequences can be compared against a
database of
protein sequences. The
integrated approach provides a rapid and efficient method to identify and isolate useful genes. Organisms of particular interest include, but are not limited to
bacteria, fungi,
algae, and the like. Compositions comprise a mini-
cosmid vector comprising a stuffer fragment and at least one cos site.