Compositions and methods for rapid and highly efficient characterization of 
genetic diversity in organisms are provided. The methods involve rapid sequencing and characterization of extrachromosomal 
DNA, particularly plasmids, to identify and isolate useful 
nucleotide sequences. The method targets 
plasmid DNA and avoids repeated 
cloning and sequencing of the host 
chromosome, thus allowing one to focus on the genetic, elements carrying maximum 
genetic diversity. The method involves generating a 
library of extrachromosomal 
DNA clones, sequencing a portion of the clones, comparing the sequences against a 
database of existing DNA sequences, using an 
algorithm to select said novel 
nucleotide sequence based on the presence or absence of said portion in a 
database, and identification of at least one novel 
nucleotide sequence. The DNA sequence can also be translated in all six frames and the resulting 
amino acid sequences can be compared against a 
database of 
protein sequences. The 
integrated approach provides a rapid and efficient method to identify and isolate useful genes. Organisms of particular interest include, but are not limited to 
bacteria, fungi, 
algae, and the like. Compositions comprise a mini-
cosmid vector comprising a stuffer fragment and at least one cos site.