Sapelo Island Microbial Observatory Sapelo Island Microbial Observatory
powered by

 

Bacterial Taxonomy

 

Overview

Taxonomic classification of large libraries of 16S rRNA sequences from bacterial isolates and environmental DNA is a significant challenge, despite the widespread availability of public sequence databases and associated bioinformatics and genomics software. The quality of taxonomic information in general public databases, such as GenBank, varies considerably and new sequences are added at a phenomenal rate, quickly rendering phylogenetic trees and taxonomic placements obsolete.

In order to provide consistent and up-to-date taxonomic classifications for the thousands of 16S sequences in the SIMO database, we have developed an automated process for assigning unknown sequences to taxonomic ranks. This taxonomic information is then used to annotate sequence records for submission to GenBank, support taxonomic searches of the SIMO database, and provide taxonomic classification and lineage information for each SIMO sequence record.

Taxonomic Assignments

SIMO Taxonomic assignments are based on similarity to vetted type species sequences in the Ribosomal Database Project database. Unknown 16S sequences are compared to RDP type species sequences using the RDP Sequence Match program, then the highest ranking sequences are retrieved from RDP and aligned with the unknown SIMO sequence using the Smith-Waterman pair-wise local alignment algorithm (SSEARCH34 in the Pearson FASTA 3 package).

The taxonomic lineage of the most similar type species is then parsed from the RDP tree, and corresponding taxonomic ranks are assigned to the unknown SIMO sequence after applying a rank cut-off filter based on percent nucleotide identity determined from the local alignment. The taxonomic rank cut-offs, listed below, were determined empirically by comparing a large number of aligned sequences for known type species. The comparisons (between 104-335 depending upon the taxonomic rank) were chosen to be a balanced representation of more than one bacterial phylum. For each taxonomic rank, the cut-off values are conservative and represent the 16S rRNA sequence similarity values that would include approximately 95% of the comparisons at that rank (Whitman and Dyszynski, unpublished data).

Taxonomic Rank % Identity Cut-off*
Domain >0%
Phylum >75%
Class >85%
Order >91%
Family >92%
Genus >95%
Species >=100%

* e.g. 93% identity would result in assignment of the following ranks from the closest type species:
Domain - Phylum - Class - Order - Family

Taxonomic assignments are performed automatically when new sequence data are added to the SIMO database, and are updated approximately quarterly to reflect additions and refinements to the RDP database and phylogenetic tree.

SIMO RDP Agent

A set of software utilities was developed by Wade Sheldon to automate the entire process described above, as well as to retrieve RDP trees for the closest overall 10 sequences (type and non-type) to augment the taxonomic classification results. Classifications and trees returned from each analysis are uploaded to the SIMO database for use by SIMO investigators and display on the SIMO database web pages.

The software, termed the SIMO RDP Agent, was developed using MATLAB 6.5. The conceptual diagram on the RDP Taxonomy Agent page illustrates the work-flow performed for each analysis.

SIMO RDP Agent diagram
(View Conceptual Diagram)

SIMO RDPquery Program

An open source Java program (RDPquery) has also been developed by Glen Dyszynski and Wade Sheldon to allow individuals to classify 16S sequences on their own using this same approach. This program is fully described on the SIMO RDPquery web page.

 

 

 
   
 

National Science FoundationThe Sapelo Island Microbial Observatory is funded by the National Science Foundation

This material is based upon work supported by the National Science Foundation under grant number MCB-0702125. Any opinions, findings, conclusions, or recommendations expressed in the material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

 

UGA Marine Sciences

Contact Us