Introduction
RDPquery is a Java application
for retrieving taxonomic identifications for 16S rRNA prokaryotic gene
sequences. The program utilizes The Ribosomal Database Project's (http://rdp.cme.msu.edu)
online sequence match tool to retrieve classification information. RDPquery
was created by Glen Dyszynski and Wade
Sheldon in the Departments of Microbiology and Marine Sciences at
the University of Georgia. The program makes use of another Java application
created by Ahmed Moustafa called JAligner,
which creates alignments and performs comparisons on sequence data.
Classification Strategy
The general strategy used is as follows (Figure 1). For
each query sequence, RDPquery asks the RDP to find the 10 entries (or
some specified number of entries from 1-20) with the highest Sab values.
However, the sequence with the highest Sab value is frequently not the
sequence with the highest similarity, in the same way that the sequence
with the highest BLAST score frequently does not have the highest similarity.
Therefore, RDPquery uses JAligner to calculate the sequence similarity
for each of the sequences with high Sab values. To limit the number of
requests on the RDP, this action is done locally using downloaded copies
of all the RDP sequences. Therefore, it is necessary to download the
most recent version of the RDP sequences so that all of the matching
sequences returned by RDP can be found in the RDP database FastA file.
figure 1. Overview of RDPquery
RDPquery then identifies the sequence with the highest
similarity and creates an output file with two sets of taxonomic identifications.
The first set contains all the taxonomic data provided by RDP for the
sequence with the highest similarity. The second set, however, contains
only those taxonomic identifications where the similarity value exceeds
a predetermined cutoff. These cutoffs were generated by a survey of the
taxonomy in Bergey's Manual of Systematic Bacteriology2 (Figure 2). The
default cutoff values were set to represent the similarity value at which
one would be 95% confident in declaring a given taxonomic assignment.
For instance, 95 % of the comparisons we surveyed between members of
different genera from within the same family possessed less than 95 %
sequence similarity. Similarly, 95 % of the comparisons between members
of different families from within the same order possessed less than
92 % sequence similarity. Thus a clone possessing 94 % sequence similarity
to a type strain would be classified in the same family but not in the
same genus. These guidelines are conservative and tend to assign clones
to taxonomic groups when there is a high level of confidence; however,
note that the guidelines were developed from nearly complete sequences,
so caution should be used when applying them to partial sequences, which
may be more or less conserved than the entire gene.

figure
2. Survey of taxonomic assignments in Bergey's Manual of Systematic
Bacteriology. At each level, the rRNA sequence similarity was determined
for representatives of different taxa from within the same higher taxonomic
group. Thus, at the genus level, representatives of genera within the
same family were compared. At the family level, representatives within
the same order were compared. All sequences used were from type strains
and >1300 bp. No more than six sequences were selected from any
one taxon.
Licensing
The RDPquery source code is licensed under the GNU
General Public License. If you use RDPquery in a published work
or product, please include the citation:
Dyszynski, G. and Sheldon, W.M. RDPquery:
A Java program from the Sapelo Program Microbial Observatory for automatic
classification of bacterial 16S rRNA sequences based on Ribosomal Database
Project taxonomy and Smith-Waterman alignment. (http://simo.marsci.uga.edu/public_db/rdp_query.htm,
[version used]).
Downloads
RDPquery version 2.7 (October 2006)
--
Documentation
Only (Adobe PDF)
Source code and documentation (Zip archive)