About rrnDB

The ribosomal RNA operon copy number database is a publicly available, curated resource for ribosomal operon (rrn) copy number information for Bacteria and Archaea.

Previous releases of rrnDB are described in the following publications:

Stoddard S.F, Smith B.J., Hein R., Roller B.R.K. and Schmidt T.M. (2015) rrnDB: improved tools for interpreting rRNA gene abundance in bacteria and archaea and a new foundation for future development. Nucleic Acids Research 2014; doi: 10.1093/nar/gku1201. [PMID:25414355]
Fulltext [HTML] [PDF]
Lee,Z.M., Bussema,C. 3rd and Schmidt,T.M. (2009) rrnDB: documenting the number of rRNA and tRNA genes in bacteria and archaea. Nucleic Acids Res., 37, Database issue, D489-D493; doi: 10.1093/nar/gkn689. [PMID:18948294]
Fulltext [HTML] [PDF]
Klappenbach,J.A., Saxman,P.R., Cole,J.R. and Schmidt,T.M. (2001) rrnDB: the ribosomal RNA operon copy number database. Nucleic Acids Res., 29(1), 181-4. [PMID:11125085]
Fulltext [HTML] [PDF]

General Description

The rrnDB is a curated database that catalogs the numbers of genes that encode for 16S and 23S ribosomal RNAs in Bacteria and Archaea. Typically, a single copy of each of these genes is clustered into a rRNA operon, with 1 to 15 rRNA operons present per genome. Because the number of genes encoding tRNAs is positively correlated with the number of rRNA-encoding genes (1), tRNA gene copy numbers are included. Data are gathered from published genome sequences (primarily) and from published articles that include estimates of the number of rRNA encoding genes. These are important data to microbiologists because the number of ribosomal RNA genes is indicative of where a bacterium lies on a spectrum of ecological strategies between oligotrophy (few rRNA genes) and copiotrophy (many rRNA genes) (3,4). The demand for rapid synthesis of ribosomes in copiotrophic bacteria is proposed to be the selective pressure driving the maintenance of multiple copies of rRNA genes per genome.

The 16S rRNA gene is a popular target for culture-independent, community composition surveys in microbiology. Analysis pipelines usually produce estimates of per-taxon relative abundances based on the number of copies of 16S genes recovered in a sequence library. Unfortunately, given the variable per-genome copy number of the 16S gene, a frequently recovered sequence may represent a high copy number taxon of lesser abundance, or a low copy number taxon of higher abundance. Given knowledge of gene copy number, molecular surveys can be adjusted to reduce this error source. The accuracy of such an adjustment depends on a reference database of known 16S copy numbers mapped to a taxonomy or phylogeny. To serve this need the rrnDB provides curated 16S rRNA gene copy number information that can be accessed on-line and can be downloaded in a machine-processable format for use in other applications (the pan-taxa statistics).

References

Major Data Sources

Starting with v5.0 the data sources for rrnDB are as follows.

NCBI genome assemblies and annotation data from the NCBI FTP server.
NCBI taxonomy data are acquired from the NCBI FTP server.
RDP taxonomy data are acquired from the RDP Classifier tool of the Ribosome Database Project (RDP).
Records inherited from rrnDB v3.1.227 are based on empirical determination of rrn copy number using various methods not involving finished genome sequences.

Further information about data sources is available in the rrnDB version history.

Records Curation

Maintaining genome-based resources involves a trade-off between manual curation of records, which is slow but leads to improved data quality, and machine-processing of records, which facilitates higher throughput but can compromise data quality owing to genome assembly or annotation error. Starting with version 5.0, rrnDB obtains 16S gene copy numbers by blasting selected known 16S sequences against NCBI whole genome assemblies and processing the resulting BLAST alignments. The objective for the new approach is to reduce dependency on genome annotation and to achieve more timely updates.

The majority of genome records in rrnDB are now identified by the NCBI assembly accessions on which they are based. Included in rrnDB are RefSeq and Genbank assemblies of the whole genome, without any gaps. Genbank assemblies that were withdrawn from RefSeq are excluded. When an assemblies is in both RefSeq and Genbank, which is the usual case, the rrnDB record points to the RefSeq entry.

To determine the 16S copy number of a genome, a 16S reference sequence is blasted against the genome assembly and the resulting alignments processed. For the Archaea NR_074239.1 (Methanohalophilus mahii DSM 5219) is used as reference 16S gene and for Bacteria NR_024570.1 (Escherichia coli strain U 5/41). These were arbitrarly selected from the NCBI RefSeq Targeted Loci Project

A few genomes records are withheld from rrnDB for more peculiar reasons. The withheld genomes page lists those records and gives a short descriptions of the grounds of rejection.

Version Numbers

Using rrnDB v5.0 as the example, the left number represents a major internal change by rrnDB. The right number represents an update from at least one of the Major Data Sources. Versions before 5.0 had a third number representing incremental improvements to the software or website features not large enough to merit a major version increment.

Links

Kyoto Encyclopedia of Genes and Genomes (KEGG). KEGG is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from genomic and molecular-level information. A major feature of KEGG is their creation and maintenance of various ontologies for understanding genomes and genome aggregates at higher-levels than the sequence and annotation of the individual genomes. The KEGG K number ontology for aggregating orthologous gene functions across genomes was used by rrnDB v4.

Ribosomal Database Project II. RDPII is a database of selected annotated bacterial 16S rRNA gene sequences. Additionally, it allows users to upload their own sequences to be aligned, classified based RDPII’s taxonomical hierarchy and to generate distance matrix for use with other analytical programs. It also provides analysis tools for users to build phylogenetic trees and to compare libraries.

NCBI Microbial Genomes has a large collection of publicly available genomic sequences from Bacteria and Archaea.

The Genomic tRNA Database. While we only provide the total number of tRNA genes in a genome, this database classifies the tRNA genes copy numbers according to their isotypes and anticodon. Users can also view the secondary structure of selected tRNAs.

Integrated Microbial Genomes. IMG is a database that catalogs all publicly available genomes from all three domains of life.

NCBI Taxonomy. The classification system used in our database is according to the NCBI Entrez Taxonomoy.

Ribosomal Internal Spacer Sequence Collection. A database of internal transcribed spacer (ITS) sequences.

Protocols

Ribosomal RNA Operon Copy Number Determination Via Southern Hybridization Using Non-Radioactive Detection [Southern Hybridization.doc]