About rrnDB versions

rrnDB Version Number System: The leftmost number represents a major internal change by rrnDB. The middle number represents an update affecting only the data such as new versions of KEGG, NCBI Taxonomy, RDP Classifier training set, new pan-taxa tables, or curation of individual records. The rightmost number, not used since 5.0, represents incremental improvements to the software or capabilities not large enough to merit a major version increment.

Data Sources Used: The date and version of each of the major data sources that are used to build rrnDB are shown for each rrnDB version. Data source versions do not always change in concert with increments in the rrnDB version. For example the rrnDB version could be incremented after curation of individual records or a software-only change, or a change in only one of the three major data sources.

rrnDB version history

rrnDB version KEGG NCBI Taxonomy NCBI Assembly RDP Classifier training set Changelog
5.9

current
April 24, 2024 April 24, 2024 RDP Naive Bayesian rRNA Classifier Version 2.14,
July 2023
Trainset No:19
Taxonomy Version:
RDP 16S rRNA training set No. 19 07/2023
Updated data from NCBI FTP server:
  • 40019 Bacteria records representing 10169 species
  • 585 Archaeal records representing 458 species
Changes:
  • File download URL paths changed to /downloads/. The old URLs should still work.
  • Estimate result file URL paths changed also. Old URLs should also still work.
  • Estimate: The routine that checks if an uploaded fasta file is acceptable was replaced. Some files that were accepted before might now be rejected and vice versa. When a file gets rejected there should now be an message displayed regarding specifically what triggered the rejection.
5.8 June 9, 2022 June 9, 2022 RDP Classifier 2.13
Trainset No:18
Taxonomy Version:
RDP 16S rRNA training set 18
Updated data from NCBI FTP server:
  • 27655 Bacteria records representing 7203 species
  • 448 Archaeal records representing 348 species
5.7 Jan. 11, 2021 Jan. 11, 2021 RDP Classifier 2.12
Trainset No:16
Taxonomy Version:
RDP 16S rRNA training set 16
Updated data from NCBI FTP server:
  • 20255 Bacteria records representing 5716 species
  • 387 Archaeal records representing 292 species
5.6 Oct. 25, 2019 Oct. 25, 2019 RDP Classifier 2.12
Trainset No:16
Taxonomy Version:
RDP 16S rRNA training set 16
Updated data from NCBI FTP server:
  • 15486 Bacteria records representing 4568 species
  • 343 Archaeal records representing 261 species
New Features:
  • The first column of the pan-taxa tables, which are available in the download section, now contain the NCBI taxonomy id for each taxon. Previously this field exposed an internal database row identifier. Note that in the RDP version of the table some fields are empty due to the difficulties of mapping RDP to NCBI taxonomy.
5.5 Sept. 20, 2018 Sept. 20, 2018 RDP Classifier 2.12
Trainset No:16
Taxonomy Version:
RDP 16S rRNA training set 16
Updated data from NCBI FTP server:
  • 10996 Bacteria records representing 3484 species
  • 282 Archaeal records representing 220 species
5.4 Oct. 10, 2017 Oct. 10, 2017 RDP Classifier 2.12
Trainset No:16
Taxonomy Version:
RDP 16S rRNA training set 16

Updated data from NCBI FTP server:

  • 8223 Bacteria records representing 2913 species
  • 262 Archaeal records representing 201 species
Two genomes (GCF_002243515.1, GCF_002162355.1) that were introduced in 5.3 were removed due to unrealistically high 16S copy numbers (27 vs. 37.)

Fixes:

  • Omit withheld genome records from download files. Previously, a small number of records that were not available on the website (5 or less) made it into the download files. Old files remain available unchanged.

5.3 Sept. 7, 2017 Sept. 7, 2017 RDP Classifier 2.12
Trainset No:16
Taxonomy Version:
RDP 16S rRNA training set 16
Updated data from NCBI FTP server:
  • 7951 Bacteria records representing 2821 species
  • and 260 Archaeal records representing 199 species
New Features:
  • enable annotation search with partial assembly accessions (data source record ids
  • Add NCBI tax id column to downloadable rrnDB data set
5.2 May 17, 2017 May 17, 2017 RDP Classifier 2.12
Trainset No:16
Taxonomy Version:
RDP 16S rRNA training set 16
Updated data from NCBI FTP server:
  • 7198 Bacteria records representing 2644 species
  • 240 Archaeal records representing 188 species
New features:
  • We are providing all 16S rRNA sequences for download in a single file. The sequences are based on NCBI's annotation. Sequences of 50 genes (affecting 25 genomes) are currently missing due to missing or inconsistent annotations.
Fixes:
  • include correct name of uploded file in hierarchical estimate output column header, instead of a random string
  • fix search by tax id, will only show exactly matching results, not where a tax id is a substring of another
5.1 Jan. 19, 2017 Jan. 19, 2017 RDP Classifier 2.12
Trainset No:16
Taxonomy Version:
RDP 16S rRNA training set 16
Updated data from NCBI
6398 bacterial records representing 2466 species, and 234 archaeal records representing 183 species.
5.0 Oct. 17, 2016 Oct. 17, 2016 RDP Classifier 2.12 Trainset No:16
Taxonomy Version:
RDP 16S rRNA training set 16
Dropped KEGG as a data source. Started using NCBI genome assemblies as primary data source. 16S copy numbers obtained by searching for 16S copies in genome assemblies using BLAST. See About rrnDB page for details.
5880 Bacteria records representing 2327 species, and 230 Archaeal records representing 179 species.
4.4.4 June 29, 2015 July 6, 2015 RDP Classifier 2.11 Train Set 15 Estimate: Updated the 16S copy numbers that RDP Classifier uses to make copy-number adjustments. The original training set used copy number data based on rrnDB 4.2.2. The update was done by retrieving 16S training set № 15, replacing the copy number file with one made from the most recent pan-taxa 16S copy number statistics for RDP, available at the download page. RDP Classifier's train command was then used to generate a set of custom training files.
4.4.3 June 29, 2015 July 6, 2015 RDP Classifier 2.11 Train Set 15
  1. BioSample and Assembly accessions were added to the database for most genomes.
  2. BioProject accessions were changed to the full letter/integer expression instead of using just the integer bioproject ID.
  3. The 16S sequence-based quality control test (QC Test 5) that genomes pass through before they are added to rrnDB was changed. Starting with rrnDB 4.4.3 invalid 16S sequences are identified by a cascade of BLAST similarity searches and RDP Classifier screens. Further information about QC Test 5 can be found on the About rrnDB page. The blast database for this version was built from RDP 11.4 and contained 256,507 subject 16S sequences.
4.3.3 April 28, 2014 May 6, 2014 2014-09-17,
RDP Classifier training set No. 10
  1. Added the Estimate service to the rrnDB website.
  2. The pan-taxa 16S copy number statistics files for RDP and NCBI taxonomy systems are more streamlined. Each table row in the table (a distinct taxon) now has a parentid column identifying the row of the parent taxon in the same table. Rows that had previously represented gaps in the taxonomy lineage by using place-holder-names were dropped from the table. In statistics calculations, each row contributes its mean directly to the statistics of its parent taxon.
  3. For RDP pan-taxa statistics only, the calculation of RDP genus statistics from the means of NCBI species aggregates has been discontinued. The stats of each RDP genus are now calculated directly from the 16S copy numbers of the rrnDB records that map to that genus. The change affected the Pan-taxa 16S means for 55 RDP genera, 8 of which underwent an absolut change of more than 0.5, with a maximum of 1.08. Those most-affected 8 genera are Paenibacillus, Bacillus, Clostridium III, Lachnospiracea_incertae_sedis, Rhodospirillum, Prosthecochloris, Porphyromonas, and Roseburia. Statistics for family level and up are calculated from the means of their subtaxa, as before.
4.2.2 April 28, 2014 May 6, 2014 2014-09-17,
RDP Classifier training set No. 10
  1. The RDP taxonomy assigned to rrnDB genomes was updated using RDP training set 10.
  2. Downloadable, tab-delimited ASCII files containing most of the data content of rrnDB versions have been added to the website and are accessible via a newly-added "Download" tab in the web site navigation menu.
  3. A set of tab-delimited tables holding 16S copy number statistics for each taxon in the database ("pan-taxa statistics") have been added to the website download area. In the new files, the mean and standard deviation of 16S copy numbers for each taxon have been calculated from the means of the child taxons below it. When a rrnDB Search or Browse result contains a taxon that is overrepresented compared to other taxa in the search (the genus Escherichia for example), the 16S copy number statistics of the search result can be misleading. The pan-taxa statistics reduce genome overrepresentation bias by aggregating statistics for each taxon into a single number from which higher-rank statistics are calculated.
4.1.1 April 28, 2014 May 6, 2014 2012-06-01,
RDP Classifier training set No. 9
  1. Two additional quality control tests (QC Tests 5 & 6) were implemented. Sixteen rrnDB v4.0.0 genomes did not pass the tests and were removed. The removed genomes are (KEGG Accessions): T01955, T02726, T01956, T02880, T01813, T02384, T01818, T02205, T02827, T02615, T01527, T01684, T00988, T02618, T00383, T02933.
  2. The method of mapping RDP taxonomy to rrnDB records has changed. The new method uses the RDP Classifier to assign RDP taxonomy to rrnDB genomes. Assignments in rrnDB v4.1.1 were made using RDP Classifier training set No. 9. Previously we had reported that Classifier training set No. 10 had been used in rrnDB v4.1.1, but that was a mistake.
  3. This is still a beta version.
4.0.0 April 28, 2014 May 6, 2014 2014-03-07,
RDP Release 11.2
This version is an almost total replacement of rrnDB v3.1.227 with a major redesign of the database and user interface software, and a major change in the source of data used in rrnDB and the data processing methods. The number of records has approximately doubled. Data sources are NCBI/KEGG for genomes and genome annotations, and NCBI and RDP Release 11.2 for taxonomy. Genomes were mapped to RDP taxonomy by searching for 100% sequence identity between the 16S rRNA gene sequences of genomes in rrnDB, and 16S rRNA sequences in RDP 11.2. This version uses various automated quality control filters to try to prevent genomes that show evidence of annotation errors from entering the database. 216 records from rrnDB v3.1.227, which 16S gene copy number counts were derived from experimental evidence other than genome sequence annotations, were carried forward to v4.0.0.
3.1.227 rrnDB v3.1.227 is the final implementation of rrnDB as described in 2009 in Nucleic Acids Res. 29(1):181-4.