About rrnDB versions

rrnDB Version Number System: The leftmost number represents a major internal change by rrnDB. The middle number represents an update affecting only the data such as new versions of KEGG, NCBI Taxonomy, RDP Classifier training set, new pan-taxa tables, or curation of individual records. The rightmost number, not used since 5.0, represents incremental improvements to the software or capabilities not large enough to merit a major version increment.

Data Sources Used: The date and version of each of the major data sources that are used to build rrnDB are shown for each rrnDB version. Data source versions do not always change in concert with increments in the rrnDB version. For example the rrnDB version could be incremented after curation of individual records or a software-only change, or a change in only one of the three major data sources.

rrnDB version history

rrnDB versionData source release datesChangelog
KEGG NCBI Taxonomy NCBI Assembly RDP Classifier training set
5.1

current
Jan. 19, 2017 Jan. 19, 2017 RDP Classifier 2.12
Trainset No:16
Taxonomy Version:
RDP 16S rRNA training set 16
Updated data from NCBI
6398 bacterial records representing 2466 species, and 234 archaeal records representing 183 species.
5.0 Oct. 17, 2016 Oct. 17, 2016 RDP Classifier 2.12 Trainset No:16
Taxonomy Version:
RDP 16S rRNA training set 16
Dropped KEGG as a data source. Started using NCBI genome assemblies as primary data source. 16S copy numbers obtained by searching for 16S copies in genome assemblies using BLAST. See About rrnDB page for details.
5880 Bacteria records representing 2327 species, and 230 Archaeal records representing 179 species.
4.4.4 June 29, 2015 July 6, 2015 RDP Classifier 2.11 Train Set 15 Estimate: Updated the 16S copy numbers that RDP Classifier uses to make copy-number adjustments. The original training set used copy number data based on rrnDB 4.2.2. The update was done by retrieving 16S training set № 15, replacing the copy number file with one made from the most recent pan-taxa 16S copy number statistics for RDP, available at the download page. RDP Classifier's train command was then used to generate a set of custom training files.
4.4.3 June 29, 2015 July 6, 2015 RDP Classifier 2.11 Train Set 15
  1. BioSample and Assembly accessions were added to the database for most genomes.
  2. BioProject accessions were changed to the full letter/integer expression instead of using just the integer bioproject ID.
  3. The 16S sequence-based quality control test (QC Test 5) that genomes pass through before they are added to rrnDB was changed. Starting with rrnDB 4.4.3 invalid 16S sequences are identified by a cascade of BLAST similarity searches and RDP Classifier screens. Further information about QC Test 5 can be found on the About rrnDB page. The blast database for this version was built from RDP 11.4 and contained 256,507 subject 16S sequences.
4.3.3 April 28, 2014 May 6, 2014 2014-09-17,
RDP Classifier training set No. 10
  1. Added the Estimate service to the rrnDB website.
  2. The pan-taxa 16S copy number statistics files for RDP and NCBI taxonomy systems are more streamlined. Each table row in the table (a distinct taxon) now has a parentid column identifying the row of the parent taxon in the same table. Rows that had previously represented gaps in the taxonomy lineage by using place-holder-names were dropped from the table. In statistics calculations, each row contributes its mean directly to the statistics of its parent taxon.
  3. For RDP pan-taxa statistics only, the calculation of RDP genus statistics from the means of NCBI species aggregates has been discontinued. The stats of each RDP genus are now calculated directly from the 16S copy numbers of the rrnDB records that map to that genus. The change affected the Pan-taxa 16S means for 55 RDP genera, 8 of which underwent an absolut change of more than 0.5, with a maximum of 1.08. Those most-affected 8 genera are Paenibacillus, Bacillus, Clostridium III, Lachnospiracea_incertae_sedis, Rhodospirillum, Prosthecochloris, Porphyromonas, and Roseburia. Statistics for family level and up are calculated from the means of their subtaxa, as before.
4.2.2 April 28, 2014 May 6, 2014 2014-09-17,
RDP Classifier training set No. 10
  1. The RDP taxonomy assigned to rrnDB genomes was updated using RDP training set 10.
  2. Downloadable, tab-delimited ASCII files containing most of the data content of rrnDB versions have been added to the website and are accessible via a newly-added "Download" tab in the web site navigation menu.
  3. A set of tab-delimited tables holding 16S copy number statistics for each taxon in the database ("pan-taxa statistics") have been added to the website download area. In the new files, the mean and standard deviation of 16S copy numbers for each taxon have been calculated from the means of the child taxons below it. When a rrnDB Search or Browse result contains a taxon that is overrepresented compared to other taxa in the search (the genus Escherichia for example), the 16S copy number statistics of the search result can be misleading. The pan-taxa statistics reduce genome overrepresentation bias by aggregating statistics for each taxon into a single number from which higher-rank statistics are calculated.
4.1.1 April 28, 2014 May 6, 2014 2012-06-01,
RDP Classifier training set No. 9
  1. Two additional quality control tests (QC Tests 5 & 6) were implemented. Sixteen rrnDB v4.0.0 genomes did not pass the tests and were removed. The removed genomes are (KEGG Accessions): T01955, T02726, T01956, T02880, T01813, T02384, T01818, T02205, T02827, T02615, T01527, T01684, T00988, T02618, T00383, T02933.
  2. The method of mapping RDP taxonomy to rrnDB records has changed. The new method uses the RDP Classifier to assign RDP taxonomy to rrnDB genomes. Assignments in rrnDB v4.1.1 were made using RDP Classifier training set No. 9. Previously we had reported that Classifier training set No. 10 had been used in rrnDB v4.1.1, but that was a mistake.
  3. This is still a beta version.
4.0.0 April 28, 2014 May 6, 2014 2014-03-07,
RDP Release 11.2
This version is an almost total replacement of rrnDB v3.1.227 with a major redesign of the database and user interface software, and a major change in the source of data used in rrnDB and the data processing methods. The number of records has approximately doubled. Data sources are NCBI/KEGG for genomes and genome annotations, and NCBI and RDP Release 11.2 for taxonomy. Genomes were mapped to RDP taxonomy by searching for 100% sequence identity between the 16S rRNA gene sequences of genomes in rrnDB, and 16S rRNA sequences in RDP 11.2. This version uses various automated quality control filters to try to prevent genomes that show evidence of annotation errors from entering the database. 216 records from rrnDB v3.1.227, which 16S gene copy number counts were derived from experimental evidence other than genome sequence annotations, were carried forward to v4.0.0.
3.1.227 rrnDB v3.1.227 is the final implementation of rrnDB as described in 2009 in Nucleic Acids Res. 29(1):181-4.
  • 9 versions