How to use rrnDB
Searching the database [contents]
There are several different ways in which the database can be searched. All these ways are accessible on the main search page reachable via the Search
link on the navigation menu.
- Taxonomy based search [up]
If you are interested in information about a specific taxon:
- Search Taxonomy [up]
- You can enter a taxonomic name or part of it into the
Search Taxonomy
search field. A drop-down menu allows you to specify which taxonomic system to use. The default is to use NCBI scientific names. You can choose to extend the search to include the other categories of the NCBI system including synonyms, misspellings, historic names etc. The third option is to use the RDP taxonomic system. - NCBI taxonomy ids can be entered directly into the
Search Taxonomy
search field.
- You can enter a taxonomic name or part of it into the
- Browse Taxonomy [up]
If you are undecided as to what to look for or uncertain of a taxon's spelling you can employ the
Taxonomy Browser
. The Taxonomy Browser has drop-down menus for each taxonomic rank, listing the taxa for which rrnDB has a record. The browser can be switched between the NCBI and RDP taxonomies by using the radio controls andChange
button on the right side of the browser.
- Search Taxonomy [up]
- Search by 16S copy number [up]
Use the
Search Record Annotations
form if you are interested in finding organisms with a specific 16S gene copy number or a range of numbers.- For instance, to retrieve all records of organisms having six copies of the 16S gene type 6 into the
Search Record Annotations
search field, then select the16S gene copy number
radio control and click theSearch
button. - To get all records with six or more 16S gene copies type >=6 into the search field.
- To get all records with copy numbers between three and five, type 3-5 into the search field. Remember to select the correct radio button for 16S copy number search.
- For instance, to retrieve all records of organisms having six copies of the 16S gene type 6 into the
- Keyword search [up]
If you are interested in records that contain a certain keyword in their annotation, enter that term in the
Search Record Annotations
search field and select theKeyword search
radio button. The search engine will search for your keyword within rrnDB's data source org name, curator, evidence, and reference fields of each record. - List all records [up]
To see a listing of all records just click the
Search
button of theSearch Record Annotations
form without entering anything into the search field. You can get all records that the RDP classifier was able to classify by selectingRDP names
from the drop-down list in theSearch Taxonomy
search form and click theSearch Taxonomy
button on an empty search field.
Interpreting search results [contents]
After initiating a search, you will be shown the search result page with a table listing all records of organisms that were found to be relevant to your search criteria.
- 16S Statistics [up]
Just above the record listing is a display of the statistics of 16S copy numbers calculated over the records found as a result of your search. There you will find a tabular listing of the range, the mode, the median, the arithmetic mean, the standard deviation, as well as the count of records in the search result. The histogram to the right gives an alternative view of the distribution of 16S copy numbers among the organisms in your search result.
- Record listing [up]
The records matching your search criteria are listed in a table. The first column contains a link to each records detail page (see below). The following rows are displayed:
- Data source record id
- Record identifiers starting with
rrnDBv3-
were inherited from rrnDB 3.1.227. All other identifiers are NCBI Assembly accessions, indicating the genome assembly on which the rrnDB record is based. - Data source organism name
- The organism's name as described in the data source.
- NCBI organism name
- For records inherited from rrnDB 3.1.277, the corresponding NCBI scientific name. The NCBI scientific name requires the mouse to hover over the field to become visible.
- RDP name
- The name for the organism as classified by the RDP classifier, usually at the rank of genus.
- 16S copy number
- Gene copy numbers as listed in each record. An
n/a
indicates missing information. In cases of missing 16S gene counts in records inherited from rrnDB 3.1.227 the copy number is interpolated from the 23S count that is available in these cases. The 23S copy number is not shown in the records table but available on the record details page. While being displayed asn/a
the interpolated value is used in the 16S count statistics, when sorting the table by the 16S column, and when searching the database by 16S copy number.
- Detail page [up]
Each organism record has a detail page. In addition to the data shown in the search result table the detail page contains:
- Data source record id
- Record identifiers starting with
rrnDBv3-
were inherited from rrnDB 3.1.227. The other record ids (GCA[F].########.#
are the NCBI Assembly accessions from which the genome record is derived. Each assembly accession is linked to the corresponding web page, maintained by NCBI, where sequence data are available. - BioSample link
- A link to each organism's NCBI BioSample web page.
- BioProject link
- A link to each organism's NCBI BioProject web page. Sequence data can be obtained here as well.
- NCBI taxid
- The NCBI taxonomy id of each organism. Linked to the corresponding record at the NCBI Taxonomy Browser web page.
- NCBI taxonomic lineage
- The organisms taxonomic lineage using the NCBI taxonomy.
- RDP taxonomic lineage
- The organisms taxonomic lineage using the RDP taxonomy as classified by the RDP classifier.
- Curator
- Records that were curated manually, contain a reference to the curator. Typically, records inherited from rrnDB 3.1.227 have this field populated.
- Evidence
- A short description what evidence the 16S copy number of an organism is based on.
- Note
- Additional information about some records.
- References
- Records inherited from rrnDB 3.1.227 typically have a bibliographic reference with a link to their PubMed web page.
Using Estimate [contents]
The Estimate function is accessed from the main navigation menu via the Estimate
tab. Estimate allows users to upload a FASTA formatted file with 16S sequences and will run the RDP Classifier's classify command on them. For information on the RDP Classifier consult the RDP Classifier web site operated by the Ribosomal Database Project (RDP).
- Uploading a FASTA file and running RDP Classifier [up]
Use the file upload dialog to select a FASTA file from your computer to upload. You may change the confidence cutoff value from the default to any fraction between 0 and 1. Clicking the
Upload
button will start the process, which may take a while depending on the number of sequences that were uploaded. - Getting the Estimate result [up]
The result of running the classifier on your set of 16S sequences is a set of three output files:
- Classification file
- Full classification for each sequence with confidence value.
- Hierarchical abundance file
- Contains the abundance information, listing each taxon and the number of sequences that were mapped to this taxon.
- Copy-number adjusted hierarchichal abundance file
- Contains 16S gene copy-number corrected abundance information. The numbers were obtained by dividing the values from the second file by the mean 16S copy number of the respective taxon. The full set of each taxon's mean 16S count is available for download. Look for the pan-taxa statistics files on the rrnDB Download page.
Downloading data [contents]
Current and older data sets from the rrnDB can be downloaded from the rrnDB Download page. The data are made available in individual tab-delimited text files suitable for import into spreadsheet software or relational databases. The first line in each file contains the column names. Each file's name contains a string indicating the version of rrnDB from which the files were generated. Consult the About rrnDB Versions page for information about the changes between current and older versions of rrnDB.
- Pan-taxa statistics [up]
Pan-taxa statistics files contain per-taxon aggregated 16S copy number statistics, one taxon per row. There is one file for each taxonomic system, NCBI and RDP. A description of the table columns follows:
- id
- Row identifier, only meaningful within the table and may change between updates.
- rank
- Taxonomic rank.
- name
- The name of the taxon
- childcount
- Number of subtaxa that contribute to this taxon's statistics, i.e. number of rows with parentid equal to this row's id (cf. parentid).
- min, max
- Minimum, maximum 16S gene copy number aggregated from the taxon's genome records.
- mode, median
- Mode and median of the 16S gene copy numbers of the genomes belonging to a taxon. In practice, these values are only calculated for taxa at RDP's genus and NCBI's species level. Since the statistics for higher ranks are aggregates over means, i.e. non-discrete values, it does not make sense to calculate mode and median beyond the lowest ranks.
- mean, stddev
- Arithmetic mean and standard deviation representing the 16S gene copy number of a pan-taxon. These values are calculated from the means of the pan-taxa of immediate lower rank.
- sum16slist
- List of the 16S copy numbers of the genomes contributing to a pan-taxon.
- parentid
- The row identifier of a taxon's parent. The mean 16S count of a parent is calculated from the means of all taxa sharing this parentid. Hence, the mean 16S count is a mean of means, except at the lowest level where the 16S copy numbers, over which the mean is taken, come from genome records. Note that the parent-child relation can skip ranks wherever there are gaps in a genome's lineage.