Searching the database [contents]

There are several different ways in which the database can be searched. All these ways are accessible on the main search page reachable via the Search link on the navigation menu.

  1. Search by 16S copy number [up]

    Use the Search Record Annotations form if you are interested in finding organisms with a specific 16S gene copy number or a range of numbers.

    • For instance, to retrieve all records of organisms having six copies of the 16S gene type 6 into the Search Record Annotations search field, then select the 16S gene copy number radio control and click the Search button.
    • To get all records with six or more 16S gene copies type >=6 into the search field.
    • To get all records with copy numbers between three and five, type 3-5 into the search field. Remember to select the correct radio button for 16S copy number search.
  2. Keyword search [up]

    If you are interested in records that contain a certain keyword in their annotation, enter that term in the Search Record Annotations search field and select the Keyword search radio button. The search engine will search for your keyword within rrnDB's data source org name, curator, evidence, and reference fields of each record.

  3. List all records [up]

    To see a listing of all records just click the Search button of the Search Record Annotations form without entering anything into the search field. You can get all records that the RDP classifier was able to classify by selecting RDP names from the drop-down list in the Search Taxonomy search form and click the Search Taxonomy button on an empty search field.

Interpreting search results [contents]

After initiating a search, you will be shown the search result page with a table listing all records of organisms that were found to be relevant to your search criteria.

  1. 16S Statistics [up]

    Just above the record listing is a display of the statistics of 16S copy numbers calculated over the records found as a result of your search. There you will find a tabular listing of the range, the mode, the median, the arithmetic mean, the standard deviation, as well as the count of records in the search result. The histogram to the right gives an alternative view of the distribution of 16S copy numbers among the organisms in your search result.

  2. Record listing [up]

    The records matching your search criteria are listed in a table. The first column contains a link to each records detail page (see below). The following rows are displayed:

    Data source record id
    Record identifiers starting with rrnDBv3- were inherited from rrnDB 3.1.227. All other identifiers are NCBI Assembly accessions, indicating the genome assembly on which the rrnDB record is based.
    Data source organism name
    The organism's name as described in the data source.
    NCBI organism name
    For records inherited from rrnDB 3.1.277, the corresponding NCBI scientific name. The NCBI scientific name requires the mouse to hover over the field to become visible.
    RDP name
    The name for the organism as classified by the RDP classifier, usually at the rank of genus.
    16S copy number
    Gene copy numbers as listed in each record. An n/a indicates missing information. In cases of missing 16S gene counts in records inherited from rrnDB 3.1.227 the copy number is interpolated from the 23S count that is available in these cases. The 23S copy number is not shown in the records table but available on the record details page. While being displayed as n/a the interpolated value is used in the 16S count statistics, when sorting the table by the 16S column, and when searching the database by 16S copy number.
  3. Detail page [up]

    Each organism record has a detail page. In addition to the data shown in the search result table the detail page contains:

    Data source record id
    Record identifiers starting with rrnDBv3- were inherited from rrnDB 3.1.227. The other record ids (GCA[F].########.# are the NCBI Assembly accessions from which the genome record is derived. Each assembly accession is linked to the corresponding web page, maintained by NCBI, where sequence data are available.
    BioSample link
    A link to each organism's NCBI BioSample web page.
    BioProject link
    A link to each organism's NCBI BioProject web page. Sequence data can be obtained here as well.
    NCBI taxid
    The NCBI taxonomy id of each organism. Linked to the corresponding record at the NCBI Taxonomy Browser web page.
    NCBI taxonomic lineage
    The organisms taxonomic lineage using the NCBI taxonomy.
    RDP taxonomic lineage
    The organisms taxonomic lineage using the RDP taxonomy as classified by the RDP classifier.
    Curator
    Records that were curated manually, contain a reference to the curator. Typically, records inherited from rrnDB 3.1.227 have this field populated.
    Evidence
    A short description what evidence the 16S copy number of an organism is based on.
    Note
    Additional information about some records.
    References
    Records inherited from rrnDB 3.1.227 typically have a bibliographic reference with a link to their PubMed web page.

Using Estimate [contents]

The Estimate function is accessed from the main navigation menu via the Estimate tab. Estimate allows users to upload a FASTA formatted file with 16S sequences and will run the RDP Classifier's classify command on them. For information on the RDP Classifier consult the RDP Classifier web site operated by the Ribosomal Database Project (RDP).

  1. Uploading a FASTA file and running RDP Classifier [up]

    Use the file upload dialog to select a FASTA file from your computer to upload. You may change the confidence cutoff value from the default to any fraction between 0 and 1. Clicking the Upload button will start the process, which may take a while depending on the number of sequences that were uploaded.

  2. Getting the Estimate result [up]

    The result of running the classifier on your set of 16S sequences is a set of three output files:

    Classification file
    Full classification for each sequence with confidence value.
    Hierarchical abundance file
    Contains the abundance information, listing each taxon and the number of sequences that were mapped to this taxon.
    Copy-number adjusted hierarchichal abundance file
    Contains 16S gene copy-number corrected abundance information. The numbers were obtained by dividing the values from the second file by the mean 16S copy number of the respective taxon. The full set of each taxon's mean 16S count is available for download. Look for the pan-taxa statistics files on the rrnDB Download page.

Downloading data [contents]

Current and older data sets from the rrnDB can be downloaded from the rrnDB Download page. The data are made available in individual tab-delimited text files suitable for import into spreadsheet software or relational databases. The first line in each file contains the column names. Each file's name contains a string indicating the version of rrnDB from which the files were generated. Consult the About rrnDB Versions page for information about the changes between current and older versions of rrnDB.

  1. Pan-taxa statistics [up]

    Pan-taxa statistics files contain per-taxon aggregated 16S copy number statistics, one taxon per row. There is one file for each taxonomic system, NCBI and RDP. A description of the table columns follows:

    id
    Row identifier, only meaningful within the table and may change between updates.
    rank
    Taxonomic rank.
    name
    The name of the taxon
    childcount
    Number of subtaxa that contribute to this taxon's statistics, i.e. number of rows with parentid equal to this row's id (cf. parentid).
    min, max
    Minimum, maximum 16S gene copy number aggregated from the taxon's genome records.
    mode, median
    Mode and median of the 16S gene copy numbers of the genomes belonging to a taxon. In practice, these values are only calculated for taxa at RDP's genus and NCBI's species level. Since the statistics for higher ranks are aggregates over means, i.e. non-discrete values, it does not make sense to calculate mode and median beyond the lowest ranks.
    mean, stddev
    Arithmetic mean and standard deviation representing the 16S gene copy number of a pan-taxon. These values are calculated from the means of the pan-taxa of immediate lower rank.
    sum16slist
    List of the 16S copy numbers of the genomes contributing to a pan-taxon.
    parentid
    The row identifier of a taxon's parent. The mean 16S count of a parent is calculated from the means of all taxa sharing this parentid. Hence, the mean 16S count is a mean of means, except at the lowest level where the 16S copy numbers, over which the mean is taken, come from genome records. Note that the parent-child relation can skip ranks wherever there are gaps in a genome's lineage.