Dataset
Viewed: [[ro.stat.viewed]] Cited: [[ro.stat.cited]] Accessed: [[ro.stat.accessed]]
ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&rfr_id=info%3Asid%2FANDS&rft_id=info:doi10.25910/5cc7cd40fca8e&rft.title=Indexed reference databases for KMA and CCMetagen&rft.identifier=http://hdl.handle.net/2123/20336&rft.publisher=The University of Sydney&rft.description=This database was built to identify taxa in metagenome samples using the CCMetagen pipeline. The whole NCBI nt collection allows a complete taxonomic overview, including from microbial eukaryotes that may be present in the dataset. This database is already indexed, ready to use with KMA and CCMetagen. A manual describing how to use this dataset can be found at: https://github.com/vrmarcelino/CCMetagen Additionally, a tutorial on the whole analysis of a set of metatranscriptome samples can be found at: https://github.com/vrmarcelino/CCMetagen/tree/master/tutorial The database was built as follows: The partially non-redundant nucleotide database was downloaded from the NCBI website (ftp://ftp.ncbi.nih.gov/blast/db/FASTA/nt.gz) in January 2018. This database was formatted to include taxids in sequence headers. Indexing was then performed with KMA using the commands: kma_index -i nt_taxid.fas -o ncbi_nt -NI -Sparse TG Three indexed databases are provided: NCBI nucleotide collection RefSeq database of bacterial and fungal genomes &rft.creator=Clausen Philip&rft.creator=Dr Jan Buchmann&rft.creator=Dr Vanessa Rossetto Marcelino&rft.date=2019&rft_rights=CC BY-NC-SA: Attribution-Noncommercial-Share Alike 4.0 https://creativecommons.org/licenses/by-nc-sa/4.0/&rft_subject=Microbiology&rft_subject=Biological Sciences&rft_subject=Metagenomics&rft_subject=Metatranscriptomics&rft.type=dataset&rft.language=English Access the data

Licence & Rights:

Non-Commercial Licence view details
CC-BY-NC-SA

CC BY-NC-SA: Attribution-Noncommercial-Share Alike 4.0
https://creativecommons.org/licenses/by-nc-sa/4.0/

Access:

Open

Brief description

This database was built to identify taxa in metagenome samples using the CCMetagen pipeline. The whole NCBI nt collection allows a complete taxonomic overview, including from microbial eukaryotes that may be present in the dataset. This database is already indexed, ready to use with KMA and CCMetagen.

A manual describing how to use this dataset can be found at: https://github.com/vrmarcelino/CCMetagen

Additionally, a tutorial on the whole analysis of a set of metatranscriptome samples can be found at: https://github.com/vrmarcelino/CCMetagen/tree/master/tutorial

The database was built as follows:

The partially non-redundant nucleotide database was downloaded from the NCBI website (ftp://ftp.ncbi.nih.gov/blast/db/FASTA/nt.gz) in January 2018. This database was formatted to include taxids in sequence headers.

Indexing was then performed with KMA using the commands:

kma_index -i nt_taxid.fas -o ncbi_nt -NI -Sparse TG

Three indexed databases are provided:

  1. NCBI nucleotide collection
  2. RefSeq database of bacterial and fungal genomes

Notes

Update to dataset:

The NCBI nucleotide collection contains many environmental and artificial sequence entries without taxonomic information (e.g. uncultured marine bacteria). We therefore compiled a database without those.

The file ncbi_nt_no_env_11jun2019.zip contains therefore all ncbi nt entries excluding the descendants of environmental eukaryotes (taxid 61964), environmental prokaryotes (48479), unclassified sequences (12908) and artificial sequences (28384).

Issued: 30 04 2019

Data time period: 09 04 2019 to 30 04 2019

Click to explore relationships graph
Subjects

User Contributed Tags    

Login to tag this record with meaningful keywords to make it easier to discover