This data set includes unprocessed sample .fastq files from two separate Illumina NextSeq runs, labelled as 'Run_1' and 'Run_2', respectively.
Sample names: e.g. STS15059, 'STS' is the abbreviation of Short-tailed shearwater. The first two digits of the numeric refer to the year of collection e.g. '15' = 2015. Finally, the following number refers to the sequential unique ID for that year, e.g. '059' is the fifty-ninth sample for the years' collection.
Leg bands are also recorded and are generally a 5-digit number and are unique to the individual bird. Longitudinal samples can be identified using these band IDs. E.g. in Run_2, an individual with the band number: 52196, was collected in 2015 as 'STS15065' and again in 2017 as 'STS17044'.
Run_1: N = 35 individual samples are split across 4 lanes e.g. 'STS16020_S35_L001(/L002/L003/L004)_R1_001/fastq' and need to be merged before conversion to .fasta format and downstream analysis.
Run_2: N = 36 individual samples were provided as a single merged file from the service provider, e.g. 'STS15059_S34_R1_001.fastq'.
Sample_info: This excel spreadsheet has information on samples as follows:
'Band': 5-digit number on leg band.
'Sample': Sample number within run.
'UID': The unique ID for collection year e.g. STS15007.
'Age': The known-age of the animal rounded to whole year.
'Index (NebNext)': The NEB index used for NGS sample identification.
'Note': Additional information on if a sample was a between or within run replicate or longitudinal replicate.
Analysis of these data will be published in: [tba: R. De Paoli-Iseppi et al. 2018. Molecular Ecology Resources].