DDBJ Sequence Read Archive Handbook

DDBJ Sequence Read Archive

DDBJ Sequence Read Archive (DRA)is an archive database for output data generated by next-generation sequencing machines including Roche 454 GS System®, Illumina Genome Analyzer®, Applied Biosystems SOLiD® System, and others. DRA is a member of the International Nucleotide Sequence Database Collaboration (INSDC)and archiving the data in a close collaboration with NCBI Sequence Read Archive (SRA)and EBI Sequence Read Archive (ERA).

Three INSDC partners regularly exchange data other than Analysis.

DRA accepts sequencing data from capillary sequencing platforms in fastq format. To submit sequencing chromatograms in addition to bases and qualities, please submit data to the DDBJ Trace Archive.

Metadata

Metadata objects

The metadata describes how the associated data have been obtained. The metadata are composed of 6 objects, Submission, BioProject, BioSample, Experiment, Runand Analysis. Each of these objects is defined by its XML schema and is related each other. Multiple Experiments can "point" to a single Sample, but not vice-versa.

Accession numbers with distinct prefixes are assigned to each object. Metadata and accession number system are common in DRA/ERA/SRA. The Experiment, Run and Analysis are the SRA objects, and the BioProject and BioSample are external database objects.

For details, please see the DRA XML schema

Submission

A container object only for grouping objects to be submitted.

BioProject

An overall description of a single research initiative; a project will typically relate to multiple samples and datasets.

BioSample

Description of biological source material; each physically unique specimen should be registered as a single BioSample with a unique set of attributes.

Experiment

A description of sample-specific sequencing library and sequencing methods. An Experiment references 1 BioProject and 1 BioSample. Multiple Experiments can "point" to a single Sample, but not vice-versa.

Run

Runs describe the files that belong to the previously created Experiments. They specify the data files for a specific sample to be processed by DRA. Note that all data files listed in a Run will be merged into a single SRA archive file, so files from different samples or replicates should not be grouped in the same Run. Paired-end data files, conversely, MUST be listed in a single run in order for the two files to be correctly processed as paired-end.

Analysis

Packages data associated with sequence read objects that are intended for downstream usage or that otherwise needs an archival home. Submit alignment data in bam file to Run. Please contact to DRA team to ask mirroring of analysis data.Analysis files are provided on the DDBJ ftp site and not indexed by the DRASearch.

Data model

Organization of metadata objects

Followings are examples of metadata. Submitters can organize metadata objects flexibly.

Most simple case

Most simple case
Most simple case

Comparative genome sequencing of three strains (paired-end)

Include paired-end read files in a Run.

Comparative genome sequencing of three strains (paired-end)
Comparative genome sequencing of three strains (paired-end)

Technical and biological replicates (paired-end)

Related FAQ: How many samples do I need for my DRA submission?

Technical and biological replicates (paired-end)
Technical and biological replicates (paired-end)

Related sequencing data are reported in two publications.

Related sequencing data are reported in two publications.
Related sequencing data are reported in two publications.

Items in each metadata object.

Required*
Conditionally required*

Submission

Center Name

Enter submitter's organization.

Center Name*

A submitter's center name. Center Name List. A center name abbreviation is required to submit data to DRA.

In the metadata creation tool, the center name is automatically filled with the account information.

The Center Name is an abbreviation operationally used by SRA and is not for indicating ownership of submission. Submitters listed in Submitter hold ownership of submission.

Lab Name*
Laboratory name within submitting institution. The Lab name is pre-entered with "Lab/Group", "Department (2)", "Department (1)", "Organization" of D-way account. Text can be editted.

Hold Until

Specify how to release the data.

Hold Until*
Direct the DRA to release the record on or after the specified date.Submitter can set the hold date for a maximum of 2 years and can change the date before the record is released.
Immediate Release*
Direct the DRA to release the record immediately after submission is processed.

Submitter

The DRA contacts the listed address(es) regarding the submission by e-mail.Include contact information of PI and non-PI member(s) who actually submits data.The contact information is not made public. If you want to display the contact information, enter the information in the BioProject.

Name*
Name of submitter.
E-mail*
E-mail of submitter.

BioProject

BioProject ID*
Select a project registered to BioProject or submit a new project. For submission to BioProject, please refer to the BioProject Handbook.

BioSample

BioSample ID*
Select samples registered to BioSample or create and submit new samples. For submission to BioSample, please refer to BioSample Handbook.

Experiment

Alias
Name of the experiment designated by the archive. This alias is used to reference metadata objects without accession numbers.
BioSample Used*
Select the BioSample this experiment uses.
Title*
Short text that can be used to call out experiment records in searches or in displays. A title like "[Sequencing Instrument Model] [paired end] sequencing of [BioSample ID]" (for example, "Illumina HiSeq 2000 paired end sequencing of SAMD00025741") is automatically constructed. To enter user-defined titles, download Experiment metadata into a tab-delimited text file, edit title values and upload it.
Library Name
The submitter's name for this library.
Library Source*
The Library Source specifies the type of source material that is being sequenced.
Library Source Description
GENOMIC Genomic DNA (includes PCR products from genomic DNA).
TRANSCRIPTOMIC Transcription products or non genomic DNA (EST, cDNA, RT-PCR, screened libraries).
METAGENOMIC Mixed material from metagenome.
METATRANSCRIPTOMIC Transcription products from community targets.
SYNTHETIC Synthetic DNA.
VIRAL RNA Viral RNA.
OTHER Other, unspecified, or unknown library source material.
Library Selection*
Whether any method was used to select and/or enrich the material being sequenced.
Library Selection Description
RANDOM Random shearing only.
PCR Source material was selected by designed primers.
RANDOM PCR Source material was selected by randomly generated primers.
RT-PCR Source material was selected by reverse transcription PCR.
HMPR Hypo-methylated partial restriction digest.
MF Methyl Filtrated.
repeat fractionation Selection for less repetitive (and more gene rich) sequence through Cot filtration (CF) or other fractionation techniques based on DNA kinetics.
size fractionation Physical selection of size appropriate targets.
MSLL Methylation Spanning Linking Library.
cDNA complementary DNA.
cDNA_randomPriming
cDNA_oligo_dT
PolyA PolyA selection or enrichment for messenger RNA (mRNA); should replace cDNA enumeration.
Oligo-dT enrichment of messenger RNA (mRNA) by hybridization to Oligo-dT.
Inverse rRNA depletion of ribosomal RNA by oligo hybridization.
ChIP Chromatin immunoprecipitation.
MNase Micrococcal Nuclease (MNase) digestion.
DNAse Deoxyribonuclease (DNase) digestion.
Hybrid Selection Selection by hybridization in array or solution.
Reduced Representation Reproducible genomic subsets, often generated by restriction fragment size selection, containing a manageable number of loci to facilitate re-sampling.
Restriction Digest DNA fractionation using restriction enzymes.
5-methylcytidine antibody Selection of methylated DNA fragments using an antibody raised against 5-methylcytosine or 5-methylcytidine (m5C)MBD2 protein methyl-CpG binding domain : Enrichment by methyl-CpG binding domain.
MBD2 protein methyl-CpG binding domain MBD2 protein methyl-CpG binding domain.
CAGE Cap-analysis gene expression.
RACE Rapid Amplification of cDNA Ends.
MDA multiple displacement amplification.
padlock probes capture method Padlock Probes capture strategy to be used in conjuction with Bisulfite-Seq.
other Other library enrichment, screening, or selection process.
unspecified Library enrichment, screening, or selection is not specified.
Library Strategy*
Sequencing technique intended for this library.
Library Strategy Description
WGS Whole genome shotgun.
WGA Whole genome amplification.
WXS Random sequencing of exonic regions selected from the genome.
RNA-Seq Random sequencing of whole transcriptome.
miRNA-Seq Micro RNA and other small non-coding RNA sequencing.
ncRNA-Seq Capture of other non-coding RNA types, including post-translation modification types such as snRNA (small nuclear RNA) or snoRNA (small nucleolar RNA), or expression regulation types such as siRNA (small interfering RNA) or piRNA/piwi/RNA (piwi-interacting RNA).
ssRNA-seq strand-specific RNA sequencing
WCS Whole chromosome (or other replicon) shotgun.
CLONE Genomic clone based (hierarchical) sequencing.
POOLCLONE Shotgun of pooled clones (usually BACs and Fosmids).
AMPLICON Sequencing of overlapping or distinct PCR or RT-PCR products.
CLONEEND Clone end (5', 3', or both) sequencing.
FINISHING Sequencing intended to finish (close) gaps in existing coverage.
RAD-Seq Restriction Site Associated DNA Sequence
ChIP-Seq Direct sequencing of chromatin immunoprecipitates.
MNase-Seq Direct sequencing following MNase digestion.
DNase-Hypersensitivity Sequencing of hypersensitive sites, or segments of open chromatin that are more readily cleaved by DNaseI.
Bisulfite-Seq Sequencing following treatment of DNA with bisulfite to convert cytosine residues to uracil depending on methylation status.
EST Single pass sequencing of cDNA templates.
FL-cDNA Full-length sequencing of cDNA templates.
CTS Concatenated Tag Sequencing.
MRE-Seq Methylation-Sensitive Restriction Enzyme Sequencing strategy.
MeDIP-Seq Methylated DNA Immunoprecipitation Sequencing strategy.
MBD-Seq Direct sequencing of methylated fractions sequencing strategy.
Tn-Seq Gene fitness determination through transposon seeding.
FAIRE-seq Formaldehyde Assisted Isolation of Regulatory Elements
SELEX Systematic Evolution of Ligands by EXponential enrichment
RIP-Seq Direct sequencing of RNA immunoprecipitates (includes CLIP-Seq, HITS-CLIP and PAR-CLIP).
ChIA-PET Direct sequencing of proximity-ligated chromatin immunoprecipitates.
Hi-C Chromosome Conformation Capture technique where a biotin-labeled nucleotide is incorporated at the ligation junction, enabling selective purification of chimeric DNA ligation junctions followed by deep sequencing
ATAC-seq Assay for Transposase-Accessible Chromatin (ATAC) strategy is used to study genome-wide chromatin accessibility. alternative method to DNase-seq that uses an engineered Tn5 transposase to cleave DNA and to integrate primer DNA sequences into the cleaved genomic DNA
Targeted-Capture
Tethered Chromatin Conformation Capture
Synthetic-Long-Read binning and barcoding of large DNA fragments to facilitate assembly of the fragment
Other Library strategy not listed.
Library Construction Protocol

Free form text describing the protocol by which the sequencing library was constructed. Please include protocols of DNA fragmentation, ligation and enrichment. If a library preparation kit is used, include the name and version (if any) of the kit (for example, Illumina Nextera DNA Library Preparation Kit).

Reference: Alnasir J, Shanahan HP. Investigation into the annotation of protocol sequencing steps in the sequence read archive. Gigascience. 2015 May 9;4:23. doi: 10.1186/s13742-015-0064-7. eCollection 2015. PMID: 25960871 (Open Access)

Instrument*
Select a sequencing instrument model.
Instrument Model
454 GS
454 GS 20
454 GS FLX
454 GS FLX+
454 GS FLX Titanium
454 GS Junior
Illumina Genome Analyzer
Illumina Genome Analyzer II
Illumina Genome Analyzer IIx
Illumina HiSeq 1000
Illumina HiSeq 1500
Illumina HiSeq 2000
Illumina HiSeq 2500
Illumina HiSeq 3000
Illumina HiSeq 4000
Illumina NovaSeq 6000
Illumina MiSeq
Illumina MiniSeq
Illumina iSeq 100
Illumina HiScanSQ
HiSeq X Five
HiSeq X Ten
NextSeq 500
NextSeq 550
Helicos HeliScope
AB SOLiD System
AB SOLiD System 2.0
AB SOLiD System 3.0
AB SOLiD 3 Plus System
AB SOLiD 4 System
AB SOLiD 4hq System
AB SOLiD PI System
AB 5500 Genetic Analyzer
AB 5500xl Genetic Analyzer
AB 5500xl-W Genetic Analysis System
Complete Genomics
MinION
GridION
PromethION
PacBio RS
PacBio RS II
Sequel
Ion Torrent PGM
Ion Torrent Proton
Ion Torrent S5
Ion Torrent S5 XL
AB 310 Genetic Analyzer
AB 3130 Genetic Analyzer
AB 3130xL Genetic Analyzer
AB 3500 Genetic Analyzer
AB 3500xL Genetic Analyzer
AB 3730 Genetic Analyzer
AB 3730xL Genetic Analyzer
Spot Type*
Select a layout of reads in sequencing data files.
Spot Type Description
single Single read
paired (FF) Paired reads with same direction.
paired (FR) Paired reads with opposite direction.
Nominal Length*
Size of the insert for Paired reads.
Nominal Sdev
Standard deviation of insert size.
Spot Length*

The read length in submitted sequencing files. For mate pairs, this number includes mate pairs, but does not include gap lengths.

  • When the spot length is constant, enter a constant value.
  • For 454 platforms producing reads with variable length, enter a constant flow count.
  • For fastq files with variable length, enter an average length.

Run

Alias
Name of the run designated by the archive. This alias is used to reference metadata objects without accession numbers.
Title*
Short text that can be used to call out run records in searches or in displays. A title like "[Sequencing Instrument Model] [paired end] sequencing of [BioSample ID]" (for example, "Illumina HiSeq 2000 paired end sequencing of SAMD00025741") is automatically constructed. To enter user-defined titles, download Run metadata into a tab-delimited text file, edit title values and upload it.
Experiment Referenced*
Select the experiment this run belongs to.

Data files for Run

Select data files for a Run.

Run/Analysis
Specify whether a data file belongs to the Run or Analysis. In the web submission form, this field is un-editable and is automatically filled according to the selected Run or Analysis. To upload metadata in tsv file, this field needs to be specified manually.
File Name*
The name of a sequence data file. Uploaded filenames are automatically filled in.
Run/Analysis contains files*
Select a Run to which the data file belongs.
File Type*
The sequence data file format. For the fastq files with variable read length, select 'generic_fastq'. For the fastq files with constant read length, select 'fastq'.

File Type Description
generic_fastq fastq files with variable read length
fastq fastq files with constant read length
sff 454 Standard Flowgram Format file
hdf5 PacBio hdf5 Format file
bam Binary SAM format for use by loaders that combine alignment and sequencing data
tab A tab-delimited table maps "SN in SQ line of BAM header" and "reference fasta file"
reference_fasta Reference sequence file in single fasta format used to construct SRA archive file format. Filename must end with ".fa"
MD5 Checksum*
MD5 checksum of a sequence data file. How to obtain the MD5 checksum values.

Analysis

Alias
Name of the analysis designated by the archive.This alias is used to reference metadata objects without accession numbers.
Title*
Title of the analyis object.
Description*
Describes the contents of the analysis.
Analysis Type*
Select an Analysis type. Submit alignment data to Run in bam format.
Analysis Type Description
De Novo Assembly A placement of sequences including trace, SRA, GI records into a multiple alignment from which a consensus is computed..
Sequence Annotation Per sequence annotation of named attributes and values.
Example: Processed sequencing data for submission to dbEST without assembly.
Reads have already been submitted to one of the sequence read archives in raw form.
The fasta data submitted under this analysis object result from the following treatments, which may serve to filter reads from the raw dataset:
    - sequencing adapter removal
    - low quality trimming
    - poly-A tail removal
    - strand orientation
    - contaminant removal.
Abundance Measurement Identify the tools and processing steps used to produce the abundance measurements (coverage tracks).

Data files for Analysis

Select data files for an Analysis.

Run/Analysis
Specify whether a data file belongs to the Run or Analysis. In the web submission form, this field is un-editable and is automatically filled according to the selected Run or Analysis. To upload metadata in tsv file, this field needs to be specified manually.
File Name*
The name of an analysis file.
Run/Analysis contains files*
Select an Analysis to which the data file belongs.
File Type*
The analysis data file format.
File Type Description
bam Binary form of the Sequence alignment/map format for read placements, from the SAM tools project.
See http://sourceforge.net/projects/samtools/.
tab A tab delimited text file that can be viewed as a spreadsheet. The first line should contain column headers..
ace Multiple alignment file output from the phred assembler and similar programs.
See http://www.phrap.org/consed/distributions/README.16.0.txt for a description of the ACE file format..
fasta Sequence data format indicating sequence base calls.The format is simple: a header line initiated with the > character, data lines following with base calls..
wig The wiggle (WIG) format allows display of continuous-valued data in track format.This display type is useful for GC percent, probability scores, and transcriptome data.
See http://genome.ucsc.edu/goldenPath/help/wiggle.html for a description of the Wiggle Track format..
BED BED format provides a flexible way to define the data lines that are displayed in an annotation track.
See http://genome.ucsc.edu/FAQ/FAQformat#format1 for a description of the BED format..
VCF Variant Call Format.
See http://www.1000genomes.org/wiki/analysis/variant%20call%20format/vcf-variant-call-format-version-41 for a description of the VCF format.
MAF Mutation Annotation Format
GFF General Feature Format
csv
tsv
MD5 Checksum*
MD5 checksum of a run data file. How to obtain the MD5 checksum values.

Run data files

  • The DRA does NOT accept fasta only datasets. The minimum submission level for SRA is base/color calls with quality scores.
  • Make sure the file names are constructed only from alphanumerals [A-Z,a-z,0-9], underscores [_], hyphens [-] and dots [.], with no whitespaces, brackets, other punctuations or symbols.
  • Barcoded data files should be demultiplexed prior to submission and a unique BioSample should be created for each barcoded sample; in other words, each BioSample must be linked to one or more unique data files.
  • In case of fastq files, submit paired reads in separate files. For bam and sff files, paired reads need to be described in single file.
  • Upload data files directly under a submission directory. Submitted archive files should NOT contain any directory structure.
  • Binary data formats, including BAM, SFF and HDF5 should be submitted without applying any additional compression.

Formats of sequencing data files

The DRA metadata submission tool cannot describe technical reads (adapter, primer and barcode sequences). "To submit raw data contain technical reads" and "To use metadata elements in DRA XML schema the but not in the submission tool", submitters need to create metadata in XML files.

Generic formats

Format Platform Recommended
BAM all platforms Yes
fastq all platforms Yes

Platform specific formats

Format Platform Recommended
SFF 454 and Ion Torrent Yes
PacBio HDF PacBio Yes
SOLiD csfasta/qual SOLiD No (please convert to fastq/bam)
Illumina qseq and scarf Illumina No (please convert to fastq/bam)

BAM file

Binary Alignment/Map files (BAM) represent one of the preferred DRA submission formats. BAM is a compressed version of the Sequence Alignment/Map (SAM) format (see SAMv1.pdf). BAM files can be decompressed to a human-readable text format (SAM) using SAM/BAM-specific utilities (e.g. samtools) and can contain unaligned sequences as well. DRA recommends to submit BAM including unaligned reads as primary data into Run.

SAM is a tab-delimited format including both the raw read data and information about the alignment of that read to a known reference sequence(s). There are two main sections in a SAM file, the header and the alignment (sequence read) sections, each of which are described below. Note that this documentation will focus on a description of the SAM format with respect to submission of BAM files to the DRA (i.e. DRA doe not accept SAM files for submission). A more comprehensive discussion of the format specifications can be found at the samtools website.

SAM Header Example:

@HD    VN:1.4    SO:coordinate
@SQ    SN:CHROMOSOME_I    LN:15072423
UR:ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Eukaryotes/invertebrates/Caenorhabditis_elegans/
WBcel215/Primary_Assembly/assembled_chromosomes/FASTA/chrI.fa.gz    AS:ce10    
SP:Caenorhabditis elegans
 
@SQ    SN:CHROMOSOME_II    LN:15279345    
UR:ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Eukaryotes/invertebrates/Caenorhabditis_elegans/
WBcel215/Primary_Assembly/assembled_chromosomes/FASTA/chrII.fa.gz     AS:ce10    
SP:Caenorhabditis elegans  
 
@RG    ID:1    PL:ILLUMINA    LB:C_ele_05    DS:WGS of C elegans    PG:BamIndexDecoder
@PG    ID:bwa    PN:bwa    VN:0.5.10-tpx

SAM Alignment Example:

3658435    145    CHROMOSOME_I    1    0    100M    CHROMOSOME_II    2716898    0    
GCCTAAGCCTAAGCCTAAGCCTAAGCCTAAGCCTAAGCCTAAGCCTAAGCCTAAGCCTAAGCCTAAGCCTAAGCCTAAGCCTAAGCCTAAGCCT
AAGCCT    
@CCC?:CCCCC@CCCEC>AFDFDBEGHEAHCIGIHHGIGEGJGGIIIHFHIHGF@HGGIGJJJJJIJJJJJJJJJJJJJJJJJJJJJHHHHHFF
FFFCCC    RG:Z:1    NH:i:1    NM:i:0
    
5482659    65    CHROMOSOME_I    1    0    100M    CHROMOSOME_II    11954696    0    
GCCTAAGCCTAAGCCTAAGCCTAAGCCTAAGCCTAAGCCTAAGCCTAAGCCTAAGCCTAAGCCTAAGCCTAAGCCTAAGCCTAAGCCTAAGCCT
AAGCCT    
CCCFFFFFHHGHGJJGIJHIJIJJJJJIJJJJJIJJGIJJJJJIIJIIJFJJJJJFIJJJJIIIIGIIJHHHHDEEFFFEEEEEDDDDCDCCCA
AA?CC:    RG:Z:1    NH:i:1    NM:i:0

BAM file processing

The header and alignment section are internally consistent: each aligned read has an RNAME (reference sequence name, 3rd field) that matches an SN tag value from the header (e.g., CHROMOSOME_I), and, if provided, the alignment read group optional field (RG:Z:) is consistent with the read group ID in the header (1). It is also important to ensure that the FLAG fields (2nd field in each line) are correctly set for the data. The SRA pipeline will attempt to resolve incorrect FLAG values, but sufficiently incorrect values can lead to processing errors. The SRA does not archive optional and non-standard tags/field values contained in the alignment section. However, the entire header section of the bam file is retained. Additionally, although the SAM format allows for an equal sign (=) in the sequence field to represent a match to the reference sequence or only an asterisk (*) in both the sequence and quality fields, the DRA processing software does not recognize either of these formats.

Please note that unexpected notations used to indicated paired reads can lead to failure to recognize the pairs and an improper SRA archive (i.e. paired reads are treated like fragments). For example, using :0 and :1 at the end of the read names is atypical and is currently not recognized as an indication of read 1 and 2 in a pair. It would be better to exclude these notations and provide the two reads with the same names. Expected notations for particular platforms will work. For example, Illumina reads with /1 or /2 appended is an expected notation. Further, neglecting to set the proper bits for paired reads in the SAM/BAM flags (e.g. multi-segment template 1-bit, first segment 64-bit, and last segment 128-bit) or splitting paired reads into separate bam files can result in an improper SRA archive or failure to generate the SRA archive.

In the case of submitting alignment data, you need to submit "BAM", "INSDC, refseq accession number OR reference multi-fasta" and "SN-reference mapping table". Submit one bam file per Run.

When submitting bam file into Analysis instead of Run, the mapping table is unnecessary. However, please consider to submit bam including unaligned reads as primary data into Run.

When submitting unmapped bam (without SQ header line) from PacBio and IonTorrent, the mapping table and reference sequences are not necessary.

If only BAM alignment files are submitted, then please make sure that the BAM files also contain the unaligned reads. This is critical to enable primary re-analysis and re-alignment of the dataset using new tools or future genome assembilies.

mapping between bam and reference sequences
mapping between bam and reference sequences

BAM file submission

The alignment data can be submitted in the BAM format. The bam files should be readable by SAMtools and picard. The BAM files are nearly optimal in terms of compression and should be submitted uncompressed.

Specify reference by INSDC/RefSeq accession number

If references are found in list, references can be specified by their accession.version number (for example, NC_000001.11). Version numberis necessary. Accession numbers for references can be searched in NCBI Assembly.

Specify reference by supplying multi-fasta

If references are not found in the list, submit a reference file in multi-fasta format. Select "reference_fasta" in the Run file type. The reference name in the bam header and reference sequence are linked by the name in bam header and fasta defline via the mapping table. If sequence length is different between @SQ-LN and multi-fasta, a warning is raised.

Specify reference by both INSDC/RefSeq accession number and multi-fasta

If a part of references are found in list, these references can be specified by their accession.version number (for example, NC_000001.11). The rest of references needs to be supplied by uploading a multi-fasta file. In the SN-reference mapping table, list accession.version numbers and sequence names of multi-fasta deflines.

SN-reference mapping table

A tab delimited text file describing mapping between "SN in SQ line in BAM header" and "accession OR sequence name in fasta file". Select "tab" in the Run file type

BAM header
@HD VN:1.0 GO:none SO:coordinate
@SQ SN:chr1 LN:249698942
@SQ SN:chr2 LN:242508799
@SQ SN:chr3 LN:198450956
...
SN-fasta mapping table. In the example, the reference named ref1 in multi-fasta file corresponds to the SN:chr1.
chr1 ref1
chr2 ref2
chr3 ref3
...
Reference multi-fasta.
>ref1
CGGTGGGGGTGGTGTTAGTACCCCATCTTGTAGGTCTGAAACACAAAGTGTGGGGTGTCT
...
>ref2
TCCACCAACGTTAGAAGGCCTTGGCCCCCAGAGAGCCAATTTCACAATCCAGAAGTCCCC
...
>ref3
GTGTGTGACCAGGGAGGTCCCCGGCCCAGCTCCCATCCCAGAACCCAGCTCACCTACCTT
...
SN-fasta mapping table. In the example, the reference "NC_000001.11" corresponds to the SN:chr1.
chr1 NC_000001.11
chr2 NC_000002.12
chr3 NC_000003.12
...

fastq

Run filetype needs to be specified depending on whether read length is constant or not.

  • For the fastq files with constact read length, select 'fastq' in the file typeof Run. Paired reads should appear in the same order in the paired files.
  • For the fastq files with variable read length, select 'generic_fastq' in the file typeof Run.

Format of fastq, for details, please see NCBI website.

  • Quality values must be in Phred scale. By default, 33 (!) is used for Phred quality offset. In the case of 64 (@), update the ascii_offset of Run XMLto 'ascii_offset="@"'.
  • No technical reads (adapters, linkers, barcodes) are allowed.
  • Paired reads must split and submitted using two Fastq files. The read names must have a suffix identifying the first and second read from the pair, for example '/1' and '/2'.
  • The first line for each read must start with '@'.
  • The base calls and quality scores must be separated by a line starting with '+'.
  • The Fastq files must be compressed using gzip or bzip2.

454

The DRA accepts sequencing run data from the 454 platform in the sff and fastq/bam format. These files should reflect the sequencing run setup. If a sff file contains data derived from more than one sample, please break up resulting fastq file into files contain data from only one sample.

The read names found in the .sfffile are meaningful and reflect the addressing scheme for the picotitre plate as well as a globally unique run id. Please do not rewrite this name in the sff as such addressing information will be lost. The sff file format is nearly optimal in terms of footprint, so there is little to be gained by further compressing them. Therefore, please provide .sfffiles uncompressed.

Illumina Genome Analyzer

Illumina pipeline v1.4 and later

DRA does not accept qseq files. Please convert qseq to fastq/bam.

SOLiD

SOLiD Native Format

DRA does not accept SOLiD native files. Please convert the native files to fastq/bam.

Ion Torrent

Submit Ion Torrent data in the sff or fastq/bam format.

Helicos Heliscope

Submit Helicos data in the sms(helicos_native) or fastq/bam format created with the fixed-quality value, "14".

Complete Genomics

Submit Complete Genomics data in the fastq/bam format.

Pacific Biosciences

Pacific Biosciences uses HDF5, a container file with a directory-like structure, to store raw data. The DRA accepts both bas.h5 and bax.h5 file submissions. Note that submission of data from the RS II instrument requires one Run consists of one *.bas.h5 file and three *.bax.h5 files. Do not rename files.

Do NOT include files other than HDF5 in a Run.

The DRA also accepts Pacific Biosciences data in the fastqformat. Because the read length varies, select the "generic_fastq"for the Run filetype.

Oxford Nanopore

Submit Oxford Nanopore data in the fastq/bam format.

Capillary sequencing platform

Submit capillary sequencing data in the fastq/bam format.

Analysis data files

PacBio Base Modification Files

PacBiosequence data also permits the analysis of methylated bases within the sequence, which can be extremely helpful to the scientific community. For example, the precise positions of those modified bases can be used to determine the specificity of the DNA methyltransferases that produced them. The PacBio analysis suite contains an analysis workflow (RS_Modification_and_Motif_Analysis) to extract these sequences and produce several files:

  • motif_summary.csv
  • modifications.csv
  • modifications.gff
  • motifs.gff

It would be beneficial to the scientific community if you were able to perform this analysis and submit at least the motif_summary.csv file for prokaryotes via as a DRA Analysis object. Please submit these files as data files of the Analysis with Sequence Annotation typein addition to sequencing reads in Run. For assistance, contact us.

NCBI guidelines of PacBio Base Modification Files

Submission to the DRA

Never submit data without the permission of the principal investigator.
Submission of research data from human subjects
For submitting data from human subjects (human data) to the databases of DDBJ center, it is submitter's responsibility to ensure that the dignity and right of human subject are protected in accordance with all applicable laws, ordinances, guidelines and policies of submitter's institution. In principle, make sure to remove any direct personal identifiers of human subjects from your data to be submitted. Before submitting human data, read the "Submission of research data from human subjects".
Submission of Patent Related Sequences
Please read "Submission of Patent Related Sequences" and "Patent Priority and Other Priority"before submitting patent related sequences.

Metadata and sequence data are required for submission to the DRA.

Please submit the assembled sequence data to the DDBJ. The DDBJ Mass Submission System (MSS)accepts the genomic or abundant sequence data generated by massively parallel sequencing platforms.

Data submission to DRA

1. Obtain a submission account

2. Create a DRA submission and upload data files

  • Create a new DRA submission ( Add DRA submission functionality to your account)
    All sequencing data in single submission will be released at the same time.
  • Upload data files by scp before submitting BioProject, BioSample, Experiment and Run

3. Submit project and sample information

BioProject

  • A description of the reseach effort
  • "Why" you sequenced your samples

BioSample

  • A description of biologically or physically unique specimens
  • "What" you sequenced

metadata can be submitted as a tab-delimited text file

4. Submit Experiment and Run

DRA Experiment

  • A description of a sample-specific sequencing library
  • "How" you performed the sequencing
  • Multiple Experiments “point” to a single Sample, but not vice-versa.

DRA Run

  • Validate data files after submitting Experiment and Run
  • All files linked to a Run are “merged” into a single SRA file format

5. Validate sequencing data files

  • Start to convert sequencing data files into a SRA file for archiving.
  • Submission passed validation step will be reviewed and accessioned.

How to submit data to the DRA

Submission to BioProject, BioSample and DRA

Submission Account

At the DNA Data Bank of Japan (DDBJ) center, BioProject, BioSample, and DRAsubmissions are managed in user's account.

According to the Submission Account Handbook, obtain a submission account and enable DRA submission in the account.

Organize data

Examples of metadata object organization is here. In single submission, only one BioProject can be registered. Multiple BioSample, Experiment, Run objects can be registered. To easily organize your data into a submission, please first consider number of BioSamples.

In this chapter, submission steps are explained by submitting a example submission "paired-end genome sequencing of three bacterial strains".

Genome sequencing data of three bacterial strains

Create a new submission

Login the D-way (https://trace.ddbj.nig.ac.jp/D-way) and the top page is displayed. Move to the DRA submission site from the “DRA” menu at the top.

Create a new submission by clicking the [New submission]. At this time, in the DRA file server (ftp-private.ddbj.nig.ac.jp), the corresponding subdirectory is created under the submitter’s home directory. Upload sequence data files to this subdirectory.

All data in a submission are released at the same time. If you want to release data at different time, please divide a submission.

If there is no reply from submitters after three months of initial contact, submissions will be cancelled.

Create a new submission

List of submission status is as follows. The DRA team reviews submission whose status is in "submission_validated" or "data_error".

List of submission status
Status Explanation
New Metadata has not been submitted.
metadata_submitted Metadata has been submitted.
data_validating Validating data files.
data_error Error occurred in data validation process.
submission_validated Metadata and data have been validated.
completed Accession numbers have been issued.
confidential Archive files has been created and submission is kept private
Public Released to public.

Upload sequence data

Sequence data files need to be uploaded before creating metadata. To create metadata first, upload some files.

Transfer sequence data by using terminal (Linux/Mac OS X)

Transfer the files by executing,

  • <Your Files> Files to be transferred. Ex: file1 file2 (file1 and file2), file* (all files whose filenames start with “file”)
  • <D-way Login ID> D-way Login ID (ex. test07)
  • <DRA Submission ID> DRA Submission ID (ex. test07-0018)
  • command example: scp strainA_1.fastq test07@ftp-private.ddbj.nig.ac.jp:~/test07-0018

Enter the passphrase set for the keys.

You can directly handle the transferred files by logging in the server. SSH login the server by executing,

Enter the passphrase set for the keys.

After logging in successfully, the following prompt is displayed.

The login environment is private for the submitter. Users other than the submitter cannot access the data. Executable commands are restricted to the following ones. Users can delete unnecessary files.

Transfer sequence data by using WinSCP (Windows)

Submission to DRA ~upload data files (Windows)~

Install and run the “WinSCP” (http://winscp.net/eng/download.php) .

Set items as below and click the [Advanced...] button.

Be sure to select the "binary mode" for file transfer. Do NOT select the "text mode".

  • File protocol: SFTP
  • Host name: ftp-private.ddbj.nig.ac.jp
  • Port number: 22
  • User name: (D-way Login ID)
  • Password: (Leave empty)
Generate private key 1

Please select the private key, which you created beforehand, from "Private key file" in "Authentication".

Generate private key 2

Last, click the [Login] button in the lower center

Login to the WinSCP

At the first time of login, a warning message is displayed; however, please select “Yes” (this message will not be displayed again). Next, enter the passphrase set for the keys.

After login successfully, a folder of your PC is displayed at left, and your private directory in the server is displayed at right. Select the files at the left window and drag & drop them into the right window to transfer the files to the server.

Transfer files by using the WinSCP

You can delete the transferred files by selecting the files and clicking the [Delete] button.

Transfer sequence data by using Cyberduck (Mac OS X)

Submission to DRA ~upload data files (Mac) ~

Download and install the Cyberduck (http://cyberduck.ch).

Run the Cyberduck and click the [Open Connection] button in the Cyberduck menu.

Open connection by using the WinSCP

Select “SFTP (SSH File Transfer Protocol)” .

SFTP in the WinSCP

Set as follows and tick off “Use Public Key Authentication” in the More Options.

  • Server: ftp-private.ddbj.nig.ac.jp
  • Port: 22
  • Username: (D-way Login ID)
  • Password: (Leave empty)
  • Add to Keychain: (Check)
Key authentication in Cyberduck

By default, the private key is created in “User’s home folder > .ssh folder (invisible in Finder) > id_rsa”.

Private key in Mac OS X

At the first time of login, a warning message is displayed; however, please select “Always” (this message will not be displayed again).

After login successfully, your private directory in the server is displayed in the window. Select the files in your PC and drag & drop them into the window to transfer the files to the server.

Transfer files by using Cyberduck

Users can ssh login ftp-private.ddbj.nig.ac.jp server by using a private key. Executable commands are restricted to the following ones. Users can delete unnecessary files.
ls cd cp mv rm more mkdir tar gzip gunzip bzip2 bunzip2 zip unzip

When sending submission files too large for e-mail attachment, submitters can upload the files for the DDBJ Mass Submission System (MSS) by using the DRA file server. After contacting the MSS team, upload the files to the /submission/[submitter ID]/mass directory.

Create metadata by using the tool

Move to the submission detail page by clicking the submission ID.

Move to the submission page

Click the [Enter / Update metadata] button to run the DRA metadata creation tool.

run the DRA metadata creation tool

When no file is uploaded to the submission directory, following message is displayed. To submit metadata, please upload data files.

To submit metadata first, upload some files (for example, empty text file).

when no data file is uploaded

The metadata are composed of the Submission, BioProject, BioSample, Experiment, Run, Analysis (optional) objects. In the metadata creation tool, enter content from left to right tabs.

Required items are marked with *.

The entered content is checked when submitters click the [Save] button or before moving to the other tab. When error messages are displayed, please revise the content.

Submission

Set the hold date within two years. Include principal investigator(s) and submitter(s) who actually submit data in the Submitter. The DRA dose not disclose the submitter information to public.

All data in a submission are released at the same time. If you want to release data at different time, please divide a submission.

Enter metadata in the tool

Study

Submit a new project by clicking [New submission], or select a project registered in the account.

Only one project can be submitted. To reference a project obtained in the other account, please contact DRA team.

Submit a new BioProject or select submitted one

To submit a BioProject, enter content from left to right tabs. The second panel is for BioProject submission. Submitter information is copied with that of DRA submission.

For BioProject metadata, please see the BioProject Handbook.

BioProject submission

To submit genome assemblies to DDBJ, a unique Locus tag prefix is necessary.

Locus tag prefix generation box will appear when [Project data type="Genome Sequencing" or "Metagenome"] AND [Capture="Whole"] AND [Objective="Sequence" or "Annotation" or "Assembly"]. Registration of a unique locus tag prefix is required for studies that result in genome assemblies.

The locus_tag prefix can contain only alpha-numeric characters and it must be at least 3-12 characters long. It should start with a letter, but numerals can be in the 2nd position or later in the string. (ex. A1C). There should be no symbols, such as -_* in the prefix. The locus_tag prefix is to be separated from the tag value by an underscore ‘_’, eg A1C_00001.

Please leave the prefix box empty, when a prefix is not necessary for WGS only submission.

Prefix is managed by NCBI. When a project is submitted, our system tries to reserve prefix to NCBI. When the prefix has already been reserved, an error message will be displayed. Please enter a different prefix and submit again.

When multiple prefixes are necessary, please contact us.

Reserve locus tag prefix

Check the content in "OVERVIEW" and submit a project by clicking [Submit BioProject].

Submit BioProject

After submitting a project, submitted one is selected in Study.

Submitted project is selected

Sample

Submit new samples by clicking [New submission], or select samples submitted in the account.

Upper limit is about 2,000 samples per submission.

To select a range of samples, first check a checkbox and click next box with pressing the "Shift". Filter samples by entering text in the upper box, and click [Select filtered BioSamples] to select all filtered samples.

To reference samples obtained in the other account, please contact us.

Submit new samples or select submitted ones

To submit a BioSample, enter content from left to right tabs. The second panel is for BioSample submission. Submitter information is copied with that of DRA submission.

Biological and technical replicates are represented by separate BioSamples. Regarding necessary number of sample for sequence submission, please see the "FAQ: How many samples do I need for my DRA submission?"

For BioSample metadata, please see the BioSample Handbook.

BioSample submission

Select a sample type in the "SAMPLE TYPE". For genome samples, minimum sample attributes are defined by MIxS.

For the Sample type, please see the BioSample Handbook.

Select a sample type

Download a template text file according to the selected sample type to enter sample attributes.

A main sample submission step is to describe samples by required, optional and user-defined attributes.

BioSample attribute list. User-defined attributes can be added at rightmost column.

BioSample submission file examples

A text file is separated by tab and can be opened and editted in spreadsheet editor (e.g. Excel®). Attribute names are in a header line. Attributes with "*" are required.

From second lines, enter one sample per line. Enter PSUB submission id in bioproject_id for project without PRJD accession numbers. For attributes without measured values, enter "missing" or "not applicable".

Download a text file for entering sample attributes

Upload the BioSample submission file by selecting the file and clicking the Continue button. The validator checks the uploaded file and feedbacks error and warning messages. Submitter can not submit the BioSample until all errors are resolved.

For the validation rules and messages, please see Validation rules page.

BioSample validation. In this example, an error for the future date in the collection_date and a warning for inconsistent countries between geo_loc_name and lat_lon of the sample "genome bacteria strain C" are displayed.

Check content in the last "OVERVIEW" and submit samples. In the "ATTRIBUTES" area, the submitted sample attribute file can be downloaded.

Submit BioSample

After submitting BioSamples, submitted BioSamples are selected in the "Sample" tab.

Submitted BioSamples are selected

Experiment

Experiment and Run as same as selected BioSamples are automatically created. Each BioSample, Experiment and Run are referenced. The Experiment and Run are automatically generated when the Experiment tab is initially displayed.

BioProject - BioSample (1) - Experiment (1) - Run (1)
  - BioSample (2) - Experiment (2) - Run (2)
  - BioSample (3) - Experiment (3) - Run (3)

In this example, 3 Experiments are created and each Experiment reference unique BioSample.

Add an Experiment by clicking the [Add new Experiment(s)] and delete an Experiment by clicking the [Delete]. Experiment referenced by Run cannot be deleted.

Experiment referencing selected BioSample, is automatically created

Experiments can be submitted in a tab-delimited text file. First save and fix Aliases (e.g., test07-0040_Experiment_0001 - 0003) by clicking the [Save]. Alias is used as a name until accession numbers are issued.

Download content into a tab-delimited text file by clicking the [Download TSV file].

Save, fix aliases and download as a tab-delimited text file

Metadata can be editted in spreadsheet software (e.g. Excel®).

If "Title" values are empty, titles are automatically constructed as "[Sequencing Instrument Model] [paired end] sequencing of [BioSample ID]" (e.g., "Illumina HiSeq 2000 paired end sequencing of SAMD00025741"). Submitters can provide user-defined text in the "Title".

Reference samples in "BioSample Used" by "SSUB BioSample Submission ID" : "Sample name" (example, SSUB003746 : Genome bacteria strain A). Spaces around ":" are ignored.

Experiment template file

Save editted content in a tab-delimited text file and select and upload it by clicking the [Upload TSV file].

Upload Experiment in a tab-delimited text file

Upload in tab-delimited text file and NOT in spreadsheet software specific format.

Run

Experiment and Run as same as selected BioSamples are automatically created. Each Run references unique Experiment.

In this example, three Runs are created and each Run references unique Experiment.

Add Run by clicking the [Add another Run(s)] and delete Run by clicking the [Delete]. Run linked to files cannot be deleted.

Save and fix Aliases

After fixing aliases by clicking the [Save], run content can be downloaded into a tab-delimited text file. To distinguish the data files for Run, enter "Run" in the leftmost "Run/Analysis" column.

Click the [Select data files for Run] and link uploaded files to Run.

Move to next site to link files to Run

All files uploaded to the submission directory are shown. Associate a file to a Run by selecting a Run alias in "Run/Analysis contains files".

Enter File type and MD5 Checksum for files. File attributes can be entered by uploading a tab-delimited text file.

Note that all data files listed in a Run will be merged into a single SRA archive file, so files from different samples or replicates should not be grouped in the same Run. Paired-end data files, conversely, MUST be listed in a single run in order for the two files to be correctly processed as paired-end.

For fastq with variable read length, select "generic_fastq" for filetype.

Enter file attributes and link files to Run

When an Analysis (optional) is unnecessary, submit metadata by clicking the [Submit/Update DRA metadata].

Submit DRA metadata

After submitting DRA metadata, start validation of data files. Click the link "Validate uploaded data files to finish this submission".

Go to data validation after submitting metadata

Analysis (optional)

Create Analysis as many as required, enter content of each Analysis. Unnecessary Analysis can be deleted by clicking the [Delete].

Click the [Select data files for Analysis] and link files to Analysis.

Enter Analysis content

Enter file attributes and associate them with Analysis. When submitting the file attributes by uploading the tab-delimited text file, to distinguish the data files for Analysis, enter "Analysis" in the leftmost "Run/Analysis" column.

Enter file attributes and link files to Analysis

Submit DRA metadata by clicking the [Submit/Update DRA metadata] and proceed to data validation process. Only md5 of analysis files are checked during validation.

Create metadata in XML files

The DRA metadata submission tool cannot describe technical reads (adapter, primer and barcode sequences). "To submit raw data contain technical reads" and "To use metadata elements in the DRA XML schema but not in the submission tool", submitters need to create or edit metadata in XML files.

    • Create a new DRA submission.

    • Prepare the Submission, Experiment, Run and Analysis (optional) XML files.

    • Un-accessioned BioProject and BioSample can be referenced in Experiment XML as follows.

    • Validate XML files against xsd by following Unix commands. You cannot upload XML with any errors.

    • Upload validated XML files. Select the Submission, Experiment, Run and Analysis (optional) XML files and upload them at once.

      Uploaded XML files are validated against SRA schema and relationship between XML objects are checked. If errors are displayed, modify and re-upload the XML files.

Upload modified XML files

Edit metadata in XML files

The DRA metadata submission tool cannot describe technical reads (adapter, primer and barcode sequences). "To submit raw data contain technical reads" and "To use metadata elements in the DRA XML schema but not in the submission tool", submitters need to create or edit metadata in XML files.

Create metadata by using the submission tool and download them in XML files.
    • Edit the downloaded XML files. For how to describe technical reads, please see the example page. For available metadata elements, please see the explanation in DRA XML schema.

    • Un-accessioned BioProject and BioSample can be referenced in Experiment XML as follows.

    • Validate XML files against xsd by following Unix commands. You cannot upload XML with any errors.

    • Upload modified XML files. Select the Submission, Experiment, Run and Analysis (optional) XML files and upload them at once.

      Uploaded XML files are validated against SRA schema and relationship between XML objects are checked. If errors are displayed, modify and re-upload the XML files.

Upload modified XML files

Validation of data files

Submitted data files are converted to the SRA files for archiving. During this conversion process, MD5 value, file format and integrity between files and metadata are validated.

In the “Data Files”, filenames in the Run and Analysis, MD5 values in the Run and Analysis and those of uploaded files, are displayed.

Click the [Validate data files] and validate uploaded data files.

Start validationo of data files

The files are validated in the following order.

FAQ: How to deal with validation errors?

MD5 Check

Consistency between the MD5 values in the metadata and of uploaded files are checked. Inconsistency in the MD5 values cause errors. When MD5 errors occur, revise metadata and re-upload files.

Data Check

Submitted data files are converted to the SRA files for archiving. During this conversion process, MD5 value, file format and integrity between files and metadata are validated. When errors occur, revise metadata and re-upload files. Validation of large files takes time.

If no errors occur, submission status become "submission_validated", and validated files are moved to separate directory.

The DRA staff review submissions with status "submission_validated". Please do not touch submissions until the DRA staff contact submitters.

Revise a submission with "data_error"

Any errors in the validation process make the submission status to "data_error". Revise metadata and/or re-upload data files after stopping the validation by clicking the [Stop validation] button. After revision, click the [Validate data files] button and start validation again.

FAQ: How to deal with validation errors?

Stop validation

Submission status is backed to "metadata_submitted". Revise and re-submit metadata or re-upload data files.

Revise submission

Accession numbers

When both the metadata and sequence data are validated (Status “submission_validated”), accession numbers with the prefix DR (Submission (DRA),Experiment (DRX),Run (DRR),Analysis (DRZ)) are assigned ("acc_issued", "complete" or "private"). Accession numbers are displayed in the “Component”.

Limited-time access to archived fastq/SRA files

To allow submitter to download and check archived fastq/SRA files, the files are copied to the following directories on the ftp-private.ddbj.nig.ac.jp server. To save disk space, the copied files are automatically deleted in one month.

Due to unexpected decrease of available disk space, copied fastq/SRA files may be deleted within one month or the copy service may be suspended. We will inform submitters on the website in advance as much as possible, however, this annoucement could be immediately before the deletion or service suspension.

  • (submitter's home)/report/dra/(DRA submission accession)/fastq/
  • (submitter's home)/report/dra/(DRA submission accession)/sra/

  • submitter/report/dra/DRA000001/fastq/DRR000001.fastq.bz2
  • submitter/report/dra/DRA000001/fastq/DRR000002.fastq.bz2
  • submitter/report/dra/DRA000001/fastq/DRR000002_1.fastq.bz2
  • submitter/report/dra/DRA000001/fastq/DRR000002_2.fastq.bz2
  • submitter/report/dra/DRA000001/sra/DRR000001.sra
  • submitter/report/dra/DRA000001/sra/DRR000002.sra

Data release

After the registered data is loaded into the database, the Status becomes “complete (private)” and the submission is kept private until one of the following conditions are met.

All data in a submission are released at the same time. If you want to release data at different time, please divide a submission.

  1. Submitter requests to release their data.
  2. Submitter has published their accession number(s) and it has been confirmed.
    We do not release the data when its accession number(s) has been published wrongly by other than the submitter.
    "publish" means to disclose accession number(s) to the public through paper, thesis, academic meeting, internet, press report etc.
  3. Specified hold-date has come.
  4. DDBJ/EMBL-Bank/GenBank records (e.g., TSA, WGS, CONetc.) citing DRA Run (DRR) accession number(s) have been made public.

Data are released without permission from submitters in the cases B, C and D. In the case D, an entire DRA submission contains cited DRR Run(s) is made public.

FAQ: How are linked BioProject/BioSample/sequence data released?

When the data is released, in a few days, the released data will become searchable at DRASearchand the data will be mirrored to the NCBI SRA.

The list of available fastq files at the DRA file server: fastqlist

Update submission

Update in each database

Change hold date

You can set the hold date for a maximum of 4 years and can change it. To change the hold date, click the [Change] button in the Hold Date and move to hold date change page.

Change the hold date

To immediately release the submission, click the [Release Now]. In the middle of the night, the submission is released, data files will be made available at ftpand metadata will be indexed by the DRA search systemin a few days.

Update metadata

Update metadata by clicking the [Enter / Update metadata] button. A part of fields are blocked from editing. After editing your metadata, please be sure to click the [Submit/Update DRA metadata] button and reflect the updates to the DRA server.

Add data files

Data files cannot be directly added to the archived Run. In another DRA submission, create new Experiment-Run objects referencing existing BioProject and BioSample records to add data files.

Similar to Run, data files cannot be directly added to the archived Analysis. To replace archived Analysis, please contact to the DRA team.

Login D-wayand create a new submission by clicking the [New submission]. Select the BioProject and BioSample IDs to which data to be added. Next, add the DRA Experiment and Run objects.

  • To add a new sample, share a BioProject ID and create a BioSample - Experiment - Run in a new DRA submission.
  • To add data files to existing sample, share BioProject and BioSample IDs and create an Experiment - Run in a new DRA submission.

Submit metadata and validate the appended data files. Accession numbers will be issued to the appended Experiment/Run objects.

The BioProject ID remains same, but different DRA submission number is assigned.

Add data files
Add data files to existing sample

To add data files to the existing DRA submission, please contact us.

Withdraw archived objects

To withdrawing archived Experiment, Run and Analysis objects, please contact us.

Supplement: MD5

MD5 (Message Digest Algorithm 5) is a hash function which calculates a hash value (MD5 number, 32-digit numbers and letters) of a given file. Because the MD5 number of the damaged file is distinct from the original one, we can check whether the transferred file is intact or not by comparing the numbers before and after the file transfer.

Obtain MD5 number (Linux)

Obtain the MD5 numbers of the files by executing,

$ md5sum file1 file2
9F6E6800CFAE7749EB6C486619254B9C file1
B636E0063E29709B6082F324C76D0911 file2

Obtain MD5 number (Mac OS X)

Obtain the MD5 numbers of the files by executing,

$ md5 file1 file2
9F6E6800CFAE7749EB6C486619254B9C file1
B636E0063E29709B6082F324C76D0911 file2

Obtain MD5 number (Windows)

Install and run the Fsum Frontend (sourceforge.net/projects/fsumfe/) .
At first, tick off "md5".

Generate md5 in the tool 1

After clicking the [+] button, open the sequence data files that you need. You can select multiple files at the same time.

Generate md5 in the tool 2

Click the [Calculate hashes] button. The MD5 numbers of the files are displayed.
By clicking the [Export] button, you can obtain the list of the MD5 numbers as a html, a csv, or a xml file.

Generate md5 in the tool 3