Sequence Read Archive
DRA Submission
Obtain a submission account
Obtain a D-way submission account and register a public key and a center name to the account to enable DRA submission.
Create a new submission
Login D-way and move to the DRA submission site from the “DRA” menu at the top. Create a new submission by [New submission]. At the same time, in the DRA file server (ftp-private.ddbj.nig.ac.jp), the corresponding directory is created under the submitter’s home. Upload sequence data files to this directory.
Upload sequence data
Upload data files to the corresponding DRA submission directory on the file server. Regarding how to upload your data files, please see “Data upload”.
Metadata submission
The DRA metadata are composed of following objects (Examples of object organization). Reference BioProject and BioSample records registered in the other databases.
- Submission (DRA)
- BioProject
- BioSample
- Experiment (DRA)
- Run (DRA)
- Analysis (DRA, optional)
You may submit the metadata in two ways, one is “Submit metadata by the web tool” and second is “Submit metadata by the excel”. When it is difficult to submit large-scale metadata (exceeds 100 Runs) by using the web tool, it is recommended to submit the metadata by uploading XMLs generated from the excel.
How to submit metadata by using the web tool is explained here.
Move to the submission detail page by clicking the submission ID.
Click the [Enter/Update metadata] to run the DRA metadata submission web tool.
When no file is uploaded to the submission directory, following message is displayed. Then upload data files.
Enter the content in English. Required items are marked with *. The entered content is checked when submitters click the [Save] button or before moving to the other tab. When error messages are displayed, please revise the content.
The web tool supports metadata preparation by tab-delimited text (tsv) files. For examples, please see the Metadata tsv examples sheet.
Submission
Enter submission information regarding data release and submitters.
BioProject
Select a project registered in the account or newly submit a project from [New submission]. To reference a project registered in the other account, please contact the DRA team.
Please see “Project Submission” page for how to submit your project. Submitter information is copied to BioProject by that of the DRA submission.
After submitting a project, submitted project is selected in the Study tab.
BioSample
Select samples (more than one sample is common in the DRA submission) registered in the account or newly submit samples from [New submission]. To select a range of samples, first check a checkbox and click next box with pressing the “Shift”. Filter samples by entering text in the upper box, and click [Select filtered BioSamples] to select all filtered samples. To reference samples registered in the other account, please contact the DRA team.
Please see the “Sample Submission” page for how to submit your samples.
After submitting BioSamples, submitted BioSamples are selected in the Sample tab.
Experiment
Experiment and Run as same as selected BioSamples are automatically created. Each BioSample,Experiment and Run are referenced. The Experiment and Run are automatically generated when the Experiment tab is initially displayed. Newly selected samples are not reflected after the initial Experiment tab display.
Auto-generation of Experiments and Runs after selecting three BioSamples. | BioProject | - BioSample (1) | - Experiment (1) | - Run (1) | | | - BioSample (2) | - Experiment (2) | - Run (2) | | | - BioSample (3) | - Experiment (3) | - Run (3) |
Add an Experiment by clicking [Add new Experiment(s)]. Delete an Experiment by clicking [Delete]. Experiment referenced by Run cannot be deleted.
Experiments can be submitted in a tab-delimited text file. First save and fix Aliases (e.g., test07-0040_Experiment_0001-0003) by clicking [Save]. Alias is used as a name until accession numbers are issued.
Download content into a tab-delimited text file by clicking [Download TSV file].
Metadata can be editted in spreadsheet software (e.g. Excel).
If “Title” values are empty, titles are automatically constructed as “[Sequencing Instrument Model] [paired end] sequencing of [BioSample ID]” (e.g., “Illumina HiSeq 2000 paired end sequencing of SAMD00025741”). It is recommended to provide user-defined text in the “Title”.
- Reference samples in “BioSample Used” by SAMD accessions (example, SAMD00000001) or “SSUB BioSample Submission ID”
- “Sample name” (example, SSUB003746 : Genome bacteria strain A). Spaces around “:” are ignored.
Save editted content in a tab-delimited text file and select and upload it by clicking the [Upload TSV file].
Run
Experiment and Run as same as selected BioSamples are automatically created. Each Run references unique Experiment.
In this example, three Runs are created and each Run references unique Experiment.
Add Run by clicking [Add another Run(s)]. Delete Run by clicking [Delete]. Run linked to files cannot be deleted.
After fixing aliases by clicking the [Save], run content can be downloaded into a tab-delimited text file. To distinguish the data files for Run, enter “Run” in the leftmost “Run/Analysis” column.
Click [Select data files for Run] and link uploaded files to Run.
All files uploaded to the submission directory are shown. Associate a file to a Run by selecting a Run alias in “Run/Analysis contains files”.
Enter File type and MD5 Checksum for files. File attributes can be entered by uploading a tab-delimited text file.
When an Analysis (optional) is unnecessary, submit metadata by clicking the [Submit/Update DRA metadata].
After submitting DRA metadata, start validation of data files. Click the link “Validate uploaded data files to finish this submission”.
Analysis (optional)
You may submit data files related to the Run sequenicng data which do not have dedicated databases to Analysis. Analysis data are not shared with NCBI and EBI. Please check databases to be submitted in the “Submission Navigation” and “Databases and Data Submission Systems”.
Create Analysis as many as required, enter content of each Analysis. Unnecessary Analysis can be deleted by clicking [Delete].
Click [Select data files for Analysis] and link files to Analysis.
Enter file attributes and associate them with Analysis. When submitting the file attributes by uploading the tab-delimited text file, to distinguish the data files for Analysis, enter “Analysis” in the leftmost “Run/Analysis” column.
Submit DRA metadata by clicking [Submit/Update DRA metadata] and proceed to data validation process. Only MD5 of analysis files are checked during validation.
Excel-based submission
Sometimes it is difficult to submit large-scale metadata (exceeds 100 Runs) by using the web tool whose response is too slow, please submit the metadata by the excel.
Before filling in the metadata excel, you need to finish followings.
Download the DRA metadata excel and describe your metadata. Example excel
Next, “upload XMLs generated from the excel” or “send the excel to the DRA team by email attachment”.
Please upload XMLs if you are familiar with command lines.
You can submit metadata by uploading XMLs in the D-way submission page by using the metadata excel and container images. Generate metadata XMLs according to the GitHub page.
To add XML elements not covered by the web tool nor the excel such as technical reads, please refer to the metadata XML examples.
Login D-way and move to the DRA submission page. Following is an example of uploading the Submission/Experiment/Run XMLs to the DRA submission “test07-0040”.
Send us the excel by email attachment if you are not familir with command lines.
Send your metadata excel with DRA submission ID by the email attachment. DRA curator generates XMLs and upload them instead of you. After uploading the XMLs, the curator send backs the metadata in a table file. Please check the file and proceed to the data file validation step if the file is correct.
Validation of data files
The MD5 value, file format and content of data files are validated during the validation process. In the “Data Files”, filenames in the Run and Analysis, MD5 values in the Run and Analysis and those of uploaded files, are displayed.
Click [Validate data files] and validate uploaded data files.
MD5 Check
Consistency between the MD5 values in the metadata and those of uploaded files are checked. Inconsistency in the MD5 values cause errors. Calculate the MD5 values of the files at your local computer and compare them to those in the metadata. If the values are same, the file may be corrupted during file transfer, so re-upload the files. When the values in the metadata are wrong, revise the values in the metadat by clicking [Enter/Update metadata].
Data Check
The format and content of data files are validated. If no errors occur, submission status become “submission_validated”, and validated files are moved to separate directory.
The DRA staff review submissions with status “submission_validated”. Please do not touch submissions until the DRA staff contact submitters.
Response to data_error
Any errors in the validation process make the submission status to “data_error”. Please see FAQ: How to deal with validation errors? regarding how to response to errors. Clicking [Stop validation] button and the status backs to “metadata_submitted”. Then revise metadata and/or re-upload data files and start validation again by clicking [Validate data files].
Accession numbers
When the metadata and sequence data are successfully registered, accession numbers with the prefix DR are assigned. Accession numbers are displayed in the “Component” and the status becomes “completed”.