Submission of research data from human subjects
For all data from human subjects researches submitted to DDBJ, it is
submitter’s responsibility to ensure that the dignity and the right of
participant (human subject) is protected in accordance with all
applicable laws, regulations and policies of submitter’s institute.
In principle, make sure to remove any direct personal identifiers of human subjects from your submissions.
Before submission, read “Submission of research data from human subjects”.
What is MSS?
Mass Submission System (MSS) is the service to accept relatively large
scale nucleotide sequence data (not reads) through sending text files.
We at DDBJ recommend using MSS when:
- the submission is not applicable for Nucleotide Sequence Submission
→ EST, STS, TSA, HTC, GSS, HTG, WGS, CON, TLS
- the submission with long sequences.
→ greater than 500 kb in its length
- the complex submission containing many features
→ more than 30 features
- the submission consists of large number of sequences.
→ When number of sequences is greater than 1024, you have to submit two or more times via NSSS
Otherwise, DDBJ recommends using the DDBJ Nucleotide Sequence Submission System (NSSS) .
The Flow of MSS
Please apply for your submission through “Application form for MSS”. According to your application, DDBJ MSS sends you how to use MSS via an email.
Preparation for sequence data submission
- whole genome-scale sequence:
- transcriptome: TSA (Transcriptome Shotgun Assembly) and EST.
- TLS (Targeted Locus Study)
Description of annotation
- For whole genome-scale sequence, it is optional to describe biological features except source and assembly_gap. However, in case of highly novel species, i.e. not available any sequence data, it is required to describe feature annotation for at least one genome as a representative.
- If you decide to submit a genome with annotation, it is required to reserve locus_tag prefix on BioSample submission.
- For feature annotation of prokaryotic genome, we recommend to use DFAST (DDBJ Fast Annotation and Submission Tool)
- For TSA data, it is optional (basically unnecessary) to describe biological features except source and assembly_gap.
- In the case of EST, you can not describe any biological features except source.
2. Make submission files
Submission files required for MSS
Prepare following files required to submit your sequence data.
- Sequence file
- The text file that contains all nucleotide sequences in FASTA-like format.
- Details : Submission file format：Sequence file.
- Annotation file
- The tab delimited text file that contains your data other than sequences, such as submitters, references and biological features.
- Details : Submission file format：Annotation file.
- AGP file（in case of
- CON entries The tab delimited text file of nine columns that contains your data , such as the order and orientation of the piece entries to construct CON entry. If you can build a sequence from an AGP file, you do not need a sequence file.
- Details : Submission file format： AGP file.
Check submission files
Before submitting to DDBJ, the files should be checked with software tools provided from DDBJ.
- UME (Utilities for MSS file Error check)
- You can verify the syntax, format and amino acid translation of CDS features of Sequence file and Annotation file. UME includes both Parser and transChecker.
- OS : Windows, unix/macOS
- Details : UME User’s Manual.
- You can verify the syntax and format of Sequence file and Annotation file.
- OS : Unix
- Details : Parser User’s Manual.
- If your data include CDS features (protein-coding sequence), you can validate the amino acid translation.
- OS : Unix
- Details : transChecker User’s Manual
Download => Validation tools for MSS data files
- Validation tools for data files do not have any function to make files for your submission. So, please make your submission files by using text editor, spreadsheet software, or some application in your PC, appropriately.
- Syntax errors due to using undefined characters, contamination of control codes, and so on would cause a major obstacle during processing submitted data, which may result in significant delay of issuing accession numbers.
- When you have to describe protein coding sequences for annotation of your sequence, the annotation file containing CDS feature(s) as Biological feature should be checked with UME or transChecker tool before submitting to DDBJ.
- Before installing Validation tools for data files, see End-user license agreement.
3. Review the submission files
After validating the sequence and annotation files by using the check tools, send them to DDBJ.
- Before preparing the entire sequence and annotation files, you can send a part of your data as a trial basis and then ask DDBJ whether the submission files are correctly created or not. (This step can be omitted).
DDBJ reviews submission files and then informs the submitter of some correction requests and/or inquiries. If there is no problem with the contents of the file, DDBJ will issue accession number(s) for your data and acknowledge the accession number(s) to the contact person by email.
- Attach to e-mail
- File transfer by SCP
- If the total size of files is more than 10 M bytes, we recommend you to use file transfer by SCP using public/private key pair.
- Please visit DDBJ Submission Portal
D-way to get D-way login account and
to upload files.
For detail, see Upload sequence data or Tutorial movies.
- Tutorial movies
- Generate key pair（Windows / macOS）
- Upload data files（Windows / macOS）
If you do not set any hold-date, your data will be released
When you set a hold-date for your data, we will release your data according to Principle of “Hold-Until-Published” data release.
The registered data will be published in a flat file format defined by DDBJ. Please refer to the figure, correspondence between annotation files and flat files