MSS - Mass Submission System

MSS - Mass Submission System

Submission of research data from human subjects

For all data from human subjects researches submitted to DDBJ, it is submitter's responsibility to ensure that the dignity and the right of participant (human subject) is protected in accordance with all applicable laws, regulations and policies of submitter's institute.
In principle, make sure to remove any direct personal identifiers of human subjects from your submissions.
Before submission, read "Submission of research data from human subjects".

What is MSS?

Mass Submission System (MSS) is the service to accept relatively large scale nucleotide sequence data (not reads) through sending text files.
We at DDBJ recommend using MSS when:

  • the submission is not applicable for Nucleotide Sequence Submission System (NSSS)
    EST, STS, TSA, HTC, GSS, HTG, WGS, CON, TLS
  • the submission with long sequences.
    → greater than 500 kb in its length
  • the complex submission containing many features
    → more than 30 features
  • the submission consists of large number of sequences.
    → When number of sequences is greater than 1024, you have to submit two or more times via NSSS

Otherwise, DDBJ recommends using the DDBJ Nucleotide Sequence Submission System (NSSS) .

If you are to submit reads from sequencers, please refer DRA; DDBJ Sequence Read Archive.
Please confirm Categories for Sequence Data.

The Flow of MSS

1. Application

Please apply for your submission through “Application form for MSS”.According to your application, DDBJ MSS sends you how to use MSS via an email.

Preparation for sequence data submission

When your data are corresponding to the following cases, please register BioProject and BioSample before using MSS.

  • whole genome-scale sequence:
    • Complete genome, nearly complete genome and draft genome (WGS, HTG).
    • Excluding sequence data consisting only of organelle, virus/phage or plasmid sequence(s).
  • transcriptome: TSA (Transcriptome Shotgun Assembly), EST
    • It is also required to submit non-assembled data (original reads, or the alternative) to DRA before TSA submission.
    • In the case of EST, BioProject and BioSample are not required, but recommended.
  • TLS (Targeted Locus Study)

Description of annotation

  • For whole genome-scale sequence, it is optional to describe biological features except source and assembly_gap. However, in case of highly novel species, i.e. not available any sequence data, it is required to describe feature annotation for at least one genome as a representative.
  • If you decide to submit a genome with annotation, it is required to reserve locus_tag prefix on BioSample submission.
  • For feature annotation of prokaryotic genome, we recommend to use DFAST (DDBJ Fast Annotation and Submission Tool).
  • For TSA data, it is optional (basically unnecessary) to describe biological features except source and assembly_gap.
  • In the case of EST, you can not describe any biological features except source.

2. Make submission files

Submission files required for MSS

Prepare following files required to submit your sequence data.
Sequence file
The text file that contains all nucleotide sequences in FASTA-like format.
Details : Submission file format:Sequence file.
Annotation file
The tab delimited text file that contains your data other than sequences, such as submitters, references and biological features.
Details : Submission file format:Annotation file.
AGP file (in case of CON entries)
The tab delimited text file of nine columns that contains your data , such as the order and orientation of the piece entries to construct CON entry.
If you can build a sequence from an AGP file, you do not need a sequence file.
Details : Submission file format: AGP file.

Check submission files

Before submitting to DDBJ, the files should be checked with software tools provided from DDBJ.

UME (Utilities for MSS file Error check)
You can verify the syntax, format and amino acid translation of CDS features of Sequence file and Annotation file. UME includes both Parser and transChecker.
OS : Windows, Unix/macOS
Details : UME User's Manual.
Parser
You can verify the syntax and format of Sequence file and Annotation file.
OS : Unix
Details : Parser User's Manual.
transChecker
If your data include CDS features (protein-coding sequence), you can validate the amino acid translation.
OS : Unix
Details : transChecker User's Manual
  • The DDBJ check tools do not have any function of making nor editing submission files. Therefore, prepare your submission files by using software such as text editor, spreadsheet etc.
  • Do not send such submission files with errors detected by the check tools. You should send your files to DDBJ after you completely fix the errors.
  • If you describe CDS feature(s) in your annotation file, you should validate it with UME or transChecker.
  • Before installing the check tools, check End-user license agreement.

3. Review the submission files

After validating the sequence and annotation files by using the check tools, send them to DDBJ.

  • Before preparing the entire sequence and annotation files, you can send a part of your data as a trial basis and then ask DDBJ whether the submission files are correctly created or not. (This step can be omitted).

DDBJ reviews submission files and then informs the submitter of some correction requests and/or inquiries.
If there is no problem with the contents of the file, DDBJ will issue accession number(s) for your data and acknowledge the accession number(s) to the contact person by email.

File transfer

Attach to email
File transfer by SCP
If the total size of files is more than 10 M bytes, we recommend you to use file transfer by SCP using public/private key pair.
Please visit DDBJ Submission Portal D-way to get D-way login account and to upload files.
For detail, see Upload sequence data or Tutorial movies.
Tutorial movies
Generate key pair(Windows / macOS
Upload data files(Windows / macOS

4. Distribution

If you do not set any hold-date, your data will be released immediately.
When you set a hold-date for your data, DDBJ will release your data according to Principle of "Hold-Until-Published" data release.

The registered data will be published in a flat file format defined by DDBJ. Please refer to the figure, correspondence between annotation files and flat files.