• Locus tag prefix registration is not working in BioProject

DDBJ Annotated/Assembled Sequences

  • Home
  • Submission
    • Before Submission
    • Web submission
    • Mass Submission
    • Data Update
  • Search
    • getentry
    • ARSA
  • Flat file
    • Feature key
    • Qualifier key
    • Organism qualifier
    • Description of Location
    • Protein Coding Sequence
    • The Genetic Codes
    • Codes Used in Sequence Description
    • Example of Submission
  • Data categories
    • Data Submission from Genome Project
    • Pseudohaplotype
    • WGS
    • Finished level genomic sequences
    • Metagenome Assembly
    • Single amplified genome
    • CON
    • GSS
    • HTG
    • Submission of environmental sequences
    • ENV
    • TLS
    • Data Submission from Transcriptome Project
    • TSA
    • EST
    • HTC
    • Third Party Data (TPA)
  • FAQ
  • Other
    • Patent
    • MGA
  • Home
  • ddbj
  • MSS - Mass Submission System

MSS - Mass Submission System

Submission of research data from human subjects

For all data from human subjects researches submitted to DDBJ, it is submitter’s responsibility to ensure that the dignity and the right of participant (human subject) is protected in accordance with all applicable laws, regulations and policies of submitter’s institute.
In principle, make sure to remove any direct personal identifiers of human subjects from your submissions.
Before submission, read “Submission of research data from human subjects”.

  • Application form for MSS
  • Validation tools for MSS data files
  • Format of submission files required for MSS
    • Sequence file
    • Annotation file
    • Samples of annotation files

What is MSS?

Mass Submission System (MSS) is the service to accept relatively large scale nucleotide sequence data (not reads) through sending text files.
We at DDBJ recommend using MSS when:

  • the submission is not applicable for Nucleotide Sequence Submission System (NSSS)
    • EST, STS, TSA, HTC, GSS, HTG, WGS, CON, TLS
    • See Categories for Sequence Data in detail.
  • the submission with long sequences.
    • greater than 500 kb in its length
  • the complex submission containing many features
    • more than 30 features
  • the submission consists of large number of sequences.
    • greater than 100

Basically, if none of the above applies to your data, DDBJ recommends using the DDBJ Nucleotide Sequence Submission System (NSSS).

If you are to submit reads from sequencers, please refer DRA; DDBJ Sequence Read Archive.

The Flow of MSS

The Flow of MSS

1. Application

Please apply for your submission through “Application form for MSS”. According to your application, DDBJ MSS sends you how to use MSS via an email.

Preparation for sequence data submission

When your data are corresponding to the following cases, please register BioProject and BioSample before using MSS.

  • whole genome-scale sequence:
    • Complete genome, nearly complete genome and draft genome (WGS and HTG).
    • Excluding sequence data consisting only of organelle, virus/phage or plasmid sequence(s).
  • transcriptome: TSA (Transcriptome Shotgun Assembly) and EST.
    • It is also required to submit non-assembled data (original reads, or the alternative) to DRA before TSA submission.
    • In the case of EST, BioProject and BioSample are not required, but recommended.
  • TLS (Targeted Locus Study)

Description of annotation

  • For whole genome-scale sequence, it is optional to describe biological features except source and assembly_gap. However, in case of highly novel species, i.e. not available any sequence data, it is required to describe feature annotation for at least one genome as a representative.
  • If you decide to submit a genome with annotation, it is required to reserve locus_tag prefix on BioSample submission.
  • For feature annotation of prokaryotic genome, we recommend to use DFAST (DDBJ Fast Annotation and Submission Tool)
  • For TSA data, it is optional (basically unnecessary) to describe biological features except source and assembly_gap.
  • In the case of EST, you can not describe any biological features except source.

2. Make submission files

Submission files required for MSS

Prepare following files required to submit your sequence data.

Sequence file
The text file that contains all nucleotide sequences in FASTA-like format.
Details : Submission file format:Sequence file.
Annotation file
The tab delimited text file that contains your data other than sequences, such as submitters, references and biological features.
Details : Submission file format:Annotation file.
AGP file(in case of
CON entries The tab delimited text file of nine columns that contains your data , such as the order and orientation of the piece entries to construct CON entry. If you can build a sequence from an AGP file, you do not need a sequence file.
Details : Submission file format: AGP file.

When you like to submit TSA, complete genome, draft genome (WGS or HTG), please submit BioProject and BioSample at first. Then, describe accession numbers of them in annotation file.

Check submission files

Before submitting to DDBJ, the files should be checked with software tools provided from DDBJ.

UME (Utilities for MSS file Error check)
You can verify the syntax, format and amino acid translation of CDS features of Sequence file and Annotation file. UME includes both Parser and transChecker.
OS : Windows, unix/macOS
Details : UME User’s Manual.
Parser
You can verify the syntax and format of Sequence file and Annotation file.
OS : Unix
Details : Parser User’s Manual.
transChecker
If your data include CDS features (protein-coding sequence), you can validate the amino acid translation.
OS : Unix
Details : transChecker User’s Manual

Download => Validation tools for MSS data files

  • Validation tools for data files do not have any function to make files for your submission. So, please make your submission files by using text editor, spreadsheet software, or some application in your PC, appropriately.
  • Syntax errors due to using undefined characters, contamination of control codes, and so on would cause a major obstacle during processing submitted data, which may result in significant delay of issuing accession numbers.
  • When you have to describe protein coding sequences for annotation of your sequence, the annotation file containing CDS feature(s) as Biological feature should be checked with UME or transChecker tool before submitting to DDBJ.
  • Before installing Validation tools for data files, see End-user license agreement.

3. Review the submission files

After validating the sequence and annotation files by using the check tools, send them to DDBJ.

  • Before preparing the entire sequence and annotation files, you can send a part of your data as a trial basis and then ask DDBJ whether the submission files are correctly created or not. (This step can be omitted).

DDBJ reviews submission files and then informs the submitter of some correction requests and/or inquiries. If there is no problem with the contents of the file, DDBJ will issue accession number(s) for your data and acknowledge the accession number(s) to the contact person by email.

File transfer

Attach to e-mail
File transfer by SCP
If the total size of files is more than 10 M bytes, we recommend you to use file transfer by SCP using public/private key pair.
Please visit DDBJ Submission Portal D-way to get D-way login account and to upload files.
For detail, see Upload sequence data or Tutorial movies.
Tutorial movies
Generate key pair(Windows / macOS)
Upload data files(Windows / macOS)

4. Distribution

If you do not set any hold-date, your data will be released immediately.
When you set a hold-date for your data, we will release your data according to Principle of “Hold-Until-Published” data release.

The registered data will be published in a flat file format defined by DDBJ. Please refer to the figure, correspondence between annotation files and flat files

DFAST for the submission of prokaryote genomes

DFAST(DDBJ Fast Annotation and Submission Tool)

DFAST DFAST is a rapid annotation pipeline service for prokaryote genomes, which also generates the annotation files that can be directly submitted to DDBJ. We strongly recommend that the submitters use DFAST for the registration of the prokaryote genomes to Annotated/Assembled Sequences database.

Registration procedure for the prokaryote genome

  1. You need D-way account which has been obtained through DFAST in order to register the prokaryote genome and the annotation into DDBJ. Registration of BioProject, BioSample and locus_tag prefix when biological feature are descriebed are required in advance.
  2. If you login to DFAST with D-way account, you can manage the jobs analyzed in DFAST. If you have not obtained the login account, see “Create a D-way account in the website” to get a new account.

How to submit the data obtained in DFAST

  1. Login to DFAST with your account. First, upload the fasta file in “job submission page”, and start the job to analyze the genome. At this stage, you can obtain a job ID. When the job is finished, click “DDBJ submission” tab on the page. The annotation and sequence files, which are needed for MSS submission, are created after you fill the form in metadata section.(*1)
  2. On the job management page, add checkmark to the job number that you would like to submit to DDBJ.
  3. Select “MSS” for the file format type, and click “DOWNLOAD” to download the submission files. Please check the meta information carefully. If you encounter a warning, check again and correct the metadata that you have filled (*2). If you would like to edit the annotation and meta data on a text file, download the files and open them by text editor.
  4. Apply for the submission through “Application form for MSS”. According to the process shown in “The Flow of MSS”, send the submission files that you have downloaded in DFAST to DDBJ.

*1 You can use DFAST and obtain the result of genome annotation without logging in. In that case, you should remember the job ID. When you login to DFAST, you can import the job into your account by the function of job history on the menu bar.

*2 The function in DFAST for checking the metadata is simple. You may be asked to correct the files by DDBJ curators after you submit the data.

Related pages

  • Submission File Format
  • Validation tools for MSS data files
  • UME User’s manual
  • Parser User’s Manual
  • transChecker User’s Manual
  • Validator error message
  • Application form for MSS