DDBJ Annotated/Assembled Sequences
MSS - Mass Submission System
Submission of research data from human subjects
For all data from human subjects researches submitted to DDBJ, it is
submitter’s responsibility to ensure that the dignity and the right of
participant (human subject) is protected in accordance with all
applicable laws, regulations and policies of submitter’s institute.
In principle, make sure to remove any direct personal identifiers of
human subjects from your submissions.
Before submission, read “Submission of research data from human subjects”.
- Application form for MSS
- Validation tools for MSS data files
- Format of submission files required for MSS
What is MSS?
Mass Submission System (MSS) is the service to accept relatively large
scale nucleotide sequence data (not reads) through sending text files.
We at DDBJ recommend using MSS when:
- the submission is not applicable for Nucleotide Sequence Submission System (NSSS)
- the submission with long sequences.
- greater than 500 kb in its length
- the complex submission containing many features
- more than 30 features
- the submission consists of large number of sequences.
- greater than 100
Basically, if none of the above applies to your data, DDBJ recommends using the DDBJ Nucleotide Sequence Submission System (NSSS).
If you are to submit reads from sequencers, please refer DRA; DDBJ Sequence Read Archive.
The Flow of MSS
1. Application
Please apply for your submission through “Application form for MSS”. According to your application, DDBJ MSS sends you how to use MSS via an email.
Preparation for sequence data submission
When your data are corresponding to the following cases, please register BioProject and BioSample before using MSS.
- whole genome-scale sequence:
- transcriptome: TSA (Transcriptome Shotgun Assembly) and EST.
- It is also required to submit non-assembled data (original reads, or the alternative) to DRA before TSA submission.
- In the case of EST, BioProject and BioSample are not required, but recommended.
- TLS (Targeted Locus Study)
Description of annotation
- For whole genome-scale sequence, it is optional to describe biological features except source and assembly_gap. However, in case of highly novel species, i.e. not available any sequence data, it is required to describe feature annotation for at least one genome as a representative.
- If you decide to submit a genome with annotation, it is required to reserve locus_tag prefix on BioSample submission.
- For feature annotation of prokaryotic genome, we recommend to use DFAST (DDBJ Fast Annotation and Submission Tool)
- For TSA data, it is optional (basically unnecessary) to describe biological features except source and assembly_gap.
- In the case of EST, you can not describe any biological features except source.
2. Make submission files
Submission files required for MSS
Prepare following files required to submit your sequence data.
- Sequence file
- The text file that contains all nucleotide sequences in FASTA-like format.
- Details : Submission file format:Sequence file.
- Annotation file
- The tab delimited text file that contains your data other than sequences, such as submitters, references and biological features.
- Details : Submission file format:Annotation file.
- AGP file(in case of
- CON entries The tab delimited text file of nine columns that contains your data , such as the order and orientation of the piece entries to construct CON entry. If you can build a sequence from an AGP file, you do not need a sequence file.
- Details : Submission file format: AGP file.
When you like to submit TSA, complete genome, draft genome (WGS or HTG), please submit BioProject and BioSample at first. Then, describe accession numbers of them in annotation file.
Check submission files
Before submitting to DDBJ, the files should be checked with software tools provided from DDBJ.
- UME (Utilities for MSS file Error check)
- You can verify the syntax, format and amino acid translation of CDS features of Sequence file and Annotation file. UME includes both Parser and transChecker.
- OS : Windows, unix/macOS
- Details : UME User’s Manual.
- Parser
- You can verify the syntax and format of Sequence file and Annotation file.
- OS : Unix
- Details : Parser User’s Manual.
- transChecker
- If your data include CDS features (protein-coding sequence), you can validate the amino acid translation.
- OS : Unix
- Details : transChecker User’s Manual
Download => Validation tools for MSS data files
- Validation tools for data files do not have any function to make files for your submission. So, please make your submission files by using text editor, spreadsheet software, or some application in your PC, appropriately.
- Syntax errors due to using undefined characters, contamination of control codes, and so on would cause a major obstacle during processing submitted data, which may result in significant delay of issuing accession numbers.
- When you have to describe protein coding sequences for annotation of your sequence, the annotation file containing CDS feature(s) as Biological feature should be checked with UME or transChecker tool before submitting to DDBJ.
- Before installing Validation tools for data files, see End-user license agreement.
3. Review the submission files
After validating the sequence and annotation files by using the check tools, send them to DDBJ.
- Before preparing the entire sequence and annotation files, you can send a part of your data as a trial basis and then ask DDBJ whether the submission files are correctly created or not. (This step can be omitted).
DDBJ reviews submission files and then informs the submitter of some correction requests and/or inquiries. If there is no problem with the contents of the file, DDBJ will issue accession number(s) for your data and acknowledge the accession number(s) to the contact person by email.
File transfer
- Attach to e-mail
- File transfer by SCP
- If the total size of files is more than 10 M bytes, we recommend you to use file transfer by SCP using public/private key pair.
- Please visit DDBJ Submission Portal
D-way to get D-way login account and
to upload files.
For detail, see Upload sequence data or Tutorial movies. - Tutorial movies
- Generate key pair(Windows / macOS)
- Upload data files(Windows / macOS)
4. Distribution
If you do not set any hold-date, your data will be released
immediately.
When you set a hold-date for your
data, we will release your data according to Principle of
“Hold-Until-Published” data release.
The registered data will be published in a flat file format defined by DDBJ. Please refer to the figure, correspondence between annotation files and flat files
DFAST for the submission of prokaryote genomes
DFAST(DDBJ Fast Annotation and Submission Tool)
DFAST DFAST is a rapid annotation pipeline service for prokaryote genomes, which also generates the annotation files that can be directly submitted to DDBJ. We strongly recommend that the submitters use DFAST for the registration of the prokaryote genomes to Annotated/Assembled Sequences database.
Registration procedure for the prokaryote genome
- You need D-way account which has been obtained through DFAST in order to register the prokaryote genome and the annotation into DDBJ. Registration of BioProject, BioSample and locus_tag prefix when biological feature are descriebed are required in advance.
- If you login to DFAST with D-way account, you can manage the jobs analyzed in DFAST. If you have not obtained the login account, see “Create a D-way account in the website” to get a new account.
How to submit the data obtained in DFAST
- Login to DFAST with your account. First, upload the fasta file in “job submission page”, and start the job to analyze the genome. At this stage, you can obtain a job ID. When the job is finished, click “DDBJ submission” tab on the page. The annotation and sequence files, which are needed for MSS submission, are created after you fill the form in metadata section.(*1)
- On the job management page, add checkmark to the job number that you would like to submit to DDBJ.
- Select “MSS” for the file format type, and click “DOWNLOAD” to download the submission files. Please check the meta information carefully. If you encounter a warning, check again and correct the metadata that you have filled (*2). If you would like to edit the annotation and meta data on a text file, download the files and open them by text editor.
- Apply for the submission through “Application form for MSS”. According to the process shown in “The Flow of MSS”, send the submission files that you have downloaded in DFAST to DDBJ.
*1 You can use DFAST and obtain the result of genome annotation without logging in. In that case, you should remember the job ID. When you login to DFAST, you can import the job into your account by the function of job history on the menu bar.
*2 The function in DFAST for checking the metadata is simple. You may be asked to correct the files by DDBJ curators after you submit the data.