Nucleotide Sequence Submission
Before Nucleotide Sequence Submission
Purpose and Significance of Nucleotide Sequence Submission
In many scientific journals, it is required to assign and to describe INSDC accession number for a nucleotide sequence on the research paper. DDBJ is a member of International Nucleotide Sequence Database Collaboration (INSDC).
When you wish to publicize your sequence through DDBJ, and your sequence is acceptable for DDBJ, you can submit your sequence to DDBJ, even if you have no plan to publication of any research paper related to the sequence.
Once released, the nucleotide sequences submitted to INSDC including DDBJ are available for everyone.
If you submit nucleotide sequences to DDBJ, you can get NO priority for patent.
New Submission or Update?
When you wonder your sequence data should be newly submitted or your previous entry should be modified, do not hesitate to contact us at Contact form “Data updates / Corrections”.
Nucleotide Sequence Submission System is a tool only for new submission, so, do not use Nucleotide Sequence Submission System to send your update request. If you need to modify your previousentry, see the link for update request, and contact us at Application Form for Data Update Requests.
Rights and Duties of Submitter
For sequence submission to DDBJ, it is required for submitters to provide not only nucleotide sequence but also address of submitters and contact person, reference(s) (including primary citation), names of source organisms, function, natures of genes, and so on (collectively means “registration information” of the entry).
DDBJ releases its data in the DDBJ flat file format. Submitter(s) and contact person of the entry are indicated in the REFERENCE 1 of DDBJ flat file, in principle.
Following the progress of research, personnel change, and/or correction of some error, submitters of the entry can revise and/or update their own nucleotide sequence and registration information.
As mentioned above and
the page to explain dataflow, the nucleotide sequences released
from DDBJ are available for everyone.
When a user other than the submitter of the entry points out some error(s) in the entry, DDBJ will inform it to contact person of the entry. Since only submitters of the entry can revise and/or update the entry, it depends on the submitters of the entry if the entry is modified following user’s claim or not.
Basically, it is required for the submitters to response user’s inquiry about their own entry.
When you wish to contact to the submitter(s) of an entry of your interest, please contact us through the inquiry form with reasons briefly, then we will forward your message to the submitter(s).
So, do not block E-mails from DDBJ.
When there is a disagreement between users and submitters on registration information of an entry, DDBJ keeps to be neutral to either opinion.
Releases of Primary Citation and Sequence Data
During preparation/submission of primary citation,
DDBJ can store your registration information privately in the meantime.
If necessary, submitters have to include a hold date in their registration information.
Then, the entry with a hold date is stored privately at DDBJ.
DDBJ must maintain registration information as confidential until publication of the entry.
The hold data will be open to the public according to principle of data release.
In principle, even submitters cannot remove their own entry if the entry has already been released and/or the accession number is publicized in Journal etc.
However, DDBJ can suppress the entry in many of its services following the submitter’s request.
Required items for nucleotide sequence submission
The items, affiliation, postal address and phone number of
contact person and all names of
submitters submitters are required.
Some of those items will be indicated in REFERENCE 1 on the flat files of the entries.
After 2008, none of E-mail address, phone or fax number of the contact person is displayed without disclosing request from submitters.
- Notice: Submitter should not be only one person.
- Submitter of the entry is the person who have responsibility to the submitted data in the entry.
We accept updating requests only from the original submitter of the entry.
Basically, we strongly recommend to describe joint submitters more than two persons, e.g. at least a true worker and an adviser, to avoid lost communication in future.
In principle, we cannot accept any sequence data from a student without whose advisers in names of submitter.
Date of data release to the public
Submitters can select the status of their data, either “immediately release” or “hold until published”.
“hold date” is the date to start the distribution of the entry.
Submitter can specify the date, if necessary
If you selected “hold until published”, it is required to specify the “hold date” of your data.
Reference: Principle of “Hold-Until-Published” data release
Number of sequences
If you would like to have consecutive accession numbers, you should fix the number of entries before your submission.
Even if your sequence is identical to previously reported sequence(s),
on the condition that the sequence is independently determined,
you can submit it as a “new” entry.
Basically, DDBJ accepts all sequence data that are independently determined, even though sequences are identical each other.
However, for variation studies, DDBJ also accepts submissions of multiple identical sequences with frequency and total sample number.
DDBJ recommends to normalize research data for variation studies by appropriate set of entries; basically, the number of entries should be equal to multiplication of numbers of sequence polymorphisms and sampled populations.
See also representative submissions of identical sequences for variation studies, in detail.
Scientific paper, REFERENCE
You have to describe authors and title of the main paper for the sequence, as a primary citation. Even though you have no plan to submit any paper for you sequence, please enter authors and title, formally.
You can describe just referred papers which does not describe about the submitting sequence, if necessary.
Biological knowledge related to nucleotide sequence
Whether the species is identified or not, it is required to describe the relevant information on the biological origin of your sequence with organism name etc.
As annotation for your sequence,
feature should be described, if at all possible.
You should describe features such as protein coding sequences (CDS), rRNA, tRNA, ncRNA and so on with their location. Please also describe qualifiers, such as product, gene and so on, arbitrarily.
- Notice: protein coding sequence; CDS feature should have gene and product.
- See also the guideline of gene nomenclature at DDBJ before your submission.
In general, we accept sequences to share the data enough to believe ‘as is’ in nature.
It is incorrect to submit the sequencer output ‘as is’ without scrutinizing it.
See Nucleotide Sequences in detail.
Workflow of the data submission to DDBJ
1 Data Submission
(A) Nucleotide Sequence Submission System (NSSS)
DDBJ generally recommends you to use Nucleotide Sequence Submission System.
(B) Mass Submission System (MSS)
NSSS cannot accept the following sequence data.
The nucleotide sequence data belonging to either of the following cases should be submitted via MSS.
Please note the points other than number or length of your sequences.
a) Either of the following categories or amounts of sequence data
- EST, STS, TSA,
- See Categories for Sequence Data in detail.
- Submission with long sequences, greater than 500 kb in its length
- Complex submission containing many features for one sequence, more than 30 features
- Submission consists of large number of sequences, greater than 100 in total
b) Regardless finished or draft level, sequence data of whole-length scale replicons
- (Nuclear) genome
- Organelle genome
- Virus/Phage genome/segments
c) Sequence data to be described BioProject or BioSample in DBLINK
When you need to use DBLINK to link BioProject or BioSample, the following cases are included, but not limited to them.
- Sequence data from metagenome analyses, environmental profilings, and so on
- Sequence data of targeted genes to be linked each other
- When you are planning to submit or have submitted whole genome scale data obtained from the same samples.
- Required to submit prokaryotic 16S rRNA gene for phylogenic report
- Advanced paper submission of any other targeted gene(s)/cluster region(s)
We annotate in accordance with our rules and the international rules agreed upon by the DDBJ/ENA/GenBank consortium. In the annotation process, we may contact the Contact Person to make inquiry about the data.
3 Assignment and Notification of Accession Number
We start the curation work and inform the problems until 7 working days after we receive the submission. Accession numbers are notified to the contact person’s email address after the problems have been corrected.
4 Report of Data Releasing
We notify data release to the Contact Person by E-mail. Once the data are released, please confirm the data by one of the retrieval tools accessible from the DDBJ homepage (e.g., getentry).
If you would like to update your data, please send a request mail from Application Form for Data Update Requests with the necessary information. Please refer to Updates/Correction (after getting your accession number) for details.
5 General Information
|For general inquiry on DDBJ||Contact form|
|For data submission||Contact form|
|For updating submitted data||Application Form for Data Update Requests|
Sequence Data Transition
Following figure shows the dataflow from new submission to release and update at DDBJ.
It is now the usual practice for authors to acquire accession numbers from DDBJ(, ENA, or GenBank) to their sequences when they submit articles to journals. * You can submit your sequences to DDBJ, even if you have no plan to publication of article related to the sequences.
Nucleotide Sequence Submission
Basically, DDBJ accepts nucleotide sequence submissions via Nucleotide Sequence Submission System or Mass Submission System. DDBJ issues an accession number for each sequence after processing submitted data.
Hold until Publication
During sequence submission, the submitter can specify that the data can be made available to the public through DDBJ immediately or not. If the submitter wishes to hold the data until publication, submitter has to specify a hold date.
Release of Sequence Data
DDBJ releases the submitted data that specified to be open to the public immediately, as soon as possible after processing. The submitted entry that is specified to hold until publication will be released according to principle of data release. When the accession number of the hold entry is published, the entry will be released with no exception and no permission from the submitter. Everyone can request DDBJ to release the unpublished data whose accession numbers are on the published papers.
Availability of Released Data
At first, the data released from DDBJ are available via getentry and anonymous FTP. The data are forwarded to GenBank and ENA, and the data are available also via GenBank and ENA. The data are also expanded into services provided from DDBJ, Search and Analysis, ARSA and so on. Basically, the data released from DDBJ are available for everyone.
Citation of Released Data
Released data from DDBJ/ENA/GenBank are cited many biological databases.
Feed back for Released Data
If you have comments or questions for released data, please contact the submitters of each entry. If you can not directly contact the submitters, please contact us through the inquiry form with reasons
- Only submitters of the entry can update and modify the entry. After data madification, the submitter of the entry can also specify either of immediate release or hold until publication. However, in principle, if the entry have already been open to the public, the entry can not restore hold.
- Submitter of the entry is the person who have
responsibility to the submitted data in the entry, in principle.
Only submitter can update his/her entry. Basically, submitter takes responsibility to reply inquiry from DDBJ or DDBJ users about his/her data. In principle, submitter is indicated in the REFERENCE 1 of DDBJ flat file.
- Contact person
- “Contact person” is the person who is responsible about the
descriptions of the entry and has a duty as a
representative to correspond with DDBJ and its users.
- “Contact person” has to be one of the submitters, in principle.
- “Contact person” is the person who will make contact with DDBJ and its users about the entry, in principle. So, do not block E-mails from DDBJ.
- In principle, Contact person is indicated in the REFERENCE 1 of DDBJ flat file. When you wishes to contact to the submitter(s) of an entry of your interest, please contact us with the inquiry form with reasons briefly, then we will forward your message to the submitter(s).
- Accept date
- “Accept date” is the date that DDBJ gets the original data enough to assign accession number, in principle.
- Hold date
- “Hold date” is the date to start the distribution of the entry.
Submitter can specify the date, if necessary
Reference: Principle of “Hold-Until-Published” data release
- Working day
- DDBJ Center takes days off not only every Saturday and Sunday but also Japanese national holidays, year-end and new year holidays (from December 29th to January 3rd) and summer holidays of the Research Organization of Information and Systems (two days in August). See also DDBJ Calendar.
- Flat file
- “Flat file” is the DDBJ format for distribution.
Reference: Explanation of DDBJ flat file Format]
- “Entry” is the unit of the data of DDBJ and INSDC. The database is a
collection of entry.
Reference: Explanation of DDBJ flat file Format
- Primary entry
- “Primary entry” is publicly available in the DDBJ/ENA/GenBank
databases and the sequence of primary entry has been experimentally
determined by submitter.
Confer: TPA (Third Party Data)
- Primary citation
- “Primary citation” is the main paper for the sequence of the
In principle, primary citation is indicated in the REFERENCE 2 of DDBJ flat file.
Since REFERENCE 2 indicates the publication status of the sequence, the reference which does not describe about the submitting sequence is indicated as REFERENCE 3 or after, not as REFERENCE 2.