Since DDBJ Nucleotide Sequence Submission System can be only used for new submissions, you can not update submitted data with the system.
For update, see Data Updates/Corrections.
In principle, following two conditions are required to delete your sequence data;
1) The sequence data has not yet been publicized
2) The accession number of the data has not yet been published.
Please send your request to
with the following contents in clear English.
Just for information, we can restrict access to your sequence data that have been open to the public, if the conditions are right.
See also the following item.
See Categories for Sequence Data.
If you are not sure to which category your sequence data should be submitted, see followings;
If you still have any question, please contact us from contact form by selecting the item, "Data Submission".
If your annotation meets the requirements of TPA Submission Guidelines, DDBJ can accept it as TPA (Third Party Data).
Select from following two ways.
In general, we recommend to use DDBJ Nucleotide Sequence Submission System
In cases of, large number of sequences, many features, and/or long sequences, MSS is more useful.
In general, you can submit amino acid sequences by describing CDS feature for your nucleotide sequences.
However, DDBJ does not accept amino acid sequences only, i.e. without any nucleotide sequences.
In that case, please submit to UniProt, directly.
You can submit amino acid sequences to UniProt through SPIN.
Please contact to datasubs@ebi.ac.uk.
The protein_id will be automatically assigned at DDBJ during release of your nucleotide sequence with CDS feature.
DDBJ can not accept only assembled EST sequences. However, DDBJ can accept EST assembled sequences as TSA with original (i.e. before assemble) sequence data. See also Data Submission form Transcriptome Project.
When original sequence data (primary entries) are generated from Next Generation Sequencers, submit to DDBJ sequence Read Archive (DRA), from traditional sequencers, submit as EST via Mass Submission System (MSS).
Then, DDBJ can accept assembled sequences (both de novo and reference mapping) as TSA through MSS.
See Organism qualifier.
For detail, see either of following cases;
2. In case of unidentified species names, proposing a new species etc.
3. environmental sample
In cases of sequences derived by direct molecular isolation from soil, sea water, etc. i.e. a bulk environmental DNA sample by PCR with or without subsequent cloning of the product, DGGE, or other anonymous methods, see What is ENV ? – environmental samples.
For description of organism qualifier, see 3. Environmental samples.
Though frequently confused, the term, 'environmental samples', does NOT mean "wild type". If sequences are derived from isolated or cultured organisms, the sequence data are not classified into environmental samples.
For description of organism qualifier, see 4. Artificially constructed sequences.
You can use experiment or inference qualifier to describe evidence of speculation in each feature.
See Categories for Sequence Data.
Please submit raw reads generated from Next Generation Sequencers to DDBJ sequence Read Archive.
See also Data Submission from Genome Project or Data Submission from Transcriptome Project.
Please submit assembled sequences through Mass Submission System, if necessary.
Please submit raw reads of sequence-based expression data to DDBJ Sequence Read Archive.
For sequence data related to Barcode of Life project, please submit via DDBJ Nucleotide Sequence Submission System or Mass Submission System.
For chromatograms (traces), please submit to DDBJ Trace Archive
Basically, please submit every sequence that you have experimentally determined, whatever the resource of genome, mRNA or any others.
In principle, DDBJ accepts submission of experimentally determined sequence in its contiguous structure.
You can describe mRNA feature, CDS feature and so on as annotation for genomic sequence, however, descriptions of mRNA features do not mean "the mRNA sequence is experimentally determined.", in general.
If you have read mRNA sequences, please submit mRNA sequences to DDBJ. See also Acceptable data for DDBJ.
Yes you can. It ought to be required at 'instructions to authors' of most of journals to submit sequence data to DDBJ (, EMBL-Bank or GanBank) before the paper submission.
During submission of sequence data, select status for your REFERENCE as follows.
Your citations will be appeared at REFERENCE 2 or after on DDBJ flat file.
Regardless you are to publish academic paper or not, DDBJ accepts your submission of sequence data.
If you have no plan to paper publication, you have to fill following items of REFERENCE.
When you change your plan after sequence data submission, i.e. if you publish a paper, contact us from this form to send request with subject "Our paper was published".
Though there is no requirement to submit sequence data to DDBJ (, EMBL-Bank or GenBank) on the journal, we strongly recommend to submit sequence data to DDBJ for improvement of data availability for readers of your paper.
DDBJ accepts updating requests only from the original submitter of the entry.
Basically, we strongly recommend to describe joint submitters more than two persons, e.g. at least a true worker and an adviser, to avoid lost communication in future.
When sequence data are published, the data will be shared among DDBJ, EMBL-Bank and GenBank. So, it is necessary and sufficient to submit sequence data to either of three data banks only once.
If you submit sequence data to GenBank after submission of the same data to DDBJ, the data will be duplicated. So, do not submit the same data to two or more data banks.
Though some journals instruct to authors to submit sequence data to GenBank, Accession Number is commonly used by all of DDBJ, EMBL-Bank and GenBank to construct INSD.
Nucleotide sequence data related to patent applications are transferred from Japan Patent Office to DDBJ.
So, usually, you do not have to submit such sequence data to DDBJ.
However, if you apply to any other Patent Office, or if you need to publish a paper during patent application, confirm at Patent Office whether you can submit the data to DDBJ or not.
Note that when the sequence data is published from DDBJ, the data becomes a part of the public domain, as "official notice".
If you submit nucleotide sequence data to DDBJ, you can get NO priority for the data.
DDBJ takes no responsibility for any property or priority issues for patenting. For patent application, you should confirm JPO or some other Patent Offices.
DDBJ does not have any right for the gene nomenclature. Also, DDBJ does not make any official collaboration with any committee of gene nomenclature. If there is no particular incident, the descriptions related to gene nomenclature are described as provided by submitter.
Even if you name a gene during your sequence data submission to DDBJ, there is no guarantee that the gene name is accepted at research communities.
You should confirm each gene nomenclature committee, i.e. HUGO Gene Nomenclature Committee (HGNC) for human, MGI - Mouse Nomenclature for mouse, and so on.
In general, you can describe base substitutions by using variation feature with replace and note qualifiers.
In case of using DDBJ Nucleotide Sequence Submission System, select 'other' for template.
About format of feature annotation, see F01) polymorphism and variation at Example of Submission.
Though you can submit sequence data including SNP (Single Nucleotide Polymorphisms) to DDBJ, the data will not automatically reflect to dbSNP.
dbSNP is an independent database from INSDC, operated by NCBI.
For SNP data, we recommend you to submit to dbSNP.
In case of submission to DDBJ, see format of feature annotation at B13) polymorphism and variation on Example of Submission.
For instance, when the length of sequence is 199035 bp and a CDS feature is located in the range from 199001 to 100, you should describe the location of CDS feature as
join(199001..199035,1..100)
See also Description of Location in detail.
As feature annotation, we strongly recommend you to describe CDS (protein-coding sequence),rRNA,tRNA and so on.
Please inform us in detail, when you apply to Mass Submission System.
At first, please confirm whether The Genetic Code is appropriately selected or not.
Generally, if /transl_table qualifier is appropriately described with a number of the genetic code, the nucleotide sequence is automatically translated to amino acid sequence according to the genetic code.
In exceptional cases of specific codons (selenocysteine etc.) that is not followed the genetic codes, describe /transl_except qualifier, appropriately.
In cases of RNA editing,ribosomal frameshift,mitochondrial TAA stop codon, see Example of submission and describe with /exception and /translation, /ribosomal_slippage, /transl_except, respectively.
In case of rare initiation of translation, staring with an amino acid other than methionine, describe the location of CDS feature with starting from "<", operatively indicating 5'end not complete. And describe brief explanation about the translation mechanism in /note qualifier.
See Contact person.
If your affiliation was changed after sequencing or when you belong two or more institutes, please describe the most responsible one as a representative.
In principle, accession numbers will be acknowledged to contact person via e-mail (with Subject: "[DDBJ] Assigned Accession No.") within 5 working days (i.e. except holidays) after DDBJ accepting submitted data.
See DDBJ Calendar about working days of DDBJ Center.
When you do not receive accession numbers or inquiry about your data from DDBJ within 5 working days after your data submission, please contact us from contact form by selecting the item, "Data Submission".
To make sure, Do not block E-mails from DDBJ.
In case of using DDBJ Nucleotide Sequence Submission System, please confirm if you have received a mail from DDBJ with "DDBJ: Web submission completed" in its subject or not. This mail is automatically sent to contact person, when DDBJ accepts your sequence data via Nucleotide Sequence Submission System.
See Acceptable data for DDBJ.
If you have any question, please contact us from contact form.
If you have specific ID for your data other than accession number, such as EntryID or any, contact us from contact form by selecting the item, "Updating Submitted Data", with ID and E-mail address of contact person.
In case of uncertain, tell us following items as far as you know, then we will search your data.
When we can not find your data, we will ask you to submit your data as new one.
In general, see the rule of the journal (i.e. Instructions to Authors), and follow it.
At INSDC, we recommend you to describe accession numbers in the footnote on the title page of your paper as following;
Note: Nucleotide sequence data reported are available in the DDBJ/EMBL/GenBank databases under the accession number(s)----'.
It indicates that this data is directly submitted from the submitter. The term is the antonym to "journal scan".
REFERENCE 1 is the information of submitter(s), not general reference.
So, do not describe "Direct Submission" in the title for literature in REFERENCE 2 or after.
In general, you can find accept date in JOURNAL line of REFERENCE 1 on DDBJ flat file.
Please note that some old data do not have the description of accept date.
See "Why is the hold-date required?". Please specify the date.
Though DDBJ does not restrict the date, we strongly recommend to specify the date within two years.
If not specified, the data will be published, immediately.
After data submission, you can change the hold date as needed.
Contact us from this form by selecting "Change the hold-date" in [Subject].
If you set the hold date for your data, the data will be published according to Principle of “Hold-Until-Published” data release.
After setting to publish the data, the mail with "[DDBJ] Publicized your data" in its subject is sent to contact person.
So, Do not block E-mails from DDBJ.
If the information of contact person is old or invalid, we may be unable to acknowledge publication of your data or any other important announcement.
Contact us from this form to send request by selecting the subject, "Change the contact person, belonging, institution, etc..".
For once published entries, we can restrict to use the data, if the conditions are right.
In case of the restriction, DDBJ will not include the data in its periodical release and remove from all services under DDBJ.
However, the data is permanently available on getentry queried with its accession number.
# The rule is not applied, when the data is published by any mistake of INSD.
This policy is written in the document prepared by International Advisory Committee of INSD on Overview of International Nucleotide Sequence Databases Policies as follows;
All database records submitted to the INSD will remain an entry accessible as part of the scientific record. Corrections of errors and update of the records by authors are welcome and erroneous records may be removed from the next database release, but all will remain permanently accessible by accession number.
In addition, there are a number of databases constructed by occasionally using data from INSD.
DDBJ can not support to delete data from such databases. If you are to delete the cited data on other databases, you have to contact managing staff of each database, directly.
See Definition of Feature Key and Feature Table Definition.
When you can not find any accommodated feature, use misc_feature and enter information in value of /note qualifier.
For instance, since DDBJ is a database for nucleotide sequences, we do not prepare any specific item for amino acid sequence motifs.
However, you can describe such kind of information by using misc_feature with /note qualifier.
The amino acid sequence for CDS feature will be automatically translated from nucleotide sequence according to location and other items, and reflected into /translation qualifier. So, in general, do not enter it.
The rule to translate nucleotide sequence into amino acid sequence is specified in accordance with agreements of International Nucleotide Sequence Database Collaboration.
The codon table using a CDS feature is specified in the value of /transl_table qualifier as a number of The Genetic Codes.
There are three points frequently misunderstood.
There are some exceptional cases, represented by RNA editing and so on.
Nucleotide Sequence Submission System is an interactive application to enter all of items required for your submission on step by step basis.
To use Mass Submission System (MSS), submitters have to make submission files by themselves. So, DDBJ will review and consult for submitters on the process of making files.
Some submitters use Nucleotide Sequence Submission System to submit a lot of sequences, while some submitter use MSS to submit a few sequences.
Based on above information, select either of them as needed.
There is no limit of the number of entries to use Mass Submission System.
You can use it not only for many sequences but also for one long sequence with many features (i.e. complete genome with annotation).
See Mass Submission System
See Before your nucleotide sequence submission.
You can use VecScreen.
At 6. template, a) select 'other' and click [Input annotation] or b) Click [Upload annotation file].
Then, you can describe two or more features for each sequence as follows.
In case of a), see 7.Annotation (when “other” was selected at template).
In case of b), see 7. Annotation: upload an annotation file.
Since DEFINITION is constructed by DDBJ according to rules, there is no field to enter it.
Click [Select Qualifier], check qualifiers in the dialog as needed and click [Save] button.
Then, you can find input fields for qualifiers on 7.Annotation.
Related to this issue, in case of selecting "other" on 6. template, you have to specify some features other than source. So, click [Add feature] and select some feature on the list.
These errors mean amino acid translation for CDS (protein coding sequence) feature is not appropriate in the 5' or 3' end, respectively.
When the CDS feature is not complete (i.e. partial) at 5' and/or 3' ends, its location is required to include flag for 'not complete'.
According to rules on Description of Location, partial sequences should be appropriately specified with flags for 5' end not complete, "<", and/or for 3' end not complete, ">" on its feature location.
| location | condition |
|---|---|
| <1..295 | [not start with initiation codon] and [stop with termination codon] |
| 1.. >295 | [start with initiation codon] and [not stop with termination codon] |
| <1.. >295 | [not start with initiation codon] and [not stop with termination codon] |
This error message is outputted, because you select /translation for CDS feature by dialog of [Select Qualifier] button.
Generally, since /translation qualifier is automatically created according to items under CDS feature, do not enter any amino acid sequence.
So, you can fix the error by removing /translation qualifier.
For your information, /translation qualifier is required only in case describing with /exception qualifier.
Typically, /exception qualifier indicates "RNA editing" is occurred on mRNA. In that case, conceptual amino acid translation of genome sequence is different from protein product of real mRNA molecules.
The error is occurred because you do not enter correct genetic code.
See 7.Annotation -- How to input an organism name.
To specify genetic code, enter digit in the input field.
The value will be automatically applied for /transl_table qualifier for CDS feature.
For your information, in case of a previously reported organism, the genetic code is automatically specified, by describing Scientific name (/organism qualifier) and /organelle qualifier. If your sequence is derived from an organelle other than nuclei, you have to specify /organelle qualifier to set the genetic code for mitochondrion, chloroplast or some, appropriately.
At first, please save the URL of the page of Nucleotide Sequence Submission System.
Then, clear cache of the browser and reopen the saved URL.
It is likely to resolve the condition.
If not resolved, confirm if you use either of browsers Firefox or Chrome that we recommend to use.
If not, change to Firefox or Chrome and reopen the URL.
If you still have any problem, please contact us with followings from contact form by selecting the item, "DDBJ Nucleotide Sequence Submission System".
You may enter incorrect values for Location and/or /codon_start of CDS feature.
If the value of /codon_start is either of "2" or "3", the location of CDS feature should be 5' end not complete.
See Description of Location and modify the location with flag for "5' end not complete", for an example, from "1..300" to "<1..300".
When the CDS feature is started with an initiation codon, correct /codon_start with "1".
You can modify your inputs on any pages before finishing your submission.
You can go back to each page by clicking either of 1.Contact person, 2.Hold date, 3.Submitter, 4.Reference, 5.Sequence, 6.Template or 7.Annotation in progress bar at upside of pages.

Confirm following points.
If you still have any problem, contact us from contact form by selecting the item, "Data Submission".
On 5.Sequence, input all of your sequences in multi-FASTA format. We will assign consequent accession numbers for your sequences.
Moving to 7.Annotation, you can enter feature annotation for each sequence at once.
In general, see How to describe CDS feature, when termination codon is found in the range.
You can also see Protein Coding Sequence; CDS feature to describe CDS feature.
Following items are case study for the error.
1. Did you correctly specify /codon_start qualifier to indicate reading frame of the CDS feature?
Select 1, 2 or 3, appropriately.
2. Have you specify correct genetic code for /transl_table qualifier?
See followings and specify genetic code, appropriately.
3. Are there really some stop codons in the range of CDS feature because of frame shift, nonsense mutation, or some other reason?
3-1. In case of pseudogene
Click [Select Qualifier] button beside CDS and add /pseudogene qualifier. Then, you can specify /pseudogene qualifier with its controlled vocabularies.
See also b) considered pseudogene in detail.
3-2. In cases of unsure whether it is pseudogene or not, the reason of stop codon is uncertain, or on the process of diversity increasing related to acquired immunity, describe misc_feature, not CDS feature.
See a) Putative nonsense mutation, frameshift caused by uncertain reason, or on the process of diversity increasing related to acquired immunity for IgG etc. in detail.
In other cases. There are some possibilities to output this error because of ribosomal slippage, RNA editing, exceptional amino acid usage, transpon insertion and so on.
You can confirm amino acid sequences for CDS features as follows.
The function to confirm amino acid sequences will be applied on DDBJ Nucleotide Sequence Submission System.
Though you have not yet enter either /organism or /mol_type on annotation table, you click [Confirm] button.
You must fill mandatory items of annotation (feature, location, qualifier) before clicking [Confirm] button.
On 7.Annotation, click [Select Qualifier] button beside 'source', and select qualifiers as needed. Then, click [Edit] button beside entry name and input /organism and others. Note that it is required to input at least one feature other than source.
See also 7.Annotation – How to input an organism name.
INSD; International Nucleotide Sequence Database are composed of DDBJ, ENA and NCBI, and collect experimentally determined nucleotide sequence data.
A unique accession number issued by INSD for each submitted sequence data is defined as the INSD accession number.
On DDBJ flat file, the accession number is described in ACCESSION line.
If multiple entries are united to an entry, or if an entry is extensively modified after the submission, the responsible data banks may assign a new accession number to it. In these cases, the new accession number is called the primary accession number, and the old accession number(s) is/are called the secondary accession number(s).
In the flat file, the primary accession number is indicated first, then the secondary accession number(s) follows.
example ACCESSION AB999999 AB888888 AB777777
You can find the same updated entry with both the primary and the secondary accession numbers, in general.
However, if the old entry with secondary accession number has previously been open to the public, the old one is not removed. So, you can find the old record by getentry.
At DDBJ, we do not provide any official services to accept SNV, CGH analysis, microarray, variation and so on.
We assume that you can submit your data at NCBI or EBI.
Please submit to some of followings. If you have any questions, please ask each database, directly.
NCBI: Gene Expression Omnibus (GEO), dbSNP, dbVar, ClinVar
EBI: ArrayExpress, European Variation Archive (EVA), Database of Genomic Variants archive (DGVa)
If your data are derived from human subjects, it may be required to submit your data to either of following controlled access databases.
The database of Genotypes and Phenotypes (dbGaP)
European Genome-phenome Archive (EGA)
Japanese Genotype-phenotype Archive (JGA)
For DDBJ nucleotide sequence submission system (NSSS), you must input nucleotide sequence(s) in FASTA format (for 1 sequence only) or in multi-FASTA format (for 2 or more sequences).
Related page: Format of the nucleotide sequences that you can paste or upload
You must insert the end flag (//) at the end of each sequence when you use MSS for the submission. Please see the page, "How to Make Sequence File".
See also Wikipedia, FASTA format