In principle, DDBJ only accepts updating requests from the original submitter of the entry except reference update. Therefore, if you are not the submitter you will need authorization from the submitter before making requests for the entry.
DDBJ can forward your comments and suggestions to the submitter.
Please contact us via Inquiry to the sequence submitters (submitted to DDBJ).
Since DDBJ Nucleotide Sequence Submission System can be only used for new submissions, you can not update submitted data with the system.
For update, see Data Updates/Corrections.
Please contact us from contact form by selecting the item, "Updating Submitted Data" with following items;
We will reply with contents of your data.
DDBJ does not accept any reservation for updating sequence data.
Therefore, in case of updating published data, the data will be immediately re-distributed after update.
Please select either of following ways.
In principle, we can delete only the sequence data that have not yet been publicized.
See the relevant item in Data Updates/Corrections and mail to us with data.
Just for information, we can restrict access to your sequence data that have been publicized, if the conditions are right. See also the following item.
If you are not sure to which category your sequence data should be submitted, see followings;
If you still have any question, please contact us from contact form by selecting the item, "Data Submission".
Select from following two ways.
In general, we recommend to use DDBJ Nucleotide Sequence Submission System
In cases of, large number of sequences, many features, and/or long sequences, MSS is more useful.
In general, you can submit amino acid sequences by describing CDS feature for your nucleotide sequences.
However, DDBJ does not accept amino acid sequences only, i.e. without any nucleotide sequences.
In that case, please submit to UniProt, directly.
You can submit amino acid sequences to UniProt through SPIN.
Please contact to firstname.lastname@example.org.
DDBJ can not accept only assembled EST sequences. However, DDBJ can accept EST assembled sequences as TSA with original (i.e. before assemble) sequence data. See also Data Submission form Transcriptome Project.
When original sequence data (primary entries) are generated from Next Generation Sequencers, submit to DDBJ sequence Read Archive (DRA), from traditional sequencers, submit as EST via Mass Submission System (MSS).
Then, DDBJ can accept assembled sequences (both de novo and reference mapping) as TSA through MSS.
See Organism qualifier.
For detail, see either of following cases;
2. In case of unidentified species names, proposing a new species etc.
3. environmental sample
In cases of sequences derived by direct molecular isolation from soil, sea water, etc. i.e. a bulk environmental DNA sample by PCR with or without subsequent cloning of the product, DGGE, or other anonymous methods, see What is ENV ? – environmental samples.
For description of organism qualifier, see 3. Environmental samples.
Though frequently confused, the term, 'environmental samples', does NOT mean "wild type". If sequences are derived from isolated or cultured organisms, the sequence data are not classified into environmental samples.
See Categories for Sequence Data.
Please submit raw reads generated from Next Generation Sequencers to DDBJ sequence Read Archive.
See also Data Submission from Genome Project or Data Submission from Transcriptome Project.
Please submit assembled sequences through Mass Submission System, if necessary.
Please submit raw reads of sequence-based expression data to DDBJ Sequence Read Archive.
Basically, please submit every sequence that you have experimentally determined, whatever the resource of genome, mRNA or any others.
In principle, DDBJ accepts submission of experimentally determined sequence in its contiguous structure.
You can describe mRNA feature, CDS feature and so on as annotation for genomic sequence, however, descriptions of mRNA features do not mean "the mRNA sequence is experimentally determined.", in general.
If you have read mRNA sequences, please submit mRNA sequences to DDBJ. See also Acceptable data for DDBJ.
Yes you can. It ought to be required at 'instructions to authors' of most of journals to submit sequence data to DDBJ (, EMBL-Bank or GanBank) before the paper submission.
During submission of sequence data, select status for your REFERENCE as follows.
Regardless you are to publish academic paper or not, DDBJ accepts your submission of sequence data.
If you have no plan to paper publication, you have to fill following items of REFERENCE.
When you change your plan after sequence data submission, i.e. if you publish a paper, see When our paper was published, what should I do? on Data Updates/Corrections and contact us from this form to send request with subject "Our paper was published".
Though there is no requirement to submit sequence data to DDBJ (, EMBL-Bank or GenBank) on the journal, we strongly recommend to submit sequence data to DDBJ for improvement of data availability for readers of your paper.
DDBJ accepts updating requests only from the original submitter of the entry.
Basically, we strongly recommend to describe joint submitters more than two persons, e.g. at least a true worker and an adviser, to avoid lost communication in future.
When sequence data are published, the data will be shared among DDBJ, EMBL-Bank and GenBank. So, it is necessary and sufficient to submit sequence data to either of three data banks only once.
If you submit sequence data to GenBank after submission of the same data to DDBJ, the data will be duplicated. So, do not submit the same data to two or more data banks.
Nucleotide sequence data related to patent applications are transferred from Japan Patent Office to DDBJ.
So, usually, you do not have to submit such sequence data to DDBJ.
However, if you apply to any other Patent Office, or if you need to publish a paper during patent application, confirm at Patent Office whether you can submit the data to DDBJ or not.
Note that when the sequence data is published from DDBJ, the data becomes a part of the public domain, as "official notice".
If you submit nucleotide sequence data to DDBJ, you can get NO priority for the data.
DDBJ takes no responsibility for any property or priority issues for patenting. For patent application, you should confirm JPO or some other Patent Offices.
DDBJ does not have any right for the gene nomenclature. Also, DDBJ does not make any official collaboration with any committee of gene nomenclature. If there is no particular incident, the descriptions related to gene nomenclature are described as provided by submitter.
Even if you name a gene during your sequence data submission to DDBJ, there is no guarantee that the gene name is accepted at research communities.
In general, you can describe base substitutions by using variation feature with replace and note qualifiers.
In case of using DDBJ Nucleotide Sequence Submission System, select 'other' for template.
About format of feature annotation, see F01) polymorphism and variation at Example of Submission.
Though you can submit sequence data including SNP (Single Nucleotide Polymorphisms) to DDBJ, the data will not automatically reflect to dbSNP.
dbSNP is an independent database from INSDC, operated by NCBI.
For SNP data, we recommend you to submit to dbSNP.
For instance, when the length of sequence is 199035 bp and a CDS feature is located in the range from 199001 to 100, you should describe the location of CDS feature as
See also Description of Location in detail.
As feature annotation, we strongly recommend you to describe CDS (protein-coding sequence)，rRNA，tRNA and so on.
Please inform us in detail, when you apply to Mass Submission System.
When it is difficult for you to annotate your genome sequence, we recommend to use some public services, for example, MiGAP.
At first, please confirm whether The Genetic Code is appropriately selected or not.
Generally, if /transl_table qualifier is appropriately described with a number of the genetic code, the nucleotide sequence is automatically translated to amino acid sequence according to the genetic code.
In exceptional cases of specific codons (selenocysteine etc.) that is not followed the genetic codes, describe /transl_except qualifier, appropriately.
In case of rare initiation of translation, staring with an amino acid other than methionine, describe the location of CDS feature with starting from "<", operatively indicating 5'end not complete. And describe brief explanation about the translation mechanism in /note qualifier.
See Contact person.
If your affiliation was changed after sequencing or when you belong two or more institutes, please describe the most responsible one as a representative.
Since 2007, we have removed E-mail addresses and phone numbers from sequence data.
If you can find a related paper at REFERENCE on DDBJ flat file, contact information would be available on the paper.
When you wishes to contact to the submitter(s) of an entry of your interest, please contact us via Inquiry to the sequence submitters (submitted to DDBJ) with reasons briefly, then we will forward your message to the submitter(s).
In principle, accession numbers will be acknowledged to contact person via e-mail (with Subject: "[DDBJ] Assigned Accession No.") within 5 working days (i.e. except holidays) after DDBJ accepting submitted data.
See DDBJ Calendar about working days of DDBJ Center.
When you do not receive accession numbers or inquiry about your data from DDBJ within 5 working days after your data submission, please contact us from contact form by selecting the item, "Data Submission".
To make sure, Do not block E-mails from DDBJ.
In case of using DDBJ Nucleotide Sequence Submission System, please confirm if you have received a mail from DDBJ with "DDBJ: Web submission completed" in its subject or not. This mail is automatically sent to contact person, when DDBJ accepts your sequence data via Nucleotide Sequence Submission System.
If you have specific ID for your data other than accession number, such as EntryID or any, contact us from contact form by selecting the item, "Updating Submitted Data", with ID and E-mail address of contact person.
In case of uncertain, tell us following items as far as you know, then we will search your data.
When we can not find your data, we will ask you to submit your data as new one.
In general, see the rule of the journal (i.e. Instructions to Authors), and follow it.
At INSDC, we recommend you to describe accession numbers in the footnote on the title page of your paper as following;
Note: Nucleotide sequence data reported are available in the DDBJ/EMBL/GenBank databases under the accession number(s)----'.
It indicates that this data is directly submitted from the submitter. The term is the antonym to "journal scan".
REFERENCE 1 is the information of submitter(s), not general reference.
So, do not describe "Direct Submission" in the title for literature in REFERENCE 2 or after.
The hold date is required if you hold the data until your paper publication. So, please specify the date.
Though DDBJ does not restrict the date, we strongly recommend to specify the date within two years.
If not specified, the data will be published, immediately.
We will check and support it.
Please contact us from contact form by selecting the item, "Updating Submitted Data" with accession numbers.
See Principle of "Hold-Until-Published" data release.
If your data was submitted before 1998, it might be still unpublished after hold date.
If the information of contact person is old or invalid, we may be unable to acknowledge publication of the data or any other important announcement.
See the relevant item in Data Updates/Corrections and contact us from this form to send request by selecting the subject, "Change the contact person, belonging, institution, etc..".
If you set the hold date for your data, the data will be published according to Principle of “Hold-Until-Published” data release.
After setting to publish the data, the mail with "[DDBJ] Publicized your data" in its subject is sent to contact person.
So, Do not block E-mails from DDBJ.
If the information of contact person is old or invalid, we may be unable to acknowledge publication of your data or any other important announcement.
See the relevant item in Data Updates/Corrections and contact us from this form to send request by selecting the subject, "Change the contact person, belonging, institution, etc..".
For once published entries, we can restrict to use the data, if the conditions are right.
In case of the restriction, DDBJ will not include the data in its periodical release and remove from all services under DDBJ.
However, the data is permanently available on getentry queried with its accession number.
# The rule is not applied, when the data is published by any mistake of INSD.
This policy is written in the document prepared by International Advisory Committee of INSD on Overview of International Nucleotide Sequence Databases Policies as follows;
All database records submitted to the INSD will remain an entry accessible as part of the scientific record. Corrections of errors and update of the records by authors are welcome and erroneous records may be removed from the next database release, but all will remain permanently accessible by accession number.
In addition, there are a number of databases constructed by occasionally using data from INSD.
DDBJ can not support to delete data from such databases. If you are to delete the cited data on other databases, you have to contact managing staff of each database, directly.
Please confirm if the ID on the paper is Accession Number Assigned by INSD or not.
If accession numbers on the paper, please contact us from contact form by selecting the item, "Updating Submitted Data" with following items.
It is getentry.
"getentry" is a system for data retrieval by accession numbers, etc.
In general, the sequence data will be available on getentry from the day after processed to release.
DDBJ is functioning as one of the international nucleotide sequence databases, including EMBL-Bank/EBI in Europe and GenBank/NCBI in the USA as the two other members.
When DDBJ releases the submitted data, EMBL-Bank and GenBank will load the data into their own services, respectively.
See Sequence Data Transition.
Note that the data are converted into EMBL-Bank or GenBank format.
In general, the data released from EMBL-Bank or GenBank are loaded into DDBJ services and published from DDBJ within their released date.
The data released from DDBJ are loaded into ENA/EBI and GenBank and published from them within a few days.
However, the data release processes at all three databases may be delayed, because of system maintenance, troubles on the network, or any other reasons. So, we can not specify the temporal differences among them.
The amino acid sequence for CDS feature will be automatically translated from nucleotide sequence according to location and other items, and reflected into /translation qualifier. So, in general, do not enter it.
The rule to translate nucleotide sequence into amino acid sequence is specified in accordance with agreements of International Nucleotide Sequence Database Collaboration.
The codon table using a CDS feature is specified in the value of /transl_table qualifier as a number of The Genetic Codes.
There are three points frequently misunderstood.
Nucleotide Sequence Submission System is an interactive application to enter all of items required for your submission on step by step basis.
To use Mass Submission System (MSS), submitters have to make submission files by themselves. So, DDBJ will review and consult for submitters on the process of making files.
Some submitters use Nucleotide Sequence Submission System to submit a lot of sequences, while some submitter use MSS to submit a few sequences.
Based on above information, select either of them as needed.
There is no limit of the number of entries to use Mass Submission System.
You can use it not only for many sequences but also for one long sequence with many features (i.e. complete genome with annotation).
See Mass Submission System
At 6. template, a) select 'other' and click [Input annotation] or b) Click [Upload annotation file].
Then, you can describe two or more features for each sequence as follows.
In case of a), see 7.Annotation (when “other” was selected at template).
In case of b), see 7. Annotation: upload an annotation file.
Since DEFINITION is constructed by DDBJ according to rules, there is no field to enter it.
Click [Select Qualifier], check qualifiers in the dialog as needed and click [Save] button.
Then, you can find input fields for qualifiers on 7.Annotation.
Related to this issue, in case of selecting "other" on 6. template, you have to specify some features other than source. So, click [Add feature] and select some feature on the list.
These errors mean amino acid translation for CDS (protein coding sequence) feature is not appropriate in the 5' or 3' end, respectively.
When the CDS feature is not complete (i.e. partial) at 5' and/or 3' ends, its location is required to include flag for 'not complete'.
According to rules on Description of Location, partial sequences should be appropriately specified with flags for 5' end not complete, "<", and/or for 3' end not complete, ">" on its feature location.
|<1..295||[not start with initiation codon] and [stop with termination codon]|
|1.. >295||[start with initiation codon] and [not stop with termination codon]|
|<1.. >295||[not start with initiation codon] and [not stop with termination codon]|
This error message is outputted, because you select /translation for CDS feature by dialog of [Select Qualifier] button.
Generally, since /translation qualifier is automatically created according to items under CDS feature, do not enter any amino acid sequence.
So, you can fix the error by removing /translation qualifier.
For your information, /translation qualifier is required only in case describing with /exception qualifier.
Typically, /exception qualifier indicates "RNA editing" is occurred on mRNA. In that case, conceptual amino acid translation of genome sequence is different from protein product of real mRNA molecules.
The error is occurred because you do not enter correct genetic code.
See 7.Annotation -- How to input an organism name.
To specify genetic code, enter digit in the input field.
The value will be automatically applied for /transl_table qualifier for CDS feature.
For your information, in case of a previously reported organism, the genetic code is automatically specified, by describing Scientific name (/organism qualifier) and /organelle qualifier. If your sequence is derived from an organelle other than nuclei, you have to specify /organelle qualifier to set the genetic code for mitochondrion, chloroplast or some, appropriately.
At first, please save the URL of the page on Nucleotide Sequence Submission System.
Then, shutdown the browser and reopen the saved URL.
When your sequences are relatively many, it is likely to resolve the condition.
If you still have any problem, please contact us with followings from contact form by selecting the item, "DDBJ Nucleotide Sequence Submission System".
See Description of Location and modify the location with flag for "5' end not complete", for an example, from "1..300" to "<1..300".
When the CDS feature is started with an initiation codon, correct /codon_start with "1".
You can modify your inputs on any pages before finishing your submission.
You can go back to each page by clicking either of 1.Contact person, 2.Hold date, 3.Submitter, 4.Reference, 5.Sequence, 6.Template or 7.Annotation in progress bar at upside of pages.
Confirm following points.
If you still have any problem, contact us from contact form by selecting the item, "Data Submission".
On 5.Sequence, input all of your sequences in multi-FASTA format. We will assign consequent accession numbers for your sequences.
Moving to 7.Annotation, you can enter feature annotation for each sequence at once.
In general, see How to describe CDS feature, when termination codon is found in the range.
You can also see Protein Coding Sequence; CDS feature to describe CDS feature.
Following items are case study for the error.
1. Did you correctly specify /codon_start qualifier to indicate reading frame of the CDS feature?
Select 1, 2 or 3, appropriately.
3. Are there really some stop codons in the range of CDS feature because of frame shift, nonsense mutation, or some other reason?
3-1. In case of pseudogene
Click [Select Qualifier] button beside CDS and add /pseudogene qualifier. Then, you can specify /pseudogene qualifier with its controlled vocabularies.
See also b) considered pseudogene in detail.
3-2. In cases of unsure whether it is pseudogene or not, the reason of stop codon is uncertain, or on the process of diversity increasing related to acquired immunity, describe misc_feature, not CDS feature.
See a) Putative nonsense mutation, frameshift caused by uncertain reason, or on the process of diversity increasing related to acquired immunity for IgG etc. in detail.
You can confirm amino acid sequences for CDS features as follows.
The function to confirm amino acid sequences will be applied on DDBJ Nucleotide Sequence Submission System.
Though you have not yet enter either /organism or /mol_type on annotation table, you click [Confirm] button.
You must fill mandatory items of annotation (feature, location, qualifier) before clicking [Confirm] button.
On 7.Annotation, click [Select Qualifier] button beside 'source', and select qualifiers as needed. Then, click [Edit] button beside entry name and input /organism and others. Note that it is required to input at least one feature other than source.
See also 7.Annotation – How to input an organism name.
INSD; International Nucleotide Sequence Database are composed of DDBJ, ENA and NCBI, and collect experimentally determined nucleotide sequence data.
A unique accession number issued by INSD for each submitted sequence data is defined as the INSD accession number.
On DDBJ flat file, the accession number is described in ACCESSION line.
If multiple entries are united to an entry, or if an entry is extensively modified after the submission, the responsible data banks may assign a new accession number to it. In these cases, the new accession number is called the primary accession number, and the old accession number(s) is/are called the secondary accession number(s).
In the flat file, the primary accession number is indicated first, then the secondary accession number(s) follows.
You can find the same updated entry with both the primary and the secondary accession numbers, in general.
However, if the old entry with secondary accession number has previously been open to the public, the old one is not removed. So, you can find the old record by getentry.
samtools view -hX Usage: samtools view [options]
| [region1 [...]] Options: -h print header for the SAM output -X output FLAG in string (samtools-C specific)
At DDBJ, we do not provide any official services to accept SNV, CGH analysis, microarray, variation and so on.
We assume that you can submit your data at NCBI or EBI.
Please submit to some of followings. If you have any questions, please ask each database, directly.
If your data are derived from human subjects, it may be required to submit your data to either of following controlled access databases.
The database of Genotypes and Phenotypes (dbGaP)
European Genome-phenome Archive (EGA)
Japanese Genotype-phenotype Archive (JGA)
To get a FASTA format of TSA or WGS entries, please use "getentry", specifying the following values.
ID : Specify the Accession Number.
Output format : Select "total nt seq FASTA" for the result.
Result : Select one from the following filetype for the output.
For more information about each value, please see getenry HELP.
Please see the following video as well.