Sequence data submitted to DDBJ are classified and stored into following categories.
Division, conventional sequence data
Data type, bulk sequence data
If you are not sure to which database you should submit your data, see following sites;
Using Mass Submission System, the submitted nucleotide sequences are classified into one of the categories according to the descriptions of the DATATYPE, DIVISION, and KEYWORD.
The data that are not classified into any categories described in the sections are called general data and belong here.
In principle, it is required for general data to have at least one source feature and at least one other Biological feature.
Submitted sequences are automatically classified into one of the following divisions on the basis of the taxonomy of the source organisms.
Environmental samples and artificially constructed sequences are classified into ENV - envrionmental_samples and SYN division, respectively.
In principle, it is required for ENV and SYN data to have at least one source feature and at least one other Biological feature.
Sequences derived from high throughput projects, such as large scale analyses like EST dataset, ongoing whole genome scale sequencing, and so on, are classified into the following divisions, respectively.
Basically only one source feature should be described for an entry in those divisions.
In this regard, however, the entries including HTC or HTG division can have some Biological feature like as general data, if necessary.
It is recommended for STS submission to use primer_bind feature and PCR_conditions qualifier.
Many genome projects submitting a lot of HTG and/or WGS entries can often provide the information to assemble a series of their entries and reconstruct a genome structure. An accession number would be assigned for such contig tiling path, so called "CON entry", which is classified into CON division.
See also steps of genome sequencing, categories of sequence data and their correspondences.
We can NOT directly accept only the submission of CON entry.
At first you have to submit all piece entries to construct the contig, then a CON entry will be constructed.
AGP file is required to submit CON entries.
The large set of contigs or the finished sequences without annotation from the proceeding genome project can be submitted to DDBJ/EMBL-Bank/GenBank as WGS data.
Please note that WGS data is different from others in its format of accession number.
See also steps of genome sequencing, categories of sequence data and their correspondences.
Since 2008, DDBJ/EMBL-Bank/GenBank has accepted the sequence data of Transcriptome Shotgun Assembly (TSA) categorized for assembled RNA transcript sequences.
Basically only one source feature should be described for a TSA entry.
TSA entries can have some Biological features like as general data, if necessary.
Please note that TSA data may be different from others in its format of accession number.
See also steps of transcriptome project, categories of sequence data and their correspondences
TPA (Third Party Data) is a nucleotide sequence data collection in which each entry is obtained by assembling primary entries publicized from DDBJ/EMBL-Bank/GenBank, Trace Archive, and/or Sequence Read Archive with additional feature annotation(s) determined by experimental or inferential methods by TPA submitter. Those assemblies include two cases; one or more primary entries are used and newly determined sequence is contained. TPA sequence data should be submitted to DDBJ/EMBL-Bank/GenBank as a part of the process to publish biological research for primary nucleotide sequences.
See also TPA Submission Guidelines.