General data: classified by source species
The data that are not classified into any categories described in the
sections are called general data and belong here.
In principle, it is required for general data to have at least one source feature and at least one other Biological feature.
Submitted sequences are automatically classified into one of the following divisions on the basis of the taxonomy of the source organisms.
|PRI||Primates (other than human)|
|MAM||Mammals (other than primates or rodents)|
|VRT||Vertebrates (other than mammals)|
|PLN||Plants or fungi|
ENV/SYN: impossible to identify souce species, Environmental Samples and Synthetic Constructs
Environmental samples and artificially constructed sequences are
classified into ENV and SYN division,respectively.
In principle, it is required for ENV and SYN data to have at least one source feature and at least one other Biological feature.
|ENV||Sequences obtained via environmental sampling methods, direct PCR, DGGE, etc.
For ENV submissions, it is necessary to describe an environmental_sample qualifier on the source feature.
|SYN||Synthetic constructs; sequences constructed by artificial manipulations
For SYN submissions, in general, the entry often has plural source features, so it should be cared.
See also Example of Submission; E05) synthetic construct..
CON: Contig/Constructed, Tiling of Entries
Many genome projects submitting a lot of HTG and/or
WGS entries can often provide the information to
assemble a series of their entries and reconstruct a genome structure.
An accession number would be assigned for such contig tiling path, so
called “CON entry”, which is classified into CON
See also Steps of genome sequencing, categories of sequence data and their correspondences.
We can NOT directly accept only the submission of CON entry.
At first you have to submit all piece entries to construct the contig, then a CON entry will be constructed.
AGP file is required to submit CON entries.
EST/GSS/HTC/HTG/STS: Divisions for Feasibility of Sequencing
Sequences derived from high throughput projects, such as large scale
analyses like EST dataset, ongoing whole genome scale sequencing, and so
on, are classified into the following divisions, respectively.
Basically only one source feature should be described for an entry in those divisions.
In this regard, however, the entries including HTC or HTG division can have some Biological features like as general data, if necessary.
|EST||Expressed sequence tags, cDNA sequences read short single pass.|
|GSS||Genome survey sequences, genome sequences read short single pass.|
|STS||Sequence tagged sites, tagged sequences for genome sequencing.
Recommended to use primer_bind feature and PCR_conditions qualifier.
|HTC||High throughput cDNA sequences from cDNA sequencing projects, not EST.
This division is to include unfinished high throughput cDNA sequences.
|HTG||High throughput genomic sequences mainly from genome sequencing projects.
Unfinished HTG entries are classified into different levels, as follow;
Data type, bulk sequence data
WGS: Fragment Sequences during WGS Assembling Process
The large set of contigs from the proceeding genome project can be
submitted as one of bulk sequence data, Whole Genome Shotgun
Please note that WGS data is different from others in its format of accession number.
See also Steps of genome sequencing, categories of sequence data and their correspondences .
TSA: Transcriptome Shotgun Assembly
Since 2008, we have accepted one of bulk sequence data, Transcriptome
Shotgun Assembly (TSA) categorized for assembled RNA
Basically only one source feature should be described for a TSA entry.
TSA entries can have some Biological features like as general data, if necessary.
Please note that TSA data may be different from others in its format of accession number.
See also steps of transcriptome project, categories of sequence data and their correspondences
TLS: Targeted Locus Study
Since 2016, we have accepted one of bulk sequence data, Targeted Locus
Study (TLS), including 16S rRNA or some other
targeted loci mainly to be clustered into operational taxonomic unit.
TLS entries can have some Biological features like as general data.
Please note that TLS data is different from others in its format of accession number.
Distinguishing that the nucleotide sequences are not determined by the submitters
TPA: Third Party Data and primary sequence data
TPA (Third Party Data) is a nucleotide sequence data
collection in which each entry is obtained by assembling primary entries
publicized from DDBJ/EMBL-Bank/GenBank, Trace
Sequence Read Archive with additional feature
annotation(s) determined by experimental or inferential methods by TPA
submitter. Those assemblies include two cases; one or more primary
entries are used and newly determined sequence is contained. TPA
sequence data should be submitted to DDBJ/EMBL-Bank/GenBank as a part of
the process to publish biological research for primary nucleotide
See also TPA Submission Guidelines.
Data types in MSS submission
|WGS: Whole Genome Shotgun||The sequences are WGS (draft genome) excluding MAG or SAG.|
|GNM: Finished Level Genome Sequence, non-WGS||The sequences are Finished Level Genomic Sequences (not WGS) excluding MAG or SAG.|
|MAG: Metagenome-Assembled Genome||The sequences are MAG.|
|SAG: Single Amplified Genome||The sequences are SAG.|
|TLS: Targeted Locus Study||The sequences are TLS.|
|HTG: High Throughput Genomic Sequences||The sequences are HTG.|
|TSA: Transcriptome Shotgun Assembly||The sequences are TSA.|
|HTC: High Throughput cDNA Sequences||The sequences are HTC.|
|EST: Expressed Sequence Tags||The sequences are EST.|
|MISC: Sequences that are not included in above types||The sequences do not match any types.|
|ASK: Ask DDBJ curator to judge a correct datatype||Ask DDBJ curators to counsult the data type.|