• Entries from ENA and GenBank during a specific period are not being reflected in getentry

DDBJ Annotated/Assembled Sequences

  • Home
  • Submission
    • Before Submission
    • Web submission
    • Mass Submission
    • Data Update
  • Search
    • getentry
    • ARSA
  • Flat file
    • Feature Table
    • Feature key
    • Qualifier key
    • Nucleotide Sequences
    • Organism qualifier
    • Identifiers
    • Description of Location
    • Protein Coding Sequence
    • The Genetic Codes
    • Codes Used in Sequence Description
    • Description Examples of Sequence Data
  • Data categories
    • Data Submission from Genome Project
    • Pseudohaplotype
    • WGS
    • Finished level genomic sequences
    • Metagenome Assembly
    • Single amplified genome
    • HTG
    • Environmental sample
    • ENV
    • TLS
    • Data Submission from Transcriptome Project
    • TSA
    • EST
    • HTC
    • Third Party Data (TPA)
  • FAQ
  • Other
    • Patent
    • MGA
  • Home
  • ddbj
  • Feature key

Feature key

The feature keys used and recommended for DDBJ submissions are as follows.

Feature/Qualifier Usage Matrix

The chart, Feature/Qualifier usage matrix, explains recommended combinations of feature and qualifier keys for DDBJ submissions.

For more detail of available combinations of feature and qualifier keys in INSDC entries, read: 7.2 Appendix II: Feature keys reference of Feature Table Definition.

Definition of Feature key

assembly_gapFeature Table Definition

gap between two components of a genome or transcriptome assembly

C_regionFeature Table Definition

Constant region of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains.
Includes one or more exons depending on the particular chain.

CDSFeature Table Definition

coding sequence; sequence of nucleotides that corresponds with the sequence of amino acids in a protein (location includes stopcodon).
See also the page about CDS feature.

centromereFeature Table Definition

region of biological interest identified as a centromere and which has been experimentally characterized

D-loopFeature Table Definition

displacement loop; a region within mitochondrial DNA in which a short stretch of RNA is paired with one strand of DNA, displacing the original partner DNA strand in this region;
also used to describe the displacement of a region of one strand of duplex DNA by a single stranded invader in the reaction catalyzed by RecA protein

D_segmentFeature Table Definition

Diversity segment of immunoglobulin heavy chain, and T-cell receptor beta chain.

exonFeature Table Definition

region of genome that codes for portion of spliced mRNA, rRNA and tRNA

gapFeature Table Definition

gap in the sequence; sequencing gap other than assembly_gap

intronFeature Table Definition

a segment of DNA that is transcribed, but removed from within the transcript by splicing together the sequences (exons) on either side of it

J_segmentFeature Table Definition

Joining segment of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains.

mat_peptideFeature Table Definition

mature peptide or protein coding sequence; coding sequence for the mature or final peptide or protein product following post-translational modification.
The location does not include the stop codon (unlike the corresponding CDS)

misc_bindingFeature Table Definition

site in nucleic acid which covalently or non-covalently binds another moiety that cannot be described by any other binding key;
primer_bind or protein_bind

misc_differenceFeature Table Definition

feature sequence is different from that presented in the entry and cannot be described by any other “difference” key;
variation, modified_base
Comment
The misc_difference feature should be used to describe variability that arises as a result of genetic manipulation (e.g. site directed mutagenesis).

misc_featureFeature Table Definition

region of biological interest which cannot be described by any other feature key; a new or rare feature.

misc_RNAFeature Table Definition

any transcript or RNA product that cannot be defined by other RNA keys
prim_transcript, precursor_RNA, mRNA, 5’UTR, 3’UTR, exon, CDS, sig_peptide, transit_peptide, mat_peptide, intron, polyA_site, ncRNA, rRNA, tRNA, tmRNA

misc_structureFeature Table Definition

any secondary or tertiary nucleotide structure or conformation that cannot be described by other “Structure” keys;
stem_loop, D-loop

mobile_elementFeature Table Definition

region of genome sequence containing mobile element

modified_baseFeature Table Definition

the indicated nucleotide is a modified nucleotide and should be substituted for by the indicated molecule (given in the mod_base qualifier value)

mRNAFeature Table Definition

messenger RNA; includes 5’untranslated region (5’UTR), coding sequences (CDS, exon) and 3’untranslated region (3’UTR)

ncRNAFeature Table Definition

a non-protein-coding gene, other than ribosomal RNA (rRNA) and transfer RNA (tRNA), the functional molecule of which is the RNA transcript

operonFeature Table Definition

region containing polycistronic transcript including a cluster of genes that are under the control of the same regulatory sequences/promoter and in the same biological pathway

oriTFeature Table Definition

iorigin of transfer; region of a DNA molecule where transfer is initiated during the process of conjugation or mobilization

precursor_RNAFeature Table Definition

any RNA species that is not yet the mature RNA product

primer_bindFeature Table Definition

Non-covalent primer binding site for initiation of replication, transcription, or reverse transcription.
Includes site(s) for synthetic e.g., PCR primer elements

propeptideFeature Table Definition

propeptide coding sequence; coding sequence for the domain of a proprotein that is cleaved to form the mature protein product.

protein_bindFeature Table Definition

non-covalent protein binding site on nucleic acid

regulatoryFeature Table Definition

any region of sequence that functions in the regulation of transcription or translation
Since December 2014, the following old features have been merged into this feature.

  • attenuator –> regulatory feature with /regulatory_class=”attenuator”
  • CAAT_signal –> regulatory feature with /regulatory_class=”CAAT_signal”
  • enhancer –> regulatory feature with /regulatory_class=”enhancer”
  • GC_signal –> regulatory feature with /regulatory_class=”GC_signal”
  • -35_signal –> regulatory feature with /regulatory_class=”minus_35_signal”
  • -10_signal –> regulatory feature with /regulatory_class=”minus_10_signal”
  • polyA_signal –> regulatory feature with /regulatory_class=”polyA_signal_sequence”
  • promoter –> regulatory feature with /regulatory_class=”promoter”
  • RBS –> regulatory feature with /regulatory_class=”ribosome_binding_site”
  • TATA_signal –> regulatory feature with /regulatory_class=”TATA_box”
  • terminator –> regulatory feature with /regulatory_class=”terminator”
  • misc_signal –> regulatory feature with /regulatory_class=”other”

repeat_regionFeature Table Definition

region of genome containing repeating units

rep_originFeature Table Definition

origin of replication; starting site for duplication of nucleic acid to give two identical copies

rRNAFeature Table Definition

mature ribosomal RNA; the RNA component of the ribonucleoprotein particle (ribosome) which assembles amino acids into proteins

sig_peptideFeature Table Definition

signal peptide coding sequence; coding sequence for an N-terminal domain of a secreted protein;
this domain is involved in attaching nascent polypeptide to the membrane; leader sequence

sourceFeature Table Definition

identifies the biological source of the specified span of the sequence.
This key is mandatory. Every entry will have, as a minimum, a single source key spanning the entire sequence.
More than one source key per sequence is permissible

stem_loopFeature Table Definition

hairpin; a double-helical region formed by base-pairing between adjacent (inverted) complementary sequences in a single strand of RNA or DNA

telomereFeature Table Definition

region of biological interest identified as a telomere and which has been experimentally characterized

tmRNAFeature Table Definition

transfer messenger RNA; tmRNA acts as a tRNA first, and then as an mRNA that encodes a peptide tag;
the ribosome translates this mRNA region of tmRNA and attaches the encoded peptide tag to the C-terminus of the unfinished protein;
this attached tag targets the protein for destruction or proteolysis;

transit_peptideFeature Table Definition

transit peptide coding sequence; coding sequence for an N-terminal domain of a nuclear-encoded organellar protein;
this domain is involved in post-translational import of the protein into the organelle

tRNAFeature Table Definition

mature transfer RNA, a small RNA molecule (75-85 bases long) that mediates the translation of a nucleic acid sequence into an amino acid sequence

unsureFeature Table Definition

A small region of sequenced bases, generally 10 or fewer in its length, which could not be confidently identified.

V_regionFeature Table Definition

Variable region of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains.
Codes for the variable amino terminal portion. Can be made up from V_segments, D_segments, N_regions and J_segments.

V_segmentFeature Table Definition

Variable segment of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains.
Codes for most of the variable region (V_region) and the last few amino acids of the leader peptide

variationFeature Table Definition

a related strain contains stable mutations from the same gene (e.g., RFLPs, polymorphisms, etc.) which differ from the presented sequence at this location (and possibly others).

3’UTRFeature Table Definition

1) region at the 3’ end of a mature transcript (following the stop codon) that is not translated into a protein;
2) region at the 3’ end of an RNA virus (following the last stop codon) that is not translated into a protein;

5’UTRFeature Table Definition

1) region at the 5’ end of a mature transcript (preceding the initiation codon) that is not translated into a protein;
2) region at the 5’ end of an RNA virus genome (preceding the first initiation codon) that is not translated into a protein;