Biological features of a submitted sequence data are described with "Feature" key (the biological nature of the annotated feature), "Location" (the region of the sequence which corresponds to Feature), and "Qualifier" (supplementary information about Feature). In principle, EST or GSS entries are not described with any features except the "source" key.

FEATURES are indicated on the basis of the information provided by submitter and modified by databanks to describe the appropriate annotation. The rules of feature description agreed with three databanks are explained at The DDBJ/EMBL/GenBank Feature Table: Definition in detail.

Feature keys are briefly classified into 3 groups;

  • group 1: biological source of the sequence (source)
  • group 2: biological function features of the region
    e.g. CDS, rRNA, etc.
  • group 3: difference and/or change of the sequence data
    e.g. variation, conflict, etc.

The feature, "source" (group 1) is mandatory for all entries in the international nucleotide database. The qualifiers "/organism" and "/mol_type" are mandatory for source feature.

Feature keys in group 2 fall into families which are in some sense similar in function and which are annotated in a similar manner.A functional family may have a "generic" or miscellaneous key, which can be recognized by the 'misc_' prefix, that can used for instances not covered by the other defined keys of that group.

One of the most frequently used feature key is "CDS" to describe coding sequence for protein. See also CDS feature page.

FEATURES             Location/Qualifiers
     source          1..450
                     /clone_lib="lambda gt11 human liver cDNA (GeneTech.
                     /organism="Homo sapiens"
     CDS             86..>450
                     /product="glyceraldehyde-3-phosphate dehydrogenase"
source      1..450 -- The region from 1st to 450th base of the sequence is derived from the source described with following qualifiers.
/chromosome="12" -- The sequence is obtained from chromosome 12.
/clone="GT200015" -- The clone name which the sequence is obtained.
/clone_lib="lambda gt11 human liver cDNA (GeneTech. No.20)" -- The clone library name which the sequence is obtained.
/map="12p13" -- The sequence is located on 12p13.
/db_xref="taxon:9606" -- The sequence is derived from a organism correspond to taxonomy database ID: 9606 (human).
/mol_type="mRNA" -- The sequence is derived from mRNA.
/organism="Homo sapiens" -- The sequence is obtained from human.
/tissue_type="liver" -- The sequence is obtained from liver.
CDS      86..>450 -- The region from 86th to 450th base of the sequence is coding a protein described with following qualifiers.
">" means that 3'end is not completed for the region of CDS.
The rule to describe "Location" is explained at Description of Location in detail.
/codon_start=1 -- The frame reading amino acid translation of the first codon is the 1st base of this region (86th base of the entry).
/gene="GAPD" -- gene symbol, see gene qualifier
/product="glyceraldehyde-3-phosphate dehydrogenase" -- product name, see product qualifier
/protein_id="BAA12345.1" -- This is the ID assigned to amino acid sequence by the international nucleotide database.
It is indicated as 3 alphabet characters and 5 digits.
The number next to "." indicates he version number of protein ID. If the amino acid sequence is updated, the version number goes up (the protein_id is NOT changed).
/transl_table=1 -- The nucleotide sequence of CDS region is translated into amino acid sequence according to genetic code table 1.
/translation="MAKIKIGINGFGRIGRLVARVALQSDD(syncopation)FTDKDKAVAQLKGGAKKV" -- The nucleotide sequence of CDS region is conceptually translated into one-letter abbreviated amino acid sequence (Amino Acid Codes), except setting the qualifier exception.
In the case of setting the qualifier pseudogene or pseudo , /translation is NOT indicated.