• Newly released or re-released DRAs cannot be searched on DDBJ Search
  • Error in DRA validation process
  • (May 3-May 6) Correspondence during the Golden Week holidays
  • Entries from ENA and GenBank during a specific period are not being reflected in getentry

DDBJ Annotated/Assembled Sequences

  • Home
  • Submission
    • Before Submission
    • Web submission
    • Mass Submission
    • Data Update
  • Search
    • getentry
    • ARSA
  • Flat file
    • Feature Table
    • Feature key
    • Qualifier key
    • Nucleotide Sequences
    • Organism qualifier
    • Identifiers
    • Description of Location
    • Protein Coding Sequence
    • The Genetic Codes
    • Codes Used in Sequence Description
    • Description Examples of Sequence Data
  • Data categories
    • Data Submission from Genome Project
    • Pseudohaplotype
    • WGS
    • Finished level genomic sequences
    • Metagenome Assembly
    • Single amplified genome
    • HTG
    • Environmental sample
    • ENV
    • TLS
    • Data Submission from Transcriptome Project
    • TSA
    • EST
    • HTC
    • Third Party Data (TPA)
  • FAQ
  • Other
    • Patent
    • MGA
  • Home
  • ddbj
  • Codes Used in Sequence Description

Codes Used in Sequence Description

Nucleotide

Nucleotide Base Codes

The nucleotide base codes that are used with the International Nucleotide Sequence Database is as follows.
Sequence data is expressed with small letters only.
Capital letter will be automatically converted to small letter.

Symbol Meaning Explanation
a a adenine
c c cytosine
g g guanine
t t thymine in DNA; uracil in RNA
m a or c amino
r a or g purine
w a or t  
s c or g  
y c or t pyrimidine
k g or t keto
v a or c or g not t
h a or c or t not g
d a or g or t not c
b c or g or t not a
n a or c or g or t any

[References]

  • Cornish-Bowden, A. Nucl Acid Res 13, 3021-3030 (1985)
  • Feature Table Definition: 7.4.1 Nucleotide base codes (IUPAC)

Modified Base Abbreviations

An example for description of the modified base in FEATURES line.

Example


      FEATURES             Location/Qualifiers
           modified_base   15
                           /mod_base="m2g"


Abbreviation Modified base description
ac4c 4-acetylcytidine
chm5u 5-(carboxyhydroxylmethyl)uridine
cm 2’-O-methylcytidine
cmnm5s2u 5-carboxymethylaminomethyl-2-thiouridine
cmnm5u 5-carboxymethylaminomethyluridine
dhu dihydrouridine
fm 2’-O-methylpseudouridine
gal q beta,D-galactosylqueuosine
gm 2’-O-methylguanosine
i inosine
i6a N6-isopentenyladenosine
m1a 1-methyladenosine
m1f 1-methylpseudouridine
m1g 1-methylguanosine
m1i 1-methylinosine
m22g 2,2-dimethylguanosine
m2a 2-methyladenosine
m2g 2-methylguanosine
m3c 3-methylcytidine
m4c N4-methylcytosine
m5c 5-methylcytidine
m6a N6-methyladenosine
m7g 7-methylguanosine
mam5u 5-methylaminomethyluridine
mam5s2u 5-methoxyaminomethyl-2-thiouridine
man q beta,D-mannosylqueuosine
mcm5s2u 5-methoxycarbonylmethyl-2-thiouridine
mcm5u 5-methoxycarbonylmethyluridine
mo5u 5-methoxyuridine
ms2i6a 2-methylthio-N6-isopentenyladenosine
ms2t6a N-((9-beta-D-ribofuranosyl-2-methyltiopurin-6-yl)carbamoyl)threonine
mt6a N-((9-beta-D-ribofuranosylpurine-6-yl)N-methyl-carbamoyl)threonine
mv uridine-5-oxyacetic acid methylester
o5u uridine-5-oxyacetic acid (v)
osyw wybutoxosine
p pseudouridine
q queuosine
s2c 2-thiocytidine
s2t 5-methyl-2-thiouridine
s2u 2-thiouridine
s4u 4-thiouridine
m5u 5-methyluridine
t6a N-((9-beta-D-ribofuranosylpurine-6-yl)carbamoyl)threonine
tm 2’-O-methyl-5-methyluridine
um 2’-O-methyluridine
yw wybutosine
x 3-(3-amino-3-carboxypropyl)uridine, (acp3)u
OTHER Other (*)

(*) Modified base not found in this list should be described in /note qualifier.

[References]

  • Sprinzl, M. and Gauss, D.H. Nucl Acid Res 10, r1 (1982). (note that in Cornish_Bowden, A. Nucl Acid Res 13, 3021-3030 (1985) the IUPAC-IUB declined to recommend a set of abbreviations for modified nucleotides)
  • Feature Table Definition: 7.4.2 Modified base abbreviations

Amino Acid

Amino Acid Codes

The amino acid code that is used with the International Nucleotide Sequence Database is as follows.
These amino acids are described with one letter abbreviation in /translation qualifier of CDS feature.
The listed amino acid abbreviations are legal values for qualifiers /transl_except and /anticodon.
Those that are not included in “Amino acid codes”, please refer to Modified and unusual Amino Acids.

Abbreviation 1 letter abbreviation Amino acid name
Ala A Alanine
Arg R Arginine
Asn N Asparagine
Asp D Aspartic acid
Cys C Cysteine
Gln Q Glutamine
Glu E Glutamic acid
Gly G Glycine
His H Histidine
Ile I Isoleucine
Leu L Leucine
Lys K Lysine
Met M Methionine
Phe F Phenylalanine
Pro P Proline
Pyl O Pyrrolysine
Ser S Serine
Sec U Selenocysteine
Thr T Threonine
Trp W Tryptophan
Tyr Y Tyrosine
Val V Valine
Asx B Aspartic acid or Asparagine
Glx Z Glutamic acid or Glutamine
Xaa X Any amino acid
Xle J Leucine or Isoleucine
TERM   termination codon

[References]

  • IUPAC-IUB Joint Commission on Biochemical Nomenclature.Nomenclature and Symbolism for Amino Acids and Peptides. Eur. J. Biochem. 138: 9-37 (1984).
  • Feature Table Definition: 7.4.3 Amino acid abbreviations

Modified and Unusual Amino Acids

For other amino acids, those that are not included in Amino Acid Codes, abbreviation listed below is used.
All of these amino acids are described with one letter abbreviation “X” in /translation qualifier of CDS feature.

Abbreviation Amino acid name
Aad 2-Aminoadipic acid
bAad 3-Aminoadipic acid
bAla beta-Alanine, beta-Aminoproprionic acid
Abu 2-Aminobutyric acid
4Abu 4-Aminobutyric acid, piperidinic acid
Acp 6-Aminocaproic acid
Ahe 2-Aminoheptanoic acid
Aib 2-Aminoisobutyric acid
bAib 3-Aminoisobutyric acid
Apm 2-Aminopimelic acid
Dbu 2,4-Diaminobutyric acid
Des Desmosine
Dpm 2,2’-Diaminopimelic acid
Dpr 2,3-Diaminoproprionic acid
EtGly N-Ethylglycine
EtAsn N-Ethylasparagine
Hyl Hydroxylysine
aHyl allo-Hydroxylysine
3Hyp 3-Hydroxyproline
4Hyp 4-Hydroxyproline
Ide Isodesmosine
aIle allo-Isoleucine
MeGly N-Methylglycine, sarcosine
MeIle N-Methylisoleucine
MeLys 6-N-Methyllysine
MeVal N-Methylvaline
Nva Norvaline
Nle Norleucine
Orn Ornithine
OTHER Other (*)

(*) Amino acid not found in this list should be described in /note qualifier.

[References]

  • Feature Table Definition: 7.4.4 Modified and unusual Amino Acids

Related pages

  • The Genetic Codes
  • DDBJ flat file format
  • Qualifier key
  • Protein Coding Sequence; CDS feature
  • Description Examples of Sequence Data