• (July 9 10:00-11:00) Announcement of GEA/DRA/BP/BS system suspension

DDBJ Annotated/Assembled Sequences

  • Home
  • Submission
    • Before Submission
    • Web submission
    • Mass Submission
    • Data Update
  • Search
    • getentry
    • ARSA
  • Flat file
    • Feature Table
    • Feature key
    • Qualifier key
    • Nucleotide Sequences
    • Organism qualifier
    • Identifiers
    • Description of Location
    • Protein Coding Sequence
    • The Genetic Codes
    • Codes Used in Sequence Description
    • Description Examples of Sequence Data
  • Data categories
    • Data Submission from Genome Project
    • Pseudohaplotype
    • WGS
    • Finished level genomic sequences
    • Metagenome Assembly
    • Single amplified genome
    • HTG
    • Environmental sample
    • ENV
    • TLS
    • Data Submission from Transcriptome Project
    • TSA
    • EST
    • HTC
    • Third Party Data (TPA)
  • FAQ
  • Other
    • Patent
    • MGA
  • Home
  • ddbj
  • locus tag

locus tag

locus tag

The locus tags are identifiers systematically assigned to every gene in a genome.
INSD (DDBJ/ENA/GenBank) introduced the locus tag to prevent confusion caused by similar names assigned to different genes of different genomes. All component sequences of a genome assembly such as chromosomes and plasmids should use the same locus tag prefix in the /locus_tag qualifiers to systematically distinguish genes.

locus tag prefix registration

A genome assembly submission requires a BioProject and a BioSample.
Register a locus tag prefix during the BioSample submission. Enter a prefix you want to use in the locus_tag_prefix attribute provided by the BioSample packages used for genome assembly samples.
The locus_tag prefix should consist of 3-12 alpha-numeric characters and should start with a letter, but numerals can be used after the 2nd character (e.g. A1C). There should be no symbols such as ‘-‘, ‘' and ‘*’.
A locus tag prefix cannot be changed after registration, so avoid to register a prefix following an organism name or a strain name which may be changed in the future.

/locus_tag qualifier

Submit genome assembled sequences to DDBJ through Mass Submission System.
In /locus_tag qualifiers, enter the prefix which has been registered in the corresponding BioSample. Assign /locus_tag qualifiers to both protein-coding genes and non-coding RNA genes. Separate the prefix and the tag value by an underscore (e.g. A1C_00001). It is recommended to use the same numbering convention for all /locus_tag qualifiers in order of appearance regardless of types of annotating genes such as protein-coding, structural RNA and originating chromosome. However, if you want to include information regarding chromosome number and RNA type, you may add these information after the underscode following the prefix.

ABC_I00001 for gene 1, chromosome I
ABC_II00001 for gene 1, chromosome II
ABC_r1112 for ribosomal RNA genes
ABC_t1113 for tRNA genes

Add the /locus_tag qualifiers to the following features (do not add to repeat_region).

  • mRNA
  • CDS
  • 5’UTR
  • 3’UTR
  • intron
  • exon
  • tRNA
  • rRNA
  • ncRNA
  • tmRNA
  • precursor_RNA
  • prim_transcript
  • misc_RNA

Use the same value in the /locus_tag qualifiers of exon/CDS/mRNA features which constitute single gene.
One locus_tag should be associated with one /gene.

How to add /locus_tag

A new /locus_tag can be added in either of the following ways when updating genome sequences and annotations.

No. 1: Deletion and addition

Before     After
ABC_0022
           ABC_4568 (new gene)
ABC_0023   ABC_0023

No. 2: Add into the gap

Before     After
ABC_0020   ABC_0020
           ABC_0021 (new gene)
ABC_0030   ABC_0030

Decimal integers like versioning (e.g. ABC_0020.1) can not be used.