Last updated:2017.3.14.

What is EST? – Expressed Sequence Tags

EST is a division of DDBJ/EMBL-Bank/GenBank that contains sequence data and other information on "single-pass" cDNA (i.e. mRNA or other RNA transcript) sequences, or "Expressed Sequence Tags", from a number of organisms.

You can submit EST data to DDBJ through Mass Submission System (MSS).

Notes on the EST submission

  • Prior to your submission, remove regions of cloning vectors from your sequences.
  • Clone ID is required for clone qualifier.
  • It is strongly recommended to include qualifiers indicating expression conditions; tissue (tissue_type), developmental stage (dev_stage), mating type (mating_type or sex) and so on.
  • In principle, only sequences derived from Sanger method are acceptable for EST division.
    Sequence reads generated from, so-called, Next Generation Sequencers are accepted at DDBJ Sequence Read Archive.
  • EST assemble sequence would be accepted as TSA: Transcriptome Shotgun Assembly.

Aspects of EST on DDBJ flat file

  • In principle, no feature information is provided except source.
  • LOCUS line provides the division name, "EST".
  • KEYWORDS line provides the keywords name, "EST" and one of following three terms.
    *Since following controlled vocabularies indicate strategies of methods which are used to obtain ESTs, there is no guarantee if the sequence is really derived from 5'- or 3'- end of RNA transcript or not.
  • For 5' EST submissions --- 5'-end sequence (5'-EST)
  • For 3' EST submissions --- 3'-end sequence (3'-EST)
  • Other than above two cases --- unspecified EST
  • In the case of 3' EST, to distinguish whether your sequences are corresponding to anti-sense or sense strand, please describe either of following two COMMENTs.
  • For anti-sense strand;
    3'-EST sequences are presented as anti-sense strand.
  • For sense strand;
    3'-EST sequences are presented as sense strand.

Sample of EST flat file

LOCUS       HY000000                 300 bp   mRNA     linear   EST 15-OCT-2008
DEFINITION  Mus musculus mRNA, clone: 2310009A01, 3' end sequence, expressed 
            in tongue.
VERSION     HY000000.1
KEYWORDS    EST; 3'-end sequence (3'-EST).
SOURCE      Mus musculus (house mouse)
  ORGANISM  Mus musculus
            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
            Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia;
            Sciurognathi; Muroidea; Muridae; Murinae; Mus; Mus.
REFERENCE   1  (bases 1 to 300)
  AUTHORS   Mishima,H., Yamada,T. and Liu,G.Q.
  TITLE     Direct Submission
  JOURNAL   Submitted (30-SEP-2008) to the DDBJ/EMBL/GenBank databases.
            Contact:Hanako Mishima
            National Institute of Genetics, DNA Data Bank of Japan; Yata 1111,
            Mishima, Shizuoka 411-8540, Japan
  AUTHORS   Mishima,H., Yamada,T., Park,C.S. and Liu,G.Q.
  TITLE     Mus musculus EST
  JOURNAL   Unpublished (2008)
COMMENT     3'-EST sequences are presented as anti-sense strand.
FEATURES             Location/Qualifiers
     source          1..300
                     /clone_lib="full-length enriched mouse cDNA library A01"
                     /organism="Mus musculus"
BASE COUNT          86 a          90 c          73 g          51 t
        1 attaatataa gctaaatatg tttttcaata tatattgata atagaatatc aacaatttgg
        -- The rest of nucleotide sequence is omitted --