DDBJ Annotated/Assembled Sequences

  • Home
  • Submission
    • Before Submission
    • Web submission
    • Mass Submission
    • Data Update
  • Search
    • getentry
    • ARSA
  • Flat file
    • Feature key
    • Qualifier key
    • Nucleotide Sequences
    • Organism qualifier
    • Identifiers
    • Description of Location
    • Protein Coding Sequence
    • The Genetic Codes
    • Codes Used in Sequence Description
    • Example of Submission
  • Data categories
    • Data Submission from Genome Project
    • Pseudohaplotype
    • WGS
    • Finished level genomic sequences
    • Metagenome Assembly
    • Single amplified genome
    • CON
    • GSS
    • HTG
    • Submission of environmental sequences
    • ENV
    • TLS
    • Data Submission from Transcriptome Project
    • TSA
    • EST
    • HTC
    • Third Party Data (TPA)
  • FAQ
  • Other
    • Patent
    • MGA
  • Home
  • ddbj
  • HTC

HTC

The HTC division of DDBJ/ENA/GenBank contains draft sequence data derived from cDNA libraries created using full length insert cDNA (mRNA) cloning methods.
Like genome data (HTG), when sequences are considered to be finished level, the data will be moved from HTC to corresponding taxonomic division.

You can submit HTC data to DDBJ through Mass Submission System (MSS).

Notes on HTC/full length insert cDNA submission
  • Prior to your submission, remove regions of cloning vectors from your sequences.
  • Clone ID is required for clone qualifier.
  • It is strongly recommended to include qualifiers indicating expression conditions; tissue (tissue_type), developmental stage (dev_stage), mating type (mating_type or sex) and so on.
  • As mentioned above, HTC is different from EST assemble sequence. Do not confuse with TSA: Transcriptome Shotgun Assembly.

Sample flat file

Aspects of HTC/full length insert cDNA

  • If the sequence is considered to be finished, LOCUS line provides the division name according to taxonomic lineage; either of “HUM”, “PRI”, “ROD”, “MAM”, “VRT”, “INV” or “PLN”.
    If the sequence is not finished level, the DIVISION name is “HTC”.
  • If the sequence is considered to be finished, KEYWORDS line provides the keyword, “FLI_CDNA”.
    If the sequence is not finished level, “HTC” is appeared as a keyword.
    In HTC data, if the sequence is likely to be full length, it has a KEYWORDS, “HTC_FLI”.
  • Optionally, KEYWORDS line provides some methodological keyword, “oligo capping”, “CAP trapper” or the like.
LOCUS       AK000000              1450 bp    mRNA    linear   HTC 15-OCT-2008
DEFINITION  Mus musculus mRNA for hypothetical protein, complete cds, clone: 
            2310009A01, full insert sequence. 
ACCESSION   AK000000
VERSION     AK000000.1
KEYWORDS    HTC; HTC_FLI; CAP trapper.
SOURCE      Mus musculus (house mouse)
  ORGANISM  Mus musculus
            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
            Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia;
            Sciurognathi; Muroidea; Muridae; Murinae; Mus; Mus.
REFERENCE   1  (bases 1 to 1450)
  AUTHORS   Mishima,H., Yamada,T. and Liu,G.Q.
  TITLE     Direct Submission
  JOURNAL   Submitted (30-SEP-2008) to the DDBJ/EMBL/GenBank databases.
            Contact:Hanako Mishima
            National Institute of Genetics, DNA Data Bank of Japan; Yata 1111,
            Mishima, Shizuoka 411-8540, Japan
REFERENCE   2
  AUTHORS   Mishima,H., Yamada,T., Park,C.S. and Liu,G.Q.
  TITLE     Mus musculus full-length enriched cDNA
  JOURNAL   Unpublished (2008)
FEATURES             Location/Qualifiers
     source          1..1450
                     /clone="2310009A01"
                     /clone_lib="full-length enriched mouse cDNA library A01"
                     /db_xref="taxon:10090"
                     /dev_stage="adult"
                     /mol_type="mRNA"
                     /organism="Mus musculus"
                     /sex="male"
                     /tissue_type="tongue"
     CDS             124..1230
                     /codon_start=1
                     /product="hypothetical protein"
                     /protein_id="BAA12348.1"
                     /transl_table=1
                     /translation=""--- omitted ---"
BASE COUNT          399 a          323 c          398 g          330 t
ORIGIN
        1 agtcgcacga aggtttcggc cttatgggcg gacgggtgag taacgcgtag gaatctatcc
        :
        -- The rest of nucleotide sequence is omitted --
        :
//

Related pages

  • Data Submission from Genome Project
  • WGS
  • CON
  • GSS
  • HTG
  • Submission of environmental sequences
  • ENV
  • TLS
  • Data Submission from Transcriptome Project
  • TSA
  • EST
  • HTC
  • Third Party Data (TPA)