• Entries from ENA and GenBank during a specific period are not being reflected in getentry

DDBJ Annotated/Assembled Sequences

  • Home
  • Submission
    • Before Submission
    • Web submission
    • Mass Submission
    • Data Update
  • Search
    • getentry
    • ARSA
  • Flat file
    • Feature Table
    • Feature key
    • Qualifier key
    • Nucleotide Sequences
    • Organism qualifier
    • Identifiers
    • Description of Location
    • Protein Coding Sequence
    • The Genetic Codes
    • Codes Used in Sequence Description
    • Description Examples of Sequence Data
  • Data categories
    • Data Submission from Genome Project
    • Pseudohaplotype
    • WGS
    • Finished level genomic sequences
    • Metagenome Assembly
    • Single amplified genome
    • HTG
    • Environmental sample
    • ENV
    • TLS
    • Data Submission from Transcriptome Project
    • TSA
    • EST
    • HTC
    • Third Party Data (TPA)
  • FAQ
  • Other
    • Patent
    • MGA
  • Home
  • ddbj
  • HTG

HTG

The HTG division was created to accommodate a growing need to make unfinished genomic sequence data available to the scientific community.

The HTG division of DDBJ contains unfinished genome sequences.
When sequences are considered to be finished level, the data will be moved from HTG to corresponding taxonomic division.

You can submit HTG data to DDBJ through Mass Submission System (MSS).

Notes on HTG submission
  • Prior to sequence data submission, get a BioProject ID for your project on the BioProject Database
  • Clone ID should be described in clone qualifier.
    Basically, main targets of HTG division are unfinished sequences of BAC, YAC, fosmid clones.

Sample flat file

Aspects of HTG

  • If the sequence is considered to be finished, LOCUS line provides the division name according to taxonomic lineage; either of “HUM”, “PRI”, “ROD”, “MAM”, “VRT”, “INV”, “PLN” or “BCT”.
    If the sequence is not finished level, the division name is “HTG”.
  • If the sequence is considered to be finished, there is no keyword in KEYWORDS.
    If the sequence is not finished level, “HTG” and either of “HTGS_PHASE0”, “HTGS_PHASE1” or “HTGS_PHASE2” are appeared as keywords.
    • HTGS_PHASE0: one-to-few pass reads of a single clone
    • HTGS_PHASE1: unfinished, may be unordered, unoriented contigs, with gaps.
    • HTGS_PHASE2: unfinished, ordered, oriented contigs, with or without gaps.
  • Optionally, KEYWORDS line provides some other keywords, “HTGS_DRAFT”, “HTGS_ENRICHED”, “HTGS_POOLED_CLONE” or “HTGS_POOLED_MULTICLONE”.
LOCUS       AP000000              121001 bp    DNA    linear   HTG 15-OCT-2008
DEFINITION  Arabidopsis thaliana DNA, chromosome 1, BAC clone: CIC5D1, ***
            SEQUENCING IN PROGRESS ***, 10 unordered pieces.
ACCESSION   AP000000
VERSION     AP000000.1
DBLINK      BioProject:PRJDB04321
KEYWORDS    HTG; HTGS_PHASE1.
SOURCE      Arabidopsis thaliana (thale cress)
  ORGANISM  Arabidopsis thaliana
            Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
            Spermatophyta; Magnoliophyta; eudicotyledons; core eudicotyledons;
            rosids; malvids; Brassicales; Brassicaceae; Camelineae;
            Arabidopsis.
REFERENCE   1  (bases 1 to 423)
  AUTHORS    Mishima,H., Yamada,T. and Liu,G.Q.
  TITLE     Direct Submission
  JOURNAL   Submitted (30-SEP-2008) to the DDBJ/EMBL/GenBank databases.
            Contact:Hanako Mishima
            National Institute of Genetics, DNA Data Bank of Japan; Yata 1111,
            Mishima, Shizuoka 411-8540, Japan
REFERENCE   2
  AUTHORS   Mishima,H., Yamada,T., Park,C.S. and Liu,G.Q.
  TITLE     Arabidopsis thaliana DNA
  JOURNAL   Unpublished (2008)
FEATURES             Location/Qualifiers
     source          1..121001
                     /chromosome="1"
                     /clone="CIC5D1"
                     /collection_date="2001"
                     /db_xref="taxon:3702"
                     /ecotype="Columbia"
                     /geo_loc_name="USA"
                     /map="between mi303 and mi259"
                     /mol_type="genomic DNA"
                     /organism="Arabidopsis thaliana"
     gap             2079..2128
                     /estimated_length=unknown
     gap             7295..7344
                     /estimated_length=unknown
     gap             15694..15743
                     /estimated_length=unknown
     gap             32780..32829
                     /estimated_length=unknown
     gap             40371..40420
                     /estimated_length=unknown
     gap             59441..59490
                     /estimated_length=unknown
     gap             79080..79129
                     /estimated_length=unknown
     gap             88074..88123
                     /estimated_length=unknown
     gap             107128..107177
BASE COUNT         105 a          98 c          112 g          108 t
ORIGIN
        1 attaatataa gctaaatatg tttttcaata tatattgata atagaatatc aacaatttgg
        :
        -- The rest of nucleotide sequence is omitted --
        :
//

Related pages

  • Data Submission from Genome Project
  • Submission of environmental sequences
  • Data Submission from Transcriptome Project
  • Third Party Data (TPA)