• Entries from ENA and GenBank during a specific period are not being reflected in getentry

DDBJ Annotated/Assembled Sequences

  • Home
  • Submission
    • Before Submission
    • Web submission
    • Mass Submission
    • Data Update
  • Search
    • getentry
    • ARSA
  • Flat file
    • Feature Table
    • Feature key
    • Qualifier key
    • Nucleotide Sequences
    • Organism qualifier
    • Identifiers
    • Description of Location
    • Protein Coding Sequence
    • The Genetic Codes
    • Codes Used in Sequence Description
    • Description Examples of Sequence Data
  • Data categories
    • Data Submission from Genome Project
    • Pseudohaplotype
    • WGS
    • Finished level genomic sequences
    • Metagenome Assembly
    • Single amplified genome
    • HTG
    • Environmental sample
    • ENV
    • TLS
    • Data Submission from Transcriptome Project
    • TSA
    • EST
    • HTC
    • Third Party Data (TPA)
  • FAQ
  • Other
    • Patent
    • MGA
  • Home
  • ddbj
  • Finished level genomic sequences

Finished level genomic sequences

Finished level genomic sequences (non-WGS)

Nucleotide sequence must meet the following items as finished level genomic sequences

  • Finished level genomic sequences represent the full-length sequences of each of the replicons that make up the genome, and there must be one entry per replicon. It can contain sequencing gaps. In general, finished level genomic sequences refer to the full-length sequences of the chromosome.
  • Each chromosome entry must be a single contiguous sequence. Finished level genomic sequences can include organelle in eukaryotes or plasmid in prokaryotes sequences as well as chromosomes.
  • Each entry comprising a genome must be assigned either a chromosome, an organelle, or a plasmid. An entry that has a missing chromosome number (e.g. unanchored) can also be included as part of the finished level genomic sequences set.
  • In prokaryotes, the full length of the nucleotide sequence of a replicon (chromosome or plasmid) is expected to be submitted.
  • In eukaryotes, the sequence of each chromosome that contains sequencing gaps (difficult-to-read regions such as centromeres, telomeres, and repeats) can be registered as finished level. In this entry, annotation of the sequencing gap region is required.


How to submit to finished level genomic sequences and requirements

  • In order to submit finished level genomic sequences, please apply at the Mass Submission System (MSS) .
  • Registration of bothBioProject and BioSample are required for submission of finished level genomic sequences in advance. Description of a single accession number of BioProject and BioSample are needed on Finished level genomic sequences.
  • Raw read sequences can be registered at the DDBJ Sequence Read Archive (DRA). Accession number of run data that are used to construct the assembled genome sequences should be written on entries of Finished level genomic sequences.
  • If biological features such as CDS, tRNA, rRNA and so on are annotated to the sequences, application of a locus_tag prefix for each genomes is mandatory one the submission of BioSample Database.
  • If biological features such as CDS, tRNA, rRNA and so on are annotated to the sequences, registration of a locus_tag prefix is mandatory on the submission of BioSample every genomes.
  • Although annotation of biological features is optional, it is required for genome sequences from species that have not been available.

Please also visit the following web site in more detail.

  • Data submission from Genome Project
  • INSDC standards for genome assembly submission


Example DDBJ flat file format

Aspect of Finished level genomic sequences

  • Accession number; Basically, each Finished level genomic sequence submitted to DDBJ is assigned an accession number that consists of 2 alphabet characters and 6 digits .
  • DEFINITION ; The following information is displayed.
    • In the case of which entry consists of only a single chromosome in prokaryotes genome sequences, “complete genome” is shown to indicate that entry is the full length of genome sequence.
    • In eukaryotes, an entry that is composed of consecutive sequences for a single chromosome shows chromosome number.
  • COMMENT block includes Genome-Assembly-Data and information related to genome assembly. Here are the tag names of Genome-Assembly-Data.
Tag name Value (information)
Assembly Method Name of the assembly algorithm(s) with version number it was run.
Assembly Name A brief name suitable for display that does not include the organism name. This is mandatory for eukaryotes.
Genome Coverage The estimated base coverage across the genome.
Sequencing Technology sequencing platform(s) used.
  • Example flat file for prokaryotes genome sequences entries
    • Accession: AP025277-AP025279
    • Aeromonas hydrophila strain; NUITM-VA1, chromosome and plasmid
  • Example flat file for eukaryotes genome sequences entries
    • Accession: AP023152-AP023171
    • Felis catus, chromosome genome assemblies
    • AP023152 chromosome A1 entry

Related pages

  • Data Submission from Genome Project
  • Submission of environmental sequences
  • Data Submission from Transcriptome Project
  • Third Party Data (TPA)