Finished level genomic sequences

Finished level genomic sequences (non-WGS)

Nucleotide sequence must meet the following items as finished level genomic sequences

Finished level genomic sequences represent the full-length sequences of each of the replicons that make up the genome, and there must be one entry per replicon. It can contain sequencing gaps. In general, finished level genomic sequences refer to the full-length sequences of the chromosome.
Each chromosome entry must be a single contiguous sequence. Finished level genomic sequences can include organelle in eukaryotes or plasmid in prokaryotes sequences as well as chromosomes.
Each entry comprising a genome must be assigned either a chromosome, an organelle, or a plasmid. Chromosomes can include unlocalized sequences and unplaced sequences as part of a set of finished level genomic sequences. The unlocalized sequence is defined as a sequence in which the chromosome number has been determined but its position or orientation is unknown. The unplaced sequence is defined as a sequence in which the chromosome number is unknown.
In prokaryotes and viruses / phages, the full length of the nucleotide sequence of a replicon (chromosome or plasmid or segment) is expected to be submitted.
In eukaryotes, the sequence of each chromosome that contains sequencing gaps (difficult-to-read regions such as centromeres, telomeres, and repeats) can be registered as finished level. In this entry, annotation of the sequencing gap region is required.

How to submit to finished level genomic sequences and requirements

In order to submit finished level genomic sequences, please apply at the Mass Submission System (MSS) .
Registration of bothBioProject and BioSample are required for submission of finished level genomic sequences in advance. Description of a single accession number of BioProject and BioSample are needed on Finished level genomic sequences.
Raw read sequences can be registered at the DDBJ Sequence Read Archive (DRA). Accession number of run data that are used to construct the assembled genome sequences should be written on entries of Finished level genomic sequences.
If biological features such as CDS, tRNA, rRNA and so on are annotated to the sequences, application of a locus_tag prefix for each genomes is mandatory one the submission of BioSample Database.
If biological features such as CDS, tRNA, rRNA and so on are annotated to the sequences, registration of a locus_tag prefix is mandatory on the submission of BioSample every genomes.
Although annotation of biological features is optional, it is required for genome sequences from species that have not been available.
The chromosome number, organelle name, plasmid name, and segment number must be described in the source feature using the specified qualifiers. For unlocalized and unplaced sequences, please describe the scaffold or contig number in note qualifier.
The unlocalized and unplaced sequences cannot be mechanically joined together to create a biologically meaningless sequence that does not exist in nature and to register with a chromosome number such as 0 (or 99). Please register each sequences as a separate scaffold or contig sequences.

Please also visit the following web site in more detail.

The components of the sequences

The components of the sequences are as follows
- chromosome level assembly sequence (chromosome)
- unlocalized sequence (scaffold and contig)
- unplaced sequence　(scaffold and contig)
- organelle genome sequence
- plasmid sequence
- segment sequence (for virus genome)
Examples of components
- Eukaryotic genome
  - chromosome, unlocalized, unplaced, organelle
- Prokaryotic genome
  - chromosome, plasmid
- Virus or phage genome
  - chromosome, segment
- Synthetic genome
  - chromosome

Glossary

replicon	A structural unit of DNA or RNA that is initiated from a single origin of replication and replicated sequentially by a series of regulatory factors.
chromosome	A set of one or more chromosomes, with or without gaps. Unlocalized and unplaced sequences may be included.
unlocalized sequence	A sequence that belongs to a particular chromosome but the position or orientation on that chromosome has not been determined.
unplaced sequence	A sequence that the chromosome number has not been determined.

Example DDBJ flat file format

Aspect of Finished level genomic sequences

Accession number; Basically, each Finished level genomic sequence submitted to DDBJ is assigned an accession number that consists of 2 alphabet characters and 6 digits .
DEFINITION ; The following information is displayed.
- In the case of which entry consists of only a single chromosome in prokaryotes genome sequences, “complete genome” is shown to indicate that entry is the full length of genome sequence.
- In eukaryotes, an entry that is composed of consecutive sequences for a single chromosome shows chromosome number.
COMMENT block includes Genome-Assembly-Data and information related to genome assembly. Here are the tag names of Genome-Assembly-Data.

Tag name	Value （information）
Assembly Method	Name of the assembly algorithm(s) with version number it was run.
Assembly Name	A brief name suitable for display that does not include the organism name. This is mandatory for eukaryotes.
Genome Coverage	The estimated base coverage across the genome.
Sequencing Technology	sequencing platform(s) used.

Example flat file for prokaryotes genome sequences entries
- Accession: AP025277-AP025279
- Aeromonas hydrophila strain; NUITM-VA1, chromosome and plasmid
Example flat file for eukaryotes genome sequences entries
- Accession: AP023152-AP023171
- Felis catus, chromosome genome assemblies
- AP023152 chromosome A1 entry