• Entries from ENA and GenBank during a specific period are not being reflected in getentry

DDBJ Annotated/Assembled Sequences

  • Home
  • Submission
    • Before Submission
    • Web submission
    • Mass Submission
    • Data Update
  • Search
    • getentry
    • ARSA
  • Flat file
    • Feature Table
    • Feature key
    • Qualifier key
    • Nucleotide Sequences
    • Organism qualifier
    • Identifiers
    • Description of Location
    • Protein Coding Sequence
    • The Genetic Codes
    • Codes Used in Sequence Description
    • Description Examples of Sequence Data
  • Data categories
    • Data Submission from Genome Project
    • Pseudohaplotype
    • WGS
    • Finished level genomic sequences
    • Metagenome Assembly
    • Single amplified genome
    • HTG
    • Environmental sample
    • ENV
    • TLS
    • Data Submission from Transcriptome Project
    • TSA
    • EST
    • HTC
    • Third Party Data (TPA)
  • FAQ
  • Other
    • Patent
    • MGA
  • Home
  • ddbj
  • WGS

WGS

The whole genome shotgun approach (the whole genome is once blasted into millions of fragment, which are sequenced and reassembled to produce a series of sequence ‘scaffolds’.) has been used to sequence the genome of various organisms.

The large set of contigs from the proceeding genome project can be submitted to DDBJ/ENA/GenBank as WGS data.
See also INSDC standards for genome assembly submission

See the list of publicized WGS data.

You can submit WGS data to DDBJ via Mass Submission System (MSS).

Acceptable WGS data

In principle, DDBJ/ENA/GenBank can accept assemblies (i.e. overlapping reads) that are appropriately assembled sequences and can not accept redundant reads (i.e. raw read sequences). If you wish to publicize raw read sequences, we recommend you to contact DDBJ Sequence Read Archive (DRA).
  • The WGS entries are contigs (overlapping reads) and/or the scaffolds (assembled contigs separated by gaps).
  • The WGS entries can contain consecutive "n" s to represent sequencing gaps.

Unacceptable WGS data

  • Assembled genome sequences from multiple organisms that are not metagenomes.
  • The following cases without chromosome assembly (contigs and scaffolds)
    • Organelle genome contigs alone.
    • Plasmids contigs alone.

Submission of WGS entry

The Submitters visit the MSS form site and make an application.

  • Prior to assembly sequence data submission, it is required to submit to BioProject and BioSample databases.
  • If you wish to annotate all protein-coding genes and non-protein-coding RNA genes on the assembly sequences, please register a locus_tag prefix when submitting each BioSample.
  • Sample annotation: (WGS sample annotation).

Sample flat file

Aspects of WGS

  • Basically, each WGS sequence submitted to DDBJ is assigned an accession number that consists of 6 alphabet characters and 9 digits (since January 2024) or 4 alphabet characters and 8 digits.
  • “WGS” and either of controlled terms (STANDARD_DRAFT, HIGH_QUALITY_DRAFT, IMPROVED_HIGH_QUALITY_DRAFT, ANNOTATION_GRADE, NON_CONTIGUOUS_FINISHED) indicating the degree of completion as genome sequence are indicated in KEYWORDS line. The definitions of each KEYWORD can be found on the following website(INSDC agreed methodological keywords).
  • A summary of the assembly is displayed in the COMMENT.
Tag name Value (information)
Assembly Method Name of the assembly algorithm(s) with version number it was run.
Assembly Name A brief name suitable for display that does not include the organism name. This is mandatory for eukaryotes.
Genome Coverage The estimated base coverage across the genome.
Sequencing Technology Sequencing platform(s) used.


LOCUS       ZZZZZZ010000001              123456 bp    DNA    linear   ROD 07-AUG-2024
DEFINITION  Mus musculus C57BL6 DNA, EN0001. 
ACCESSION   ZZZZZZ010000001 ZZZZZZ010000000
VERSION     ZZZZZZ010000001.1
DBLINK      BioProject:PRJDB99999
            Sequence Read Archive:DRR999998, DRR999999
            BioSample:SAMD99999999
KEYWORDS    WGS; STANDARD_DRAFT.
SOURCE      Mus musculus
  ORGANISM  Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
            Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; 
            Muroidea; Muridae;Murinae; Mus; Mus.
REFERENCE   1  (bases 1 to 123456)
  AUTHORS   Mishima,H. and Shizuoka,T.
  TITLE     Direct Submission
  JOURNAL   Submitted (01-MAY-2024) to the DDBJ/EMBL/GenBank databases.
            Contact:Hanako Mishima
            National Institute of Genetics, DNA Data Bank of Japan; Yata 1111,
            Mishima, Shizuoka 411-8540, Japan
REFERENCE   2
  AUTHORS   Mishima,H., Shizuoka,T. and Fuji,I.
  TITLE     Mouse whole genome shotgun sequence
  JOURNAL   Unpublished (2024)
COMMENT     Whole genome shotgun sequencing project.
            #Genome-Assembly-Data-START##
            Assembly Method       :: HGAP v. 1.0; Celera Assembler v. 7.0; 
                                     Quiver v. 1.4.0; Sequencher v. 5.1
            Assembly Name         :: MusC56 v1
            Genome Coverage       :: 238x
            Sequencing Technology :: PacBio RS, Illumina GAIIx
            ##Genome-Assembly-Data-END##
FEATURES             Location/Qualifiers
     source          1..123456
                     /collection_date="missing: lab stock"
                     /db_xref="taxon:10090"
                     /geo_loc_name="Japan"
                     /mol_type="genomic DNA"
                     /organism="Mus musculus"
                     /strain="C57BL6"
                     /submitter_seqid="EN0001"
     CDS             complement(join(147..1241,1364..1816))
                     /codon_start=1
                     /locus_tag="DDBJGEN_0001G0001"
                     /product="hypothetical protein"
                     /protein_id="xxxxxxxxxx.1"
                     /transl_table=1
                     /translation="MTEHIFEKISLNLSNIINKCVYKQTTLNDAQNE
                     IKETMNVIINQYNHYITKDVMDEILILTSKLLYSQNIESLIIYLNKL
                     (snipped)
                     GFFRMYQIWNVS"
     assembly_gap    2982..3269
                     /estimated_length=288
                     /gap_type="within scaffold"
                     /linkage_evidence="paired_ends"
     tRNA             3569..3643
                     /locus_tag="DDBJGEN_t0001G0001"
                     /product="tRNA-Ser"

-- The rest is snipped --
//

Related pages

  • Data Submission from Genome Project
  • CON
  • GSS
  • HTG
  • Submission of environmental sequences
  • ENV
  • TLS
  • Data Submission from Transcriptome Project
  • TSA
  • EST
  • HTC
  • Third Party Data (TPA)