Whole Genome Shotgun (WGS) sequences data

The whole genome shotgun approach (the whole genome is once blasted into millions of fragment, which are sequenced and reassembled to produce a series of sequence 'scaffolds'.) has been used to sequence the genome of various organisms.

The large set of contigs or the finished sequences without annotation from the proceeding genome project can be submitted to DDBJ/EMBL-Bank/GenBank as WGS data.

Please click here and you can see the list of publicized WGS data.

You can submit WGS data to DDBJ via Mass Submission System (MSS).

Acceptable WGS data

In principle, DDBJ/EMBL-Bank/GenBank can accept assemblies (i.e. overlapping reads) that are appropriately assembled sequences and can not accept redundant reads (i.e. raw read sequences). If you wish to publicize raw read sequences, we recommend you to contact with DDBJ Trace Archive (DTA) or DDBJ Sequence Read Archive (DRA), instead of DDBJ/EMBL-Bank/GenBank.

  • To describe on your paper or other purpose, submitted WGS data should be open to the public* with their accession numbers.
        * You can specify a hold-date for your WGS data.
        See also Principle of "Hold-Until-Published" data release .
  • Any assemblies can not have consequence "n" s representing sequencing gap.
  • DDBJ can accept assembled sequences separated by gaps (i.e. supercontigs) as contigs of assemblies, "scaffold CON entry", in the AGP file format.
  • Further contig structures, ultra-scaffold and/or chromosomal can also be submitted as CON entries in the AGP file format.
  • Prior to sequence data submission, it is required to submit to BioProject Database and BioSample Database.

The WGS data are expected to be updated as the project progresses. When the genome sequencing is completed but not annotated with appropriate features, i.e. CDS (protein-coding gene) and others, the data are still processed as WGS. After addition of feature annotation, the complete genome sequence is assigned a new accession number constructed with two alphabets and six digits and the WGS accession number is made secondary. Then, the complete genome entry is moved to Taxonomic Division classified by the source organism.

WGS accession number

The accession number assigned to each WGS data consists of 4 letters + 8 (sometimes 9 or 10, if necessary) digits.

Example: ZZZZ01000001

4 letters -- Prefix to distinguish each project
2 digits -- Version number of the data set
6 digits -- ID of each individual sequence (It sometimes be 7 or 8 digits according to the number of entries.)

The set_version goes up for every update of the dataset. Example: ZZZZ02000001

Sample flat file of WGS data

LOCUS       ZZZZ01000001    123456 bp  DNA   linear HUM 01-MAY-2003
DEFINITION  Homo sapiens DNA, chromosome 7, contig: A01234B01.
ACCESSION   ZZZZ01000001 ZZZZ01000000
VERSION     ZZZZ01000001.1
DBLINK      BioProject:PRJDA12345
            BioSample:SAMD01234567
            Sequence Read Archive:DRR012345, DRR012346
KEYWORDS    WGS.
SOURCE      Homo sapiens
  ORGANISM  Homo sapiens
            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
            Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
            Catarrhini; Hominidae; Homo.
REFERENCE   1  (bases 1 to 123456)
  AUTHORS   Mishima,H. and Shizuoka,T.
  TITLE     Direct Submission
  JOURNAL   Submitted (01-APR-2003) to the DDBJ/EMBL/GenBank databases.
            Contact:Hanako Mishima
            National Institute of Genetics, DNA Data Bank of Japan; Yata 1111,
            Mishima, Shizuoka 411-8540, Japan
REFERENCE   2
  AUTHORS   Mishima,H., Shizuoka,T. and Fuji,I.
  TITLE     Human whole genome shotgun sequence
  JOURNAL   Unpublished (2003)
COMMENT     ##Genome-Assembly-Data-START##
            Finishing Goal           :: Finished
            Current Finishing Status :: High Quality Draft
            Assembly Method          :: Newbler v. 2.3
            Genome Coverage          :: 30x
            Sequencing Technology    :: 454/Illumina
            ##Genome-Assembly-Data-END##
FEATURES             Location/Qualifiers
     source          1..123456
                     /db_xref="taxon:9606"
                     /chromosome="7"
                     /mol_type="genomic DNA"
                     /note="contig: A01234B01"
                     /organism="Homo sapiens"

-- The rest is snipped --
// 
ページの先頭へ戻る