The whole genome shotgun approach (the whole genome is once blasted into millions of fragment, which are sequenced and reassembled to produce a series of sequence ‘scaffolds’.) has been used to sequence the genome of various organisms.
The large set of contigs from the proceeding genome project can be submitted to DDBJ/ENA/GenBank as WGS data.
See also INSDC standards for genome assembly submission
You can submit WGS data to DDBJ via Mass Submission System (MSS).
- Acceptable WGS data
- In principle, DDBJ/ENA/GenBank can accept assemblies (i.e. overlapping reads) that are appropriately assembled sequences and can not accept redundant reads (i.e. raw read sequences). If you wish to publicize raw read sequences, we recommend you to contact DDBJ Sequence Read Archive (DRA) instead of DDBJ/ENA/GenBank.
- Prior to sequence data submission, it is required to submit to BioProject Database and BioSample Database.
- DDBJ accepts following two formats for WGS submissions;
- a) WGS + scaffold CON:
- The WGS entries are the contigs (overlapping reads with no gaps)
- The WGS entries can NOT have consequence "n"'s to represent sequencing gaps.
- If you need to submit how the WGS entries are assembled together into scaffolds or chromosomes, you can submit AGP file.
- DDBJ can accept scaffold (assembled contigs separated by gaps) as CON entry", in AGP format.
- b) WGS with gaps:
- The WGS entries are the scaffolds (assembled contigs separated by gaps).
- The WGS entries can contain consequence "n"'s to represent sequencing gaps.
- No AGP file is required.
Sample flat file
Aspects of WGS
- Basically, each WGS sequence submitted to DDBJ is assigned an accession number that consists of 4 alphabet characters and 8 digits .
- “WGS” and either of controlled terms indicating the degree of completion as genome sequence are indicated in KEYWORDS line.
LOCUS ZZZZ01000001 123456 bp DNA linear HUM 01-MAY-2003 DEFINITION Homo sapiens DNA, chromosome 7, A01234B01. ACCESSION ZZZZ01000001 ZZZZ01000000 VERSION ZZZZ01000001.1 DBLINK BioProject:PRJDA12345 BioSample:SAMD01234567 Sequence Read Archive:DRR012345, DRR012346 KEYWORDS WGS; STANDARD_DRAFT. SOURCE Homo sapiens ORGANISM Homo sapiens Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo. REFERENCE 1 (bases 1 to 123456) AUTHORS Mishima,H. and Shizuoka,T. TITLE Direct Submission JOURNAL Submitted (01-APR-2003) to the DDBJ/EMBL/GenBank databases. Contact:Hanako Mishima National Institute of Genetics, DNA Data Bank of Japan; Yata 1111, Mishima, Shizuoka 411-8540, Japan REFERENCE 2 AUTHORS Mishima,H., Shizuoka,T. and Fuji,I. TITLE Human whole genome shotgun sequence JOURNAL Unpublished (2003) COMMENT Whole genome shotgun sequencing project. FEATURES Location/Qualifiers source 1..123456 /db_xref="taxon:9606" /chromosome="7" /mol_type="genomic DNA" /organism="Homo sapiens" /submitter_seqid="A01234B01" -- The rest is snipped -- //