HOME > Hot Topics

DDBJ starts accepting Trace Archive data constructing it as a part of the integrated database project.

Trace Archive is defined by NCBI as a permanent repository of DNA sequence chromatograms (traces), base calls, and quality estimates for single-pass reads from various large-scale sequencing projects.

http://www.ncbi.nlm.nih.gov/Traces/trace.cgi

DDBJ has reached the first registration of Trace Archive data in July 2008, supported by National project of integrating life science databases.

1. Trace data of Oryzias latipes WGS sequences determined by National Institute of Genetics (NIG):
TI numbers are as follows:
  • 2095022956-2095389675
  • 2095396176-2096435759
  • 2096858496-2096933759

* Relevant announcement: Release of WGS 134,429 entries and CON 6,928 entries for Medaka strain Hd-rR, and WGS 346,141 entries and CON 38,235 entries for strain HNI.
2. Trace data from human gut metagenome project by University of Tokyo, the Center for Omics and Bioinformatics (UTCOB):
TI numbers are as follows:
  • 2097946941-2099007079

* Relevant announcement: Release of new human gut metagenome WGS data, 353,805 entries.
(1) Assemble trace data to WGS entries.
The sizes of these trace data are as follows:
(a) about 50G bytes(from NIG, gzipped tar files including .qual, peak, .seq and .scf)
(b) about 40G bytes(from UTCOB, gzipped tar files including .scf)
These trace data both (a) and (b) were assembled to WGS entries:
The (a) trace data was firstly assembled to the part of BAAF WGS entries (about 309M bytes, gzipped tar file including Flat File format). The trace data (a) was further assembled to DG000001-DG000024 chromosome /CON entries. Medaka genome sequencing project web site provides more details.
The trace data (b) was assembled to BAAU-BABG WGS entries (about 272M bytes, gzipped tar file including Flat File format).
(2) Transfer the file from DDBJ to NCBI .
We uploaded test data to NCBI Trace Archive by the conventional ftp protocol. It took intolerably long time to transfer. We have investigated several alternative file transfer protocol and application. Then, we have been able to transfer by parallel transfer of multiple files by the conventional ftp. The transfer was actually completed in several hours though it was expected 2 whole days based on a sequential ftp.

These data are now retrievable at NCBI Trace Archive. DDBJ starts preparing the original web page for the retrieval of trace data.

August 6, 2008