HOME > DDBJ Mail Magazine  
No.44 & 45  Nov. 12, 2009
Japanese (No.44, No.45) top Latest version top    backnumber backnumber    ddbj Published by DDBJ
 Indian Summer
In Japan, November 7 was called "Rittou", which means the start of the Winter season. But, actually, the weather is warm and comfortable. Does this relate to the "global heating"?
DDBJ issues DDBJ MailMagazine (No.44 & 45). If you have any questions and suggestions about DDBJmag, please do not hesitate to write ddbjmag@ddbj.nig.ac.jp . We would like to hear from you.
In this April, DDBJ brought in a new staff, and underwent the organizing reconstruction, according to the retirement of some original members. Under the following new members, DDBJ started activities toward for a new generation of bioinbormatics. DDBJ appreciate your continuing support for our activities.
 Center for Information Biology and DNA Data Bank of Japan (CIB-DDBJ)
General Director
Kousaku Okubo (Professor, Laboratory for Gene-Expression Analysis, CIB-DDBJ)
Secretary General
Takashi Gojobori (Professor, DNA Data Analysis Laboratory, CIB-DDBJ; Vice-President of NIG )
 DNA Data Bank of Japan (DDBJ) Staff
Yasukazu Nakamura (Professor, Gene-Product Informatics Laboratory, CIB-DDBJ)
Toshihisa Takagi (Professor, Laboratory for Research and Development of Biological Databases, CIB-DDBJ)
Eli Kaminuma (Assist Professor, Gene-Product Informatics Laboratory, CIB-DDBJ)
Osamu Ogasawara (Assistant Professor, Laboratory for Gene -Expression Analysis, CIB-DDBJ)
Prof. Okubo Prof. Gojobori Prof. Nakamura Prof. Takagi Assist. Prof. Kaminuma Assist. Prof. Ogasawara

DDBJ announce the release of DDBJ Sequence Read Archive(DRA) and DDBJ Trace Archive(DTA) websites.

DDBJ Sequence Read Archive DDBJ Trace Archive

DDBJ Sequence Read Archive (DRA) is an archive for primary analysis data from next-generation sequencers. DDBJ Trace Archive (DTA) is an archive for DNA sequence chromatograms (traces), base calls, and quality estimates for single-pass reads from capillary sequencers. The DRA and DTA collect the data in a collaboration with NCBI and EBI.

This DRA/DTA website describes the instructions for the data submission, file transfer, etc. This website will be helpful for the submission of the data from next-generation and/or capillary sequencers. Please refer them when you submit such data.

Moreover, in the DTA site, the data search (submitted via DDBJ) is available.

The DRA and DTA are a part of National project of integrating life science databases.

International Nucleotide Sequence Database Collaboration (INSDC), the three data banks; DDBJ, EMBL-Bank/EBI, GenBank/NCBI hold the international collaborators meeting every year.
In 2009, the meeting was held at NCBI in USA, 12-13 May.
DDBJ, EMBL-Bank, GenBank reported each bank activities in the last year, discussed practical matters to maintain and update INSDC.
Also, since this year (2009), INSDC has added a coraborative meeting to deal with mass sequence data produced by the next generation sequencers (Short Read Archive) and traces produced by traditional sequencers (Trace Archive).
The first meeting for this collaboration was held at NCBI in USA, 14-15 May 2009.
Short Read Archive
DRA; DDBJ Sequence Read Archive
ERA; European Read Archive (EBI)
Short Read Archive (NCBI)
Trace Archive
DTA; DDBJ Trace Archive
Ensembl Trace Server (EBI)
Trace Archive (NCBI)
The outcomes of the two meetings are summarized below.

The Items; Discussed and To Be Studied

Sequence data from the next generation sequencers

As mentioned above, the databases collecting outputs from the next generation sequencers have joined INSDC since 2009. INSDC will request major scientific journals that DRA/ERA/SRA accession numbers for corresponding sequence data should be included in paper submissions.
DDBJ/EMBL-Bank/GenBank reject submissions of EST sequence data produced by 454 sequencers (GS-20, GS-FLX, etc.). In principle, only DRA/ERA/SRA should accept those kinds of EST data.

the database for project ID

Since 2005, INSDC has started to discuss project ID assignment as a flag to specify genome projects, and how to indicate project ID on flat files.
Since 2008, INSDC have decided to use project ID, not only for genome/metegenome projects, but also many kinds of large scale sequencing projects including transcriptomes.
DDBJ and GenBank indicate project ID at DBLINK line on flat files. EMBL-Bank indicate project IDs in PR line on flat files. For the genome/metagenome projects, we have almostly completed to assign project IDs.
DDBJ/EMBL-Bank/GenBank require to describe project IDs for TSA submissions and thier primary entries.

Termination of strain level taxonomy ID assignment for microorganism genome submission

All organism names that are represented in the sequence data of DDBJ/EMBL-Bank/GenBank are registered to the taxonomy database.Taxonomy database assigned strain level taxonomy IDs for whole genome scale submissions of microorganisms, to flag those genome projects.
Since INSDC provided project IDs as a solution to index genome projects, we discussed to terminate assignment of strain level taxonomy ID for microorganism genomes. However, since many institutes have already cited those strain level IDs, we should carefully considrer that the policy change would cause confusion.

Frame mismatched candidates of protein coding regions of high-throughput sequence data

Increasing submissions of large scale draft sequence data, submitters often want to annotate frame mismatched candidates of protein coding regions with CDS features avoiding translation errors by operatively joined location.
To distinguish these kinds of CDS features, we will prepare a new qualifier, /artificial_location qualifier as a flag. In this regard, however DDBJ/EMBL-Bank/GenBank will accept only submissions from whole genome scaleprojects including large scale transcriptomes.

Structured COMMENT/CC line to capture metadata

Recently, GenBank started to use structured COMMENT approach to capture metadata related to a biological sample that has been sequenced.
The concept behind structured COMMENT is to provide submitters with a mechanism that allows them to supply a set of tag/value data elements that currently are not supported by the Feature Table.
DDBJ/EMBL-Bank/GenBank will discuss the format of structured COMMENT/CC line to use it in a formalized way.

Changes to the Feature Table Document: Features and Qualifiers

The following items will be applied from October 2009 with the revision of Feature Table Definition, if not otherwise specified.

The /pseudo qualifier to be separated into /pseudogene and /non_functional

Since the word "pseudo" is likely to be associated with "pseudogene" but it is used for both putative pseudogenes and non-functional forms, the /pseudo qualifier will be separated into /pseudogene and /non_functional, to better reflect their actual usages.
The modification will be applied in April 2010.

The value, "annotated by transcript or proteomic data", will be legal for /exception qualifier

A new qualifier, /haplogroup, will be legal for source feature.

For the /strain qualifier, it is no more legal to describe multiple equivalent names.

Previously (before May 2009), DDBJ accepted the sequence data with description of multiple-names in a /strain qualifier;
      /strain="ATCC #### (= JCM ### = NBRC ###)"
To describe equivalent strain names, appropriate usage of /note qualifier is recommended.
      /note="strain coidentity: JCM ### = NBRC ###"
      /strain="ATCC ####"

A new qualifier, / artificial_location , will be legal for CDS feature.

As mentioned above, we are preparing our check tools for /artificial_location qualifier. Details of changes will be made available shortly.
The modification will be applied in December 2009.

Improvement of the format of /inference qualifier

In order to describe inferential supports more effectively, format /inference qualifier will be improved. The discussion has been continued since 2008. Details of changes will be made available shortly.

A new directory, "tpa", was made under "ddbj_database" of DDBJ anonymous FTP site. Before this change, TPA (Third-Party Annotaion) data were separated into three directories; "ddbj", "ddbjnew" (for daily updates), and "wgs". Now, all of TPA associated data can be downloaded from the new "tpa" directory.
Moreover, another new directory, "dra", was also made under "ddbj_database" of DDBJ anonymous FTP site. Now, all of released data from DDBJ Sequence Read Archive can be downloaded from the new "dra" directory.
"trace" directory was renamed to "dta", according to the official launch of DDBJ Trace Archive (DTA). For details of the change in "ddbj_database" directory and its subdirectories, see README.TXT in the directory.
For details of the change in "ddbj_database" directory and its subdirectories, see README.TXT in the directory.

If you automatically monitor DDBJ anonymous FTP, please confirm your monitoring program if necessary.
 Redistribution of genomic sequence (build 4) of the cultivar Nipponbare of Japanese rice (Oryza sativa Japonica Group ) assigned with RAP annotation Aug. 11, 2009

The entries corresponding to complete sequences of japonica rice chromosomes were updated from build 3 to build 4.
Their accession numbers are AP008207-AP008218.
The updated contents were the results annotated by The Second Rice AnnotationProject Meeting (RAP2) which was an international rice genome annotation project organized by Japanese research groups.
By this updating, AP008207-AP008218 entries were redistributed with approximately 28,000 CDS (protein coding region) features.

The accession numbers  (Anonymous FTP)  are as follows ;

Reference sites
(Photo:Integrated Database Project)
 Release of new silkworm (Bombyx mori) small RNA MGA 4,448,218 entries Jun. 3, 2009

DDBJ newly released 4,448,218 entries of??MGA data derived from silkworm (Bombyx mori), which had been submitted by University of Tokyo.

The accession numbers are as follows;
  • AHAAB0000001-AHAAB0547473
  • AHAAC0000001-AHAAC1704525
  • AHAAD0000001-AHAAD2196220
  • (total 4,448,218 entries)
Related site: About Mass sequence Genome for Sequence (MGA) entry
Anonymous FTP: AH_resource_index

These entries were released as DDBJ daily updates on June 1, 2009.
 Release of new tobacco (Nicotiana tabacum) EST 65,102 entries Jul. 24, 2009

tobacco DDBJ newly released 65,102 entries of EST data derived from tobacco (Nicotiana tabacum), which had been submitted by Japan Tobacco Inc.

The accession numbers  (Anonymous FTP)  are as follows;

These entries were released as DDBJ daily updates on Jul. 23, 2009.

(Photo:Integrated Database Project)
  • The nucleotide sequence database collected and maintained by DDBJ is quarterly released online to the public. We completed DDBJ Release 79.0 on Sep. 25, 2009. DDBJ Release 79.0 consists of 108,593,519 entries, and the number of bases reached 106,684,379,504. See also the DDBJ release note.
  • DDBJ amino acid database (DAD) Rel.49.0 was released on Oct. 14, 2009 at DDBJ. DAD Rel 49.0 consists of 15,359,639 entries, and the total number of residues reached 4,200,060,817.
The periodical release and the new data are available by FTP download from the "FTP/Web API" page.
MAFFT service was opened to the public in Web API for Biology (WABI). MAFFT is a fast and accurate multiple alignment program developed by Dr. Kato of Kyushu university (http://align.bmr.kyushu-u.ac.jp/mafft/software/)

Genome Information Broker for Viruses (GIB-V) extracted 52,088 complete virus genomes or segments data from DDBJ release 78 and enhanced host information. The host information which is not written in DDBJ entry is added by using cross reference information of UNIPROT entry. Furthermore, a common name is converted to a scientific name by using TxSearch system.

The paper about Web API for Biology (WABI) was published in 2009 Web Server Issue of Nucleic Acids Research (Volume 37 Issue 12 July 2009).

"Web API for biology with a workflow navigation system"
Yeondae Kwon, Yasumasa Shigemoto, Yoshikazu Kuwana and Hideaki Sugawara
Nucleic Acids Research, 2009, Vol. 37, No. suppl_2 W11-W16

Published by:
DNA Data Bank of Japan (DDBJ)
Center for Information Biology and DNA Data Bank of Japan (CIB-DDBJ)
National Institute of Genetics (NIG)
Research Organization of Information and Systems
1111 Yata, Mishima, Shizuoka 411-8540, JAPAN