DNA Data Bank of Japan
DDBJ Mail Magazine 
No. 22 (November 15, 2005)
top Latest version top
backnumber Back number
ddbj Published by DDBJ
This page is translated from Japanese version. Sending "E-mail magazine" has not started yet.
Search for
Site Map
about DDBJ
Data Submission
       Mass Sub
Data Updates
Search and Analysis
ARSA   getentry
Breakdown Stats
Download data
  DDBJ Release Note
  Release Information
Q and A
Dorrs for Infomation Bioligy
  Conference on Info Bio

 Contact Us  
Copyright © 1995-2006
DDBJ All rights reserved.
 ♦ Smiley sits on the mushroom  
This is the web version of bimonthly DDBJ Mail Magazine 22nd.
The CIB-DDBJ building is surrounded by a beautiful green environment. Here and there, we can find many kinds of mushrooms (or toadstools !?), which grow among fallen leaves or hold on to the stems of some trees.
The picture shows a yellow smiley midget enjoying the view of NIG (National Institute of Genetics) from the top of a mushroom gripped onto a cherry tree. Do you want to join him?
Article titles are below to jump to your interests. Smiley sits on the mushroom
  1. Public collections of DNA and RNA sequence reach 100 gigabases
  2. Rice Genome analysis completed
  3. Chimpanzee Genome sequencing completed
  4. Release of transcript sequences derived from human and mouse with a huge scale
  5. Macaca fascicularis cDNA database (QFbase)
  6. New items are added to the DDBJ HP
  7. Change the format of a Variable record of MGA data
  8. DDBJ Rel. 63 Completed
  9. The report of INSDC annual meeting
  10. Termination of SF gate-WAIS and malign service
If you have any questions and opinions about DDBJmag, please don't hesitate to write to ddbjmag@ddbj.nig.ac.jp.
We really want to hear from you !!!!

 ♦ Public collections of DNA and RNA sequence reach 100 gigabases 
The world's three leading public repositories for DNA and RNA sequence information have reached 100 gigabases (100,000,000,000 bases; the 'letters' of the genetic code) of sequence. Thanks to their data exchange policy, which has paved the way for the global exchange of many types of biological information, the three members of the International Nucleotide Sequence Database Collaboration (INSDC, http://www.insdc.org), EMBL-Bank (Hinxton, UK) , GenBank (Bethesda, USA) and the DNA Data Bank of Japan (Mishima, Japan) all reached this milestone together.
Takashi Gojobori, Director of the Center for Information Biology and DNA Data Bank of Japan, says: "The INSDC has laid the foundations for the exchange of many types of biological information: as we enter the era of systems biology and researchers begin to exchange complex types of information such as the results of experiments that measure the activities of thousands of genes, or computational models of entire processes, it is important to celebrate the achievements of the three databases that pioneered the open exchange of biological information."
Graham Cameron, Associate Director of EMBL's European Bioinformatics Institute, says "This is an important milestone in the history of the nucleotide sequence databases; From the first EMBL Data Library entry made available in 1982 to today's provision of over 55 million sequence entries from at least 200,000 different organisms, these resources have anticipated the needs of molecular biologists and addressed them - often in the face of a serious lack of resources."
David Lipman, Director of the National Center for Biotechnology Information, adds: "Today's nucleotide sequence databases allow researchers to share completed genomes, the genetic make-up of entire ecosystems, and sequences associated with patents. The INSDC has realized the vision of the researchers who initiated the sequence database projects, by making the global sharing of nucleotide sequence information possible."
EMBL-Bank and GenBank had started International Nucleotide Sequence Database activities in 1980, and DNA Data Bank of Japan (DDBJ) had joined in its activities as a third collaborative partner in 1987. DDBJ has received data submission from all countries of the world, mainly from Japan. By exchanging the collected data with another 2 databanks, DDBJ has contributed to the development of International Nucleotide Sequence Database collaboration.
For details of DDBJ, please refer to http://www.ddbj.nig.ac.jp/.

 ♦ Rice Genome analysis completed 
The Rice genome (Oryza sativa (japonica cultivar-group) cv. Nipponbare) has been sequenced by the IRGSP (International Rice Genome Sequencing Project), in December 2004, and Japan has been taking a leading role of this project.
Recently, a detailed analysis of the high-quality sequence of the rice genome was featured in Nature (vol.436, pp.793-800; Aug.11, 2005).
According to National Institute of Agrobiological Sciences (NIAS) and the Institute of the Society for Techno-innovation of Agriculture, Forestry and Fisheries (STAFF), a total of 37,544 genes were identified. About 71% of these genes are similar to genes found in Arabidopsis, a model experimental dicotyledonous plant, which has been completely sequenced in the year 2000. A total of 2,859 genes (c.a. 8%) did not have homologies with those of Arabidopsis. These genes may be rice-specific or monocot-specific.
The data was submitted to DDBJ/EMBL/GenBank International Nucleotide Sequence Databases and was released under the Accession numbers AP008207 - AP008218. The sequence information is available via DDBJ getentry system.
The sequence information is available via DDBJ getentry system. Drs. Takashi Gojobori (Director of CIB-DDBJ) and Hisakazu Iwama (formerly belonged to DDBJ until 2004.10, current affiliation is Associate Professor of Kagawa University) participated in International Rice Genome Sequencing Project, in charge of Annotation and Analysis.

-IRGSP (International Rice Genome Sequencing Project)
-DDBJ HP:Link to Other web pages on Information Biology (databases for Rice)
-IRGSP Release (Build4.0)(2005.8.31)
-IRGSP Release (Build 3.0)(2005.2.3)

 ♦ Chimpanzee Genome sequencing completed 
Draft sequence of the chimpanzee genome was analyzed by the Chimpanzee Sequencing and Analysis Consortium, and published in Nature (vol.437, pp.69-87; Sep.1, 2005).
Chimpanzee is the humankind's closest living relative. Comparison of the genomes between human and chimpanzee will assist understanding of human specific functions and the mechanism of evolution from primates to human.
The data was submitted to DDBJ/EMBL/GenBank International Nucleotide Sequence Databases as whole genome shotgun (WGS) sequences data which is available via DDBJ getentry system. Or the data is also available by downloading from "Anonymous FTP of the DDBJ" page or WGS ftp site (AACZ.gz, AADA.gz).
Chimpanzee chromosome 22 DNA sequencing had already been completed by the International Chimpanzee Chromosome 22 Consortium in last May, and the result was published in Nature.

 ♦ Release of transcript sequences derived from human and mouse with a huge scale 
The FANTOM Consortium, which is led by the Institute of Physical and Chemical Research of Japan, has comprehensively studied the mouse genome and transcriptome. On September 2, they published two papers in Science about the integrative analysis of human and mouse transcripts (Science (vol.309, pp.1559-1563; Sep.2, 2005), Science (vol.309, pp.1564-1566; Sep.2, 2005).
In these papers, they reported the finding of many protein-coding and non-protein coding transcripts in both mouse and human. They suggested that these non-protein coding RNAs (ncRNAs) regulate the expression of normal transcripts encoding proteins and provide important information associated with the gene expression control mechanism in mammals.
Professor T. Gojobori (Head of Center for Information Biology and DNA Data Bank of Japan; CIB-DDBJ), Associate professor K. Ikeo and their colleagues participated in the FANTOM Consortium from CIB-DDBJ. They are co-authors of the papers mentioned above. In addition, the activities of the Genome Network Project are also related to this project.
Approximately 2,000,000 million EST (expressed sequence tag), about 110,000 HTC (high throughput cDNA sequence), and 8,800,000 MGA (Mass sequence for genome annotation) entries, which were referred to these papers, have been already registered and released from DDBJ. All of the data can be retrieved, displayed and acquired by using the DDBJ retrieval tool, getentry.

- Press release on RIKEN site (Japanese Only)
- FANTOM Database
- The list of accession number assigned to sequences used in the research; cite fro web site of Genome Exploration Research Group, Genome Sciences Center, RIKEN

 Macaca fascicularis cDNA database (QFbase) 
Cynomolgus monkeys (Macaca fascicularis) are common laboratory animals widely used for medical and pharmaceutical researches. Macaca fascicularis cDNA database (QFbase) was constructed by JCRB Genebank, National Institute of Biomedical Innovation, Japan, under collaborative research with Laboratory for DNA Data Analysis, CIB-DDBJ, National Institute of Genetics, Japan; Division of Genetic Diagnosis, Institite of Medical Sciences and Department of Medical Genome Sciences, University of Tokyo, Japan.
The database provides about 85,000 5' or 3' EST sequences of cynomolgus monkey cDNA libraries derived from the brain, liver, and testis. About 4,000 full-length cDNA sequences are also included in the database and 1,700 of them contain the information for human-macaque difference in the protein coding region.
Accession numbers released from DDBJ are BB873801 - BB894695 (20895 entries, 3'EST) and CJ430287 - CJ493524 (63238 entries, 5'EST). These entries can be retrievable via DDBJ getentry system.
The cDNA clones are ready to be provided though Human Science Research Resource Bank (HSRRB) for further functinal experiments. All data are available at http://genebank.nibio.go.jp/gbank/index_e.html.
This "QFbase" is also intorduced at Link to DDBJ/CIB web pages on Information Biology and Link to Other web pages on Information Biology: Databases for Other Mammals pages. For your reference, please see these pages, too.

 ♦ New items are added to the DDBJ HP  
We added following items on the DDBJ HP.
 ♦ Change the format of a Variable record of MGA data  
The format of the Variable record of MGA (Mass sequence for Genome Annotation) data was slightly changed as the International Nucleotide Sequence Database Collaboration Meeting in 2005. The modification is to remove "//" line which is inserted between two consecutive entries in the Variable record. We appreciate your attention to the change.
Old format
>ZZZZZ0000001|ABC1004AA60F1902|10|9B|lipidosis-related protein Lipidosin| MGI:2385656|
(Skip the rest)
New format
>ZZZZZ0000001|ABC1004AA60F1902|10|9B|lipidosis-related protein Lipidosin| MGI:2385656|
(Skip the rest)
About Mass sequence Genome for Sequence (MGA) entry

 ♦ DDBJ Rel. 63 Completed 
The nucleotide sequence database collected and maintained by DDBJ is quarterly released online to the public. We completed DDBJ Release 63 in Sep, 30, 2005. DDBJ Release 63 consists of 47,741,593 entries, and the number of bases reached 52,246,110,341.
FTP site for periodical release and new data download.


 ♦ The report of INSDC annual meeting 
To operate and implement the collaborative construction of the international nucleotide sequence database, the three data banks; DDBJ, EMBL-Bank/EBI, GenBank/NCBI hold the international collaborators meetings at every May.
In 2005, the meeting was held at DDBJ in Mishima, on May16-18.
DDBJ, EMBL, GenBank reported each databank activities in the last year and discussed some practical matters to maintain and to develop the nucleotide sequence database as follows;

The Items; Discussed and To Be Studied
  • The international nucleotide sequence database constructed by the collaboration among DDBJ, EMBL and GenBank was agreed to be called INSDC; International Nucleotide Sequence Database Collaboration.
    INSDC has made public its web site; http://www.insdc.org/
  • Since 2003, we have discussed the format of INSD_XML. DDBJ has made the trial data in INSD_XML format available at its FTP site and its retrival tool, getentry, in advance of EMBL and GenBank.
  • Since 2004, we have accepted the submission of MGA data. We reconsidered the rules for acceptance and the format for distribution. What is MGA?; http://www.ddbj.nig.ac.jp/sub/mga-e.html
  • Since 2002, we have accepted TPA submission. In the past, some biological evidence was required for the TPA submission. Now, we are planning to accept the inferred sequences by non-experimental evidence. We will continue to discuss the guideline for acceptance and classification of TPA submissions.
  • Since 2003, the locus_tag qualifier has been used as the identifier for the tracking purpose by many genome projects. In the past, we allowed submitters to use the flexible prefixes for their locus_tag. However, as we are afraid that it would cause some disruption in the future, we will manage and assign prefixes of locus_tag to keep uniquness through the whole database.
    In association with it, we will improve our flat file format to include the PROJECT_ID that can be utilized to specify the project (mainly for genome projects).
  • Relating to their strandness and partiality, rRNA sequences are not consistently annotated in the database. It was agreed that we should check them, and also that the same preference for plus stranded annotation should be applied to other single feature.
Changes to the Feature Table Document: Features and Qualifiers
  • For features, especially CDS, the database users demand the information if the feature description is based on some biological experiment or only inference based on sequence similarity or so. To make the evidence information available, evidence qualifier will be split into two new qualifiers, "experiment" and "inference";
a) An argument of the feature based on some biological experiment (instead of /evidence=experimental)
/experiment="free text" (less than 1000 letters)
b) An argument of the feature not based on any biological experiment (instead of /evidence=not_experimental)
/inference="[TYPE] (same species):[evidence basis]"
(The values of [TYPE] will be controlled by the list. )
- note - The old qualifier, /evidence=experimental or not_experimental, will be replaced by followings, respectively;
     /experiment="experimental evidence, no additional details recorded"
     /inference="non-experimental evidence, no additional details recorded"
  • Recently, the number of the entries for the research of environmental sampling and divergence of the life (e.g. the BARCODE project) is significantly increased. For these submission, it is important to describe the information on specification of the specimen from which the sequence is obtained. So, five new qualifiers will become legal on the "source" feature;
- /collection_date="DD MMM YYYY" or "MMM YYYY" or "YYYY"
     DD ; two-digit for the date,
     MMM ; three letter for the month abbreviation
     YYYY; four-digit for the year
- /lat_lon="###.## [N or S], ###.## [E or W]"
- /collected_by="[Name of the person who collected the specimen.]"
- /identified_by="[Name of the person who identified the specimen.]"
- /PCR_primers="fwd_name:[name],fwd_seq:[sequence],rev_name:[name], rev_seq:[sequence]"
  • The "pseudo" qualifier will be legal on "intron" and "misc_RNA" features.
  • The "rpt_unit" qualifier will be split into two new qualifiers; "rpt_unit_range" and "rpt_unit_seq" will be introduced.
  • Two new qualifiers, "ribosomal_slippage" and "trans_splicing" will be valid on the CDS feature.
  • "hydrogenosome" will be added to the list of legal values for the "organelle" qualifier.
Changes to Other Items
    The rules for the description of location will be changed;
  • combinations of "join" and "order" operators in one location will be illegal.
  • the use of two identical location construction operators within one location will be illegal.
    (Ex. '100..100' will be illegal)
  • the usage of '^' will be restricted to adjacent nucleotides.
    (Ex. '100^200' will be illegal)
  • the use of range (m.n) descriptor within location spans will be illegal.
    (Ex. '(5.10)..100' will be illegal)
 ♦Termination of SF gate-WAIS and malign service 
SF gate-WAIS and malign service were terminated on September 30. As a substitute service of data retrieval by key words provided by DDBJ, "ARSA" and "SRS" are available. As a service of multiple alignment, "ClustalW "(clustalw@nig.ac.jp) is available. Thank you very much for your understanding and cooperation.

Published by: DNA Data Bank of Japan (DDBJ)
  Center for Information Biology and DNA Data Bank of Japan (CIB-DDBJ)
National Institute of Genetics (NIG)
Research Organization of Information and Systems
1111 Yata, Mishima, Shizuoka 411-8540, JAPAN
Last modified: Oct. 07, 2011