DNA Data Bank of Japan
DDBJ Mail Magazine 
October 16, 2006
top Latest version top
backnumber Back number
ddbj Published by DDBJ
This page is translated from Japanese version. Sending "E-mail magazine" has not started yet.
Search for
Site Map
about DDBJ
Data Submission
       Mass Sub
Data Updates
Search and Analysis
ARSA   getentry
Breakdown Stats
Download data
  DDBJ Release Note
  Release Information
Q and A
Dorrs for Infomation Bioligy
  Conference on Info Bio

 Contact Us  
Copyright © 1995-2006
DDBJ All rights reserved.
 ♦ Good season!!! 
In Japan, we tend to feel the fall season as the season for readings, arts and foods. It is calm and good for reading books with hot drinks. Leaves' color change day by day, to yellow, red, brown. And the foods! We should be careful not to eat them too much!!!
If you have any questions and suggestions about DDBJmag, please don't hesitate to write to ddbjmag@ddbj.nig.ac.jp. We really want to hear from you!!!

 ♦ Completion of DDBJ Release 67 
The Nucleotide Sequence Database collected and managed by DDBJ have quarterly distributed and published online. DDBJ Release 67 was completed. The numbers of entries are 61,144,621 and the numbers of nucleotide sequences are 65,443,024,193. The download site of FTP regular release and new coming data is as follows;

At present, all files of DDBJ release except some indexes (ddbjacc#.idx, ddbjjou#.idx, and ddbjkey#.idx) have 300 MB storage capacity. From the next release 68, December 2006, we will change the maximum file size from 300 MB to 1.5 GB, because the network capacity has been remarkably increased. Each file named as ddbj***##.seq will have at most 1.5 GB storage capacity like as the index files by then.
For more datails, please refer to the [DDBJ Release Note 67] .

 ♦ Redistribution of whole rice genome data (O.sativa ssp.japonica) with RAP annotation 
The entries corresponding to complete sequences of japonica rice chromosomes were updated. Their accession numbers are AP008207-AP008218. The updated contents were the results annotated by The First Rice Annotation Project Meeting (RAP1) which was an international rice genome annotation project organized by Japanese research groups. By this updating, AP008207-AP008218 entries were redistributed with approximately 26,800 CDS (protein coding region) features.

This data can be acquired from the following sites. The result of RAP1 annotation has also been available in the Rice Annotation Project Database (RAP-DB) at the National Institute of Genetics.


 ♦ Release of new Porcine full length cDNA 10,000 entries 
DDBJ newly released Porcine cDNA 10,000 entries, which had been submitted by National Institute of Agrobiological Sciences. These entries were released as DDBJ daily updates on Sep.16, 2006.
Reference URL: http://pede.dna.affrc.go.jp/
The accession numbers and the file names for anonymous FTP are as follows;
AK230469-AK240615 (10147 entries) anonymous FTP: Sus_scrofa_HTC_060916_1.seq.gz

 ♦ The Report for The 19th International Collaborators Meeting 
To operate and implement the collaborative construction of the international nucleotide sequence database, the three data banks; DDBJ, EMBL-Bank/EBI, GenBank/NCBI hold the international collaborators meetings every May.
In 2006, the meeting was held at GenBank in Bethesda, Maryland, USA, 15-17 May. DDBJ, EMBL, GenBank reported each bank activities in the last year, discussed some practical matters to maintain and to develop the nucleotide sequence database as follows;

The Items; Discussed and To Be Studied
  • No restriction for using INSDC data
    INSDC confirmed that we should not accept any submissions with restrictions in free public access.

  • Non-submission of data to INSDC
    Since 2005, INSDC has made public its web site; http://www.insdc.org/. 3 banks agreed with that we are to add more contents for the web site.

    Since 2003, we have discussed the schema of this common XML description named INSD-XML. Since 2005, three banks have trially exchanged data in INSD-XML format. Thoroughly reviewing of the trial, we discussed some improvement of INSD-XML to provide it as common XML description among three banks.

  • locus_tag
    Since 2003, the locus_tag qualifier has been used as the identifier for the tracking purpose by many genome projects. In the past, we allowed submitters to use the flexible prefixes for their locus_tag. Since 2005, to keep uniquness through INSDC, we have disccused to manage and to assign prefixes of locus_tag.
    The framework to assign the locus_tag prefixes will be available in the near future.

Changes to the Feature Table Document: Features and Qualifiers
  • New amino acid abbreviations, "J" and "O"
    1) Pyl (O); Pyrrolysine
    The 22nd naturally encoded amino acid, pyrrolysine was discovered. The JCBN IUBMB-IUPAC (the Joint Commission on Biochemical Nomenclature of IUBMB and IUPAC) has agreed that Pyl (the three-letter abbreviation), O (the one-letter abbreviation) will be recommended for this amino acid.
    2) Xle (J); Leucine or Isoleucine
    The residue abbreviations, Xle (the three-letter abbreviation) and J (the one-letter abbreviation) are reserved for the case that cannot experimentally distinguish leucine from isoleucine.
    So, we are to add the following abbreviations;
    Abbreviation1 letter abbreviationAmino acid name
    XleJLeucine or Isoleucine
    So, INSDC will use "J" and "O" for the values of translation qualifiers in CDS features.
  • Two old qualifiers; 'transposon' and 'insertion_seq' will be integrated into a new qualifier "mobile_element". The qualifier will be legal on only 'repeat_region' feature as below;
    CommentThe specified value for is either of followings;
    insertion sequence
    non-LTR retrotransposon
  • "viral cRNA" is added to the specified values for the qualifier, 'mol_type' that indicates molecule type of the sequence in vivo on the source feature.
    Definition of "viral cRNA"
    cRNA is a plus-strand copy of a minus strand RNA genome which serves as a template to make viral progeny genomes
  • The 'operon' qualifier will be legal on 'rRNA' feature.
  • EC_number should be more controlled. Furthermore, we will accept the symbol "n" (initial of "new") to indicate that the code is not available now and will be assigned later.
  • For the values of 'PCR_primers' qualifier, modified base codes (i.e. " i "; inosine) are required to be described with enclosing in the brackets; "<" and ">"
    For example :
    /PCR_primer="fwd_name; hoge1, fwd_seq;cgkgtgtatcttact
    rev_name; hoge2, rev_seq;cg<i>gtgtatcttact"

Changes to Other Items
  • The rules for the description of location will be slightly changed;
    the use of range "(m.n)" descriptor will be discontinued.

 ♦ New function of getentry in CONTIG entries search 
getentry is the entry retrieval system by accession numbers etc, which is provided by DDBJ via WWW and E-mail servers. This time, we have changed the way to get the result of web version of getentry in searching the CONTIG entries.
When you specify to get the results on www, the sequence retrieval function changed as follows;
  • it enables to get consecutive DNA sequence data including gaps in FASTA format
  • it enables to get CONTIG substantial entries' sequence (e.g. CM 000230)

 ♦ Failure in part of search services in the WWW getentry 
getentry is the entry retrieval system by accession numbers etc, which is provided by DDBJ via WWW and E-mail servers.
Recently, we found there were some entries, which had not been displayed on the web version of getentry for a certain period (see below for details).
The problem entries are contig entries released by DDBJ and GenBank. They include a part of the CON 7269 entries, corresponding to the release of 223 thousands WGS entries of the Medaka strain Hd-rR, released by DDBJ on April 20, 2006.
We kindly request users to conduct their searches again if you have used the web version of getentry in the affected period, using the relevant keywords. Currently, the web version of getentry works normally.
Fortunately, the e-mail version of getentry was not affected by this trouble.
Details are as follows:
  • Affected periods:
    [entries released by DDBJ] April 24, 2006 - May 24, 2006
    [entries released by GenBank] January 4, 2005* - May 24, 2006
    *Actual date is calculated by the first released date of noticed CH entries (e.g.CM00126)
  • Cause: bug at the web version searching program
  • Condition: Users could not get complete results during the affected period and in the following conditions.
    - when you specify: ID as [Accession], DATABASE as [Flat File (DDBJ)], Result as [WWW] and do the search with the relevant undisplayable entries in the query box.
  • Undisplayable entries: (-->see the accession numbers list)
    [entries released by DDBJ] 1121 entries
    [entries released by GenBank] 1092 entries

Published by: DNA Data Bank of Japan (DDBJ)
  Center for Information Biology and DNA Data Bank of Japan (CIB-DDBJ)
National Institute of Genetics (NIG)
Research Organization of Information and Systems
1111 Yata, Mishima, Shizuoka 411-8540, JAPAN
Last modified: Oct. 07, 2011