
No. 71 & 72 Jul. 26, 2012
Latest version Backnumber Published by DDBJ
Mt. Fuji climbers
-
More than 300,000 people try climbing Mt. Fuji from July to August.
Most people start to trek from one of the four 5th stations on Mt. Fuji.
This photo shows the starting point "Subashiri 5th Station".
Mt. Fuji is just now crowded with many climbers.
DDBJ issues DDBJ Web Magazine-e (No.71 & 72). The name was changed into DDBJ Web Magazine-e from DDBJ Mail Magazine.
If you have any questions and suggestions about Web Magazine-e, please do not hesitate to write to us. We would like to hear from you.
-
DDBJ Rel. 89.0
- Date: Jun. 27, 2012
- 153,273,314 entries 141,016,380,296 bases
- DDBJ Release Note
- Latest Release Information
DAD Rel. 59.1
DAD released Rel.59.1, because a part of trouble had been found in Rel.59.0 (released on June 2012).
- Date: Jul. 2, 2012
- 23,691,501 entries: 6,844,427,098aa (total number of residues)
- DAD Release Note
- Latest Release Information
- Feature Table Definition (FT-Doc) is the common annotation manual among the three banks (DDBJ, EMBL-Bank, GenBank) for the construction of the DDBJ/EMBL/GenBank International Nucleotide Sequence Database. Feature Table Definition was revised in May 2012. Version is 10.1.
-
DDBJ Pipeline is now released.
Pipeline’s system was moved to NIG supercomputer, which was newly released at March 2012.
In the new NIG supercomputer, DDBJ Pipeline utilizes 10TB/2TB memory computers for denovo assembly analysis.
Other new features are 1)Velvet tool is revival. 2)A new tool(Trinity:RNA-seq denovo) was furnished. 3) Mate-paired reads are available 4) DRA data to be imported is everyday updated.
Please check the Pipeline’s manual including further explanation for new release.
-
MiGAP (Microbial Genome Annotation Pipeline) is a mechanical annotation tool which provides novice and old pro alike to microbial contigs and genomes.
The service was resumed on May 1, 2012 , from DDBJ website.
http://migap.ddbj.nig.ac.jp
To use the service, users needs an account to login the MiGap site.
Please get an account from Apply the use of Web service of Apply the use of super computer system (in Japanese).
Manual:
How to Use MiGAP (in Japanese)
Note: At present, only new annotation is available (View of old results which had been annotated before Feb. 2012 is in preparation).
-
INSDC newly released whole genome sequence data of Solanum lycopersicum, which had been submitted by the Boyce Thompson Institute for Plant Research.
Kazusa DNA Research Institute plays a major role in the International Tomato Genome Sequencing Project. - Reference URL: Boyce Thompson Institute for Plant Research (News) Tomato genome becomes fully sequenced paving the way for healthier fruits and vegetables
- The accession numbers are as follows; (Available by getentry and DRASearch )
-
DDBJ released EST data derived from a species of planarians (Dugesia japonica), which had been submitted by RIKEN. (Available by getentry )

The accession numbers are as follows;
- FY925127 - FY960824 ( 35,698 entries) 5'-EST
- FY960825 - FY979285 ( 18,461 entries) 3'-EST
total: 54,159 entries
-
DDBJ newly released GSS data derived from African clawed frog (Xenopus laevis), which had been submitted by National Institute of Genetics.

The accession numbers are as follows;
- GA131508 - GA388245 (256,738 entries)
-
DDBJ released EST data derived from Asian Swallowtail (Papilio xuthus) and Common Mormon (Papilio polytes), which had been submitted by National Institute of Advanced Industrial Science and Technology.
The accession numbers are as follows ;
-
Papilio xuthus
- FY174038-FY192407 ( 18,370 entries) 5'-EST
- FY192408-FY210626 ( 18,219 entries) 3'-EST
total: 36,589 entries
-
Papilio polytes
- FY302525-FY312114 ( 9,590 entries) 5'-EST
- FY312115-FY319715 ( 7,601 entries) 3'-EST
- FY319716-FY339489 ( 19,774 entries) 5'-EST
- FY339490-FY358875 ( 19,386 entries) 3'-EST
total: 56,351 entries(Photo by NBDC)
-
DDBJ newly released EST data derived from human (Homo sapiens), which had been submitted by RIKEN.
The accession numbers are as follows ;
- HY000001 - HY183282 ( 183,282 entries) 5'-EST
- HY183283 - HY377477 ( 194,195 entries) 3'-EST
total: 377,477 entries - HY000001 - HY183282 ( 183,282 entries) 5'-EST
4. Flat File structure for Japan Patent Office (JPO) -Second part-
-
Introduction
I described about summary of Flat File (FF) structure of Japan Patent Office (JPO) in previous column at first part. In second part, I will introduce FF structure in more detail, conversion process of organism name from JPO submission file and the format of Patent publication number.
(1) FF structure and description contents
I showed FF structure for nucleotide sequence data dividing by six parts ([A] LOCUS Block, [B] SOURCE Block, [C] REFERENCE Block, [D] COMMENT Block, [E] Feature Block and [F] Sequence Block) as different color with sample data and description contents in Fig. 1.
Fig. 1 Correspondence relation of JPO nucleotide sequence data
1-1: [A] LOCUS Block
LOCUS Block has LOCUS, DEFINITION, ACCESSION, VERSION and KEYWORDS line (Table. 1).
Table. 1: Description Contents of LOCUS Block
In case of amino acid sequence data, LOCUS line has the different output format (Fig.2). Its line has Accession number, Sequence length number, Molecule type, Division and Last release date.Line name Description Contents
(NA: Nucleotide sequence data. AA: Amino acid sequence data)LOCUS
[Example]
NA: Fig.1
AA: Fig.2Accession number
Sequence length number (NA: bp, AA: aa)
Molecule type (NA: DNA or RNA, AA: PRT)
Molecular form (NA: linear, AA: not described)
Division (PAT)
Last release date (If the entry is updated and reopened to public site, this date will be changed.)DEFINITION Publication number and Invention tile (Same as title line of REFERENCE Block)
Example:JP 2010599999-A/100: Genetic Markers Expressed In TumorsACCESSION Accession number
The accession number prefixes of JPO data are as follows;
NA: E, BD, DD, DJ, DL, DM, FU, FV, FW, FZ, GB, HV
AA: E, BD, DD
The accession number prefix of KIPO data is as follows;
NA: DI
AA: DIVERSION NA: Sequence version number with Accession number
Example: ZZ000001.1
The data open to public for the first time is version number as "1". If sequence is updated, version number is increased.
AA: Not set on VERSION lineKEYWORDS Patent publication number
JP header: Number of Publication of patent applications in Japan and Japanese translations of PCT international publication for patent applications.
WO header: Number of International application published under the Patent Cooperation Treaty (PCT)
Please also refer their number format in section (3).
Fig.2 LOCUS line for Amino acid sequence data
1-2: [B] SOURCE Block
Scientific name on SOURCE and ORGANISM lines is converted from NCBI Taxonomy Database (Table. 2). Please also refer at section (2).
Table. 2: Description Contents of SOURCE Block
1-3: [C] REFERENCE BlockLine name Description Contents SOURCE Scientific name (Common name)
SOURCE line is set scientific name and common name.
Scientific name is converted from organism name on OS line of COMMENT Block based on NCBI Taxonomy Database. Moreover, if scientific name has common name (example: human) in NCBI Taxonomy Database, common name is set after scientific name.
[Example] SOURCE Homo sapiens (human)ORGANISM Scientific name on fist line and its lineage information based on NCBI Taxonomy Database on second line.
[Example]
ORGANISM Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;
Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo.
REFERENCE Block has AUTHORS, TITLE and JOURNAL line (Table. 3).
Table. 3: Description Contents of REFERENCE Block
1-4: [D] COMMENT BlockLine name Description Contents AUTHORS Inventor name
Inventor fill name is set PI line of COMMENT Block.TITLE Invention tile
Same value is also set in DEFINITION line.JOURNAL First line is described the publication number and publication date after the fixed value “Patent:”on JOURNAL line.
Applicant name is described on second line. Applicant information is only set JOURNAL line in FF.
[Example] JOURNAL Patent: JP 2010599999 100 29-SEP-2010;
DNA Data Bank of Japan
COMMENT Block is described the patent application information (Table 2). Each description is set the line header name (Table. 4).
Table. 4: Description Contents of COMMENT Block
*Some old JPO data have PC line for description of international patent classification (IPC) code.Line name Description Contents OS Organism name in JPO submission file.
SOURCE line, ORGANISM line, /organism and /db_xref are constructed by this organism name.PN Publication number and sequence number PD Publication date PF Application date, Application number PR Priority date, Priority application number PI Inventor name CC Comment FH Feature header (Fixed value: Key Location/Qualifiers) FT Feature information
1-5: [E] Feature Block
JPO data has only source feature, same also Korean Intellectual Property Office (KIPO) data (Table. 5).
Nucleic acid sequence data has /mol_type, /db_xref and /organism qualifiers.
Amino acid sequence data does not have /db_xref qualifier.
Table. 5: Description Contents of Feature Block
(Organism information update based on NCBI Taxonomy Database)Qualifier name Description Contents
(NA: Nucleotide sequence data. AA: Amino acid sequence data)/mol_type NA: unassigned DNA,unassigned RNA
AA: Not set/db_xref Taxonomy ID of NCBI Taxonomy Database setting after fixed value " taxon:" /organism Scientific name based on NCBI Taxonomy Database
DDBJ started adding Taxonomy ID of /db_xref to JPO and KIPO data from May 2010. DDBJ will update the reconstruction of SOURCE line, ORGANIS line and /organism based Taxonomy ID, once a year.
1-6: [F] Sequence Block
Nucleotide sequence data has BASE COUNT line (Fig.1) which is described the number of adenine (a), cytosine (c), guanine (g), thymine (t).
In case of Amino acid sequence data, BASE COUNT line is not output (Fig.3).
Fig.3 Example of Sequence Block for amino acid data
(2) Conversion process of scientific name
2-1: Conversion of scientific name from JPO submission file
Original organism name in JPO submission file is described on OS line of COMMENT Block. Scientific name is converted from its name on OS line and set to SOURCE line, ORGANISM line and /organism qualifier based on NCBI Taxonomy database. Its lineage information is also constructed and set on ORGANISM line.
2-2: Unidentified organism name
In case of organism name described by applicants is not found in NCBI Taxonomy database, its name is converted to "unidentified" on SOURCE line, ORGANISM line and /organism qualifier (Fig.4).
Original organism name described by applicants is set on OS line at COMMENT Block (Fig.1, Table 4).
Fig.4 Example of Unidentified organism name (extracted FF)
(3) Format of patent publication number and Description parts on FF
3-1: Format of patent publication number
DDBJ received three kind of patent publication for [1] Publication of patent applications, [2] Published Patent Cooperation Treaty (PCT) international publication for patent applications and [3] Japanese translations of PCT international publication for patent applications (Table 6).
Publication of patent applications and Japanese translations of PCT international publication for patent applications have same format of publication number with “JP” in head of its number. PCT international publication for patent applications have “WO”in head of publication number.
Table 6: Format of Patent publication number
3-2: Description parts on FF
Publication number is set on KEYWORDS line of LOCUS Block, JOURNAL line of REFERENCE Block and PN line of COMMENT Block (Fig.5).
Fig.5 Patent publication number on DDBJ FF
*Publication number is described on yellow lines on FF.
Author comments
In this time, Patent column is final. When you would like to know FF structure and patent data property for JPO and KIPO data, please refer to my columns. If I have a chance, I will explain how to search the patent data by DDBJ tools and improvement points of JPO and KIPO FF.
Published by:
DNA Data Bank of Japan (DDBJ)
DDBJ Center
National Institute of Genetics
Research Organization of Information and Systems
1111 Yata, Mishima, Shizuoka 411-8540, JAPAN
DDBJ Center
National Institute of Genetics
Research Organization of Information and Systems
1111 Yata, Mishima, Shizuoka 411-8540, JAPAN
