DDBJ Web Magazine-e No.71 & 72


No. 71 & 72   Jul. 26, 2012
Latest version    Backnumber    Published by DDBJ


Mt. Fuji climbers
More than 300,000 people try climbing Mt. Fuji from July to August.
Most people start to trek from one of the four 5th stations on Mt. Fuji.
This photo shows the starting point "Subashiri 5th Station".
Mt. Fuji is just now crowded with many climbers.


DDBJ issues DDBJ Web Magazine-e (No.71 & 72). The name was changed into DDBJ Web Magazine-e from DDBJ Mail Magazine.
If you have any questions and suggestions about Web Magazine-e, please do not hesitate to write to us. We would like to hear from you.

DDBJ Rel. 89.0, DAD (DDBJ amino acid database) Rel. 59.1 Completed
DDBJ Rel. 89.0


DAD Rel. 59.1

DAD released Rel.59.1, because a part of trouble had been found in Rel.59.0 (released on June 2012).
Reference: Apologies for the trouble of in DAD release 59.0.
DDBJ/EMBL/GenBank Feature Table Definition revised
Feature Table Definition (FT-Doc) is the common annotation manual among the three banks (DDBJ, EMBL-Bank, GenBank) for the construction of the DDBJ/EMBL/GenBank International Nucleotide Sequence Database. Feature Table Definition was revised in May 2012. Version is 10.1.
Resumed DDBJ Read Annotation Pipeline
DDBJ Pipeline is now released.

Pipeline’s system was moved to NIG supercomputer, which was newly released at March 2012.
In the new NIG supercomputer, DDBJ Pipeline utilizes 10TB/2TB memory computers for denovo assembly analysis.

Other new features are 1)Velvet tool is revival. 2)A new tool(Trinity:RNA-seq denovo) was furnished. 3) Mate-paired reads are available 4) DRA data to be imported is everyday updated.

Please check the Pipeline’s manual including further explanation for new release.
MiGap renewal from DDBJ
MiGAP (Microbial Genome Annotation Pipeline) is a mechanical annotation tool which provides novice and old pro alike to microbial contigs and genomes.
The service was resumed on May 1, 2012 , from DDBJ website.
http://migap.ddbj.nig.ac.jp

To use the service, users needs an account to login the MiGap site.
Please get an account from Apply the use of Web service of Apply the use of super computer system (in Japanese).

Manual:
How to Use MiGAP  (in Japanese)

Note: At present, only new annotation is available (View of old results which had been annotated before Feb. 2012 is in preparation).
Release of whole genome sequence data of Solanum lycopersicum
INSDC newly released whole genome sequence data of Solanum lycopersicum, which had been submitted by the Boyce Thompson Institute for Plant Research.
Kazusa DNA Research Institute plays a major role in the International Tomato Genome Sequencing Project.
Reference URL:  Boyce Thompson Institute for Plant Research (News)     Tomato genome becomes fully sequenced  paving the way for healthier fruits and vegetables
The accession numbers are as follows; (Available by getentry and DRASearch )
genome
  • WGS:      AEKE02000001 - AEKE02026877 (AEKE.gz) ( 26,877 entries)
  • Chr CON: CM001064 - CM001075 ( 12 entries)
  • CON:       GL758110 - GL761332 ( 3,223 entries)
RNA-Seq data
Release of sequence data from DDBJ (May – Jul. 2012)
DDBJ released EST data derived from a species of planarians (Dugesia japonica), which had been submitted by RIKEN. (Available by getentry )


The accession numbers are as follows;
  • FY925127 - FY960824 ( 35,698 entries) 5'-EST
  • FY960825 - FY979285 ( 18,461 entries) 3'-EST
total: 54,159 entries

DDBJ newly released GSS data derived from African clawed frog (Xenopus laevis), which had been submitted by National Institute of Genetics.


The accession numbers are as follows;
  • GA131508 - GA388245  (256,738 entries)


DDBJ released EST data derived from Asian Swallowtail (Papilio xuthus) and Common Mormon (Papilio polytes), which had been submitted by National Institute of Advanced Industrial Science and Technology.

The accession numbers are as follows ;

    Papilio xuthus
  • FY174038-FY192407 ( 18,370 entries) 5'-EST
  • FY192408-FY210626 ( 18,219 entries) 3'-EST
total: 36,589 entries


    Papilio polytes
  • FY302525-FY312114 (  9,590 entries) 5'-EST
  • FY312115-FY319715 (  7,601 entries) 3'-EST
  • FY319716-FY339489 ( 19,774 entries) 5'-EST
  • FY339490-FY358875 ( 19,386 entries) 3'-EST
total: 56,351 entries
(Photo by NBDC
DDBJ newly released EST data derived from human (Homo sapiens), which had been submitted by RIKEN.

The accession numbers are as follows ;
  • HY000001 - HY183282 ( 183,282 entries) 5'-EST
  • HY183283 - HY377477 ( 194,195 entries) 3'-EST
total: 377,477 entries
4. Flat File structure for Japan Patent Office (JPO) -Second part-
4. Flat File structure for Japan Patent Office (JPO) -Second part-
Hideo Aono
DDBJ Patent Annotator

Patent column 1. 2. 3.

Introduction
I described about summary of Flat File (FF) structure of Japan Patent Office (JPO) in previous column at first part. In second part, I will introduce FF structure in more detail, conversion process of organism name from JPO submission file and the format of Patent publication number.

(1) FF structure and description contents

I showed FF structure for nucleotide sequence data dividing by six parts ([A] LOCUS Block, [B] SOURCE Block, [C] REFERENCE Block, [D] COMMENT Block, [E] Feature Block and [F] Sequence Block) as different color with sample data and description contents in Fig. 1.

    Fig. 1 Correspondence relation of JPO nucleotide sequence data

1-1: [A] LOCUS Block
LOCUS Block has LOCUS, DEFINITION, ACCESSION, VERSION and KEYWORDS line (Table. 1).

    Table. 1: Description Contents of LOCUS Block
Line name Description Contents
(NA: Nucleotide sequence data. AA: Amino acid sequence data)

LOCUS

[Example]
NA: Fig.1
AA: Fig.2
Accession number
Sequence length number (NA: bp, AA: aa)
Molecule type (NA: DNA or RNA, AA: PRT)
Molecular form (NA: linear, AA: not described)
Division (PAT)
Last release date (If the entry is updated and reopened to public site, this date will be changed.)
DEFINITION Publication number and Invention tile (Same as title line of REFERENCE Block)

Example:JP 2010599999-A/100: Genetic Markers Expressed In Tumors
ACCESSION Accession number
The accession number prefixes of JPO data are as follows;
NA: E, BD, DD, DJ, DL, DM, FU, FV, FW, FZ, GB, HV
AA: E, BD, DD

The accession number prefix of KIPO data is as follows;
NA: DI
AA: DI
VERSION NA: Sequence version number with Accession number
Example: ZZ000001.1
The data open to public for the first time is version number as "1". If sequence is updated, version number is increased.

AA: Not set on VERSION line
KEYWORDS Patent publication number
JP header: Number of Publication of patent applications in Japan and Japanese translations of PCT international publication for patent applications.
WO header: Number of International application published under the Patent Cooperation Treaty (PCT)

Please also refer their number format in section (3).
In case of amino acid sequence data, LOCUS line has the different output format (Fig.2). Its line has Accession number, Sequence length number, Molecule type, Division and Last release date.

    Fig.2 LOCUS line for Amino acid sequence data

1-2: [B] SOURCE Block
Scientific name on SOURCE and ORGANISM lines is converted from NCBI Taxonomy Database (Table. 2). Please also refer at section (2).

    Table. 2: Description Contents of SOURCE Block
Line name Description Contents
SOURCE Scientific name (Common name)

SOURCE line is set scientific name and common name.
Scientific name is converted from organism name on OS line of COMMENT Block based on NCBI Taxonomy Database. Moreover, if scientific name has common name (example: human) in NCBI Taxonomy Database, common name is set after scientific name.

[Example] SOURCE Homo sapiens (human)
ORGANISM Scientific name on fist line and its lineage information based on NCBI Taxonomy Database on second line.

[Example]
ORGANISM Homo sapiens
                   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;
                   Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo.
1-3: [C] REFERENCE Block
REFERENCE Block has AUTHORS, TITLE and JOURNAL line (Table. 3).

    Table. 3: Description Contents of REFERENCE Block
Line name Description Contents
AUTHORS Inventor name
Inventor fill name is set PI line of COMMENT Block.
TITLE Invention tile
Same value is also set in DEFINITION line.
JOURNAL First line is described the publication number and publication date after the fixed value “Patent:”on JOURNAL line.
Applicant name is described on second line. Applicant information is only set JOURNAL line in FF.

[Example] JOURNAL Patent: JP 2010599999 100 29-SEP-2010;
                                DNA Data Bank of Japan
1-4: [D] COMMENT Block
COMMENT Block is described the patent application information (Table 2). Each description is set the line header name (Table. 4).

    Table. 4: Description Contents of COMMENT Block
Line name Description Contents
OS Organism name in JPO submission file.
SOURCE line, ORGANISM line, /organism and /db_xref are constructed by this organism name.
PN Publication number and sequence number
PD Publication date
PF Application date, Application number
PR Priority date, Priority application number
PI Inventor name
CC Comment
FH Feature header (Fixed value: Key Location/Qualifiers)
FT Feature information
*Some old JPO data have PC line for description of international patent classification (IPC) code.

1-5: [E] Feature Block
JPO data has only source feature, same also Korean Intellectual Property Office (KIPO) data (Table. 5).
Nucleic acid sequence data has /mol_type, /db_xref and /organism qualifiers.
Amino acid sequence data does not have /db_xref qualifier.

    Table. 5: Description Contents of Feature Block
Qualifier name Description Contents
(NA: Nucleotide sequence data. AA: Amino acid sequence data)

/mol_type NA: unassigned DNA,unassigned RNA
AA: Not set
/db_xref Taxonomy ID of NCBI Taxonomy Database setting after fixed value " taxon:"
/organism Scientific name based on NCBI Taxonomy Database
(Organism information update based on NCBI Taxonomy Database)
DDBJ started adding Taxonomy ID of /db_xref to JPO and KIPO data from May 2010. DDBJ will update the reconstruction of SOURCE line, ORGANIS line and /organism based Taxonomy ID, once a year.

1-6: [F] Sequence Block
Nucleotide sequence data has BASE COUNT line (Fig.1) which is described the number of adenine (a), cytosine (c), guanine (g), thymine (t).
In case of Amino acid sequence data, BASE COUNT line is not output (Fig.3).

    Fig.3 Example of Sequence Block for amino acid data

(2) Conversion process of scientific name

2-1: Conversion of scientific name from JPO submission file
Original organism name in JPO submission file is described on OS line of COMMENT Block. Scientific name is converted from its name on OS line and set to SOURCE line, ORGANISM line and /organism qualifier based on NCBI Taxonomy database. Its lineage information is also constructed and set on ORGANISM line.

2-2: Unidentified organism name
In case of organism name described by applicants is not found in NCBI Taxonomy database, its name is converted to "unidentified" on SOURCE line, ORGANISM line and /organism qualifier (Fig.4).
Original organism name described by applicants is set on OS line at COMMENT Block (Fig.1, Table 4).

    Fig.4 Example of Unidentified organism name (extracted FF)

(3) Format of patent publication number and Description parts on FF

3-1: Format of patent publication number
DDBJ received three kind of patent publication for [1] Publication of patent applications, [2] Published Patent Cooperation Treaty (PCT) international publication for patent applications and [3] Japanese translations of PCT international publication for patent applications (Table 6).
Publication of patent applications and Japanese translations of PCT international publication for patent applications have same format of publication number with “JP” in head of its number. PCT international publication for patent applications have “WO”in head of publication number.

    Table 6: Format of Patent publication number

3-2: Description parts on FF
Publication number is set on KEYWORDS line of LOCUS Block, JOURNAL line of REFERENCE Block and PN line of COMMENT Block (Fig.5).

    Fig.5 Patent publication number on DDBJ FF
*Publication number is described on yellow lines on FF.



Author comments
In this time, Patent column is final. When you would like to know FF structure and patent data property for JPO and KIPO data, please refer to my columns. If I have a chance, I will explain how to search the patent data by DDBJ tools and improvement points of JPO and KIPO FF.
Published by:DNA Data Bank of Japan (DDBJ)
Center for DNA Data Bank of Japan
National Institute of Genetics (NIG)
Research Organization of Information and Systems
1111 Yata, Mishima, Shizuoka 411-8540, JAPAN

ページの先頭へ戻る