4. Flat File structure for Japan Patent Office (JPO) -Second part-

4. Flat File structure for Japan Patent Office (JPO) -Second part-

Hideo Aono
DDBJ Patent Annotator

Patent column 1. 2. 3.

Introduction
I described about summary of Flat File (FF) structure of Japan Patent Office (JPO) in previous column at first part. In second part, I will introduce FF structure in more detail, conversion process of organism name from JPO submission file and the format of Patent publication number.

(1) FF structure and description contents

I showed FF structure for nucleotide sequence data dividing by six parts ([A] LOCUS Block, [B] SOURCE Block, [C] REFERENCE Block, [D] COMMENT Block, [E] Feature Block and [F] Sequence Block) as different color with sample data and description contents in Fig. 1.

    Fig. 1 Correspondence relation of JPO nucleotide sequence data


1-1: [A] LOCUS Block
LOCUS Block has LOCUS, DEFINITION, ACCESSION, VERSION and KEYWORDS line (Table. 1).

    Table. 1: Description Contents of LOCUS Block

Line name Description Contents
(NA: Nucleotide sequence data. AA: Amino acid sequence data)
LOCUS

[Example]
NA: Fig.1
AA: Fig.2

Accession number
Sequence length number (NA: bp, AA: aa)
Molecule type (NA: DNA or RNA, AA: PRT)
Molecular form (NA: linear, AA: not described)
Division (PAT)
Last release date (If the entry is updated and reopened to public site, this date will be changed.)
DEFINITION Publication number and Invention tile (Same as title line of REFERENCE Block)

Example:JP 2010599999-A/100: Genetic Markers Expressed In Tumors

ACCESSION Accession number
The accession number prefixes of JPO data are as follows;
NA: E, BD, DD, DJ, DL, DM, FU, FV, FW, FZ, GB, HV
AA: E, BD, DD

The accession number prefix of KIPO data is as follows;
NA: DI
AA: DI

VERSION NA: Sequence version number with Accession number
Example: ZZ000001.1
The data open to public for the first time is version number as "1". If sequence is updated, version number is increased.

AA: Not set on VERSION line

KEYWORDS Patent publication number
JP header: Number of Publication of patent applications in Japan and Japanese translations of PCT international publication for patent applications.
WO header: Number of International application published under the Patent Cooperation Treaty (PCT)

Please also refer their number format in section (3).

In case of amino acid sequence data, LOCUS line has the different output format (Fig.2). Its line has Accession number, Sequence length number, Molecule type, Division and Last release date.

    Fig.2 LOCUS line for Amino acid sequence data


1-2: [B] SOURCE Block
Scientific name on SOURCE and ORGANISM lines is converted from NCBI Taxonomy Database (Table. 2). Please also refer at section (2).

    Table. 2: Description Contents of SOURCE Block

Line name Description Contents
SOURCE Scientific name (Common name)

SOURCE line is set scientific name and common name.
Scientific name is converted from organism name on OS line of COMMENT Block based on NCBI Taxonomy Database. Moreover, if scientific name has common name (example: human) in NCBI Taxonomy Database, common name is set after scientific name.

[Example] SOURCE Homo sapiens (human)

ORGANISM Scientific name on fist line and its lineage information based on NCBI Taxonomy Database on second line.

[Example]
ORGANISM Homo sapiens
                   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;
                   Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo.

1-3: [C] REFERENCE Block
REFERENCE Block has AUTHORS, TITLE and JOURNAL line (Table. 3).

    Table. 3: Description Contents of REFERENCE Block

Line name Description Contents
AUTHORS Inventor name
Inventor fill name is set PI line of COMMENT Block.
TITLE Invention tile
Same value is also set in DEFINITION line.
JOURNAL First line is described the publication number and publication date after the fixed value “Patent:”on JOURNAL line.
Applicant name is described on second line. Applicant information is only set JOURNAL line in FF.

[Example] JOURNAL Patent: JP 2010599999 100 29-SEP-2010;
                                DNA Data Bank of Japan

1-4: [D] COMMENT Block
COMMENT Block is described the patent application information (Table 2). Each description is set the line header name (Table. 4).

    Table. 4: Description Contents of COMMENT Block

Line name Description Contents
OS Organism name in JPO submission file.
SOURCE line, ORGANISM line, /organism and /db_xref are constructed by this organism name.
PN Publication number and sequence number
PD Publication date
PF Application date, Application number
PR Priority date, Priority application number
PI Inventor name
CC Comment
FH Feature header (Fixed value: Key Location/Qualifiers)
FT Feature information

*Some old JPO data have PC line for description of international patent classification (IPC) code.

1-5: [E] Feature Block
JPO data has only source feature, same also Korean Intellectual Property Office (KIPO) data (Table. 5).
Nucleic acid sequence data has /mol_type, /db_xref and /organism qualifiers.
Amino acid sequence data does not have /db_xref qualifier.

    Table. 5: Description Contents of Feature Block

Qualifier name Description Contents
(NA: Nucleotide sequence data. AA: Amino acid sequence data)
/mol_type NA: unassigned DNA,unassigned RNA
AA: Not set
/db_xref Taxonomy ID of NCBI Taxonomy Database setting after fixed value " taxon:"
/organism Scientific name based on NCBI Taxonomy Database

(Organism information update based on NCBI Taxonomy Database)
DDBJ started adding Taxonomy ID of /db_xref to JPO and KIPO data from May 2010. DDBJ will update the reconstruction of SOURCE line, ORGANIS line and /organism based Taxonomy ID, once a year.

1-6: [F] Sequence Block
Nucleotide sequence data has BASE COUNT line (Fig.1) which is described the number of adenine (a), cytosine (c), guanine (g), thymine (t).
In case of Amino acid sequence data, BASE COUNT line is not output (Fig.3).

    Fig.3 Example of Sequence Block for amino acid data


(2) Conversion process of scientific name

2-1: Conversion of scientific name from JPO submission file
Original organism name in JPO submission file is described on OS line of COMMENT Block. Scientific name is converted from its name on OS line and set to SOURCE line, ORGANISM line and /organism qualifier based on NCBI Taxonomy database. Its lineage information is also constructed and set on ORGANISM line.

2-2: Unidentified organism name
In case of organism name described by applicants is not found in NCBI Taxonomy database, its name is converted to "unidentified" on SOURCE line, ORGANISM line and /organism qualifier (Fig.4).
Original organism name described by applicants is set on OS line at COMMENT Block (Fig.1, Table 4).

    Fig.4 Example of Unidentified organism name (extracted FF)


(3) Format of patent publication number and Description parts on FF

3-1: Format of patent publication number
DDBJ received three kind of patent publication for [1] Publication of patent applications, [2] Published Patent Cooperation Treaty (PCT) international publication for patent applications and [3] Japanese translations of PCT international publication for patent applications (Table 6).
Publication of patent applications and Japanese translations of PCT international publication for patent applications have same format of publication number with “JP” in head of its number. PCT international publication for patent applications have “WO”in head of publication number.

    Table 6: Format of Patent publication number


3-2: Description parts on FF
Publication number is set on KEYWORDS line of LOCUS Block, JOURNAL line of REFERENCE Block and PN line of COMMENT Block (Fig.5).

    Fig.5 Patent publication number on DDBJ FF

*Publication number is described on yellow lines on FF.

Author comments
In this time, Patent column is final. When you would like to know FF structure and patent data property for JPO and KIPO data, please refer to my columns. If I have a chance, I will explain how to search the patent data by DDBJ tools and improvement points of JPO and KIPO FF.

This entry was posted in Mail Magagin and tagged . Bookmark the permalink.

Comments are closed.