Last updated:2016.3.22.

How to Make Annotation File

Example: annotation file
Entry Feature Location Qualifier Value
COMMON SUBMITTER ab_name Robertson,G.R.
ab_name Mishima,H.
contact Hanako Mishima
email mishima@ddbj.nig.ac.jp
phone 81-55-981-6853
fax 81-55-981-6853
phext 3207
institute National Institute of Genetics
department DNA Data Bank of Japan
country Japan
state Shizuoka
city Mishima
street Yata 1111
zip 411-8540
REFERENCE title Mouse Genome Sequencing
ab_name Robertson,G.R.
ab_name Mishima,H
year 2012
status Unpublished
COMMENT line Please visit our website
line URL: http://www.ddbj.nig.ac.jp/
CLN01 source 1..12297 organism Mus musculus
mol_type genomic DNA
clone PC0110
chromosome 8
CDS join(<1..456,609..879,1070..1213) product protein kinase
codon_start 2
CLN02 source 1..12393 organism Mus musculus
mol_type genomic DNA
clone PC0210
chromosome 8
CDS 9365..9640 product hypothetical protein

Basic rules for making annotation file

  • Annotation file consists of five columns - Entry, Feature, Location, Qualifier, and Value.
  • The red letters in the above list are mandatory items. Please be sure to input them correctly.
  • Please enter the Entry name into Entry column. Entry name has to correspond to each name in the sequence file as described at How to Make Sequence File. Do not enter anything in the Entry column until the first line for the next entry.
  • There are two types of Features, Biological features and DDBJ original features. The detail descriptions for Features are explained below.
  • Do not enter anything in Feature columns until the first line for the next feature.
  • Location can be described in the columns adjacent Feature columns filed with either of Biological features or PRIMARY_CONTIG feature.
  • Qualifier is described in every line, in principle. It depends on the Feature whether each Qualifier is mandatory, available, or not to use for the Feature. Details are explained below.
  • The format of Value is different depending on Qualifiers. Details will be explained below.
  • In annotation file, it is judged as end when a blank line was found. Therefore, when you input multiple entries, please be sure not to make a blank line until the end of file.

COMMON entry for the common information to all entries

  • In annotation file, entry name COMMON can be described in Entry column for the common information to all entries.
  • The information described in COMMON entry will be reflected in all entries.
  • Usually, COMMON is used for SUBMITTER/REFERENCE/DATE/COMMENT, but it can also be used for Biological features when all the information of Feature, Location, Qualifiers and Values are common to all entries.
    • See Use of COMMON entry -- Meta-description '@@[entry]@@ 'is available for clone, note, ff_definition --
  • DATE and hold_date are required to be described in COMMON entry.

Supplementary documents

For making the annotation file, see also following supplementary documents;

 

SUBMITTER

Example: SUBMITTER in annotation file
Entry Feature Location Qualifier Value
COMMON SUBMITTER ab_name Robertson,G.R.
ab_name Mishima,H.
consrtm Mouse Genome Consortium
contact Hanako Mishima
email mishima@ddbj.nig.ac.jp
url http://www.ddbj.nig.ac.jp
phone 81-55-981-6853
fax 81-55-981-6853
phext 3207
institute National Institute of Genetics
department DNA Data Bank of Japan
country Japan
state Shizuoka
city Mishima
street Yata 1111
zip 411-8540
List of Qualifiers for SUBMITTER
Qualifier Legal characters for each Value (Remarks) Number of letters
ab_name
(abbreviation of author name)
alphabets, .[period], ,[comma], -[hyphen], ' [apostrophe] 64
contact (contact person) alphabets, .[period], ,[comma], -[hyphen], ' [apostrophe], [space]
(In order of first, middle, and last names delimited with)
first(64),
middle(128),
last(64)
consrtm (consortium) alphabets, digits, [space], -[hyphen], ' [apostrophe], .[period], _[underscore], .[comma], ( ) # & @ / ; : + * 255
email alphabets, digits, @, .[period], -[hyphen], _[underscore] 64
url All printable characters but [space] 255
phone, fax, phext digits, -[hyphen] (DO NOT enter + before country code) 16
institute, department All printable characters but [back-slash], ` [back-quote] 255
country, state alphabets, digits, [space], -[hyphen], '[apostrophe], .[period], _[underscore], ,[comma], ( ) # & @ / ; : + * 32
city alphabets, digits, [space], -[hyphen], '[apostrophe], .[period], _[underscore], ,[comma], ( ) # & @ / ; : + * 64
street alphabets, digits, [space], -[hyphen], '[apostrophe], .[period], _[underscore], ,[comma], ( ) # & @ / ; : + * 255
zip alphabets, digits, -[hyphen] 16
  • The red letters (Feature: SUBMITTER, Qualifier: ab_name, contact, email, phone, fax, institute, country, city, street, zip) in the above list are mandatory items. Please be sure to input them correctly. (If you enter Qualifier: consrtm, ab_name and contact are not mandatory.)
  • Basically it is necessary to enter one SUBMITTER for each entry. But COMMON can be used for describing SUBMITTER that is common to all entries. When SUBMITTER is written by using COMMON, SUBMITTER cannot be used for the other entries in the same annotation file.
  • Submitters are the persons who have the responsibility in the contents of the submitted data and have the right to update the data.
  • Qualifier: ab_name in SUBMITTER can be used repeatedly for multiple submitters and those submitters are shown in the released file in the order of this annotation file.
  • It is necessary to specify a contact person whom DDBJ will contact with about the data by using Qualifier: contact.
  • The abbreviation of the author name according to the format of REFERENCE author should be described in Value of Qualifier: ab_name.
    • Value format:
          last name[comma]initial of first name[period]initial of middle name[period]
    • Example:
      • Miyashita,Y.
      • Robertson,G.R.

    Although some names (e.g. name with a hyphen) may show a warning message owing to format error, it is possible to input.

  • Each Value for the Qualifier except ab_name in SUBMITTER cannot be used repeatedly. They can be used for only contact person. If you would like to submit the information of multiple institutes, please contact us before your submission.

 

REFERENCE

Example: REFERENCE in annotation file
Entry Feature Location Qualifier Value
REFERENCE title Sequence and analysis of mouse ch.8
ab_name Robertson,G.R.
ab_name Mishima,H.
status Published
year 2003
journal Nature
volume 8
start_page 15
end_page 20
List of Qualifiers for REFERENCE
Qualifier Legal characters for each Value (Remarks) Number of letters
title All printable characters but [back-slash], ` [back-quote] 255
ab_name
(abbreviation of author name)
alphabets, .[period], ,[comma], -[hyphen], ' [apostrophe] 64
consrtm (consortium) alphabets, digits, [space], -[hyphen], ' [apostrophe], .[period], _[underscore], ,[comma], ( ) # & @ / ; : + * 255
status Either one of follows;
Unpublished, In press, Published
-
year digits (4 figures of A.D.) 4
journal All printable characters but [back-slash], ` [back-quote]
(PubMed type abbreviation)
128
volume, start_page, end_page alphabets, digits, -[hyphen] 8
  • Qualifiers described in red letters (ab_name (or consrtm), title, status, year) are mandatory items.
  • The abbreviation of the author name according to the format of REFERENCE author should be described in Value of Qualifier: ab_name.
    • Value format:
          last name[comma]initial of first name[period]initial of middle name[period]
    • Example:
      • Miyashita,Y.
      • Robertson,G.R.

    Please pay no attention to a warning message about name format error (e.g. name with a hyphen).

  • If the Value of status is "In Press", Qualifier: journal is also a mandatory item.
  • If the Value of status is "Published", Qualifier: journal, volume, start_page and end_page are also mandatory items.
  • It is necessary to specify at least one REFERENCE for each entry. However, COMMON can be used for describing the REFERENCE that is common to all entries.
  • Please input "Unpublished" in the status, if you do not prepare any publication.
  • Please input ISO abbreviation in the journal if you have.
  • If you need to enter more than two REFERENCE features, please input the first REFERENCE directly related to your sequences and then put the other(s) that would be helpful for understanding the data after the first one.
  • When you use REFERENCE features for both COMMON entry and other entries, the REFERENCE feature(s) specified for each entry will be loaded into DDBJ after one(s) given by COMMON entry.
  • When you cite two or more REFERENCE features for an entry, they will be shown on the DDBJ flat file in the same order on the annotation file.

 

DATE

Example: DATE/hold_date in annotation file
Entry Feature Location Qualifier Value
COMMON DATE hold_date 20161125
  • If you want to keep confidential your data until a specific date, please set the date with 8 digits (e.g. 20161125).
  • Delimiters (i.e. -- (hyphen), / (slash) etc.) is not allowed to use for Value of hold_date.
  • Do not enter any DATE, if your data should be open to public immediately.
  • DATE should be included for COMMON entry. If the date is not common to all entries, please prepare the file for each.
  • If you set a hold_date, your data will be released according to Principle of "Hold-Until-Published" data release.

 

COMMENT/ST_COMMENT

Example: COMMENT and ST_COMMENT in annotation file
Entry Feature Location Qualifier Value
COMMENT line This clone was obtained at our laboratory.
COMMENT line Please visit our web site.
line URL:http://www.ddbj.nig.ac.jp
ST_COMMENT tagset_id Genome-Assembly-Data
Finishing Goal High Quality Draft
Current Finishing Status High Quality Draft
Assembly Method GS De Novo Assembler v. 2.0
Assembly Name Mmus_1.0
Genome Coverage 50x
Sequencing Technology 454 GS FLX; ABI 3730

There are two kinds of COMMENTs, general COMMENT and structured COMMENT.

COMMENT (General COMMENT)

  • Please use general COMMENT if you want to describe additional information for your data. It will automatically start a new-line by 60 letters including spaces. If you want to start a new-line other than 60 letters, please add Qualifier: line.
  • All printable characters except [back-slash] are legal for the Value of Qualifier: line.
  • COMMON entry can be used for describing COMMENT that is common to all entries.
  • When you put multiple COMMENT features, please put each COMMENT for a Feature column, separately.
  • When an entry has both COMMENT features specific to it and common with all other entries described in COMMON entry, those will be shown on DDBJ flat file in the order, COMMENT in COMMON entry at first, then followed by one specific to the entry. On DDBJ flat files, in the case of plural COMMENTs, they will be shown in DDBJ format on same order of the annotation file.
  • When you use COMMENT features for both COMMON entry and other entries, the COMMENT feature(s) specified for each entry will be loaded into DDBJ after one(s) given by COMMON entry.
  • When you describe two or more COMMENT features for an entry, they will be shown on the DDBJ flat file in the same order on the annotation file.
  • For EST submissions, some particular COMMENT description is required.

ST_COMMENT (Structured COMMENT)

  • ST_COMMENT is a feature to describe structured COMMENT.
  • Though ST_COMMENT can be defined by user community, ST_COMMENT in predetermined format is required to submit sequence data derived from genome project (including WGS) or transcriptome project (including TSA).
  • ST_COMMENT is composed of dataset name (tagset_id), names of items (user-defined Qualifier) and values of items (Value).
  • In the initial line of Structured COMMENT feature, describe tagset_id as Qualifier and dataset name as its Value.
    • In case of genome project, describe "Genome-Assembly-Data" for the value of tagset_id qualifier.
    • In case of transcriptome project, describe "Assembly-Data" for the value of tagset_id qualifier.
  • Describe a name of item as Qualifier name and its value as Value.
    • In case of Genome-Assembly-Data, use following Qualifiers.
      List of Qualifiers for Genome-Assembly-Data
      Qualifier designation and content
      Finishing Goal Finishing goal of the genome project. Use controlled vocabulary.
      Current Finishing Status Current Finishing Status of the genome project. Use controlled vocabulary.
      Assembly Method Name of program and the version used assembling sequences. Mandatory.
      Assembly Name Name that the submitter has given to that assembly of the genome. Mandatory for Eukaryote.
      Genome Coverage Approximate sequencing depth. Mandatory.
      Sequencing Technology Platform(s) used to generate the sequence. Mandatory.
    • For Assembly Name, we recommend to describe in the format, [abbreviated name of species or common name of organism] + [version] (i.e. Btau_4.0).
    • For Finishing Goal and Current Finishing Status, please use either of following terms;
      "Standard Draft", "High-Quality Draft", "Improved High-Quality Draft", "Noncontiguous Finished", "Finished"
    • In case of Assembly-Data, use following Qualifiers.
      List of Qualifiers for Assembly-Data
      Qualifier designation and content
      Assembly Method Name of program and the version used assembling sequences. Mandatory.
      Assembly Name Name and version for assembled sequences
      Coverage Approximate sequencing depth.
      Sequencing Technology Platform(s) used to generate the sequence. Mandatory.
  • If you have any question to describe ST_COMMENT, please contact us by email prior to your submission.

 

Biological Feature

Example: source and CDS features in annotation file
Entry Feature Location Qualifier Value
source 1..12297 organism Mus musculus
mol_type genomic_DNA
chromosome 8
clone PC0110
CDS join(<1..456,609..879,1070..1213) product protein kinase
codon_start 2
rRNA 1279..3000 product 18S rRNA
CDS complement(join(3213..4981,9901..11677)) gene tbpA
product TATA-box binding protein

Feature/Location/Qualifier

  • For detail definitions and descriptions of Biological features, please read Feature Table Definition.
  • In Feature Table Definition, each Qualifier has a / [slash] on its head, however do not use slashes for Qualifiers in the annotation file.
  • Qualifiers described with red letters (organism, mol_type) are mandatory items. Features, source and at least one other feature are mandatory items for each entry. Please be sure to input them correctly.
  • You can find the rule to describe Location on Description of Location.
  • You can see Qualifiers are legal for each Feature in Feature/Qualifier Usage Matrix. Some of Features have mandatory Qualifier(s). Please be sure to specify Features and Qualifiers according to their name in the table. They are strictly defined such as case-sensitive (to distinguish upper case or lower), to use "_" [underscore], and so on.
  • See also Sample annotation file and Example of Submission.
  • Files containing CDS feature(s) should be checked with UME or transChecker.
  • When you describe CDS features, Protein Coding Sequence; CDS feature would be helpful.

Value

 

Use of COMMON entry

COMMON entry for the common information to all entries

  • In annotation file, entry name COMMON can be described in Entry column for the common information to all entries.
  • The information described in COMMON entry will be reflected in all entries.
  • Usually, COMMON is used for SUBMITTER/REFERENCE/DATE/COMMENT, but it can also be used for Biological features when all the information of Feature, Location, Qualifiers and Values are common to all entries.

Meta-base position 'E' for the location description

Example: rRNA feature in COMMON entry
Entry Feature Location Qualifier Value
COMMON rRNA <1..>E product 16S rRNA

There are many submissions that have common Feature information for all entries in their Qualifiers, and Values except their Locations because of difference of their sequence lengths, such as phylogenic studies with rRNA sequences.

In such cases, you can describe the common Feature in COMMON entry by using meta-base position 'E' in its Location instead of the number of the sequence end points.

Meta-description '@@[entry]@@ 'is available for clone, note, ff_definition

Example: source feature in COMMON entry
Entry Feature Location Qualifier Value
COMMON source 1..E organism Homo sapiens
mol_type genomic DNA
note contig: @@[entry]@@
ff_definition @@[organism]@@ DNA, contig: @@[entry]@@

There are some submissions that have common Feature information for all entries in their Qualifiers, and Values except their Locations and clone name or contig names, such as EST, GSS, WGS, WGS scaffold (CON division), and so on.

In such cases, you can describe the Feature: source in COMMON entry only if you use clone or contig names as entry name.

  • You can use meta-base position 'E' in its Location instead of the number of the sequence end points.
  • For the Value of clone, note, ff_definition, a meta description @@[entry]@@, entry enclosed by "@@[" and "]@@", is available to quote entry names. It will be replaced by the entry names which are quoted from a sequence file.
ページの先頭へ戻る