Researchers who use DDBJ may mainly concern with genomic sequences, mRNA transcripts and protein-coding genes. Of course some researchers focusing molecular taxonomy may also be interested in rRNA sequences. Is it sufficient for the sequence database to prepare 4 types of features for mRNA, CDS, rRNA and, source (the origin of the sequence), then ?

In fact, there are 62 feature keys that are officially defined by DDBJ/EMBL/GenBank in the Feature Table Definition Document (FT-Doc). This FT-Doc has been revised every April and October. Before the last revision in October 2007, there were 65 feature keys. It does not mean that we, DDBJ/EMBL/GenBank, abolished three old feature keys. We removed five old feature keys and created two new feature keys after very careful thinking. We do not only collect and provide sequence data submitted, but also make every effort to keep up with the progress of research and development in biology.
To discuss the revision of FT-Doc and some other issues, we have held the International Collaborators Meeting every year (ref: The Report for the 20th International Collaborators Meeting).

One of the major topics at the meeting this year (2007) was "How to describe the diverse RNA transcripts?". Recently, many new RNA transcripts (miRNA, siRNA, and so on) were found and elucidated. Before the revision of FT-Doc, such new RNA genes were described by using the old feature key, misc_RNA that meant "other RNA transcript". It was actually a tentative resort for a rapidly developing research area. We thought it was the time to reflect the fruit of the research to the definition of feature keys.

We, DDBJ/EMBL/GenBank, worked out the problem by addition of two RNA feature keys, ncRNA and tmRNA. The number of non-protein-coding RNA families is supposed to continue to increase, so, they will be classified into ncRNA feature. Also, three old feature keys, snRNA, snoRNA, and scRNA are merged into ncRNA by this month, December 2007. Some of old data with misc_RNA features will be modified with ncRNA or tmRNA, respectively.

We call such modifications as fitting legacy data to new rules "retrofit". Every year, we apply new feature rules to the data schema by October, and retrofit the data by December. The retrofit in December 2007 is symbolic of our struggle against a paradigm shift related to RNA world.

(The below is cited from "The Report for the 20th International Collaborators Meeting")

  • New ncRNA feature
    A variety of new types of RNA transcripts, "miRNA", "siRNA", and so on, have been introduced in recent years. Because the number of non protein coding RNA families is quite likely to continue to expand, a
    new ncRNA feature that can flexibly accommodate them will be introduced.

    Furthermore, snRNA, snoRNA, and scRNA features are merged into ncRNA feature by December 2007.
  • New /ncRNA_class qualifier

    The new feature, ncRNA, will utilize a new qualifier called /ncRNA_class, with a controlled vocabulary to indicate what type of non-protein-coding feature is being represented.

    Format: /ncRNA_class="<ncRNA_class_TYPE>"
    Example: /ncRNA_class="miRNA"

    <ncRNA_class_TYPE> should be selected from the following list;
    "antisense_RNA"   "autocatalytically_spliced_intron"   "telomerase_RNA"
    "hammerhead_ribozyme"   "RNase_P_RNA", "RNase_MRP_RNA"
    "guide_RNA" "rasiRNA" "scRNA"
    "siRNA" "miRNA" "piRNA"
    "snoRNA" "snRNA" "SRP_RNA"
    "vault_RNA" "Y_RNA", "other"

  • New tmRNA feature
    To support a class of RNA transcripts that have dual tRNA-like and mRNA-like behaviors, a new tmRNA feature will belegal. See tmRDB and tmRNA Website that provide some backgroundinformation about the tmRNAs.
  • New /tag_peptide qualifier
    To indicate the nucleotide region encoding the proteolysis tag
    peptide of tmRNA, a new qualifier, /tag_peptide, will be used for the
    tmRNA feature.

    Format: /tag_peptide=<base_range>
    Example: /tag_peptide=90..122