20th INSDC meeting report: May 21-23 2007, Hinxton, UK

2007

20th INSDC meeting report: May 21-23 2007, Hinxton, UK

International Nucleotide Sequence Database Collaboration (INSDC), the three data banks; DDBJ, EMBL-Bank/EBI, GenBank/NCBI hold the international meeting every year.
In 2007, the meeting was held at EMBL-Bank/EBI in UK, 21-23 May.

DDBJ, EMBL-Bank, GenBank reported each bank activities in the last year, discussed practical matters to maintain and to update INSDC.
The outcomes of the meeting are summarized below.

The Items; Discussed and To Be Studied

INSDC web site

The three banks agree with that we are to add some samples for standardized submissions as the contents of the INSDC web site

Alternative assemblies

With large amount of reads of draft sequences available in the public, scientists are asking if they can submit assemblies of the reads to INSDC. We need to develop a policy for who can submit alternative assemblies and what we would do with the data once it is submitted i.e. would we start a TPA -like database for alternative assemblies? Three banks would ask to the advisors meeting.

GSC and MIGS

The Genomic Standards Consortium (GSC) is to support the community-based development of a standard datasets of information about complete genomes and metagenomic ones. It is currently working together towards the ‘Minimal Information about a Genome Sequence(MIGS)’ specification. Overall, the three banks agreed that a cooperative approach to GSC activities was preferred over a competitive approach.

EST/GSS clone library ID

A registration system to assign unique IDs for both academic and commercial EST and GSS libraries will be studied.

Controlled vocabulary in KEYWORDS line

The three banks agree to use following three keywords in common.

Two terms to describe the direction and location of EST
“5’-end sequence (5’-EST)”
“3’-end sequence (3’-EST)”
A term to indicate an entry belonging to a full length insert cDNA project
“FLI_CDNA”

Changes to the Feature Table Document: Features and Qualifiers

The following items will be applied from October 2007 with the revision of Feature Table Definition, if not otherwise specified.

New ncRNA feature

A variety of new types of RNA transcripts, “miRNA”, “siRNA”, and so on, have been introduced in recent years. Because the number of non protein coding RNA families is quite likely to continue to expand, a new ncRNA feature that can flexibly accommodate them will be introduced.
Furthermore, snRNA, snoRNA, and scRNA features are merged into ncRNA feature by December 2007.
New /ncRNA_class qualifier

The new feature, ncRNA, will utilize a new qualifier called /ncRNA_class, with a controlled vocabulary to indicate what type of non-protein-coding feature is being represented

Format: /ncRNA_class=”<ncRNA_class_TYPE>”
Example: /ncRNA_class=”miRNA”
<ncRNA_class_TYPE> should be selected from the following list;
```
"antisense_RNA", "autocatalytically_spliced_intron", "telomerase_RNA", 
"hammerhead_ribozyme", "RNase_P_RNA", "RNase_MRP_RNA", "guide_RNA", 
"rasiRNA", "scRNA", "siRNA", "miRNA", "snoRNA", "snRNA", "SRP_RNA", 
"vault_RNA", "Y_RNA", "other" 
```
New tmRNA feature

To support a class of RNA transcripts that have dual tRNA-like and mRNA-like behaviors, a new tmRNA feature will belegal. See tmRDB that provide some background information about the tmRNAs.
New /tag_peptide qualifier

To indicate the nucleotide region encoding the proteolysis tag peptide of tmRNA, a new qualifier, /tag_peptide, will be used for the tmRNA feature.

Format: /tag_peptide=<base_range>
Example: /tag_peptide=90..122
“tmRNA” is added to the specified values for the /mol_type qualifier that indicates molecule type of the sequence in vivo on the source feature.

The value of /specimen_voucher qualifier will be become structured, consisting of an institution code, a collection code, and a specimen identifier, as well as the existing unstructured values.

Format:
  /specimen_voucher="[<institution_code>:[<collection_code>:]]<specimen_id>"
  There are three forms of specimen_voucher qualifiers; 
  <specimen_id>
  <institution_code>:<specimen_id>
  <institution_code>:<collection_code>:<specimen_id>

If the value of includes one or more colons, “:”, it is ‘structured’. Structured vouchers include institution_codes (and optional collection_codes) taken from a controlled vocabulary that denote the museum or herbarium collection where the specimen resides

Example:
     /specimen_voucher="UAM:Mamm:52179"
     /specimen_voucher="AMCC:101706"
     /specimen_voucher="USNM:field series 8798"
     /specimen_voucher="personal collection:Dan Janzen:99-SRNP-2003"
     /specimen_voucher="99-SRNP-2003"

The two new qualifiers, /culture_collection and /bio_material, will be legal for the source feature.

These qualifiers will utilize the same format as /specimen_voucher.
culture_collection; Institution code and identifier for the culture from which the nucleic acid sequenced was obtained.
```
Format: 
    /culture_collection="<institution_code>:[<collection_code>:]<culture_id>"
Example: 
    /culture_collection="ATCC:26370"
```
bio_material; Identifier for the biological material from which the nucleic acid sequenced was obtained
```
Format:  
    /bio_material="[<institution_code>:[<collection_code&'gt;:]]<material_id>"

Example: 
    /bio_material="CGC:CB3912" 
        CGC; Caenorhabditis Genetics Center
```
The feature, old_sequence will be illegal for new entries.
At DDBJ, the two features, repeat_unit and satellite, will be illegal for new entries

Both repeat_unit and satellite features will be merged into repeat_reigon feature.
The two features, 5’clip and 3’clip will be no longer used.
The /organism qualifier will be no longer used for misc_recomb.
The /operon qualifier will be legal on the protein_bind feature.
The term, “alignment”, will be legal for [TYPE] of the /inference qualifier