International Nucleotide Sequence Database Collaboration (INSDC), the three data banks; DDBJ, EMBL-Bank/EBI, GenBank/NCBI hold the international collaborators meeting every year.
In 2007, the meeting was held at EMBL-Bank/EBI in UK, 21-23 May.
DDBJ, EMBL-Bank, GenBank reported each bank activities in the last year, discussed practical matters to maintain and to update INSDC.
The outcomes of the meeting are summarized below.
The three banks agree with that we are to add some samples for standardized submissions as the contents of the INSDC web site.
With large amount of reads of draft sequences available in the public, scientists are asking if they can submit assemblies of the reads to INSDC. We need to develop a policy for who can submit alternative assemblies and what we would do with the data once it is submitted i.e. would we start a TPA -like database for alternative assemblies? Three banks would ask to the advisors meeting.
The Genomic Standards Consortium (GSC) is to support the community-based development of a standard datasets of information about complete genomes and metagenomic ones. It is currently working together towards the 'Minimal Information about a Genome Sequence(MIGS)' specification. Overall, the three banks agreed that a cooperative approach to GSC activities was preferred over a competitive approach.
A registration system to assign unique IDs for both academic and commercial EST and GSS libraries will be studied.
The three banks agree to use following three keywords in common.
The following items will be applied from October 2007 with the revision of Feature Table Definition, if not otherwise specified.
A variety of new types of RNA transcripts, "miRNA", "siRNA", and so on, have been introduced in recent years. Because the number of non protein coding RNA families is quite likely to continue to expand, a
new ncRNA feature that can flexibly accommodate them will be introduced.
Furthermore, snRNA, snoRNA, and scRNA features are merged into ncRNA feature by December 2007.
The new feature, ncRNA, will utilize a new qualifier called /ncRNA_class, with a controlled vocabulary to indicate what type of non-protein-coding feature is being represented.
Format: /ncRNA_class="<ncRNA_class_TYPE>"
Example: /ncRNA_class="miRNA"
<ncRNA_class_TYPE> should be selected from the following list;
"antisense_RNA", "autocatalytically_spliced_intron", "telomerase_RNA", "hammerhead_ribozyme", "RNase_P_RNA", "RNase_MRP_RNA", "guide_RNA", "rasiRNA", "scRNA", "siRNA", "miRNA", "snoRNA", "snRNA", "SRP_RNA", "vault_RNA", "Y_RNA", "other"
To support a class of RNA transcripts that have dual tRNA-like and mRNA-like behaviors, a new tmRNA feature will belegal. See tmRDB and tmRNA Website that provide some backgroundinformation about the tmRNAs.
To indicate the nucleotide region encoding the proteolysis tag peptide of tmRNA, a new qualifier, /tag_peptide, will be used for the tmRNA feature.
Format: /tag_peptide=<base_range>
Example: /tag_peptide=90..122
Format:
/specimen_voucher="[<institution_code>:[<collection_code>:]]<specimen_id>"
There are three forms of specimen_voucher qualifiers;
<specimen_id>
<institution_code>:<specimen_id>
<institution_code>:<collection_code>:<specimen_id>
If the value of includes one or more colons, ":", it is 'structured'. Structured vouchers include institution_codes (and optional collection_codes) taken from a controlled vocabulary that denote the museum or herbarium collection where the specimen resides.
Example:
/specimen_voucher="UAM:Mamm:52179"
/specimen_voucher="AMCC:101706"
/specimen_voucher="USNM:field series 8798"
/specimen_voucher="personal collection:Dan Janzen:99-SRNP-2003"
/specimen_voucher="99-SRNP-2003"
These qualifiers will utilize the same format as /specimen_voucher.
culture_collection; Institution code and identifier for the culture from which the nucleic acid sequenced was obtained.
Format:
/culture_collection="<institution_code>:[<collection_code>:]<culture_id>"
Example:
/culture_collection="ATCC:26370"
bio_material; Identifier for the biological material from which the nucleic acid sequenced was obtained
Format:
/bio_material="[:[:]]"
Example:
/bio_material="CGC:CB3912"
CGC; Caenorhabditis Genetics Center
Both repeat_unit and satellite features will be merged into repeat_reigon feature.