18th INSDC meeting report: May 16-18 2005, Mishima, Japan

2005

18th INSDC meeting report: May 16-18 2005, Mishima, Japan

To operate and implement the collaborative construction of the international nucleotide sequence database, the three data banks; DDBJ, EMBL-Bank/EBI, GenBank/NCBI hold the international meeting every year.
In 2005, the meeting was held at DDBJ in Mishima, 16-18 May.

DDBJ, EMBL-Bank, GenBank reported each databank activities in the last year, discussed practical matters to maintain and to update the nucleotide sequence database. The outcomes of the meeting are summarized below.

The Items; Discussed and To Be Studied

The international nucleotide sequence database constructed by the collaboration among DDBJ, EMBL-Bank and GenBank was agreed to be calledINSDC; International Nucleotide Sequence Database Collaboration.
INSDC has made public its web site, www.insdc.org/.

Since 2003, we have discussed the format of INSDSeq-XML.
DDBJ has made the trial data in INSDSeq-XML format available at its FTP site and its retrival tool, getentry, in advance of EMBL-Bank and GenBank.

Since 2004, we have accepted the submission of MGA data.
We reconsidered the rules for acceptance and the format for distribution.

Since 2002, we have accepted TPA submission.
In the past, some biological evidence was required for the TPA submission.
Now, we are planning to accept the inferred sequences by non-experimental evidence.
We will continue to discuss the guideline for acceptance and classification of TPA submissions.

Since 2003, the /locus_tag qualifier has been used as the identifier for the tracking purpose by many genome projects.
In the past, we allowed submitters to use the flexible prefixes for their locus_tag.
However, since we are afraid that it would cause some disruption in the future, we will manage and assign prefixes of locus_tag to keep uniquness through the whole database.
In association with it, we will improve our flat file format to include the project ID that can be utilized to specify the project (mainly for genome projects).

Relating to their strandness and partiality, rRNA sequences are not consistently annotated in the database.
It was agreed that we should check them,and also that the same preference for plus stranded annotation should be applied to other single feature.

Changes to the Feature Table Document: Features and Qualifiers

For features, especially CDS, the database users demand the information if the feature description is based on some biological experiment or only inference based on sequence similarity or so.
To make the evidence information available, evidence qualifier will be split into two new qualifiers, /experiment and /inference;

a) An argument of the feature based on some biological experiment

instead of /evidence=experimental
/experiment=”free text” (less than 1000 letters)

b) An argument of the feature not based on any biological experiment

instead of /evidence=not_experimental
/inference=”[TYPE]( same species):[evidence basis]”
The values of [TYPE] will be controlled by the list.

- note -

The old qualifier, /evidence=experimental or not_experimental, will be replaced by folowings, respectively;

/experiment=”experimental evidence, no additional details recorded”
/inference=”non-experimental evidence, no additional details recorded”

Recently, the number of the entries for the research of environmental sampling and divergence of the life (e.g. the BARCODE project) is significantly increased.
For these submission, it is important to describe the information on specification of the specimen from which the sequence is obtained.
So, five new qualifiers will become legal on the source feature;

/collection_date=”DD MMM YYYY”, “MMM YYYY” or “YYYY”
- DD ; two-digit for the date,
- MMM ; three letter for the month abbreviation
- YYYY; four-digit for the year
/lat_lon=”###.## [N or S], ###.## [E or W]”
/collected_by=”[Name of the person who collected the specimen.]””
/identified_by=”[Name of the person who identified the specimen.]”
/PCR_primers=”fwd_name:[name],fwd_seq:[sequence],rev_name:[name], rev_seq:[sequence]”

The /pseudo qualifier will be legal on intron and misc_RNA features.

The /rpt_unit qualifier will be split into two new qualifiers; /rpt_unit_range and /rpt_unit_seq will beintroduced.

Two new qualifiers, /ribosomal_slippage and /trans_splicing will be valid on the CDS feature.

“hydrogenosome” will be added to the list of legal values for the /organelle qualifier.

Changes to Other Items

The rules for the description of location will be changed;: Combinations of “join” and “order” operators in one location will be illegal.; The use of two identical location construction operators within one location will be illegal.
i.e. “100..100” will be illegal.; The usage of “^” will be restricted to adjacent nucleotides
i.e. “100^200” will be illegal.; The use of range (m.n) descriptor within location spans will be illegal.
i.e. “(5.10)..100” will be illegal.