DDBJ has provided the XML formatted data (DDBJ-XML*) at its anonymous FTP site and its sequence retrieval getentry system since 2001. Additionally, since April 29 of this year, INSD-XML formatted data have been also provided by DDBJ, although as a trial**. INSD-XML is one of the DTD (Data Type Definition), which is commonly used by the International Nucleotide Sequence Databases (INSD) being comprised by DDBJ, EMBL, and GenBank, for their data releases.
* DDBJ-XML was designed aiming for the generation of documents making easier computer program manipulation and being also readable by humans. It is based on the concept of "flat file format", which has been widely used so far. Meanwhile, INSD-XML was designed focusing on Features/qualifiers structure rather than having familiarity with the Flat File format.
** Although INSD-XML was released to the public, the INSD Collaborative (INSDC) databases will confirm that there are no problems regarding the data exchange for all entries, and also will ensure that it is possible to write/read XML documents in a common format within one year. Therefore, in some cases, DTD itself might be changed and/or some changes might be done at the stage of the development from DTD to XML document. Anonymous FTP directories and file names released at DDBJ web pages are as shown below. At the present stage, available data from the anonymous FTP site is only daily updated for new data. Periodical release data will appear on next DDBJ release 62 (June, 2005).
Use of INSD-XML at FTP
URL : ftp.ddbj.nig.ac.jp/database/ddbjnew/xml/insd/
File name : DDBJNEWr##.XXX.insd_xml.gz (e.g. DDBJNEWr61.062.insd_xml.gz)
(## release number, XXX serial number)
Release number and serial number are the same as daily updated anonymous FTP data.
TPA and CON data are also released in one file.
DTD for INSD-XML is INSD_INSDSeq.dtd, which is located in the same directory
USE of INSD-XML at getentry
If you select "INSD-XML" from DNA databases box, you will obtain the data in INSD-XML format.