Release of a large amount of transcript sequences in new category MGA (Mass sequence for Genome Annotation)

The International Nucleotide Sequence Databases (INSD) discussed the issue of receiving and releasing huge amounts of transcript sequence, each of which is a 5' end short-length transcript approximately 20 bp in length. They are produced by the CAGE (Cap Analysis Genome Expression) method and thus called the CAGE sequences. INSD then agreed that the sequences be submitted and released as a new category, MGA (Mass sequence for Genome Annotation), because the sequences are of new type and not fit to any of the extant divisions or categories.

Definition of MGA

MGA is defined as those sequences which are produced in large quantity in view of genome annotation.

The first set of MGA was released on January 24, 2005. The MGA data for mouse were submitted from Dr. Yoshihide Hayashizaki and his colleagues of the Genome Sciences Center at RIKEN. The total number of sequence entries this time is 383,264. The MGA data can be downloaded at the ftp site,

URL:ftp://ftp.ddbj.nig.ac.jp/ddbj_database/mga/project_index.html

You can jump to the above ftp site from "Anonymous FTP of the DDBJ" page of DDBJ HP.

Most of the present MGA sequences are associated with the gene expression data which have already been released at CIBEX (http://cibex.nig.ac.jp/index.jsp), one of the international gene expression databases, operated at the Center for Information Biology and DNA Data Bank of Japan.