Last updated:2017.9.29.


This line shows accession number of the entry data.

Conventional sequence data

A unique accession number is issued to the data submitter by each of the three data banks. The accession number is composed of 1 alphabet character and 5 digits (ex. A12345) or 2 alphabet characters and 6 digits (ex. AB123456). The former style was used in 1980s, but later the latter style was introduced because of data explosion.
The alphabet part is called "prefix". Please refer the prefix list.

If multiple entries are united to an entry, or if an entry is extensively modified after the submission, the responsible data banks may assign a new accession number to it. In these cases, the new accession number is called the primary accession number, and the old accession number(s) is/are called the secondary accession number(s). In the flat file, the primary accession number is indicated first, then the secondary accession number(s) follows. You can find the same updated entry with both the primary and the secondary accession numbers.

ACCESSION   AB999999 AB888888 AB777777
AB999999 -- primary accession number
AB888888 AB777777 -- secondary accession number

Bulk sequence data; WGS, TSA, and TLS data

The accession number assigned to each entry of WGS, TSA, and TLS data consists of 4 alphabet characters and 8 (sometimes 9 or 10, if necessary) digits.
The alphabet part is called prefix.
See also For Large Scale Data (four prefix).

Example: ZZZZ01000001
  ZZZZ -- 4 letters -- Prefix to distinguish each project, project_id
    01 -- 2 digits -- Version number of the data set, set_version
000001 -- 6 digits -- ID of each individual sequence (It might be 7 or 8 digits depended on the number of entries.)

The set_version goes up for every update of the dataset.

ACCESSION   ZZZZ01000001 ZZZZ01000000
ZZZZ01000001 -- primary accession number
ZZZZ01000000 -- set ID

For MGA data

This (ACEESSION) line shows a number assigned by INSDC to a resource.
The number is composed of 5 alphabet characters and 7 digits (ex. ABCDE0000001).
An accession number assigned to an entry of a resource units is displayed in the MGA lines.

Example: ABCDE0000001
     AB -- first two characters      -- identifier to each project.
    CDE -- third to fifth characters -- identifier to each of resources on each project.
0000001 -- 7 digit numeric numbers   -- number for each sequence entry in a resource.
    *1 The information about each project id is avilable at the project_index page.
    *2 "resource" here means a unit of identical origin, such as tissue, cells, from which sequence are obtained.

ZZZZZ0000000 -- number to a resource unit