In the BASE COUNT line of the DDBJ flat file, 9 digits are allocated for each number of a (adenine), c (cytosine), g (guanine) and t (thymine). In the case of RNA sequence, uracil is indicated as "t" according to the rule of the international nucleotide databases collaboration. The nucleotide symbol is defined in Nucleotide Base Codes.In accordance with the relaxation of sequence length limitation, GenBank had already dropped the BASE COUNT line from their flat file format from GenBank Release 138 (Oct. 2003). DDBJ has decided to maintain the BASE COUNT line in our flat file format from the view that GC contents are still important information to characterize the sequence.
example BASE COUNT 102 a 119 c 131 g 98 t
- 102 a －－There are 102 adenine nucleotides in the sequence data.
- 119 c －－There are 119 cytosine nucleotides in the sequence data.
- 131 g －－There are 131 guanine nucleotides in the sequence data.
- 98 t －－There are 98 thymine nucleotides in the sequence data.
Change in "BASE COUNT line" of the DDBJ flat file Dec. 3rd, 2003