Last updated:2016.3.22.


The information about an entry that can not be described using FEATURES or the other fields. For instance, if submitter has the other affiliation to REFERENCE 1, it can be described on COMMENT line.

COMMENT     Human cDNA sequencing project.


Structured COMMENT

Structured COMMENT is a format to describe and to share some datasets undefined in feature/qualifier.
Using structured COMMENTs, datasets can be shared via flatfiles of INSDC in the community of submitters and users.
To describe structured COMMENT, the dataset is required to be describe in structured sets of [names of items] and [values of items] on COMMENT line.
There are some predetermined formats of structured COMMENTs that are required to submit some kinds of sequence data derived from genome projects (including WGS), transcriptome projects (including TSA) and so on.

COMMENT     ##Genome-Assembly-Data-START##
            Finishing Goal           :: Finished
            Current Finishing Status :: High Quality Draft
            Assembly Method          :: Newbler v. 2.3
            Genome Coverage          :: 30x
            Sequencing Technology    :: 454 GS Junior; Illumina GA II
##Genome-Assembly-Data-START## -- The first line of the structured COMMENT defined as "Genome-Assembly-Data".
##Genome-Assembly-Data-END## -- The last line of the structured COMMENT defined as "Genome-Assembly-Data".

The above example is an additional information, "Genome-Assembly-Data", that is required for genome projects.
The contents between these two lines are delimited item names and their values by " :: ".

Finishing Goal           :: Finished -- The final goal of the genome project is "Finished" level.
Current Finishing Status :: High Quality Draft -- The current status of the genome project is "High Quality Draft" level.
Assembly Method           :: Newbler v. 2.3 -- the software to assemble reads of sequences is Newbler and its version is 2.3.
Genome Coverage          :: 30x -- The sequencing depth of the genome sequences is approximately 30 fold.
Sequencing Technology    :: 454 GS Junior; Illumina GA II -- the platforms (sequencers) to determine the genome sequences are "454 GS Junior" and "Illumina GA II".


For MGA data

For MGA submissions. the process for obtaining the submitted sequence data e.g.; (methods for preparing sequences from tissues or cells and processing the sequences for submission) is described.

COMMENT     The CAGE (cap analysis gene expression) is based on preparation
            and sequencing of concatamers of DNA tags deriving from the
            initial 20/21 nucleotides from 5' end mRNAs.
            Full-length cDNAs were at first selected with the Cap-Trapper
            method. Then, a specific linker (Linker1, some linker contain 5 bp
            sequences that have 15 variations for each rna sample) containing
            the ClassIIs restriction enzyme site MmeI was then ligated to the
            single-strand cDNA and then the second strand of cDNA synthesized.
            The resulting double-stranded cDNA was cleaved by the restriction
            enzyme MmeI and a second linker (Linker2) was ligated to the 2 bp
            overhang at the MmeI cleaved site, to produce a 5' 20/21 tag
            having two linkers at both sides. The ligation products were
            separated from unmodified DNA with magnetic beads. The 5' end cDNA
            tags were released from the beads, and the DNA fragments were
            amplified in a PCR step by using the two linker-specific primers
            (Primer1 (uni-PCR), Primer2 (MmeI-PCR)). The desired 32-37 bp tags
            were purified and ligated to form concatamers, and then the
            concatamer were fractionated and ligated to the plasmid ZErO-2.
            The ligations were finally electroporated into DH10b cells
            (Invitrogen) and obtained plasmids were sequenced with forward
            CAGE libraries were sequenced with forward primers essentially as
            described with minor modifications to use zeocin for selection of
            recombinants. We used in-house developed algorithms for the
            extraction of tags and for masking the vectors. CAGE tags were
            extracted with the following parameters: vector masking, minimum
            12 bp recognition allowed; linker (13 bp) masking: maximum
            mismatch, 2 bp allowed; XmaJI site maximum mismatch, 2 bp allowed;
            tag length, 17-24 bp.
            Linker1: "Upper oligonucleotide GN6":
            biotin-agagagagacctcgagtaactataacggtcctaaggtagcgacctagg (5 bp)
            tccgacGNNNNN and "Upper oligonucleotide N6":