The DRA generates fastq files from the raw data SRA files by using the fastq-dump in the NCBI SRA Toolkit with following options.

fastq-dump -M 25 -E --skip-technical --split-3 -W <SRA file>

  • -M 25: Minimum read length to output is 25 (default is 25)
  • -E: No sequences starting or ending with >= 10N
  • --skip-technical: Dump only biological reads
  • --split-3: Legacy 3-file splitting for mate-pairs: first and second biological reads satisfying dumping conditions are placed in files *_1.fastq and *_2.fastq, respectively. If only one biological read is present, it is placed in *.fastq.
  • -W: Apply left and right clips

Reads are filtered and trimmed according to above dumping conditions, reads number of fastq is generally less than that of SRA file.Users can generate unfiltered and untrimmed fastq files by using following fastq-dump options.

fastq-dump -M 1 --split-3 <SRA file>