Sequence Read Archive
Example of metadata
The DRA metadata submission tool cannot describe technical reads (adapter, primer and barcode sequences). When sequencing data contain technical reads to be submitted, submitters need to create ExperimentXML files and describe technical reads in the <SPOT_DESCRIPTOR>.
Experiment (Spot, Platform)
454 single reads
Read
Read composition
Read Index : | 0 | 1 |
Read : | TCAG | ATAGAGTTGATCCTGGCTCAT…………… |
Base Coordinate : | 1 | 5 80 |
Read Type : | Adapter | Forward |
Metadata
Spot (Read Spec)
Read Index | Read Class | Read Type | Ordering Method |
---|---|---|---|
0 | Technical Read | Adapter | BaseCoord = 1 |
1 | Application Read | Forward | BaseCoord = 5 |
Experiment XML (SPOT_DESCRIPTOR)
<SPOT_DESCRIPTOR>
<SPOT_DECODE_SPEC>
<READ_SPEC>
<READ_INDEX>0</READ_INDEX>
<READ_CLASS>Technical Read</READ_CLASS>
<READ_TYPE>Adapter</READ_TYPE>
<BASE_COORD>1</BASE_COORD>
</READ_SPEC>
<READ_SPEC>
<READ_INDEX>1</READ_INDEX>
<READ_CLASS>Application Read</READ_CLASS>
<READ_TYPE>Forward</READ_TYPE>
<BASE_COORD>5</BASE_COORD>
</READ_SPEC>
</SPOT_DECODE_SPEC>
</SPOT_DESCRIPTOR>
Platform | Instrument Model |
---|---|
LS454 | 454 GS FLX Titanium |
454 paired reads
Read
Read composition
Read Index : | 0 | 1 | 2 | 3 |
Read : | TCAG | ATAGAGT……………CCTGG | TCGTAT……………TATTACG | CTCAT…………… |
Base Coordinate : | 1 | 5 | ||
Read Type : | Adapter | Forward | Linker | Forward |
Metadata
Spot (Read Spec)
Read Index | Read Class | Read Type | Ordering Method |
---|---|---|---|
0 | Technical Read | Adapter | BaseCoord = 1 |
1 | Application Read | Forward | BaseCoord = 5 |
2 | Technical Read | Linker | ExpectedBasecallTable |
3 | Application Read | Forward | RelativeOrder |
Expected Basecall Table
Base Call | Min Match | Max Mismatch | Match Edge |
---|---|---|---|
TCGTATAACTTCGTATAATGTATGCTATACGAAGTTATTACG | 38 | 5 | full |
CGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGA | 38 | 5 | full |
Experiment XML (SPOT_DESCRIPTOR)
<SPOT_DESCRIPTOR>
<SPOT_DECODE_SPEC>
<READ_SPEC>
<READ_INDEX>0</READ_INDEX>
<READ_CLASS>Technical Read</READ_CLASS>
<READ_TYPE>Adapter</READ_TYPE>
<BASE_COORD>1</BASE_COORD>
</READ_SPEC>
<READ_SPEC>
<READ_INDEX>1</READ_INDEX>
<READ_CLASS>Application Read</READ_CLASS>
<READ_TYPE>Forward</READ_TYPE>
<BASE_COORD>5</BASE_COORD>
</READ_SPEC>
<READ_SPEC>
<READ_INDEX>2</READ_INDEX>
<READ_CLASS>Technical Read</READ_CLASS>
<READ_TYPE>Linker</READ_TYPE>
<EXPECTED_BASECALL_TABLE>
<BASECALL min_match="38" max_mismatch="5" match_edge="full">TCGTATAACTTCGTATAATGTATGCTATACGAAGTTATTACG</BASECALL>
<BASECALL min_match="38" max_mismatch="5" match_edge="full">CGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGA</BASECALL>
</EXPECTED_BASECALL_TABLE>
</READ_SPEC>
<READ_SPEC>
<READ_INDEX>3</READ_INDEX>
<READ_CLASS>Application Read</READ_CLASS>
<READ_TYPE>Forward</READ_TYPE>
<RELATIVE_ORDER follows_read_index="2"/>
</READ_SPEC>
</SPOT_DECODE_SPEC>
</SPOT_DESCRIPTOR>
Platform | Instrument Model |
---|---|
LS454 | 454 GS FLX Titanium |
Illumina single reads
Read
Read composition
Read Index : | 0 |
Read : | ATAGAGTTGATCCTGG……………CCTGGCTCA |
Base Coordinate : | 1 72 |
Read Type : | Forward |
Metadata
Spot (Read Spec)
Read Index | Read Class | Read Type | Ordering Method |
---|---|---|---|
0 | Application Read | Forward | BaseCoord = 1 |
Experiment XML (SPOT_DESCRIPTOR)
<SPOT_DESCRIPTOR>
<SPOT_DECODE_SPEC>
<SPOT_LENGTH>72</SPOT_LENGTH>
<READ_SPEC>
<READ_INDEX>0</READ_INDEX>
<READ_CLASS>Application Read</READ_CLASS>
<READ_TYPE>Forward</READ_TYPE>
<BASE_COORD>1</BASE_COORD>
</READ_SPEC>
</SPOT_DECODE_SPEC>
</SPOT_DESCRIPTOR>
Platform | Instrument Model | Sequence Length |
---|---|---|
Illumina | Illumina Genome Analyzer IIx | 72 |
Illumina paired reads
Read
Read composition
Read Index : | 0 | 1 |
Read : | ATAGAGTTGATCCTGG…………… | CCTGGCTCATCAGTTGAT…………… |
Base Coordinate : | 1 | 101 200 |
Read Type : | Forward | Reverse |
Metadata
Spot (Read Spec)
Read Index | Read Class | Read Type | Ordering Method |
---|---|---|---|
0 | Application Read | Forward | BaseCoord = 1 |
1 | Application Read | Reverse | BaseCoord = 101 |
Experiment XML (SPOT_DESCRIPTOR)
<SPOT_DESCRIPTOR>
<SPOT_DECODE_SPEC>
<SPOT_LENGTH>200</SPOT_LENGTH>
<READ_SPEC>
<READ_INDEX>0</READ_INDEX>
<READ_CLASS>Application Read</READ_CLASS>
<READ_TYPE>Forward</READ_TYPE>
<BASE_COORD>1</BASE_COORD>
</READ_SPEC>
<READ_SPEC>
<READ_INDEX>1</READ_INDEX>
<READ_CLASS>Application Read</READ_CLASS>
<READ_TYPE>Reverse</READ_TYPE>
<BASE_COORD>101</BASE_COORD>
</READ_SPEC>
</SPOT_DECODE_SPEC>
</SPOT_DESCRIPTOR>
Platform | Instrument Model | Sequence Length |
---|---|---|
Illumina | Illumina Genome Analyzer IIx | 200 |
SOLiD single reads
Read
Read composition
Read Index : | 0 |
Read : | TTGATCCTGG……………CGCTCA |
Base Coordinate : | 1 50 |
Read Type : | Forward |
Metadata
Spot (Read Spec)
Read Index | Read Class | Read Type | Ordering Method |
---|---|---|---|
0 | Application Read | Forward | BaseCoord = 1 |
Experiment XML (SPOT_DESCRIPTOR)
<SPOT_DESCRIPTOR>
<SPOT_DECODE_SPEC>
<SPOT_LENGTH>50</SPOT_LENGTH>
<READ_SPEC>
<READ_INDEX>0</READ_INDEX>
<READ_CLASS>Application Read</READ_CLASS>
<READ_TYPE>Forward</READ_TYPE>
<BASE_COORD>1</BASE_COORD>
</READ_SPEC>
</SPOT_DECODE_SPEC>
</SPOT_DESCRIPTOR>
Platform | Instrument Model | Sequence Length |
---|---|---|
ABI SOLID | AB SOLiD System 3.0 | 50 |
SOLiD paired reads
Read
Read composition
Read Index : | 0 | 1 |
Read : | ATCCTGG…………… | CATCAGTTGAT…………… |
Base Coordinate : | 1 | 26 50 |
Read Type : | Forward | Forward |
Metadata
Spot (Read Spec)
Read Index | Read Class | Read Type | Ordering Method |
---|---|---|---|
0 | Application Read | Forward | BaseCoord = 1 |
0 | Application Read | Forward | BaseCoord = 26 |
Experiment XML (SPOT_DESCRIPTOR)
<SPOT_DESCRIPTOR>
<SPOT_DECODE_SPEC>
<SPOT_LENGTH>50</SPOT_LENGTH>
<READ_SPEC>
<READ_INDEX>0</READ_INDEX>
<READ_CLASS>Application Read</READ_CLASS>
<READ_TYPE>Forward</READ_TYPE>
<BASE_COORD>1</BASE_COORD>
</READ_SPEC>
<READ_SPEC>
<READ_INDEX>1</READ_INDEX>
<READ_CLASS>Application Read</READ_CLASS>
<READ_TYPE>Forward</READ_TYPE>
<BASE_COORD>26</BASE_COORD>
</READ_SPEC>
</SPOT_DECODE_SPEC>
</SPOT_DESCRIPTOR>
Platform | Instrument Model | Sequence Length |
---|---|---|
ABI SOLID | AB SOLiD System 3.0 | 50 |
Ion torrent single reads
Read
Read composition
Read Index : | 0 |
Read : | AGCCGTATATAG……………CGTCAGAA |
Base Coordinate : | 1 (x) |
Read Type : | Forward |
Metadata
Spot (Read Spec)
Read Index | Read Class | Read Type | Ordering Method |
---|---|---|---|
0 | Application Read | Forward | BaseCoord = 1 |
Experiment XML (SPOT_DESCRIPTOR)
<SPOT_DESCRIPTOR>
<SPOT_DECODE_SPEC>
<READ_SPEC>
<READ_INDEX>0</READ_INDEX>
<READ_CLASS>Application Read</READ_CLASS>
<READ_TYPE>Forward</READ_TYPE>
<BASE_COORD>1</BASE_COORD>
</READ_SPEC>
</SPOT_DECODE_SPEC>
</SPOT_DESCRIPTOR>
Platform | Instrument Model | Sequence Length |
---|---|---|
Ion torrent | Ion torrent PGM/Proton |
PacBio single reads (Standard sequencing)
Metadata
Spot (Read Spec)
Adapter Spec |
---|
ATCTCTCTCTTTTCCTCCTCCTCCGTTGTTGTTGTTGAGAGAGAT |
Read Index | Read Class | Read Type | Ordering Method |
---|---|---|---|
0 | Application Read | Forward | BaseCoord = 1 |
Experiment XML (SPOT_DESCRIPTOR)
<SPOT_DESCRIPTOR>
<SPOT_DECODE_SPEC>
<READ_SPEC>
<READ_INDEX>0</READ_INDEX>
<READ_CLASS>Application Read</READ_CLASS>
<READ_TYPE>Forward</READ_TYPE>
<BASE_COORD>1</BASE_COORD>
</READ_SPEC>
</SPOT_DECODE_SPEC>
</SPOT_DESCRIPTOR>
Platform | Instrument Model |
---|---|
Pacbio Smart | PacBio RS |
Experiment Attribute
Tag | Value | Units |
---|---|---|
Sequencing Protocol | Standard sequencing | |
Insert size | 8000 |
Experiment XML (EXPERIMENT_ATTRIBUTES)
<EXPERIMENT_ATTRIBUTES>
<EXPERIMENT_ATTRIBUTE>
<TAG>Sequencing Protocol</TAG>
<VALUE>Standard sequencing</VALUE>
</EXPERIMENT_ATTRIBUTE>
<EXPERIMENT_ATTRIBUTE>
<TAG>Insert size</TAG>
<VALUE>8000</VALUE>
</EXPERIMENT_ATTRIBUTE>
</EXPERIMENT_ATTRIBUTES>
PacBio single reads (Circular consensus sequencing)
Metadata
Spot (Read Spec)
Read Index | Read Class | Read Type | Ordering Method |
---|---|---|---|
0 | Application Read | Other | BaseCoord = 1 |
Experiment XML (SPOT_DESCRIPTOR)
<SPOT_DESCRIPTOR>
<SPOT_DECODE_SPEC>
<READ_SPEC>
<READ_INDEX>0</READ_INDEX>
<READ_CLASS>Application Read</READ_CLASS>
<READ_TYPE>Forward</READ_TYPE>
<BASE_COORD>1</BASE_COORD>
</READ_SPEC>
</SPOT_DECODE_SPEC>
</SPOT_DESCRIPTOR>
Platform | Instrument Model |
---|---|
Pacbio Smart | PacBio RS |
Experiment Attribute
Tag | Value | Units |
---|---|---|
Sequencing Protocol | Circular consensus sequencing | |
Insert size | 700 |
Experiment XML (EXPERIMENT_ATTRIBUTES)
<EXPERIMENT_ATTRIBUTES>
<EXPERIMENT_ATTRIBUTE>
<TAG>Sequencing Protocol</TAG>
<VALUE>Circular consensus sequencing</VALUE>
</EXPERIMENT_ATTRIBUTE>
<EXPERIMENT_ATTRIBUTE>
<TAG>Insert size</TAG>
<VALUE>700</VALUE>
</EXPERIMENT_ATTRIBUTE>
</EXPERIMENT_ATTRIBUTES>
Typical examples causing errors in data validation
In the case of validation error, correct the metadata and re-upload data files after stopping validation process.
- Reads having no application read
- Reads with inconsistent base coordinate
- Reads with relative order which cannot be specified
Reads having no application read
Read
Read composition
Read Index : | 0 | 1 |
Read : | ATCCGG | CATCAGTTGAT………………………………………………… |
Base Coordinate : | 1 | 7 50 |
Read Type : | Primer | Linker (should have at least one application) |
Reads with inconsistent base coordinate
Read 1
Read composition
Read Index : | 0 | 1 |
Read : | ATCCGG…………… | CATCAG…………… |
Base Coordinate : | 1 | 1 (should be > 1) |
Read Type : | Forward | Reverse |
Read 2
Read composition
Read Index : | 0 | 1 | 2 |
Read : | TCAG | ATAGAGTTG……… | TCGTATAACTTCGTATAATGTATGCTATACGAAGTT |
Base Coordinate : | 1 | 5 | 4 (should be > 5) |
Read Type : | Adapter | Forward | Reverse |
Read 3
Read composition
Read Index : | 0 | 1 |
Read : | ATCCGGGTGTGTCATCAG | CATCAG…………… |
Base Coordinate : | 2 (should start at 1) | 19 |
Read Type : | Adapter | Forward |
Reads with relative order which cannot be specified
Read
Read composition
Read Index : | 0 | 1 | 2 | 3 |
Read : | TCAG | ATAGA…………… | ………………… | CTCAT………………………………………………………… |
Base Coordinate : | 1 | 5 | ||
Read Type : | Adapter | Forward | Linker | Forward (This forward cannot be specified) |
Metadata
Spot (Read Spec)
Read Index | Read Class | Read Type | Ordering Method |
---|---|---|---|
0 | Technical Read | Adapter | BaseCoord = 1 |
1 | Application Read | Forward | BaseCoord = 5 |
2 | Technical Read | Linker | RelativeOrder |
3 | Application Read | Forward | RelativeOrder |
Experiment (Pipeline)
Example 1 of Experiment XML Pipeline
<PROCESSING>
<PIPELINE>
<PIPE_SECTION section_name="Base Caller">
<STEP_INDEX>1</STEP_INDEX>
<PREV_STEP_INDEX>NIL</PREV_STEP_INDEX>
<PROGRAM>Casava</PROGRAM>
<VERSION>V1.8.3_V3.2.1</VERSION>
<NOTES/>
</PIPE_SECTION>
<PIPE_SECTION section_name="Quality Scores">
<STEP_INDEX>2</STEP_INDEX>
<PREV_STEP_INDEX>1</PREV_STEP_INDEX>
<PROGRAM>Casava</PROGRAM>
<VERSION>V1.8.3_V3.2.1</VERSION>
<NOTES/>
</PIPE_SECTION>
</PIPELINE>
</PROCESSING>
Example 2 of Experiment XML Pipeline
<PROCESSING>
<PIPELINE>
<PIPE_SECTION>
<STEP_INDEX>1</STEP_INDEX>
<PREV_STEP_INDEX/>
<PROGRAM>bwa</PROGRAM>
<VERSION>0.5.9-r16</VERSION>
<NOTES>BWA-MEM algorithm alignment</NOTES>
</PIPE_SECTION>
<PIPE_SECTION>
<STEP_INDEX>2</STEP_INDEX>
<PREV_STEP_INDEX>1</PREV_STEP_INDEX>
<PROGRAM>Picard</PROGRAM>
<VERSION>1.74(1243)</VERSION>
<NOTES>Duplicate reads marked</NOTES>
</PIPE_SECTION>
<PIPE_SECTION>
<STEP_INDEX>3</STEP_INDEX>
<PREV_STEP_INDEX>2</PREV_STEP_INDEX>
<PROGRAM>GATK</PROGRAM>
<VERSION>1.4-29</VERSION>
<NOTES>Indel realignment</NOTES>
</PIPE_SECTION>
<PIPE_SECTION>
<STEP_INDEX>4</STEP_INDEX>
<PREV_STEP_INDEX>3</PREV_STEP_INDEX>
<PROGRAM>GATK</PROGRAM>
<VERSION>1.4-29</VERSION>
<NOTES>Base quality score recalibration</NOTES>
</PIPE_SECTION>
</PIPELINE>
</PROCESSING>