Genomic Expression Archive
Single-cell submission guide
How to submit single-cell data
For single-cell gene expression data, submit raw data to DRA and processed data to GEA. Submit de-multiplexed (divided) sample and data files in the case of dozens cells (samples). In the case of more number of cells and de-multiplexed data affect reproducibility, submit multiplexed (mixed) sample and data files.
Regarding the 10x Genomics data files, please refer to What format of 10x Genomics data should I submit to NCBI GEO/SRA?.
Library information
In both de-multiplexed and multiplexed submissions, describe methods, name and version of kit (e.g., Smart-seq2, 10x, Drop-seq) used for single-cell library construction in Library Construction Protocol of the DRA Experiment. For 10x technology, describe version of 10x chemistry (e.g., v1, v2). Select “GENOMIC SINGLE CELL” or “TRANSCRIPTOMIC SINGLE CELL” in Library Source. GEA processed data for single-cell studies should be cell-level data.
Data file formats
Submit raw data in fastq or bam to DRA. Include barcode sequences.
For 10x bam files without barcode sequences, submit fastq instead. Please see Generating FASTQs with cellranger mkfastq
GEA processed data for single-cell studies should be cell-level data.
GEA Experiment Type
Select ‘RNA-seq of coding RNA from single cells’ or ‘RNA-seq of non coding RNA from single cells’. GEA Experiment Type
De-multiplexed submission
BioSample
Create a sample for each cell in BioSample and describe cell-specific information in sample attributes.
*sample_name | … | single_cell_identifier | inferred_cell_type | single_cell_well_quality |
---|---|---|---|---|
sample 1 | … | cell 1 | cell type A | OK |
sample 2 | … | cell 2 | cell type B | OK |
sample 3 | … | cell 3 | not applicable | 2 cells |
DRA
Submit fastq or bam de-multiplexed for each cell (sample).
GEA
Submit data de-multiplexed for each cell (sample) as processed data files.
When submitting multi-omics types of studies (ADT, HTO, TCR, BCR, GDO, CMO) and using 10X Genomics protocols and software you must submit the feature_reference.csv file so that the data can be correctly interpreted. List different omics libraries on separate rows in SDRF.
sample1_GEX |
sample1_TCR |
sample1_ADT |
sample1_HTO |
Multiplexed submission
BioSample
Create a sample for each library (usually contains hundreds to thousands of cells) in BioSample.
*sample_name | … | tissue |
---|---|---|
library 1 | … | liver |
library 2 | … | heart |
library 3 | … | brain |
DRA
Submit fastq or bam including barcode sequences. For 10x bam files without barcode sequences, submit fastq (Generating FASTQs with cellranger mkfastq).
GEA
For processed data files, submit Cell Ranger software output files (barcodes.tsv, features.tsv, matrix.mtx), H5 or HDF5 archives, or RDS objects. Processed data for single-cell TCR and BCR samples should include contig annotations and cell barcode information.
When submitting multi-omics types of studies (ADT, HTO, TCR, BCR, GDO, CMO) and using 10X Genomics protocols and software you must submit the feature_reference.csv file so that the data can be correctly interpreted.
Since there is no information about the individual cells at the sample annotation or file level, include the analysis results, cell-specific attributes, read count matrix and barcode sequences in processed data files.