Genomic Expression Archive

  • Home
  • Submission Overview
    • Submit microarray experiment
    • Submit sequencing experiment
    • Metadata
    • Data File
    • Validation
    • Single-cell submission guide
    • Spatial gene expression
    • Data without reference sequences
    • Example
    • Array Design
    • Data matrix
    • Reviewer Access
  • FAQ
  • Search (AOE)
    • RefEx
    • Search (ArrayExpress)
  • Downloads
  • About GEA
  • Home
  • gea
  • Submit sequencing experiment

Submit sequencing experiment

GEA submission flow

1. Obtain a submission account

  • Create a D-way submission account.
  • To enable GEA submission, register a public key and a center name to your account.

2. Register a BioProject, BioSamples and Sequence Read Archive (DRA)

BioProject and BioSample registration is required for DRA submission.

BioProject

  • A description of the reseach effort.

BioSample

  • A description of biologically or physically unique samples used to generate experimental data.

DRA

  • Raw sequencing reads and alignments.

metadata can be submitted as a tab-delimited text file.

DDBJ

  • If reference sequences used in functional genomics analysis are not registered to INSDC, submit reference genome or transcriptome shotgun assembly sequences to DDBJ.

3. Upload processed data files

  • Upload processed data files into the GEA submission directory.

4. Select a DRA submission

  • Select a DRA submission which contains raw sequencing reads for the GEA submission.
  • Select DRA submissions when the GEA experiment has processed data from raw sequencing reads in multiple DRA submissions.

5. Select a BioProject

  • Select a BioProject used in the DRA submission when the GEA experiment and DRA submission belong to same project.
  • Select a BioProject not used in the DRA submission when the GEA experiment and DRA submission belong to different projects.

6. Prepare IDF and SDRF

IDF

  • The IDF (Investigation Description Format) file is used to give an overview of the experiment, including the experimental design, protocols and publication information.

SDRF

  • The SDRF (Sample and Data Relationship Format) describes the sample characteristics and the relationship between samples, data files etc.
  • SDRF template is generated from selected BioProject, BioSample(s) and DRA submission(s). Enter additional information.

7. Submit IDF and SDRF and validate data files

  • After submitting IDF and SDRF metadata in the submission web system, validation of the uploaded data files are automatically started.
  • The submission passed validation will be reviewed.

Pre-submission checklist

Single-cell sequencing experiment

Refer to ArrayExpress Single-cell submission guide. Please contact GEA team to upload any additional files for custom spike-ins or to facilitate data analysis.

More than one technology per experiment

GEA will ask you for the technology and name of the array, and applies it to the whole submission. If you have used different types of technologies for the same set of samples, we ask you to create separate submissions. Please make sure that the submissions have distinct titles (even though they may belong to the same study), in order to avoid mistakes. If you have samples from more than one array design in your experiment, it is possible to submit only one experiment. If you wish to do this, please contact GEA team.

Sequencing experiment submission

Create a new submission

Login D-way and the top page is displayed. Move to the GEA submission site from the “GEA” menu at the top.

Create a new microarray experiment submission by selecting “Sequencing” and clicking the [New submission]. At the same time, in the DDBJ file server (ftp-private.ddbj.nig.ac.jp), a corresponding subdirectory is created under the submitter’s GEA upload directory. Upload data files to this subdirectory.

If there is no reply from submitters after three months of initial contact, submissions will be cancelled.
The maximum number of assay per submission is 1,000. If you have more than 1,000 assays, please create multiple submissions with the same BioProject reference.
Create a new submission
Create a new submission

List of submission status is as follows. The GEA team reviews submission whose status is in “submission_validated” or “data_error”.

List of submission status

Status Explanation
New Metadata are not submitted.
Data Submitted Metadata and data files are submitted.
Data Validating Validating data files.
Validation Error Error occurred in data validation process.
Data Validated Metadata and data are validated.
Curating Curator is reviewing the submission.
Accession Issued Accession number is issued to the submission.
Confidential Archive files are created and submission is kept private
Public Released to public.

Upload processed data files

Regarding how to upload your data files, please see “Data upload”.

Submission

Set the hold date within four years. Submitters’ name and affiliation will be public but e-mail address will not be disclosed.

You can delete an un-submitted GEA submission.
Enter submission information
Enter submission information

DRA

Select a DRA submission registered in your account. If DRA submission is not registered, please go to the DRA submission site and submit DRA.

To use DRA submissions obtained in the other account, please contact GEA team.

Select a DRA submission for the GEA experiment
Select a DRA submission for the GEA experiment

BioProject

Select a project registered in your account. If a BioProject is not registered, please go to the BioProject submission site and submit a project.

Select a BioProject used in the DRA submission when the GEA experiment and DRA submission belong to same project (usual case). When the GEA experiment and DRA submission belong to different projects, select a BioProject not used in the DRA submission.

To use a project obtained in the other account, please contact GEA team.

Select a BioProject for the GEA experiment
Select a BioProject for the GEA experiment

IDF

Enter information for IDF (Investigation Description Format).

Example IDF.

  • Protocol: Pre-checked protocols are mandatory.
  • Publication: Describe associated publications by PubMed ID or DOI. For unpublished manuscript, please inform us the ID after assignment.
  • Data File Type: Processed data files are required for sequencing experiment submission. Accepted Data Files Formats for sequencing experiment. We strongly recommend to submitting processed data file per sample.
Enter information for IDF
Enter information for IDF

SDRF

Download a SDRF template file
Download a SDRF template file

Enter information for SDRF (Sample and Data Relationship Format).

Example SDRF.

Auto-filled fields.

  • Name columns and attribute columns for Source Name: Generated from BioSamples.
  • SDRF rows: 1 row for 1 Run.
  • Protocols: Protocols described in IDF are inserted to appropriate positions of SDRF with temporary protocol IDs (e.g., ESUB000352_Protocol_1)
  • Technology Type: “sequencing assay” for sequencing submission.
  • SRA Experiment and Run Comments to Extract and Assay Names: Generated from DRA Experiment and Run.

Enter required fields by overwriting <Required: fill in the content> tags.

Fields you need to add.

  • Material Type: Enter controlled terms.
    • total RNA
    • polyA RNA
    • cytoplasmic RNA
    • nuclear RNA
    • genomic DNA
    • protein
    • other
  • Derived Array Data File and Comment[Derived Array Data File md5]: Enter filename and md5 checksum pair for each processed data file.
  • A list of filename and its md5 checksum (output of md5sum command) can be provided as a file <GEA submission ID>.md5 (e.g., ESUB000001.md5) (when the checksum values are provided in both SDRF and .md5 file, those in the .md5 are used).
  • Factor Value[enter experiment factor name here]: A user-defined name for each experimental factor studied by the experiment. These experimental factors represent the variables within the investigation (e.g. growth condition, genotype, organism part). The actual values of these variables will be listed in the “Factor Value []” columns. Example:
    • Factor Value[strain]
    • AT76
    • KU-2003
    • KU-PI499262
SDRF template, yellow-highlighted fields need to be filled by submitter
SDRF template, yellow-highlighted fields need to be filled by submitter

Select the entered SDRF file and continue.

Select entered SDRF file
Select entered SDRF file

Overview and submit

You can download the IDF and SDRF files and check them. When correction is necessary, go back to the previous tab and corrent metadata.

Submit the IDF and SDRF metadata by clicking the “Submit” button.

Check the IDF and SDRF and submit
Check the IDF and SDRF and submit

Validation

When data files described in the IDF and SDRF are not found in the submission directory, an error message “Data file is not uploaded” is shown and the submission is aborted.

The validator checks submitted IDF and SDRF files according to the validation rules. and gives warning and error messages. Errors need to be resolved for submission.

Warning and error messages
Warning and error messages

Accession numbers

GEA accession numbers are issued to the completed GEA experiment. You can allow reviewers access to private records by communicating a reviewer accesss token.

GEA accession numbers
GEA accession numbers

Update submission

Update in each database

Database Update
Annotated sequence database Request updates from web form
Sequence Read Archive (DRA) Login D-way and update metadata
To add or withdraw sequencing data, request updates from web form
Genomic Expression Archive (GEA) Request updates from web form
BioProject/BioSample Request updates from web form

Withdraw archived objects

To withdrawing archived Experiment, please contact us.

MD5 checksum value

See “MD5 checksum value” for how to obtain MD5 checksum values.