Submit microarray experiment

GEA submission flow

1. Obtain a submission account

2. Register a BioProject and BioSample(s)

BioProject

  • A description of the reseach effort.

BioSample

  • A description of biologically or physically unique samples used to generate experimental data.

metadata can be submitted as a tab-delimited text file.

3. Upload raw and processed data files

  • Upload raw and processed data files into the GEA submission directory.
  • [Optional] When an array design is not available in ArrayExpress/GEA, upload an array design file into the GEA submission directory along with data files.

4. Select a BioProject and BioSample(s)

  • Select a registed BioProject for the GEA submission.
  • Select registed BioSample(s) for the GEA submission. Majority of GEA submissions require more than one sample.

5. Prepare IDF and SDRF

IDF

SDRF

  • The SDRF (Sample and Data Relationship Format) describes the sample characteristics and the relationship between samples, arrays, data files etc.
  • SDRF template is generated from selected BioProject and BioSample(s). Enter additional information.

6. Submit IDF, SDRF and validate data files

  • After submitting IDF and SDRF in the submission web system, validation of the uploaded data files will automatically begin.
  • The submission passed validation will be reviewed.

Pre-submission checklist

Two-color microarray experiment

At the moment GEA submission interface only supports one type of two-color workflow (see graphic here), where two samples are connected with one common raw data file, which includes both channels. If you select the dual-channel option in the IDF tab, it will expect that you provide one file for the two samples that were hybridized together. Some recent two-color microarray technologies generate two separate raw data files (usually one for each channel), which will cause validation to fail (if you connect a single file per sample).

If you have separate files for each channel, please contact GEA team.

Single-cell sequencing experiment

Refer to ArrayExpress Single-cell submission guide. Please contact GEA team to upload any additional files for custom spike-ins or to facilitate data analysis.

More than one technology per experiment

GEA will ask you for the technology and name of the array, and applies it to the whole submission. If you have used different types of technologies for the same set of samples, we ask you to create separate submissions. Please make sure that the submissions have distinct titles (even though they may belong to the same study), in order to avoid mistakes. If you have samples from more than one array design in your experiment, it is possible to submit only one experiment. If you wish to do this, please contact GEA team.

Microarray experiment submission

Create a new submission

Login the D-way (https://trace.ddbj.nig.ac.jp/D-way) and the top page is displayed. Move to the GEA submission site from the "GEA" menu at the top.

Create a new microarray experiment submission by selecting "Microarray" and clicking the [New submission]. At the same time, in the DDBJ file server (ftp-private.ddbj.nig.ac.jp), a corresponding subdirectory is created under the submitter’s GEA upload directory. Upload data files to this subdirectory.

If there is no reply from submitters after three months of initial contact, submissions will be cancelled.

Create a new microarray experiment submission

List of submission status is as follows. The GEA team reviews submission whose status is in "submission_validated" or "data_error".

List of submission status
Status Explanation
New Metadata are not submitted.
Data Submitted Metadata and data files are submitted.
Data Validating Validating data files.
Validation Error Error occurred in data validation process.
Data Validated Metadata and data are validated.
Curating Curator is reviewing the submission.
Accession Issued Accession number is issued to the submission.
Confidential Archive files are created and submission is kept private
Public Released to public.

Upload raw and processed data files

Upload files by using terminal (Linux/Mac OS X)

Upload files by executing,

  • <Your Files> Files to be transferred. Ex: file1 file2 (file1 and file2), file* (all files whose filenames start with "file")
  • <D-way Login ID> D-way Login ID (ex. test07)
  • <GEA Submission ID> GEA Submission ID (ex. ESUB000350)
  • command example: scp Arabidopsis_control_rep_1.CEL test07@ftp-private.ddbj.nig.ac.jp:~/gea/ESUB000350

Enter the passphrase set for the keys.

You can directly handle the transferred files by logging in the server. SSH login the server by executing,

Enter the passphrase set for the keys.

After logging in successfully, the following prompt is displayed.

The login environment is private for the submitter. Users other than the submitter cannot access the data. Executable commands are restricted to the following ones. Users can delete unnecessary files.

Upload files by using WinSCP (Windows)

Submission to DRA ~upload data files (Windows)~

Install and run the "WinSCP" (http://winscp.net/eng/download.php) .

Set items as below and click the [Advanced...] button.

Be sure to select the "binary mode" for file transfer. Do NOT select the "text mode".

  • File protocol: SFTP
  • Host name: ftp-private.ddbj.nig.ac.jp
  • Port number: 22
  • User name: (D-way Login ID)
  • Password: (Leave empty)
Generate private key 1

Please select the private key, which you created beforehand, from "Private key file" in "Authentication".

Generate private key 2

Last, click the [Login] button in the lower center

Login to the WinSCP

At the first time of login, a warning message is displayed; however, please select "Yes" (this message will not be displayed again). Next, enter the passphrase set for the keys.

After login successfully, a folder of your PC is displayed at left, and your private directory in the server is displayed at right. Select the files at the left window and drag & drop them into the right window to transfer the files to the server.

Transfer files by using the WinSCP

You can delete the transferred files by selecting the files and clicking the [Delete] button.

Upload sequence data by using Cyberduck (Mac OS X)

Submission to DRA ~upload data files (Mac) ~

Download and install the Cyberduck (http://cyberduck.ch).

Run the Cyberduck and click the [Open Connection] button in the Cyberduck menu.

Open connection by using the WinSCP

Select "SFTP (SSH File Transfer Protocol)" .

SFTP in the WinSCP

Set as follows and tick off "Use Public Key Authentication" in the More Options.

  • Server: ftp-private.ddbj.nig.ac.jp
  • Port: 22
  • Username: (D-way Login ID)
  • Password: (Leave empty)
  • Add to Keychain: (Check)
Key authentication in Cyberduck

By default, the private key is created in "User’s home folder > .ssh folder (invisible in Finder) > id_rsa".

Private key in Mac OS X

At the first time of login, a warning message is displayed; however, please select "Always" (this message will not be displayed again).

After login successfully, your private directory in the server is displayed in the window. Select the files in your PC and drag & drop them into the window to transfer the files to the server.

Transfer files by using Cyberduck

Users can ssh login ftp-private.ddbj.nig.ac.jp server by using a private key. Executable commands are restricted to the following ones. Users can delete unnecessary files.
ls cd cp mv rm more mkdir tar gzip gunzip bzip2 bunzip2 zip unzip

Submission

Set the hold date within four years or choose immediate release when processed. Submitters' name and affiliation will be public but e-mail address will not be disclosed.

You can delete an un-submitted GEA submission at "Delete submission".

Enter submission information

BioProject

Select a submitted project registered in your account. If a BioProject is not registered, please go to the BioProject submission site and submit a project.

To use a project obtained in the other account, please contact GEA team.

Select a BioProject for the GEA experiment

BioSample

Select submitted BioSamples registered in your account. If BioSamples are not registered, please go to the BioSample submission site and submit samples.

To use samples obtained in the other account, please contact GEA team.

Select BioSamples for the GEA experiment

IDF

Enter information for IDF (Investigation Description Format).

Example IDF

  • Protocol: Pre-checked protocols are mandatory.
  • Publication: Describe associated publications by PubMed ID or DOI. For unpublished manuscript, please inform us the publication ID after assignment.
  • Array Design: When an array design is available in ArrayExpress/GEA, enter an array design accession number "A-XXXX-n". When an array design is not available, register a new array design by uploading an array design file into the GEA submission directory.
  • Data File Type: Raw and processed data files are required for microarray experiment submission. We strongly recommend to submitting raw and processed data file per sample. Accepted Data Files Formats for microarray experiment.
Enter information for IDF

SDRF

Download a SDRF template file

Enter information for SDRF (Sample and Data Relationship Format).

Example SDRF

Auto-filled fields.

  • Name columns and attribute columns for Source Name: Generated from BioSamples.
  • SDRF rows: 1 row for 1 BioSample.
  • Protocols: Protocols described in IDF are inserted to appropriate positions of SDRF with temporary protocol IDs (e.g., ESUB000350_Protocol_1)
  • Technology Type: "array assay" for microarray submission.
  • Array Design REF: array design accession or filename described in IDF.

Enter required fields by overwriting <Required: fill in the content> tags.

Fields you need to add.

  • Material Type: Enter controlled terms.
    • total RNA
    • polyA RNA
    • cytoplasmic RNA
    • nuclear RNA
    • genomic DNA
    • protein
    • other
  • Label: Enter label compounds used to label the extracted molecule such as biotin, Cy3 and Cy5.
  • Array Data File and Comment[Array Data File md5]: Enter filename and md5 checksum pair for each raw data file.
  • Derived Array Data File and Comment[Derived Array Data File md5]: Enter filename and md5 checksum pair for each processed data file.
  • A list of filename and its md5 checksum (output of md5sum command) can be provided as a file <GEA submission ID>.md5 (e.g., ESUB000001.md5) (when the checksum values are provided in both SDRF and .md5 file, those in the .md5 are used).
  • Factor Value[enter experiment factor name here]: A user-defined name for each experimental factor studied by the experiment. These experimental factors represent the variables within the investigation (e.g. growth condition, genotype, organism part). The actual values of these variables will be listed in the "Factor Value []" columns. Example:
    • Factor Value[strain]
    • AT76
    • KU-2003
    • KU-PI499262
SDRF template, yellow-highlighted fields need to be filled by submitter

Select the entered SDRF file and continue.

Select entered SDRF file

Overview and submit

You can download the IDF and SDRF files and check them. When correction is necessary, go back to the previous tab and corrent metadata.

Submit the IDF and SDRF metadata by clicking the "Submit" button.

Check the IDF and SDRF and submit

Validation

When data files described in the IDF and SDRF are not found in the submission directory, an error message "Data file is not uploaded" is shown and the submission is aborted.

The validator checks submitted IDF and SDRF files according to the validation rules and gives warning and error messages. Errors need to be resolved for submission.

Warning and error messages

Accession number

GEA accession numbers are issued to completed GEA experiment.

You can allow reviewers access to private records by communicating a reviewer accesss token.

GEA accession numbers

Update submission

Update in each database

Withdraw archived objects

To withdrawing archived Experiment, please contact us.

Supplement: MD5

MD5 (Message Digest Algorithm 5) is a hash function which calculates a hash value (MD5 number, 32-digit numbers and letters) of a given file. Because the MD5 number of the damaged file is distinct from the original one, we can check whether the transferred file is intact or not by comparing the numbers before and after the file transfer.

Obtain MD5 number (Linux)

Obtain the MD5 numbers of the files by executing,

$ md5sum file1 file2
9F6E6800CFAE7749EB6C486619254B9C file1
B636E0063E29709B6082F324C76D0911 file2

Obtain MD5 number (Mac OS X)

Obtain the MD5 numbers of the files by executing,

$ md5 file1 file2
9F6E6800CFAE7749EB6C486619254B9C file1
B636E0063E29709B6082F324C76D0911 file2

Obtain MD5 number (Windows)

Install and run the Fsum Frontend (sourceforge.net/projects/fsumfe/) .
At first, tick off "md5".

Generate md5 in the tool 1

After clicking the [+] button, open the sequence data files that you need. You can select multiple files at the same time.

Generate md5 in the tool 2

Click the [Calculate hashes] button. The MD5 numbers of the files are displayed.
By clicking the [Export] button, you can obtain the list of the MD5 numbers as a html, a csv, or a xml file.

Generate md5 in the tool 3