DDBJ Data Analysis Challenge (Closed)

Last updated:2016.9.30.
DDBJ Data Analysis Challenge
"DDBJ Data Analysis Challenge 2016" was closed. Thank you very much for your participation.

"DDBJ Data Analysis Challenge" is a machine learning competition using "'International Nucleotide Sequence Database' data", which is one of the life science big data provided by DDBJ. Even college students and/or researchers outside of life science field, can get an opportunity of studying "Machine Learning and Data Mining", through this Challenge. And, to join this Challenge more easily, DDBJ provides NIG Supercomputer System as your computer resources.

Participants

Anyone can participate in this Challenge. And if you hope, you can use NIG Supercomputer System for your data analysis. (Use of NIG Supercomputer System is subject to "Criteria for issuing user login accounts". Please read this page before applying NIG Supercomputer System.)

Time Line and Due Dates

 Start Accepting Applications: June 27, 2016
NIG Supercomputer System application
NIG Supercomputer System OSS Installation request
 Start Date: July 6, 2016
 Deadline for Applications: August 21, 2016
NIG Supercomputer System application
NIG Supercomputer System OSS Installation request
 End Date: August 31, 2016
 Result: September 30, 2016

Challenge Task

DNA Data Bank of Japan (DDBJ) supports a big data resource called by DDBJ Sequence Read Archive (DDBJ SRA), which contains DNA sequences genearated from high-throughput DNA sequencers. The secondary analytical database, ChIP-Atlas database (Dr.Oki of Kyushu Univ.) provides the annotation data of chromatin feature regions on genome sequences.

At this challenge task, please predict whether genomic regions corresponding to input DNA sequences
includes chromatin feature regions. Chromatin feature region is related to on-off function of gene expression, and corresponds to peak regions on a genome sequence of the ChIP-Atlas database.
In the ChIP-Atlas database, a combination of tissue/celltype conditions (CellType class) and functional Type (Antigen class) are curated as one experimental condition.

Challenge's target species is a plant. The number of conditions for the target plant is often over 100. The number of conditions in the challenge is reduced and composed of eight conditions for saving time of try and error on data modelling.

------------------------------------
Input training data: 60,000 DNA sequence
Input test data: 10,000 DNA sequence
Output training data: 8 conditions correct answer sets
-------------------------------------

 Input
One input sequence is composed of 200 bases, that is a ACGT sequence fragment with 200 length,
where the sequence is encoded as 01 code [Example: AATGC ... = 10001000000100100100 ...] so that
the length of a sequence is 800 digits.
Corresponding code: A = 1000, C = 0100, G = 0010, T = 0001, Other exceptions = 0000

 Output
Output correct answer sets of 8 conditions is also encoded as 01 code.
True answer is one, which means that the input DNA sequence contains chromatin feature regions.
Likewise zero is false answer so that it does not include the chromatin feature region.

On the submit stage, please submit the probability of true prediction with 10,000 rows (test axis)
and 8 columns (condition axis) in BigData University website.

 Data
(1) set up in University of Big Data (DDBJ-challenge.mat)
(2) NIG Supercomputer Phase2 /home/challenge/data/DDBJ-challenge.mat

Submit a Challenge

To submit a challenge, please enroll in "University of Big Data". When you sign up, your Google account is required.
Please submit the probability of true prediction with 10,000 rows (test axis) and 8 columns (condition axis) in University of Big Data website.
When you submit a challenge, intermediate ranks and intermediate scores are displayed in University of Big Data website. Intermediate ranks are displayed by nickname.

Challenge Award

We will announce the Top 3 DDBJ Challenge winners on September 30, 2016. (If you wish to remain anonymous, we will announce a nickname.) Final ranks will be released by nickname in University of Big Data website on September 1, 2016, 0:00 (JST).
To submit a paper about the reports on DDBJ Data Analysis Challenge, the award winners join as a co-author. For this reason, the top three Challenge winners will be submit the report for the data model. Also, we will release their reports by online.
We publish all DDBJ Data Analysis Challenge participant's nickname in Acknowledgments of a paper.

 Award Winners of the DDBJ Data Analysis Challenge 2016 (September 30)

Use of NIG Supercomputer System

 NIG Supercomputer System application will be stated on June 27, 2016.
 NIG Supercomputer System application
At the application, please read "Criteria for issuing user login accounts".
We accept NIG Supercomputer System application request from June 27 to August 21, 2016 from here. Please fill in the Purpose of use, "DDBJ Challenge". (example form)
Your accout is valid until August 31, 2016(JST).
If you would like to continue to use NIG Supercomputer System for your own Life Science research, please apply to renew your account from here.
Please specify the following items, when you apply.
 Select topic: Please specify "NIG Supercomputer System".
 Subject: Application for Continuing DDBJ Challenge account
* At the end of each fiscal year, a report must be submitted on the results or progress made in using the NIG Supercomputer.
We send the password by postal mail, within two weeks after your application.
 If you already have NIG Supercomputer account: We create a group for DDBJ Data Analysis Challenge in Supercomputer account, please apply from here.

 NIG Supercomputer System OSS Installation request
We accept NIG Supercomputer System OSS Installation request from June 27 to August 21, 2016 from here.
* Please note that installation takes 7 - 10 days after sending your request, and in some cases installation is unavailable.
 Please refer the following site about basic procedures for using NIG Supercomputer System.
 Login connection, submit SGE job
 Set up your programming environment (R, MATLAB, Python)

Use of MATLAB

During the period, MathWorks Japan Inc. provides software of MATLAB for DDBJ Data Analysis Challenge.

 MATLAB is available only DDBJ Data Analysis Challenge participants.
 There are two ways to use MATLAB R2016a.
(1) Install on local PC
(2) NIG Supercomputer GPU node
 Download MATLAB on local PC, please apply from here.
(It can be applied regardless of student.)
Please specify the following items, when you apply.

 University name: Please enter your company name or school name.
 Team name: Please enter your name or a nickname.
 Team member: Please enter "1".

Contact us

 Question about DDBJ Data Analysis Challenge
 Question about NIG Supercomputer System application, OSS Installation request
Please send your question from "Contact us" web form.

Links

 Machine Learning Solution Page
 Deep Learning with MATLAB

 Training a Deep Neural Network for Digit Classification – Example Code for MATLAB
 Machine Learning Made Easy (Web Seminar)
 Machine Learning with MATLAB (Web Seminar)

Corporate Sponsor

 Mathworks Japan
During the period, MathWorks Japan Inc. provides software of MATLAB for DDBJ Data Analysis Challenge.
 About license configuration

DDBJ Challenge Committee

 DDBJ Challenge Comittee

Eli Kaminuma, PhD : Center for Information Biology, National Institute of Genetics, Assistant Professor
Hisashi Kashima, PhD : Department of Intelligence Science and Technology, Kyoto University, Professor
Toshihisa Takagi, PhD : Center for Information Biology, National Institute of Genetics, Professor

DDBJ Data Analysis Challenge has been approved ethical review by NIG Institutional Review Board (IRB).

ページの先頭へ戻る