September 30, 2016
DDBJ Data Analysis Challenge Committee

DNA Data Bank of Japan held the DDBJ Data Analysis Challenge from July 6 through August 31, 2016. The DDBJ Data Analysis Challenge is a machine learning competition using “‘International Nucleotide Sequence Database”, corresponding to life science big data provided by DDBJ, EBI, and NCBI. Participants need to submit their generated machine learning models to a collaborative website, UnivOfBigData. This time challenge task is “Predicting chromatin features from DNA sequence” . There were a total of 38 participants with cumulative total of 360 model submissions. We announce the top 3 Award Winners and the Student Prize winner who is the top of all student participants in DDBJ Data Analysis Challenge 2016.

DDBJ Data Analysis Challenge 2016 Award Winners

1st Prize of DDBJ Challenge Awards 2016 Information and Mathematical Science and Bioinformatics Co., Ltd.
MOCHIZUKI Masahiro
2nd Prize of DDBJ Challenge Awards 2016 RIKEN ACCC Bioinformatics Research Unit
MATSUMOTO Hirotaka(representative*), OZAKI Haruka(*)
*They participated in this Challenge as a team.
3rd Prize of DDBJ Challenge Awards 2016 BITS Co., Ltd.
OKAYAMA Toshitsugu
Student Prize of DDBJ Challenge Awards 2016 Master's Degree Program 1, Graduate School of Information Science and Technology, The University of Tokyo
KATO Takuya

 

Results

DDBJ Challenge Award AUC Model Design  Tool Version
1st Prize 0.94564 *2 Classifiers(Extremely Randomized Trees, CNN)
*Ensemble Learning(Stacking)
*External Data(Genomic Position, Gene Structure Annotation)
python=3.5
scikit-learn=0.17.1
chainer=1.10.0
2nd Prize 0.89859 *2 Classifiers(CNN, Product of Genomic Distance Decay Parameter and  Nearest Training Data Output)
*Ensemble Learning(Averaged)
*External Data(Genomic Position)
julia=0.4.6
python=2.7.10
skflow(tensorflow=0.8.0)
3rd Prize 0.85428 *7 Classifiers(Naive Bayes for Multivariate Bernoulli Models, Logistic Regression, Random Forest, Gradient Boosting, Extremely Randomized Trees, eXtreme Gradient Boosting, CNN)
*Ensemble Learning (Stacking)
python=2.7.11
numpy=1.10.4
scikit-learn=0.17
chainer=1.11.0
xgboost=0.4a30
Student Prize 0.84318 *3 Classifiers(LeNet like CNN, DeepBind like CNN, Variable filter DeepBind like CNN)
*Ensemble Learning(Soft Voting)
python=2.7
lasagne=0.2.dev1

  Contact: Please contact us from DDBJ Contact Web Form.