DNA Data Bank of Japan held the DDBJ Data Analysis Challenge from July 6 through August 31, 2016. The DDBJ Data Analysis Challenge is a machine learning competition using “‘International Nucleotide Sequence Database”, corresponding to life science big data provided by DDBJ, EBI, and NCBI. Participants need to submit their generated machine learning models to a collaborative website, UnivOfBigData. This time challenge task is “Predicting chromatin features from DNA sequence” . There were a total of 38 participants with cumulative total of 360 model submissions. We announce the top 3 Award Winners and the Student Prize winner who is the top of all student participants in DDBJ Data Analysis Challenge 2016.
DDBJ Data Analysis Challenge 2016 Award Winners
| 1st Prize of DDBJ Challenge Awards 2016 | Information and Mathematical Science and Bioinformatics Co., Ltd. MOCHIZUKI Masahiro |
| 2nd Prize of DDBJ Challenge Awards 2016 | RIKEN ACCC Bioinformatics Research Unit MATSUMOTO Hirotaka(representative*), OZAKI Haruka(*) *They participated in this Challenge as a team. |
| 3rd Prize of DDBJ Challenge Awards 2016 | BITS Co., Ltd. OKAYAMA Toshitsugu |
| Student Prize of DDBJ Challenge Awards 2016 | Master's Degree Program 1, Graduate School of Information Science and Technology, The University of Tokyo KATO Takuya |
Results
| DDBJ Challenge Award | AUC | Model Design | Tool Version |
|---|---|---|---|
| 1st Prize | 0.94564 | *2 Classifiers(Extremely Randomized Trees, CNN) *Ensemble Learning(Stacking) *External Data(Genomic Position, Gene Structure Annotation) |
python=3.5 scikit-learn=0.17.1 chainer=1.10.0 |
| 2nd Prize | 0.89859 | *2 Classifiers(CNN, Product of Genomic Distance Decay Parameter and Nearest Training Data Output) *Ensemble Learning(Averaged) *External Data(Genomic Position) |
julia=0.4.6 python=2.7.10 skflow(tensorflow=0.8.0) |
| 3rd Prize | 0.85428 | *7 Classifiers(Naive Bayes for Multivariate Bernoulli Models, Logistic Regression, Random Forest, Gradient Boosting, Extremely Randomized Trees, eXtreme Gradient Boosting, CNN) *Ensemble Learning (Stacking) |
python=2.7.11 numpy=1.10.4 scikit-learn=0.17 chainer=1.11.0 xgboost=0.4a30 |
| Student Prize | 0.84318 | *3 Classifiers(LeNet like CNN, DeepBind like CNN, Variable filter DeepBind like CNN) *Ensemble Learning(Soft Voting) |
python=2.7 lasagne=0.2.dev1 |
Contact: Please contact us from DDBJ Contact Web Form.