Joaquin Vanschoren

Welcome. I am assistant professor of Machine Learning at the Eindhoven University of Technology. My research focuses on the automation of machine learning and networked science. I founded OpenML.org, a collaborative machine learning platform where scientists can automatically log and share data, code, and experiments, and which automatically learns from all this data to help people perform machine learning better and easier. My other passion is large-scale data analysis on all types of data (social, streams, geo-spatial, sensors, networks, text).

Curriculum Vitae

News

2016.12.09 – Invited Talk @ NIPS 2016 - Challenges in Machine Learning Workshop

This session will be about Gaming and Education. Looking forward to seeing you at NIPS, and thanks to Isabelle Guyon and the other organizers.

2016.11.11 – Keynote Talk @ Dutch Society for Pattern Recognition

A very inspirational event with many examples of machine learning on medical data. Thanks to Veronika Cheplygina for inviting me!

2016.11.09 – OpenML won the Dutch Data Prize!

Thanks so much to the organizers for stimulating open science through this award, and thanks to the fantastic OpenML team for making it all happen!

2016.10.28 – Open Science Radio has interviewed us (me and Heidi Seibold) about OpenML. Have a listen!

Thanks to Matthias Fromm and Konrad Förstner for running a super-interesting podcast, and for giving us the oportunity to talk about OpenML!

2016.06.22 – Talk @ IBM Watson Research Center, NY

Thanks to Meinolf Sellmann, Horst Samulowitz and Josep Pon for a great day and interesting discussions at IBM.

2015.12.16 – Invited Talk @ Data@Sheffield [Slides]

A tutorial on OpenML targetted at scientists from many domains, at the Open Data Science @ Sheffield workshop and Data Hide event. Many, many thanks to Neil Lawrence and the Open Data Science Initiative for a splendid visit and engaging discussions.

2015.11.17 – Talk @ High Tech Campus Technology Seminar

Short introduction of OpenML, with applications in Healthcare, at the High Tech Campus Eindhoven.

2015.10.22 – Horizon Talk @ IDA 2015 [Slides]

In this Horizon talk, I proposed the idea of a data science collaboratory, where scientists across domains can collaborate effortlessly using each other's data and code. Joint work with Bernd Bischl, Frank Hutter, Michele Sebag, Balazs Kegl, Matthias Schmid, Giulio Napolitano, Katy Wolstencroft, Alan R. Williams, and Neil Lawrence.

2015.08.10 – Invited Talk @ RGU IDEA Seminar

I had the opportunity to present OpenML to the Robert Gordon University CS department and BCS Aberdeen. Thanks to Daniel C. Doolan and Farzan Majdani who made my visit possible. Thanks to Norman Bain for the video.

2015.07.21 – Invited Talk @ Statistical Computing 2015

On networked science, OpenML and using OpenML from statistical environments such as R. Followed by a hands-on tutorial by Giuseppe Casalicchio and Bernd Bischl. Thanks to Matthias Schmid.

2015.07.11 – Invited Talk @ ICML 2015 - AutoML Workshop [Slides]

On OpenML and building systems that learn from machine learning experiments, to assist people while analyzing data, or automate the process altogether. Thanks to Balazs Kegl and Frank Hutter.

2014.10.20 – Successful OpenML 2014 Workshop @ TU/e

Including a 4-day hackathon and great presentations. All presentations archived by the TIB (German National Library for Science and Technology).

2014.08.11 – KDnuggets discusses OpenML

Nice article by Ran Bi.

2014.07.04 – Invited Talk @ ECDA 2014 [Slides]

On open science, machine learning, OpenML and the benefits it brings for machine learning research, individual scientists, as well as students and practitioners.

2014.06.17 – Talk @ VIPx Eindhoven

On designed serendipity, or how discoveries are made by openly sharing data and ideas.

2013.09.20 – Invited Talk @ CLADAG 2013

Presenting the first beta version of OpenML. Thanks to John Shawe-Taylor and the PASCAL 2 Network.

2012.10.23 – HARVEST Grant from the PASCAL 2 Network.

The funding received will support work on OpenML, a new system to automatically share and reuse reproducible machine learning experiments. Together with Bernd Bischl, Luis Torgo, KNIME and RapidMiner.

2012.09.26 – Quest magazine covers our large-scale sensor data analysis research.

BiGGrid also interviewed me and made a nice video.

2012.06.26 – Free Competition research grant from the Dutch Scientific Research Foundation.

The funding received will support work on Massively Collaborative Data Mining. Master student Jan N. van Rijn will start his PhD on this topic.

2012.04.12 – Invited Talk @ Dutch Hadoop User Group (NL-HUG)

Presenting work on large-scale sensor data analysis using Hadoop.

2010.12.07 – Best Application Award @ SARA Hadoop training program

For programming Hadoop procedures for terabyte-scale sensor data analysis.

2009.09.11 – Best Demo Award @ ECMLPKDD 2009

For a demonstration of Experiment Databases for machine learning. Together with Hendrik Blockeel.

Research

Publications

Also see Google Scholar

Journals and journal proceedings

  1. Mantovani, R.G., Horvath, T., Cerri, R., Carvalho, A.P.L.F., Vanschoren, J. Hyper-parameter Tuning of a Decision Tree Induction Algorithm. Brazilian Conference on Intelligent Systems (BRACIS 2016)
  2. Eerikainen, L.M., Vanschoren, J., Rooijakkers, M.J., Vullings, R., Aarts, R.M. Reduction of false arrhythmia alarms using signal selection and machine learning. Physiological Measurement, 37 (8), 1204- 1216, 2016
  3. Bischl, B., Kerschke, P., Kotthoff, L., Lindauer, M., Malitsky, Y., Frechette, A., Hoos, H., Hutter, F., Leyton-Brown, K., Tierney, K., Vanschoren, J. ASlib: A Benchmark Library for Algorithm Selection. Artificial Intelligence, 237, 41-58, 2016
  4. Gao, B., Berendt, B. and Vanschoren, J. Towards understanding online sentiment expression - An interdisciplinary approach with subgroup comparison and visualization. Social Network Analysis and Mining, 6 (1), 68:1-68:16, 2016
  5. van Rijn, J.N., Holmes, G., Pfahringer, B., Vanschoren, J. Having a Blast: Meta-Learning and Heterogeneous Ensembles for Data Streams IEEE Proceedings of ICDM 2015
  6. Vanschoren, J., Bischl, B., Hutter, F., Sebag, M., Kegl, B., Schmid, M., Napolitano, G., Wolstencroft, K., Williams, A.R, Lawrence, N Towards a Data Science Collaboratory Advances in Intelligent Data Analysis XIV (IDA 2015), Lecture Notes in Computer Science 9385, XIX-XXI
  7. van Rijn, J.N., Abdulrahman, S.M., Brazdil, P. and Vanschoren, J. Fast Algorithm Selection Using Learning Curves Advances in Intelligent Data Analysis XIV (IDA 2015), Lecture Notes in Computer Science 9385, 298-309
  8. Vanschoren, J,. van Rijn, J.N. and Bischl, B. Taking machine learning research online with OpenML JMLR Workshop and Conference Proceedings (BigMine 2015), 41, 1-4, 2015
  9. Eerikainen, L.M., Vanschoren, J., Rooijakkers, M.J., Vullings, R., Aarts, R.M. Decreasing the False Alarm Rate of Arrhythmias in Intensive Care Using a Machine Learning Approach IEEE Computing in Cardiology, 42, 293-297, 2015
  10. Gao, B., Berendt, B. and Vanschoren, J. Who is more positive in private? Analyzing sentiment differences across privacy levels and demographic factors in Facebook chats and posts IEEE/ACM Proceedings of ASONAM 2015, 605-610
  11. Mantovani, R.G., Rossi, A.L.D., Vanschoren, J., Bischl, B. and Carvalho, A.C.P.L.F. To tune or not to tune: Recommending when to adjust SVM hyper-parameters via meta-learning IEEE Proceedings of IJCNN 2015, 1-8
  12. Mantovani, R.G., Rossi, A.L.D., Vanschoren, J., Bischl, B. and Carvalho, A.C.P.L.F. Effectiveness of Random Search in SVM hyper-parameter tuning IEEE Proceedings of IJCNN 2015, 1-8
  13. van Rijn, J.N., Holmes, G., Pfahringer, B. and Vanschoren, J. Algorithm Selection on Data Streams. Proceedings of Discovery Science 2014. Lecture Notes in Computer Science 8777, 325-336.
  14. Vanschoren, J., van Rijn, J.N., Bischl, B. and Torgo, L. OpenML: networked science in machine learning. ACM SIGKDD Explorations, 15 (2), 49-60, 2013
  15. Serban, F.*, Vanschoren, J.*, Kietz, J.U. and Bernstein, A. A Survey of Intelligent Assistants for Data Analysis. ACM Computing Surveys, 45 (3), Art. 31, 2013
  16. Vanschoren, J., Blockeel, H., Pfahringer, B. and Holmes, G. Experiment Databases: A new way to share, organize and learn from experiments. Machine Learning, 87(2), 127-158, 2012
  17. van Rijn, J., Bischl, B., Torgo, L., Gao, B., Umaashankar, V., Fischer, S., Winter, P., Wiswedel, B., Berthold, M.R., and Vanschoren, J. OpenML: A Collaborative Science Platform. Proceedings of ECMLPKDD 2013, Lecture Notes in Computer Science 8190, 645-649
  18. Reuttemann, P., Vanschoren, J. Scientific Workflow Management with ADAMS. Proceedings of ECMLPKDD 2012, Lecture Notes in Computer Science 7524, 833-837
  19. Vespier, U., Knobbe, A.J., Nijssen, S., Vanschoren, J. MDL-Based Analysis of Time Series at Multiple Time-Scales. Proceedings of ECMLPKDD 2012, Lecture Notes in Computer Science 7524, 371-386
  20. Leite, R., Brazdil P., Vanschoren, J. Selecting Classification Algorithms with Active Testing. Proceedings of MLDM 2012, Lecture Notes in Computer Science 7376, 117-131
  21. Vespier, U., Knobbe, A., Vanschoren, J., Miao, S., Koopman, A., Obladen, B., and Bosma, C. Traffic Events Modeling for Structural Health Monitoring. Proceedings of IDA 2011, Lecture Notes in Computer Science 7014, 276-387
  22. Vanschoren, J., Blockeel, H. A community-based platform for machine learning experimentation. Proceedings of ECMLPKDD 2009, Lecture Notes In Computer Science 5782, 750-754
  23. Vanschoren, J., Pfahringer, B., Holmes, G. Learning from the past with experiment databases. Proceedings of PRICAI 2008, Lecture Notes in Artificial Intelligence 5351, 485-496
  24. Vanschoren, J., Blockeel, H., Pfahringer, B., Holmes, G. Organizing the world's machine learning information. Proceedings of ISOLA 2008, Communications in Computer and Information Science, 17, 693-708
  25. Vanschoren, J., Blockeel, H. Investigating classifier learning behavior with experiment databases. Proceedings of GfKL 2008, Data Analysis, Machine Learning and Applications, 421-428
  26. Blockeel, H.*, Vanschoren, J.* Experiment databases: Towards an improved experimental methodology in machine learning. Proceedings of ECMLPKDD 2007, Lecture Notes in Computer Science 4702, 6-17
  27. (* Joint first author)

Peer reviewed conference and workshop proceedings

  1. Zhang, C., van Wissen, A., Lakens, D., Vanschoren, J., de Ruyter, B.E.R., IJsselsteijn, W.A. Anticipating habit formation: a psychological computing approach to behavior change support. UbiComp Adjunct, 2016: 1247-1254
  2. Bischl, B., Bossek, J., Casalicchio, G., Hofner, B., Kerschke, P., Kirchhoff, D., Lang, M., Seibold, H., Vanschoren, J. Connecting R to the OpenML project for Open Machine Learning. useR Conference 2016
  3. Abdulrahman, S, Brazdil, P., van Rijn, J.N., Vanschoren, J. Algorithm Selection via Meta-learning and Sample-based Active Testing. MetaSel Workshop @ PKDD/ECML 2015, CEUR Workshop Proceedings 1455, 55-66
  4. Mantovani, R.G., Rossi, A.L.D., Vanschoren, J., Carvalho, A.C.P.L.F. Meta-learning Recommendation of Default Hyper-parameter Values for SVMs in Classification Tasks MetaSel Workshop @ PKDD/ECML 2015, CEUR Workshop Proceedings 1455, 80-92
  5. van Rijn, J.N., Vanschoren, J. Sharing RapidMiner Workflows and Experiments with OpenML. MetaSel Workshop @ PKDD/ECML 2015, CEUR Workshop Proceedings 1455, 93-103
  6. Vukicevic, M., Radovanovic, S., Vanschoren, J., Napolitano, G., Delibasic, B. Towards a Collaborative Platform for Advanced Meta-Learning in Healthcare Predictive Analytics. MetaSel Workshop @ PKDD/ECML 2015, CEUR Workshop Proceedings 1455, 112-114
  7. Knobbe A.J., Meeng M. Vanschoren J., Rees Jones S., Merlo Penning S. Reconstructing Medieval Social Networks from English and Latin Charters. Population Reconstruction 2014
  8. van Rijn, J.N., Holmes, G., Pfahringer, B. and Vanschoren, J. Towards Meta-learning on Data Streams. MetaSel Workshop @ ECAI 2014, CEUR Workshop Proceedings, 1201, 37-38
  9. Vanschoren, J., Braun, M. and Ong, C.S. Open science in machine learning. Proceedings of CLADAG 2013, 462-465.
  10. van Rijn, J., Umaashankar, V., Fischer, S., Bischl, B., Torgo, L., Gao, B., Winter, P., Wiswedel, B., Berthold, M.R., and Vanschoren, J. A RapidMiner extension for Open Machine Learning. Proceedings of RCOMM 2013, 59-70.
  11. van Rijn, J. and Vanschoren, J. OpenML: An Open Science Platform for Machine Learning. Machine Learning Conference of Belgium and The Netherlands 2013, 99-100
  12. Miao S., Vespier U., Vanschoren J. Knobbe A.J., De Gouveia da Costa Cachucho R.E. Modeling Sensor Dependencies between Multiple Sensor Types. Machine Learning Conference of Belgium and The Netherlands 2013, p. 66-73
  13. Vanschoren, J. The Experiment Database for machine learning. PlanLearn Workshop @ ECAI 2012, CEUR Workshop Proceedings, 950, 30-37
  14. Leite, R., Brazdil P., Vanschoren, J. Selecting Classification Algorithms with Active Testing on Similar Datasets. PlanLearn Workshop @ ECAI 2012, CEUR Workshop Proceedings, 950, 30-37
  15. Vespier, U., Knobbe, A., Nijssen, S., Vanschoren, S. MDL-Based Identification of Relevant Temporal Scales in Time Series. Workshop on Information Theoretic Methods in Science and Engineering, WITMSE 2012
  16. Gao, B. and Vanschoren, J. Visualizations of Machine Learning Behavior with Dimensionality Reduction Techniques. Machine Learning Conference of Belgium and The Netherlands 2011, 35-42.
  17. Miao, S., Knobbe, A., Vanschoren, J., Vespier, U., Koopman, A., Cachucho, R., Chen, X. A Range of Data Mining Techniques to Correlate Multiple Sensor Types. Dutch-Belgian Database Day 2011, Art.5
  18. Vanschoren, J., Soldatova, S. Exposé: An Ontology for Data Mining Experiments. Workshop on Third Generation Data Mining @ ECMLPKDD 2010, 31-46
  19. Vanschoren, J., Soldatova, S. Collaborative Meta-Learning. Planning to Learn workshop @ ECAI 2010, 37-46
  20. Vanschoren, J., Blockeel, H. Stand on the shoulders of giants: towards a portal for collaborative experimentation in data mining. 3rd Generation Data Mining Workshop @ ECMLPKDD 2009, 88-99
  21. Bauzá, M., Vanschoren, J., Funes, M.P., Barrera, G.M., López De Luise, D. Sistema de Autentificación Facial. Congreso de Inteligencia Computacional Aplicada (CICA) 2009
  22. Vanschoren, J. Experiment databases for machine learning. NIPS Workshop on Machine Learning Open Source Software @ NIPS 2008
  23. Vanschoren, J., Blockeel, H., Pfahringer, B., Holmes, G. Experiment databases: Creating a new platform for meta-learning research. Planning to Learn Workshop @ ICML 2008, 10-15
  24. Vanschoren, J., Van Assche, A., Vens, C., Blockeel, H. Meta-learning from experiment databases: An illustration. Machine Learning Conference of Belgium and The Netherlands 2007, 120-127
  25. Vanschoren, J., Blockeel, H. Towards understanding learning behavior. Machine Learning Conference of Belgium and The Netherlands 2006, 89-96

Book chapters

  1. Lawrynowicz, A., Esteves, D., Panov. P., Soru, T., Dzeroski, S., Vanschoren, J An Algorithm, Implementation and Execution Ontology Design Pattern. In: Studies on the Semantic Web (forthcoming), 2016
  2. Vanschoren, J., Vespier, U., Miao, S., Cachucho, R. and Knobbe, A. Large-scale sensor network analysis. In: Big Data Management, Technologies, and Applications (W-C. Hu, N. Kaabouch, ed.), IGI Global, 2013
  3. Vanschoren, J. Meta-learning architectures. In: Meta-learning in Computational Intelligence (N. Jankowski, W. Duch, K. Grabczewski, ed.), Springer, 2011
  4. Berendt, B., Vanschoren, J. and Gao, B. Datenanalyse und -visualisierung. In: Handbuch Forschungsdatenmanagement (S. Büttner, H-C. Hobohm, L. Müller, ed.), Bock+Herchen, 2011
  5. Vanschoren, J., Blockeel, H. Experiment Databases. In: Inductive Databases and Constraint-Based Data Mining (S. Dzeroski, B. Goethals, P. Panov, ed.), Springer, 2010

Books and proceedings edited

  1. Vanschoren, J., Brazdil, P., Giraud-Carrier, C.G., Kotthoff, L. (Eds.) Proceedings of the 2015 International Workshop on Meta-Learning and Algorithm Selection @ ECMLPKDD CEUR Workshop Proceedings 1455, CEUR 2015
  2. Vanschoren, J., Brazdil, P., Soares, C., Kotthoff, L. (Eds.) Proceedings of the 2014 International Workshop on Meta-Learning and Algorithm Selection @ ECAI CEUR Workshop Proceedings 1201, CEUR 2014
  3. Vanschoren, J., Brazdil, P., Kietz, J-U. (Eds.) Proceedings of the International Workshop on Planning to Learn @ ECAI CEUR Workshop Proceedings 950, CEUR 2012
  4. Vanschoren, J., Duivesteijn, W. (Eds.) The Silver Lining. Proceedings of the International Workshop on Learning from Unexpected Results @ ECMLPKDD Leiden University
  5. van der Putten, P.H.W, Veenman, C., Vanschoren, J., Israel, M., Blockeel, H. (Eds.) Proceedings of the 20th Annual Belgian-Dutch Conference on Machine Learning Leiden University, 2011

Dissertations

  1. Vanschoren, J. Understanding Machine Learning Performance with Experiment Databases PhD Thesis, Katholieke Universiteit Leuven, 2010
  2. Vanschoren, J. A framework for high-level perception MSc Thesis, Katholieke Universiteit Leuven, 2005

Invited Talks

  1. OpenML in research and education Workshop on Challenges in Machine Learning @ NIPS 2016 9 December 2016
  2. Democratizing and Automating Machine Learning Dutch Society for Pattern Recognition 11 November 2016
  3. Collaborative Machine Learning IBM Watson Research Center 22 June 2016
  4. Collaborative Machine Learning Open Data Science Sheffield 16 December 2015
  5. Towards a Data Science Collaboratory (Horizon Talk) Intelligent Data Analysis 2015 22 October 2015
  6. Towards Networked and Automated Machine Learning IDEA Seminar, Robert Gordon University 10 August 2015
  7. OpenML: Networked Science in Machine Learning Statistical Computing 2015 21 July 2015
  8. OpenML: A Foundation for Networked and Automatic Machine Learning AutoML Workshop @ ICML 2015 11 July 2015
  9. OpenML: Networked science in machine learning Université Paris-Saclay, INRIA 4 November 2014
  10. OpenML: Open science in machine learning ECDA 2014 4 July 2014
  11. OpenML: Open science in machine learning TU Dortmund, CS Department 30 January 2014
  12. Open science in machine learning CLADAG 2013 20 September 2013
  13. Data Science and sensor data Dutch Hadoop User Group 12 April 2012

Awards

  1. Best Demo Award ECMLPKDD 2009
  2. Best Application Award SARA Hadoop Day 2010

Service

PhD Jury Membership

  1. Jakub Smid, Charles University Prague, Sep 2016
  2. Bo Gao, Katholieke Universiteit Leuven, Dec 2015

Conference organization

  1. General Chair Learning and Intelligent Optimization Conference (LION 2016)
  2. Associate Chair European Conference on Machine Learning (ECMLPKDD 2013)
  3. Program Chair Machine Learning Conference of Belgium and the Netherlands (Benelearn 2011)
  4. Program Chair Machine Learning Conference of Belgium and the Netherlands (Benelearn 2010)

Workshop chair

  1. Configuration and Selection of Algorithms (COSEAL 2016)
  2. Open Machine Learning Developer Workshop (OpenMLdev 2016)
  3. Automatic Machine Learning Workshop (AutoML)
  4. Open Machine Learning @ Lorentz Center (OpenML 2016)
  5. Open Machine Learning (OpenML 2015)
  6. Metalearning and Algorithm Selection @ ECMLPKDD 2015 (MetaSel 2015)
  7. Open Machine Learning (OpenML 2014)
  8. Metalearning and Algorithm Selection @ ECAI 2014 (MetaSel 2014)
  9. The Silver Lining, Learning from Unexpected Results @ ECMLPKDD 2012 (Silver 2012)
  10. Planning to Learn @ ECAI 2012 (PlanLearn 2012)

Journal referee

  • Machine Learning Journal (MLJ)
  • Journal of Machine Learning Research (JMLR)
  • Data Mining and Knowledge Discovery (DaMi)
  • Semantic Web Journal (SWJ)
  • Computational Intelligence (COIN)

Programme committee member

  • ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2016)
  • European Conference on Machine Learning (ECMLPKDD 2012-2015)
  • Extended Semantic Web Conference (ESWC 2011 2015)
  • European Conference on Artificial Intelligence (ECAI 2014)
  • Knowledge Discovery and Information Retrieval (KDIR 2010-2012)

Research visits

  • Robert Gordon University, Aberdeen, UK (August 9-12, 2015)
  • University of Bournemouth, UK (February 16-19, 2015)
  • INRIA-Saclay, Paris, France (November 3-7, 2014)
  • University of Dortmund, Germany (January 27-31, 2014)
  • University of Waikato, New Zealand (February-March 2011)
  • Universities of Geneva and Zurich, Switzerland (June 14-18, 2010)
  • University of Porto, Portugal (June 7-11, 2010)
  • University of Aberystwyth, UK (July-August, 2009)
  • Jozef Stefan Institute, Slovenia (July 4-11, 2009)
  • University of Waikato, New Zealand (March-June, 2008)
  • University of Indiana, USA (August 2004)

Web Technology

The web today is a growing universe of interlinked web pages and web apps, teeming with interactive content. It is the result of the ongoing efforts of an open web community that helps define web technologies, like HTML5, CSS3, Javascript libraries and Web frameworks. This course provides the student with knowledge of and insight into the rapidly evolving field of web technology. The focus is on hands-on experience with a wide variety of these technologies, enabling students to develop their own web applications, from small interactive sites to the next Facebook.

Course details

Please check Canvas for course details and other relevant information.

Foundations of Data Mining

Machine learning is the science of making computers act without being explicitly programmed. Instead, algorithms are used to find patterns in data. It is so pervasive today that you probably use it dozens of times a day without knowing it, for instance in web search, speech recognition, and (soon) self-driving cars. It is also a crucial component of data-driven industry (Big Data), scientific discovery, and modern healthcare. In this class, you will learn the foundations of how data mining and machine learning work internally, understand when and how to use key concepts and techniques, and gain hands-on experience in getting them to work for yourself. You'll learn about the theoretical underpinnings of data analysis, and leverage that to quickly and powerfully apply this knowledge to tackle new problems.

This course on Canvas.

This course on OASE.

People

  • dr. ir. Joaquin Vanschoren (j.vanschoren@tue.nl) MF 7.104a - Responsible Lecturer
  • dr. Mykola Pechenizkiy (m.pechenizkiy@tue.nl) MF 7.099 - Lecturer
  • dr. Anne Driemel (a.driemel@tue.nl) MF 7.073 - Lecturer

Learning objectives

By the end of this course, you should be able to:

  • Understand how data mining algorithms algorithms work: how they find patterns in data.
  • Reason about when and how to use them, and apply them successfully in practice.
  • Understand the mathematical foundations of data mining techniques, and use this to derive fundamental properties.
  • Run practical experiments to experience first-hand how data mining algorithms behave on real data.
  • Explore how algorithm parameters and data properties affect the effectiveness of predictive models, and how better models can be built.
  • Formulate data analysis problems in the terminology of data mining.
  • Understand the challenges and common problems that occur when approaching data mining/machine learning problems (such as overfitting, curse of dimensionality) and how to counter these challenges (bias/variance trade-off, dimensionality reduction).

Required prior knowledge: While there are no strict requirements, it is highly recommended to have a working knowledge of statistics, and to have programming experience. Programming is part of the assignments. The course will mostly feature examples from R, but languages such as Python can also be used.

Course Structure

The course has the following weekly contact hours:

  • Monday, 9:30 - 10:30: Q&A session (PAV J17)
  • Mondays, 10:45 - 12:30: Plenary Lectures (PAV J17)
  • Thursdays, 13:45 - 15:30: Plenary Lectures (AUD 16)

Evaluation

There is no exam. Students are evaluated using a series of 3 problem sets, containing both theoretical and practical assignments. Students work in teams of 2 people, and teams are rotated between problem sets.

Materials

We use Canvas for posting announcements, assignments, lecture slides, etc. It is your responsibility to keep up to date with postings and activities, but these will also clearly be announced in class or by email.

Schedule

This schedule is preliminary. The order may change and parts of lectures may be removed (or added).

Feb 1 Introduction to Data Mining
An overview of the field
Vanschoren
Feb 4 Similarity and Distances
Nearest neighbor, Jaccard similarity, Locality sensitive hashing, MinHashing, Nearest Neighbor search
Driemel
Spring break
Feb 15 Clustering
Lloyd's algorithm (kMeans), Gonzales' algorithm
Driemel
Feb 18 Dimensionality Reduction
High-dimensional spaces, Random projections, PCA, Multidimensional scaling
Driemel
Feb 22 Metric embeddings
IsoMap, Frechet’s embedding, Bourgain’s embedding
Driemel
Feb 25 Machine Learning software
Interactive workshop on machine learning with R, Python and OpenML
Vanschoren
Feb 29 Rules and decision trees (Symbolic Learning)
Rule learning, separate-and-conquer, covering algorithm. Growing decision trees, information gain, regularization (pruning). Overfitting and other issues. First-Order rules, inverse deduction.
Vanschoren
Mar 3 Evaluation and optimization
Avoiding overfitting. Cross-validation. ROC analysis, Bias-Variance analysis. Optimizing hyperparameters.
Pechenizkiy
Mar 7 Instance-based learning (Learning by Analogy 1)
k-Nearest Neighbor, Locally weighted regression
Vanschoren
Mar 10 Kernel methods (Learning by Analogy 2)
Linear models, least-squares, Support Vector Machines, maximal margin, Kernel methods.
Vanschoren
Mar 14 Ensemble Learning (Cancelled due to illness)
Bagging, RandomForests, Boosting, AdaBoost
Pechenizkiy
Mar 17 Ensemble Learning (Cancelled due to illness)
Gradient boosting, Stacking
Pechenizkiy
Mar 21 Neural Networks (Connectionist Learning)
The perceptron. Single-layer neural networks.
Vanschoren
Mar 24 Neural Networks (Connectionist Learning)
Multi-layer neural networks, backpropagation. Deep learning, autoencoders.
Vanschoren
Mar 28 No lecture (TU/e closed)
Mar 31 Ensemble Learning (Catch up on cancelled lectures). Vanschoren

Deadlines:

  • Assignment 1a: Feb 18
  • Assignment 1b: Feb 25
  • Assignment 1c: Mar 3
  • Assignment 2a: Mar 10
  • Assignment 2b: Mar 24
  • Assignment 2c: Mar 31
  • Assignment 3: Apr 14

Course Policies

Participation. As this class endeavors to teach professional skills, we ask that students act professionally and treat all course participants with respect. We also encourage you to offer your ideas and thoughts to the class and to question the material presented.

Assignments. Assignments are due at the time and in the manner specified in the assignment description. Late work will lose 33% of its original point-value for each day late, and once solutions are posted or discussed, late submissions will not be accepted.

Plagiarism. Plagiarism and cheating will not be tolerated. University policy will be adhered to in all such cases. There is a difference between collaboration and plagiarism. Plagiarism is the act of using another’s work without giving them credit for it. Collaboration is the exchange of ideas, the debate of issues and the examination of readings among each other that enables you to arrive at your own independent thoughts and designs.

People

Current students

Graduate students and PhD students are the heart of the creative research and development work. At present I’m fortunate to work with the following PhD and Master students:

  • Jan N. Van Rijn, PhD Student, Meta-learning on Stream data and OpenML
  • Rafael Mantovani, PhD Student, Meta-learning and Optimization
  • Chao Zhang, PhD Student, e-Coaching for Continuous Personal Health
  • Bo Gao, PhD Student, Social Networks and Privacy
  • Sjoerd van Bavel, Master Student, Predicting Heat Capacity in Greenhouses, 2016-2017
  • Roy Haanen, Master Student, Predicting Aircraft Performance on Final Approach, 2015-2016

Former Doctoral students

  • Karthik Srinivasan (PDEng), Preventing Burglaries and Other Incidents, TU Eindhoven, 2014-2015.

Former Master students

  • Chung-Kit Lee, Burglary Prediction Model, 2015-2016
  • Hilda F. Bernard, Enhanced Sleepiness Prediction with Improved Algorithm Selection and Hyperparameter optimization, 2015-2016
  • Mikhail Evchenko, Frugal Learning: Applying Machine Learning with Minimal Resources, 2015-2016
  • Kris van Tienhoven, Gamification for OpenML, 2015-2016
  • Ruben Moonen, Object Recognition Framework using information retrieval and machine learning techniques, 2013-2014
  • Anton den Hoed, MapReduce Algorithms for Time Series Data, 2011-2012
  • Mohammed Alaeikhanehshir, Data mining to improve customer service, 2011-2012
  • Thomas De Craemer, Algorithm for a Recommendation Engine, 2010-2011
  • Wouter Deroey, Semi-automated Corpus-based Ontology Population, 2010-2011
  • Xushuang Gao, Active meta-learning, 2009-2010
  • Bo Gao, Advanced visualizations for learning behavior, 2009-2010
  • Jeroen Peelaerts, Visualizing learning behavior, 2007-2008
  • Jan Callewaert, Simulating Biologically Inspired Brood Sorting in Ant-Like Agents, 2005 - 2006
  • Anton Dries, DM_square, Analysis of Data Mining Results Through Data Mining, 2005 - 2006