This page is intended for PhD students performing their research activities in the framework of the SmartData@PoliTO lab. While the lab is concerned with an array of different research activities covering multiple application areas, all of these are underpinned by the concept that handling large amounts of data requires smart methods and algorithms. Therefore, PhD students working in the lab can highly benefit from learning and employing state-of-the-art methods in data science and machine learning during their doctoral program.
To this end, the lab proposes to first- and second-year PhD students involved in the lab a rationalized offer in terms of PhD courses providing hard skills, in order to create an educational PhD track on Data Science and Machine Learning. The track involves a selection of suitable PhD courses covering foundational and application-oriented topics, with an interdisciplinary approach that leverages the educational offer of several departments of the Politecnico, including Electronics and Telecommunications, Control and Computer Engineering, Mathematical Sciences, Applied Science and Technology, and Energy. The courses are organized as follows. For each course we provide the number of hours, the department organizing the course, and a description of the relevance of the course to data science and machine learning.
Suggested educational credits
Statistics is a fundamental component of modern Data Science. This course reviews Probability Theory and Calculus (conditional probability, Bayes theorem, independence, limit theorems, marginal and conditional distributions, basic stochastic processes, simulation), the fundamental concepts of frequentist Statistics (hypothesis testing, confidence theory, prediction, design of experiments) and of Bayesian Statistics (prior/posterior dynamics, credible regions, decision theory), both univariate and multivariate, and their various computationally intensive implementations.
Note: for students already having a suitable background on statistics, we suggest to replace this course with Topological Data Analysis from the list of additional credits below.
The course is held by Pietro Laface and Sandro Cumani.
The availability of huge amounts of data in the most diverse application domains highlights the need of automatic algorithms and tools for information management and extraction, which allow mining relevant and actionable information from data. The course will provide an introduction to the main data mining techniques (classification, clustering and association rule mining) and will describe some algorithmic implementation of the most relevant analysis techniques.
- Stochastic optimization and optimal learning: The course addresses decision making under uncertainty. This topic needs strong mathematical foundations. We consider, among other things, dynamic programming as the proper foundation of reinforcement learning. While standard methods deal with exogenous uncertainty, in several practical settings we may influence our degree of uncertainty by a proper choice of experiments. In this course, we consider online learning, i.e., stochastic dynamic programming models in which the state includes current knowledge. We do so by integrating optimization theory and Bayesian statistics, which is another essential topic in machine learning. Link to course page
- Numerical Optimization: Machine learning and numerical optimization are strictly interlaced fields, since most machine learning problems call for the solution of a numerical (usually constrained) optimization problem. In particular, the availability of numerical methods well suited for effectively solving different kinds of optimization problems (such as linear, quadratic, nonlinear, semidefinite programming) allows to enlarge the set of models which can be considered in the machine learning framework. Furthermore, in order to tackle very large sets of data, it is of paramount importance to resort to numerical methods specifically designed for large scale optimization problems. Link to course page
The goal of the course is to overview the main data mining and machine learning techniques aimed at analyzing textual data as well as to introduce the main opensource instruments nowadays available for text preparation and analysis. It presents most common techniques for text preparation, transformation, and summarization. Furthermore, it discusses how to adapt established machine learning approaches to cope with textual datasets.
The course is held by Luca Cagliero.
This course addresses random projections, a powerful method for data representation and hashing which can be used for dimensionality reduction, security and a number of applications involving big data. The course covers the mathematical aspects of compressed sensing, algorithms for reconstructing the data from random projections, and a few key applications in data analysis and communications. Moreover, it also deals with recent techniques for joint design of hardware and algorithms based on lightweight signal adaptation.
This course presents a selection of prominent approaches for constructing predictive models of complex systems, starting from a finite but possibly large set of data collecting response samples from direct measurements or first-principle simulations. Such reduced-order models aim at reproducing the dynamic behavior of the underlying system in a compact approximate form, without any knowledge of its internal structure. Under a methodological standpoint, this course bridges various approaches from the disciplines of system identification, approximation theory, model order reduction and machine learning.
The courses above amount to 130 hours of hard skills.
Additional educational credits
After a basic introduction of probability theory and statistics, the course will offer a wide overview of current problems and techniques in machine learning. The perspective is a probabilistic one, mainly focusing on maximum likelihood estimators. Among the topics discussed are regression problems, clustering, principal component analysis, classification with deep neural networks, reinforcement learning, and generative models.
Differences and correlations among the different approaches are shown, together with the backgrounds and the breakthroughs that led to the current “Deep learning” hype.The course is held by Giovanni Squillero.Link to course page
Electrical load pattern analysis is based on metered data at different spatial and time resolutions, and at different levels of aggregation. The course addresses a number of data analytics applications that use the large amount of data gathered from the electricity meters, seen from the point of view of the domain expert on energy systems. The applications illustrated refer to electrical load modelling (with deterministic and probabilistic representations), categorisation, profiling, management, forecasting and control. Various techniques are exploited, including unsupervised clustering, neural networks and other machine learning tools.
Link to course page
The course is intended for students interested in machine learning, including those who are beginning a career in deep learning and artiﬁcial intelligence research. The other target audience is software engineers who do not have a machine learning or statistics background but want to rapidly acquire one and begin using deep learning in their product or platform. Deep learning has already proved useful in many software disciplines, including computer vision, speech and audio processing, natural language processing, robotics, bioinformatics and chemistry, video games, search engines.
The course is organized in three parts. The first one introduces the basic machine learning concepts. The second one describes the most established deep learning algorithms, which are essentially solved technologies. The final part describes more speculative ideas that are widely believed to be important for future research in deep learning.
Link: not available yet (the course is new with respect to previous years)
Link to course page
In 2017/2018 we contributed to