This page is intended for PhD students performing their research activities in the framework of the SmartData@PoliTO lab. While the lab is concerned with an array of different research activities covering multiple application areas, all of these are underpinned by the concept that handling large amounts of data requires smart methods and algorithms. Therefore, PhD students working in the lab can highly benefit from learning and employing state-of-the-art methods in data science and machine learning during their doctoral program.
To this end, the lab proposes to first- and second-year PhD students involved in the lab a rationalized offer in terms of PhD courses providing hard skills, in order to create an educational PhD track on Data Science and Machine Learning. The track involves a selection of suitable PhD courses covering foundational and application-oriented topics, with an interdisciplinary approach that leverages the educational offer of several departments of the Politecnico, including Electronics and Telecommunications, Control and Computer Engineering, Mathematical Sciences, Applied Science and Technology, and Energy. The courses are organized as follows. For each course we provide the number of hours, the department organizing the course, and a description of the relevance of the course to data science and machine learning.
Suggested educational credits
Period: April – July
Statistics is a fundamental component of modern Data Science. This course reviews Probability Theory and Calculus (conditional probability, Bayes theorem, independence, limit theorems, marginal and conditional distributions, basic stochastic processes, simulation), the fundamental concepts of frequentist Statistics (hypothesis testing, confidence theory, prediction, design of experiments) and of Bayesian Statistics (prior/posterior dynamics, credible regions, decision theory), both univariate and multivariate, and their various computationally intensive implementations.
Note: for students already having a suitable background on statistics, we suggest to replace this course with Top Data Analysis from the list of additional credits below.
Period: January
The availability of huge amounts of data in the most diverse application domains highlights the need of automatic algorithms and tools for information management and extraction, which allow mining relevant and actionable information from data. The course will provide an introduction to the main data mining techniques (classification, clustering and association rule mining) and will describe some algorithmic implementation of the most relevant analysis techniques.
Period: November
The goal of the course is to overview the main data mining and machine learning techniques aimed at analyzing textual data as well as to introduce the main opensource instruments nowadays available for text preparation and analysis. It presents most common techniques for text preparation, transformation, and summarization. Furthermore, it discusses how to adapt established machine learning approaches to cope with textual datasets.
The course is held by Luca Cagliero.
Period: TBD
This course addresses random projections, a powerful method for data representation and hashing which can be used for dimensionality reduction, security and a number of applications involving big data. The course covers the mathematical aspects of compressed sensing, algorithms for reconstructing the data from random projections, and a few key applications in data analysis and communications. Moreover, it also deals with recent techniques for joint design of hardware and algorithms based on lightweight signal adaptation.
Period: TBD
This course presents a selection of prominent approaches for constructing predictive models of complex systems, starting from a finite but possibly large set of data collecting response samples from direct measurements or first-principle simulations. Such reduced-order models aim at reproducing the dynamic behavior of the underlying system in a compact approximate form, without any knowledge of its internal structure. Under a methodological standpoint, this course bridges various approaches from the disciplines of system identification, approximation theory, model order reduction and machine learning.
The courses above amount to 130 hours of hard skills.
Period: April – June
Topological Data Analysis (TDA) is a novel framework of techniques, mainly devoted to producing summaries of complex data sets. Based on algebraic topology, TDA has several applications and presents many open research problems. The course will start with a crash introduction to homology, followed by a thorough insight on persistent homology. Applications to real data and computational issues as well as an account of the existing software will be presented as well.
Link to course page
Period: March
Generative Adversarial Networks (GANs) have achieved impressive results in generating fake images that look realistic. GANs introduced the concept of adversarial training where two neural networks are trained by playing a game against each other. Adversarial training is a powerful concept that has been applied to wide variety of problems across different fields including generative models for images, time series, or DNA/protein sequences, unsupervised image-to-image translation, domain adaptation, regularization of inverse problems and many more. This course will provide students with the theoretical foundations of adversarial training as well as the most recent practical examples of its use. The target audience is cross-disciplinary as adversarial training has proved itself to be a staple of modern deep learning across many fields.Period: March
This course gives a broad yet rigorous introduction to machine learning and statistical pattern recognition. It focuses on supervised generative and discriminative learning models, analyzing some important topics such as model architectures, training and evaluation techniques.The course is held by Pietro Laface and Sandro Cumani
Link to course page
Period: TBD
“Mimetic Learning” illustrates several heuristic methodologies able to tackle complex problems. While the techniques differ in many respects, they all share an attempt to learn the optimal strategy by mimicking natural processes. In more details, the course introduces the vast family of algorithms that have been labeled either “machine learning” or “evolutionary algorithms”, and puts them into an historical perspective.Differences and correlations among the different approaches are shown, together with the backgrounds and the breakthroughs that led to the current “Deep learning” hype.
The course is held by Giovanni Squillero.
Period: May-June
In this course, we will explore data analysis and prediction topics using complex network datasets. In particular, our focus will be on temporal and structural data. We will exploit real complex networks from popular repositories. The course will cover the following topics: introduction to network science, probabilistic network models, graph visualization techniques, random walks over graphs, cascades and time series. During each class, we will see examples using Python programming language and each student will execute some small programming assignment and data analysis and visualization, possibly using data from their personal research. The course is held by Luca Vassio and Martino Trevisan.
Period: TBD in A.A. 2020-2021
The course aims at introducing some of the main tools for supporting machine learning algorithms. In particular, the focus will be on the computation of the numerical solution of very large scale nonlinear optimization problems. Both unconstrained and constrained problems will be tackled, describing different methods suitable for the various problems according to their classification (e.g., nonlinear least square problems, quadratic programming problems, purely nonlinear problems…). Foundations of stochastic dynamic programming will be also introduced.Period: March – April
Period: TBD
The course is intended for students interested in machine learning, including those who are beginning a career in deep learning and artificial intelligence research. The other target audience is software engineers who do not have a machine learning or statistics background but want to rapidly acquire one and begin using deep learning in their product or platform. Deep learning has already proved useful in many software disciplines, including computer vision, speech and audio processing, natural language processing, robotics, bioinformatics and chemistry, video games, search engines.
The course is organized in three parts. The first one introduces the basic machine learning concepts. The second one describes the most established deep learning algorithms, which are essentially solved technologies. The final part describes more speculative ideas that are widely believed to be important for future research in deep learning.
Link to course page not available yet
Period: January
Artificial Intelligence (AI) has become increasingly present and integrated in our daily lives, with complex and often unpredictable solutions. This complexity and unpredictability make it difficult for users to understand, trust, and adopt such solutions with success. The course aims at introducing methodologies and techniques to design and build intelligent interactive systems that are usable and useful for all, also reflecting on the impact that AI applications have and will have on their users. A combination of lectures and workshop-style sessions will introduce students to different techniques for creating such intelligent systems across a few selected domains.
Period: TBD
- Stochastic optimization and optimal learning The course addresses decision making under uncertainty. This topic needs strong mathematical foundations. We consider, among other things, dynamic programming as the proper foundation of reinforcement learning. While standard methods deal with exogenous uncertainty, in several practical settings we may influence our degree of uncertainty by a proper choice of experiments. In this course, we consider online learning, i.e., stochastic dynamic programming models in which the state includes current knowledge. We do so by integrating optimization theory and Bayesian statistics, which is another essential topic in machine learning. Link to course page
- Numerical Optimization: Machine learning and numerical optimization are strictly interlaced fields, since most machine learning problems call for the solution of a numerical (usually constrained) optimization problem. In particular, the availability of numerical methods well suited for effectively solving different kinds of optimization problems (such as linear, quadratic, nonlinear, semidefinite programming) allows to enlarge the set of models which can be considered in the machine learning framework. Furthermore, in order to tackle very large sets of data, it is of paramount importance to resort to numerical methods specifically designed for large scale optimization problems.