Current PhD | SmartData@PoliTO

2021

Sviluppo di tecniche di modellazione, ottimizzazione e predizione avanzate, in ambito veicolare, sulla base di dati di telematica e da sistemi IoT

Objectives

The aim of this project is twofold, namely: i) Sample and compress fine‐grained vehicle data to optimize IoT data transmission: CAN bus data and miscellaneous vehicle data are transmitted through telematic systems to enable vehicle‐related data monitoring and analysis. The granularity of the transmitted data strongly influences the quality of the performed analyses. ii) Explore the added value of fine‐grained data to support intelligent vehicle state definition: According to type, model, and context of usage, industrial vehicles in the fleet are characterized by different usage patterns.

Exploring the use of Deep Natural Language Processing models to analyze documents in cross-lingual and multi-domain scenarios

Objectives

The main goal of Deep Natural Language Processing (DNLP) techniques is to exploit Deep Learning models in order to extract significant information from large document corpora. The idea behind is to transform the original textual units into high-dimensional vector representations, which incorporate most semantic text relationships. The main goal of the proposal is to study, develop, and test new DNLP solutions tailored to multilingual and multidomain contexts, paying a particular attention to contexts in which there is a lack of training data or the use of abstractive models is preferrable.

Advanced learning strategies for visual understanding

Objectives

The PhD student will work on effective training strategies for DNNs focusing on investigating how incorporating prior knowledge (usually coming from linguistic corpora like WordNet or ConceptNet) or logical constraints can improve the training process by: i) compensating the lack of labelled training data, and ii) overcoming the natural tendency of DNNs towards shortcut solutions, by providing high-level inductive biases capable to guide the network towards the desirable solution; and on novel strategies for automatically constructing training curricula for DNNs.

Graph network models for Data Science

Objectives

The aim of the research is to define new methodologies for semantics embedding, propose novel algorithms and data structures, explore applications, investigate limitations, and advance the solutions based on different emerging Theory-guided Data Science approaches. The final goal is to contribute to improving the Machine Learning model performance by reducing the learning space thanks to the exploitation of existing domain knowledge in addition to the (often limited) available training data, pushing towards more unsupervised and semantically richer models.

Machine Learning based solutions to manage and secure communication networks

Objectives

This thesis will focus on anomaly detection based on machine learning on complex multivariate time‐series collected from honeypot deployments and communication networks in general. It will tackle how to effectively deploy honeypots to obtain acomprehensive picture of operational network attacks, how to efficiently collect and process the time‐series monitored from the honeypots to spot anomalies on a timely fashion using machine learning solutions and how to identify anomalies considering a complete view of the security monitoring environment, as opposed to identify anomalies on independent time‐series.

Bayesian statistics in RNA sequencing and epidemiology

Objectives

The aim of the project is to develop better statistical methodologies that can improve the analysis of data that the technological development (and the need to fight the pandemic) has made available in recent years. A first line of research will be to extract more information from single cell RNA equencing data (scRNAseq). Some statistical tool for the analysis of this data is already available, but both the technology and our capacity of interpreting the underpinning phenomena is increasing fast. We believe that better methodologies are feasible, and that Bayesian statistic can play an important role.

Tiny Machine Learning For Satellite Applications

Objectives

The whole FDIR problems in satellites can be decomposed into three phases: 1) detection/prediction of failure, 2) identification of the failing subsystem, 3) planning and actuation of recovery action. The proposed work to perform
in the framework of the PhD mainly focuses on 1) and only partially on 2). The input for 1) will be a set of time‐series coming from various satellite sensors given by Thales‐Alenia Space and its output is a set of alerts that something is failing right now or is going to fail in the near future.

Machine Learning Techniques for Detection of Coordinated Events in the Internet

Objectives

Several research questions arise to investigate the network traffic around darknets. There are different aspects to investigate: develop machine learning methodologies to identify coordinated events, model the events as a graph and extract emerging patterns, identify the traffic sources characteristics and their activity, highlighting possible botnet, study widespread botnets and their evolution (e.g., Mirai) and anomalies in the darknet coordinate events.

2020

AI-Powered Darknets/Honeypots for Supporting Network Anomaly Detection

Objectives

Several research topics arise to realize an AI-assisted approach for network security based on multiple darknets and honeypots. This project will investigate how to incrementally learn new anomalies without relying on centralized data repositories and respecting privacy. How to describe the anomalies using multiple data sources so to provide useful knowledge and context to operators. How to provision the monitoring probes in a network to cover the vast majority of anomalies while optimizing data capturing and processing. How to detect new anomalies and how to exchange models and knowledge among monitoring probes.

From data science algorithms to data “storyboarding”

Objectives

The following research objectives will be addressed to understand structures and models hidden in a data collection. To automatically drive the exploration of the search space at different levels, a set of interestingness metrics will be studied to evaluate and compare the meaningfulness of the discovered knowledge. The characterization of the knowledge significance in terms of innovative, possibly unconventional, transparent criteria will be addressed to rank the results so that the most relevant pieces of information can emerge. To enhance the self-learning capabilities of the proposed solutions, user interactions with the presented data “stories” will be used to collect feedbacks and re-train models for the discovery of exciting end-goals.

Machine-Learning for QoE of video-conference services

Objectives

The objective of the PhD is to propose a holistic yet practical solution for the traffic classification and management problems of multiparty online collaboration applications. Several video-conference tools will be analyzed. The possibility to use ML in conjunction with Deep Packet Inspection approaches will also be considered.

eXplainable Artificial Intelligence techniques for Natural Language Processing tasks

Objectives

The research objectives address many issues. xAI solutions for NLP tasks the huge amount of data collected from people’s daily lives (e.g. web searches, social networks, e‐commerce) are textual data. Black‐box predictive model tailored to NLP tasks increases the risk of inheriting human prejudices, racism, gender discrimination and other forms of bias. Concept‐drift detection for xAI solutions. When dealing with large data collections or complex textual datasets, the model trained in the past may be no longer valid. Data and Knowledge visualization. Visualization techniques help humans to correctly interpret, interact, and exploit data and its value. Innovative visualization representations will be studied to enhance the interpretability of the internal algorithm mechanics.

Big Spatio-Temporal Data Analytics

Objectives

The main objective of the research activity will be the design of big data analytics algorithms and systems for the analysis of heterogeneous big spatio-temporal data (e.g., satellite images, sensor measurements), aiming at generating predictive and descriptive models. The main issues that will be addressed are scalability, the amounts of big spatio-temporal data are significantly increased in the last years and some of them are singularly large. Heterogeneity, several heterogonous sources are available. Each source represents a different facet of the analyzed events and provides an important insight about them. Near-real time constraint. In several domains, timely responses are needed.

Creativity-Injection into AI-Powered Multimedia Storyboards

Objectives

The objective of this research proposal is to support the automated production of storyboards in the context of film-making, photography, and music-making, we target advanced semantic eatures based on a creative-aware AI approach, whose research objectives are In the following. The candidate will study state-of-the-art techniques addressing the enhancement of the creativity patterns of data and its main algorithms. The candidate will define innovative metrics to select the most creative patterns that can be effectively transformed into actions by domain experts. The candidate will also study and define a novel algorithm to support a self-learning methodology exploiting a KDB (Knowledge DataBase).

Data-driven Study and Design of Innovative Solutions for Urban Mobility

Objectives

The aim of this project is to study the mobility on smart cities where citizens take advantage of public transportation, shared platforms, electric vehicles, and multi-modal means. To envision this future, the first objective is to study and model the current mobility habits. The model will be parametrized, with different parameters that will be able to describe the possible future scenarios. With the built models and the simulations of the scenarios the candidate will be able to answer to many research questions about different aspects of the electric mobility in smart cities.

Robust Machine Learning Models for High Dimensional Data Interpretation

Objectives

The overall objective of this research proposal is to explore novel ways to bridge the gap between deep representation learning and symbolic knowledge representation in multi-dimensional data analysis, leveraging recent advances in the field of neural-symbolic integration and probabilistic modeling. Specifically, this research proposal targets the area of neural-symbolic integration, which can be used to encode symbolic representation techniques, such as fuzzy and/or probabilistic logic, as tensors in a deep neural network. The candidate will target the application of neural-symbolic integration techniques to solve problems in different image analysis tasks, such as image-level classification, segmentation and object detection. Of particular interest is the possibility to explore the integration of image level-data with other sources of information, for instance the integration of prior (possibly causal) information.

2019

Strengthening IoT Privacy and Security

Objectives

The research has two main objectives. First design and build a testbed for the analysis of IoT privacy and security implications and then investigate new automatic methods to fingerprint IoT devices and detect anomalies in their traffic.The testbed will include the possibility of running honeypots too, i.e., IoT devices with known vulnerabilities that are let to be exploited, so to capture real data about security attacks. This can bring us to innovative Machine Learning (ML) and cognitive methodologies, such as unsupervised anomaly detection algorithms, Generative Adversarial Networks, Deep Reinforcement Learning, which are gaining momentum to solve these problems. Such techniques will be used to understand implications for users’ privacy.

Machine Learning in network management for improving QoE in web applications

Objectives

The research should tackle the problem by collecting, storing and processing large amount of data using a big data framework. The candidate will leverage these solutions, designing and engineering novel machine learning techniques to tackle the fine-grained traffic classification problem. The candidate will design a complete solution, from data collection, feature engineering, feature selection, model selection, model training and testing. Next, the candidate should revisit and propose new QoS mechanisms to address the specific needs of the applications and the network.

Knowledge Discovery from Financial Data

Objectives

The PhD student will address the study and application of smart Big data analytics solutions to solve relevant financial issues. The research activities involve multidisciplinary knowledge and are aimed at finding smart solutions that can be easily extended and adapted to different use cases. More specifically, the research will address the following key issues related to financial Big data analysis: Lack of public data, Risk exposure, Temporal correlation between financial trends, Volume and variety of the data sources.

Machine-Learning Based Optimization of Navigation Algorithms and embedded implementation techniques for Service Robotics

Objectives

The main objective of the PhD activities will be the development of algorithms, technologies and systems that can effectively jointly optimize the performance of the considered systems in terms of traditional communication metrics (e.g., bandwidth, latency) and application-layer metrics (e.g., utility in terms of lowest cost navigation paths) which will be defined on a case-by-case basis depending on the application, while at the same time minimizing all costs (e.g., computational complexity, storage cost, number of required devices and their economic cost).

Artificial Intelligence and Simulation for tackling complexity in engineering applications: deriving data driven models and reduced order models for fast evaluation

Objectives

This work will contribute to understand the role of Deep Learning in several engineering fields where simulations, complex models, and multi-scale phenomena play fundamental role. A topic which deserves investigation concerns the optimal construction of learning sets in order to conform to the statistical properties of the quantities of interests.

Inference and control of dynamic processes on large-scale networks: from data to models

Objectives

The main objective of this thesis is the development and improvement of approximate inference methods for dynamical processes on network. The PhD student will become familiar with the most advanced techniques in the field, both from statistical mechanics and machine learning community, and will learn how to deal with large datasets and perform data analysis. Such cutting-edge theoretical notions and computational techniques will be possibly very useful for his/her future career, both in academic research and in the industrial applications of the ICT sector.

Machine learning for sentiment analysis

Objectives

The objective of the research activity is the definition of novel sentiment analysis approaches aiming at improving the detection performance by considering heterogeneous information sources. The following steps (and milestones) are envisioned:
Data collection and exploration;
Sentiment analysis algorithms design and development;
Deployment in real world applications.

2018

Data and Information Extraction from multiple data sources for transport Innovation and Sustainability (Mobility as a Service)

Objectives
The research aims at exploiting the full potential of big data to describe mobility patterns, extract information from existing massive data sources, crossing it with context-specific understanding of human behaviour, in order to analyse the different ways people interact with one another (Onnela, 2011). The project will provide – through an innovative approach, less invasive than current travel surveys – a cloud based framework for collecting, analysing, and extracting urban mobility information from several massive data sources.

Topological methods for dimensionality reduction with application to simulation and privacy

Objectives
One of the characteristic of the so called Big Data is the occurrence of high dimensional data. The analysis of high dimensional data is affected by the so-called curse of dimensionality. As an example, one of the aspects of this phenomenon is the impossibility of sampling efficiently point from neighborhood of a data set.
There are several methods used to overcome this difficulty, many of them based on projecting data onto smaller dimensional subspaces.

Deep learning models for next-generation video codec

Objectives
This project aims to devise new approaches and technologies for the compression of data using deep learning models, and to show their potential to produce more accurate and visually pleasing reconstructions at much higher compression levels for both image and video data.
The objective of this Ph.D. project is to advance image/video compression by exploiting recent advances in machine learning, and particularly to develop deep learning techniques for next-generation video coding. The objective of this Ph.D. project is to advance image/video compression by exploiting recent advances in machine learning, and particularly to develop deep learning techniques for next-generation video coding.

Deep learning techniques for advanced audio-visual media understanding

Objectives
Cognitive data analysis consists in the set of algorithms, applications and computing platforms to perform tasks that mimic human intelligence. Image tagging, video highlight creation, automatic subtitling are just some examples of how cognitive data analysis can facilitate and speed up complex tasks usually done by hand in the media domain. Deep Neural Networks (DNN) is a family of technologies implementing such kind of systems.
The objective of this PhD project is to develop advanced techniques for audio-visual media understanding applicable in media industry context.

Modelling cancer evolution through the development of artificial intelligence-based techniques on temporal and spatial molecular data

Objectives
Cancer is an evolving entity and the evolutionary properties of each tumor are likely to play a critical role in shaping its natural behavior and how it responds to therapy. However, effective tools and metrics to measure and categorize tumors based on their evolutionary characteristics still must be identified. We plan to combine mathematical modelling and AI‐based approaches to develop a new generation of cancer classifiers based on tumor evolutionary properties and proxy data. The project will be developed in collaboration with the Department of Oncology at the University of Torino. The proposed research activity fits in the SmartData@PoliTo interdepartmental centre, that brings together competences from different fields, ranging from modelling to computer programming, from communications to statistics. The candidate will join this interdisciplinary team of experts and collaborate with them.

Machine Learning algorithms and their embedded implementation for service robotics applications in precision agriculture

Objectives
Several studies have demonstrated the need to significantly increase the world’s food production by 2050. Technology could help the farmer, its adoption is limited because the farms usually do not have power, or Internet connectivity, and the farmers are typically not technology savvy. We are working towards an end‐to‐ end approach, from sensors to the cloud, to solve the problem. Our goal is to enable data‐driven precision farming. We believe that data, coupled with the farmer’s knowledge and intuition about his or her farm, can help increase farm productivity, and also help reduce costs. However, getting data from the farm is extremely difficult since there is often no power in the field, or Internet in the farms. As part of the PIC4SeR project, we are developing several unique solutions to solve these problems using low‐cost sensors, drones, rovers, vision analysis and machine learning algorithms. The research activity fits in the SmartData@PoliTo interdepartmental centre, that brings together competences from different fields, ranging from modelling to computer programming, from communications to statistics. The candidate will join this interdisciplinary team of experts and collaborate with them.

Generative adversarial network for cybersecurity applications

Objectives
Cybersecurity is one of the biggest problem in the information society that is impacting all modern communication networks. More and more complicated threats are found on a daily basis, which make the complexity of identifying and designing countermeasures more and more difficult. Machine learning and Big Data offer scalable solutions to learn from labelled datasets and build models that can be used to detect attacks. Unfortunately, in the cybersecurity context, we lack the ability to obtain large datasets of labelled attacks, since threats continue to evolve over time. This call for novel solutions to face the problem. Recently, generative adversarial networks have been proposed as a means to generalize a sample labelled dataset and create artificially richer datasets. The involve two neural networks contesting with each other in a zero‐sum game framework, one that generates candidates while the second learn how to discriminate instances. The generative network’s training objective is to increase the error rate of the discriminative network (i.e., “fool” the discriminator network by producing novel synthesized instances that appear to have come from the true data distribution). The research activity fits in the SmartData@PoliTo interdepartmental centre, that brings together competences from different fields, ranging from modelling to computer programming, from communications to statistics. The candidate will join this interdisciplinary team of experts and collaborate with them.

Risultati immagini per hospitality industry

Big data techniques for assessing the impact of web distribution strategies on performance in the hospitality industry

Objectives
The objective of the research is twofold. First, the research will investigate the usage of big data techniques in order to gather data from the Internet about hotels’ visibility, reputation, pricing and distribution strategies on the main online channels . Second, the research will complement the usage of big data algorithms for analysing data gathered with econometric analyses with the aim of investigating how the Internet distribution strategies impact on operational and economic performance, on both a daily and a year basis. In this regard, a collaboration with some companies operating in the channel management can be envisaged to access their proprietary data on hotel’s pricing and distribution strategies in the online world.

2017

Reconstruction problems and tomography

Objectives

This Ph.d. position will be devoted to theoretical aspects of reconstruction problems, with special emphasis in the adaptive TAP method for the analysis of Bayesian problems with non-linear prior information coming from real datasets. This will involve successfully modelling distributions of real data in some subdomain (e.g. natural and tomographic images), and the development of methods to solve approximately the resulting Bayesian problem.

Exploiting Semantics to Enhance Deep Learning Models

Objectives

The objective of the research activity is the definition of big data analytics approaches capable of extracting and managing knowledge of heterogeneous types (e.g., structured data, textual information, images).

Homological summarization of high dimensional static and dynamic simplicial data

Objectives

The novelty of TDA (Topological Data Analysis) is that it studies the shape of topological spaces at the mesoscopic scale by going beyond the standard measures defined on data points’ pairs. This is done by moving from networks to simplicial complexes. The latter are obtained from elementary objects, called simplices, built from such simple polyhedral as points, line segments, triangles, tetrahedra, and their higher dimensional analogues glued together along their faces.

Data-driven machine learning methods for physically-based simulations

Objectives

The candidate will focus his activity on studying the interplay between machine learning (but not only), computer simulations and statistical models: by analyzing using ML techniques the configuration space coming out from a knows mathematical/statistical model we will try to identify the relevant parameters and to refine/simplify the model. After this they will use the knowledge acquired to infer models form big data, whose configuration space approximate the given data and to use this simplified model to transform correlation in causation to make our information finally “actionable”. We will use also the developed framework to increase the population of high quality-low quantity dataset and to work on model reduction, assessment and validation in the area of FEM.

Big Data Algorithms for IoT, Profiling and Predictive Maintenance

Objectives

The objective of the research activity is the definition of big data analytics approaches to analyze IoT streams for a variety of applications (e.g., sensor data streams from instrumented cars).
The following steps (and milestones) are envisioned.
Data collection and exploration. The design of a framework to store relevant information in a data lake. Heterogeneous data streams encompassing custom proprietary data and publicly available data will be collected in a common data repository. Tools for explorative analysis will be exploited to characterize data and drive the following analysis tasks.
Big data algorithms design and development. State-of-the-art tools and novel algorithms designed for the specific data analysis problem will be defined (e.g., to predict component failures).
Knowledge/model interpretation. The understanding of a discovered behavior requires the interaction with domain experts, that will allow operational validation of the proposed approaches.

Fraudulent reviews identification in online reputation systems and their impact on economy

Objectives:

The goal of the research activities is to i) Design smart data collection crawlers that can automatically collect data from recommendation systems, i.e., designing smart policies to sample the humongous amount of data they potentially expose; ii) Model the data using graph based modeling, and design and test algorithms to automatically identify possible anomalies which reflect eventual pollution in the recommendation system; iii) Correlate the reputation of a business entity with its economic performance in the real world, and building models to predict how the performance will eventually improve based on a change in the reputation on the online recommendation systems.