Data and Information Extraction from multiple data sources for transport Innovation and Sustainability (Mobility as a Service)
The research aims at exploiting the full potential of big data to describe mobility patterns, extract information from existing massive data sources, crossing it with context-specific understanding of human behaviour, in order to analyse the different ways people interact with one another (Onnela, 2011). The project will provide – through an innovative approach, less invasive than current travel surveys – a cloud based framework for collecting, analysing, and extracting urban mobility information from several massive data sources.
One of the characteristic of the so called Big Data is the occurrence of high dimensional data. The analysis of high dimensional data is affected by the so-called curse of dimensionality. As an example, one of the aspects of this phenomenon is the impossibility of sampling efficiently point from neighborhood of a data set.
There are several methods used to overcome this difficulty, many of them based on projecting data onto smaller dimensional subspaces.
This project aims to devise new approaches and technologies for the compression of data using deep learning models, and to show their potential to produce more accurate and visually pleasing reconstructions at much higher compression levels for both image and video data.
The objective of this Ph.D. project is to advance image/video compression by exploiting recent advances in machine learning, and particularly to develop deep learning techniques for next-generation video coding. The objective of this Ph.D. project is to advance image/video compression by exploiting recent advances in machine learning, and particularly to develop deep learning techniques for next-generation video coding.
Cognitive data analysis consists in the set of algorithms, applications and computing platforms to perform tasks that mimic human intelligence. Image tagging, video highlight creation, automatic subtitling are just some examples of how cognitive data analysis can facilitate and speed up complex tasks usually done by hand in the media domain. Deep Neural Networks (DNN) is a family of technologies implementing such kind of systems.
The objective of this PhD project is to develop advanced techniques for audio-visual media understanding applicable in media industry context.
Modelling cancer evolution through the development of artificial intelligence-based techniques on temporal and spatial molecular data
Cancer is an evolving entity and the evolutionary properties of each tumor are likely to play a critical role in shaping its natural behavior and how it responds to therapy. However, effective tools and metrics to measure and categorize tumors based on their evolutionary characteristics still must be identified. We plan to combine mathematical modelling and AI‐based approaches to develop a new generation of cancer classifiers based on tumor evolutionary properties and proxy data. The project will be developed in collaboration with the Department of Oncology at the University of Torino. The proposed research activity fits in the SmartData@PoliTo interdepartmental centre, that brings together competences from different fields, ranging from modelling to computer programming, from communications to statistics. The candidate will join this interdisciplinary team of experts and collaborate with them.
Machine Learning algorithms and their embedded implementation for service robotics applications in precision agriculture
Several studies have demonstrated the need to significantly increase the world’s food production by 2050. Technology could help the farmer, its adoption is limited because the farms usually do not have power, or Internet connectivity, and the farmers are typically not technology savvy. We are working towards an end‐to‐ end approach, from sensors to the cloud, to solve the problem. Our goal is to enable data‐driven precision farming. We believe that data, coupled with the farmer’s knowledge and intuition about his or her farm, can help increase farm productivity, and also help reduce costs. However, getting data from the farm is extremely difficult since there is often no power in the field, or Internet in the farms. As part of the PIC4SeR project, we are developing several unique solutions to solve these problems using low‐cost sensors, drones, rovers, vision analysis and machine learning algorithms. The research activity fits in the SmartData@PoliTo interdepartmental centre, that brings together competences from different fields, ranging from modelling to computer programming, from communications to statistics. The candidate will join this interdisciplinary team of experts and collaborate with them.
Cybersecurity is one of the biggest problem in the information society that is impacting all modern communication networks. More and more complicated threats are found on a daily basis, which make the complexity of identifying and designing countermeasures more and more difficult. Machine learning and Big Data offer scalable solutions to learn from labelled datasets and build models that can be used to detect attacks. Unfortunately, in the cybersecurity context, we lack the ability to obtain large datasets of labelled attacks, since threats continue to evolve over time. This call for novel solutions to face the problem. Recently, generative adversarial networks have been proposed as a means to generalize a sample labelled dataset and create artificially richer datasets. The involve two neural networks contesting with each other in a zero‐sum game framework, one that generates candidates while the second learn how to discriminate instances. The generative network’s training objective is to increase the error rate of the discriminative network (i.e., “fool” the discriminator network by producing novel synthesized instances that appear to have come from the true data distribution). The research activity fits in the SmartData@PoliTo interdepartmental centre, that brings together competences from different fields, ranging from modelling to computer programming, from communications to statistics. The candidate will join this interdisciplinary team of experts and collaborate with them.
Big data techniques for assessing the impact of web distribution strategies on performance in the hospitality industry
The objective of the research is twofold. First, the research will investigate the usage of big data techniques in order to gather data from the Internet about hotels’ visibility, reputation, pricing and distribution strategies on the main online channels . Second, the research will complement the usage of big data algorithms for analysing data gathered with econometric analyses with the aim of investigating how the Internet distribution strategies impact on operational and economic performance, on both a daily and a year basis. In this regard, a collaboration with some companies operating in the channel management can be envisaged to access their proprietary data on hotel’s pricing and distribution strategies in the online world.
This Ph.d. position will be devoted to theoretical aspects of reconstruction problems, with special emphasis in the adaptive TAP method for the analysis of Bayesian problems with non-linear prior information coming from real datasets. This will involve successfully modelling distributions of real data in some subdomain (e.g. natural and tomographic images), and the development of methods to solve approximately the resulting Bayesian problem.
The objective of the research activity is the definition of big data analytics approaches capable of extracting and managing knowledge of heterogeneous types (e.g., structured data, textual information, images).
The novelty of TDA (Topological Data Analysis) is that it studies the shape of topological spaces at the mesoscopic scale by going beyond the standard measures defined on data points’ pairs. This is done by moving from networks to simplicial complexes. The latter are obtained from elementary objects, called simplices, built from such simple polyhedral as points, line segments, triangles, tetrahedra, and their higher dimensional analogues glued together along their faces.
The candidate will focus his activity on studying the interplay between machine learning (but not only), computer simulations and statistical models: by analyzing using ML techniques the configuration space coming out from a knows mathematical/statistical model we will try to identify the relevant parameters and to refine/simplify the model. After this they will use the knowledge acquired to infer models form big data, whose configuration space approximate the given data and to use this simplified model to transform correlation in causation to make our information finally “actionable”. We will use also the developed framework to increase the population of high quality-low quantity dataset and to work on model reduction, assessment and validation in the area of FEM.
The objective of the research activity is the definition of big data analytics approaches to analyze IoT streams for a variety of applications (e.g., sensor data streams from instrumented cars).
The following steps (and milestones) are envisioned.
Data collection and exploration. The design of a framework to store relevant information in a data lake. Heterogeneous data streams encompassing custom proprietary data and publicly available data will be collected in a common data repository. Tools for explorative analysis will be exploited to characterize data and drive the following analysis tasks.
Big data algorithms design and development. State-of-the-art tools and novel algorithms designed for the specific data analysis problem will be defined (e.g., to predict component failures).
Knowledge/model interpretation. The understanding of a discovered behavior requires the interaction with domain experts, that will allow operational validation of the proposed approaches.
The goal of the research activities is to i) Design smart data collection crawlers that can automatically collect data from recommendation systems, i.e., designing smart policies to sample the humongous amount of data they potentially expose; ii) Model the data using graph based modeling, and design and test algorithms to automatically identify possible anomalies which reflect eventual pollution in the recommendation system; iii) Correlate the reputation of a business entity with its economic performance in the real world, and building models to predict how the performance will eventually improve based on a change in the reputation on the online recommendation systems.