Big Spatio-Temporal Data Analytics

PhD in Computer and Control Engineering

Supervisors

Paolo Garza – paolo.garza@polito.it

PhD Student: Luca Colomba

Context of the research activity

The amount of spatio-temporal data rapidly increased in the last years. The big amount of collected spatio-temporal data ranges from satellite images to ground-based sensor measurements. That big amount of heterogeneous data can be profitably exploited in several domains, such as natural hazard prevention, pollution analysis, traffic planning, etc. 

We are currently overloaded by spatio-temporal data. For instance, the Copernicus Earth observation programme, supported by the European Commission, collects more than 12 terabytes per day. 

To transform this overload of data into valuable knowledge, we need to (i) integrate several sources, (ii) design data tailoring techniques to select relevant data related to the target analysis, and (iii) design data mining algorithms to extract knowledge offline  or in near-real time. Currently, the analyses are mainly focused on one single type of data/source at a time (e.g., satellite images or ground-based measurements). The integration of several sources into big data analytics systems capable of building accurate predictive and descriptive models will provide effective support in several application domains.

The PhD candidate will design, implement and evaluate novel big spatio-temporal data analytics solutions. 

Objectives

The main objective of the research activity will be the design of big data analytics algorithms and systems for the analysis of heterogeneous big spatio-temporal data (e.g., satellite images, sensor measurements), aiming at generating predictive and descriptive models. 

The main issues that will be addressed are the followings.

Scalability. The amounts of big spatio-temporal data are significantly increased in the last years and some of them are singularly large (e.g., the satellite images). Hence, big data solutions must be exploited to analyze them, in particular when historical data analyses are performed.

Heterogeneity. Several heterogonous sources are available. Each source represents a different facet of the analyzed events and provides an important insight about them. The efficient integration of the available spatio-temporal data sources is an important issue that must be addressed in order to build more accurate predictive and descriptive models.

Near-real time constraint. In several domains, timely responses are needed. For example, to effectively tackle natural hazards and extreme weather events, timely responses are needed to plan emergency activities. Moreover, large amounts of streaming and time series spatial data are generated (e.g., environmental measurements) and their integration and analysis is extremely useful. The current big data streaming systems (e.g., Spark and Strom) provide limited support for real-time and incremental data mining and machine learning algorithms. Hence, novel algorithms must be designed and implemented.

Skills and competencies for the development of the activity

The candidate should have excellent programming skills, good knowledge of  scalable big data frameworks (Hadoop and Spark) and the associated programming paradigms and good knowledge of traditional data mining and machine learning algorithms.

Further information about the PhD program at Politecnico can be found here

Back to the list of PhD positions