Data and Information Extraction from multiple data sources for transport Innovation and Sustainability (Mobility as a Service)

PhD in Urban and Regional Development


Cristina Pronello –
Silvia Chiusano –
Giovanni Malnati –

PhD Student: Pinky Kumawat

Context of the research activity

Nowadays, massive data availability from the worldwide web allows tracking words, as well as locations, that are analysed and matched through several databases, allowing the prediction of people’s activities and making obsolete the expensive and time-consuming statistical surveys (Hilbert, 2013).
Today there are several data sources useful for the project: open data as well as the directed, automated or volunteered sources (Kitchin, 2014), including mobile phone data. The impressive growth of data volume generated annually (Manyika et al., 2011) should have largely improved the knowledge of the urban mobility, but, nowadays, knowledge on the mobility is still scarce and fragmented, often too aggregated and used to specific goals or “spot-based” (e.g. automated sources as traffic sensors, cameras and automatic counters) and the use of data coming from the social media is still in its childhood. Such data could give some support in understanding the reasons behind travel behaviour, till now investigated through ad hoc survey not allowing to reach a large number of people.
Some previous studies exploited data mining techniques for both understanding and profiling the dynamics of people mobility, being crucial issue for transport management (Zheng et al., 2014). To this end, some approaches analysed information extracted from mobile calls’ data possibly coupled with spatio-temporal data on trips gathered from GPS-equipped devices (Nanni et al., 2014; Gabrielli et al., 2014).
Then, the Web 2.0 phenomenon has allowed the birth of the social media and, then, of the “creative consumers” (Berthon et al., 2012). Data published in social media can be mined to extract relevant information; for example, the data from Twitter – currently the leading microblogging social network – can be used for understanding the mobility patterns. Spatial and temporal information associated with tweet as well the tweet textual contents have been used in the analysis. For traffic and navigation analysis, spatial-temporal patterns from large tweet data collections have been mined for mobility hotspots identification, urban flows evaluation, and traffic anomalies discovering (Wei et al., 2012). Tweet data have also been analysed for profiling user behaviours and activities, and discovering personal places of interest (Ferrari et al., 2013).
Rudat et al. (2014) have showed how the users’ behaviour – when posting messages – depends on the audience to whom the message is addressed; thus, the correct interpretation of the contents of the social media and the ability to assign to such contents the subject for discussion is a key element for a proper use of data (Steiger et al., 2015).
Concerning the textual analysis, literature offers some interesting examples regarding the classification of moods related to tweets (Silva et al., 2014; Baecchi et al., 2015), the location of events (Paltoglou, 2015) and the evaluation of the quality of services (Thakor and Sasi, 2015). However, there are still few studies aimed at extracting mobility patterns from the social media. Some researches focus only on the geo-localisation of data coming from social networks (Hasan et al., 2013) but in a few cases a linguistic analysis and ontology on the contents is carried out (Maghrebi et al., 2015). Also the transport companies that use social networks as a channel of communication, spreading real time information about their service to their customers (Chan and Schofer, 2014; Gal-Tzur et al., 2014), do not analyse such data.
The understanding of the factors influencing mobility patterns and travel behaviour is the key to ensure the acceptance of innovations and services that could readdress the mobility patterns to more sustainable behaviours and optimize investments in transport systems.


The research aims at exploiting the full potential of big data to describe mobility patterns, extract information from existing massive data sources, crossing it with context-specific understanding of human behaviour, in order to analyse the different ways people interact with one another (Onnela, 2011). The project will provide – through an innovative approach, less invasive than current travel surveys – a cloud based framework for collecting, analysing, and extracting urban mobility information from several massive data sources. Such information is useful to several purposes: the planning and programming of public transport as well as the control of the quality of its service; the management of mobility; the supply of new services for the customers; and the study of the complexity of the interaction between information and travel behaviour.
To this end, the research will be organised according to two key objectives:

  1. analysis, integration, and extraction of information from social media, mobile devices and network operators aimed to build mobility datasets/patterns and behavioural and interactions patterns;
  2. evaluation of the collected data in terms of: a) quality and reliability; b) mobility patterns and clusters of users; c) definition of Key Performance Indicators (KPIs) useful to the decision makers for improving services in transport systems; d) individuation of new mobility services (Mobility as a Service).


Skills and competencies for the development of the activity

The candidate should have excellent programming skills, good knowledge of machine learning algorithms and of Matlab.


Further information about the PhD program at Politecnico can be found here

Back to the list of PhD positions