From data science algorithms to data “storyboarding”

PhD program in Computer and Control Engineering

Supervisors

Tania Cerquitelli – tania.cerquitelli@polito.it

PhD Student: Paolo Bethaz

Context of the research activity

In today’s world, there are an infinite number of data that can be analyzed and extracting useful insights from these data has become a fundamental task in many sectors. However, the value extraction process requires the intervention of a data scientist with significant expertise in the field who often devotes a long time to find a good trade-off between data-driven algorithm’s efficiency and the quality of the discovered insights, that can be easily translated into actions.

To streamline the knowledge extraction process and enhance the friendliness of data analytics tasks, the Ph.D. student will design and develop a new generation of data analytics solutions to automatically discover descriptive, predictive, and prescriptive models hidden in the data without requiring the intervention of the data scientist. A key feature of such solutions will be the automation of the full data-analytic workflow, from heterogeneous data ingestion to the result presentation. The proposed solutions will automatically set the analysis end goal, perform the whole process, build and visualize the data storyboard in a human-friendly and actionable way. End users, such as domain experts, will be provided with the “story” of their data:

  • automatically orchestrated by picking the most meaningful results among the plethora of parameters, choices, and trick;
  • carefully presented to make the knowledge human-readable and exploitable, so that domain experts can focus on putting such knowledge into actions.

Objectives

To automatically create the data storyboard in a human-friendly and actionable way, the following research objectives will be addressed:

Data characterization.

To understand structures and models hidden in a data collection, a set of descriptive metrics will be defined by exploiting unconventional statistical indexes and new algorithms to model underlying data structures.

Algorithm selection and optimization.

In the literature, several alternative algorithms are available for performing a given data mining task, and in most cases, no algorithm is universally superior. To automatically drive the exploration of the search space at different levels (e.g., algorithm class, implementation, parameter setting), a set of interestingness metrics will be studied to evaluate and compare the meaningfulness of the discovered knowledge.

Knowledge navigation, visualization, and exploitation.

The data mining process performed on databases may lead to the discovery of huge amounts of apparent knowledge that is usually hard to harness. Nevertheless, in-depth analysis may be required to pick the most actionable and meaningful bits. The characterization of the knowledge significance in terms of innovative, possibly unconventional, transparent criteria will be addressed to rank the results so that the most relevant pieces of information can emerge. Data “stories” based on innovative visualization frameworks will be designed to help end-users capture the full data processing flow and foster actionable knowledge exploitation.

Self-learning methodologies based on human-in-the-loop.

To enhance the self-learning capabilities of the proposed solutions, user interactions with the presented data “stories” will be used to collect feedbacks and re-train models for the discovery of exciting end-goals. By providing a way of exploiting user interactions, the overall system can be easily customized and adapted to different application scenarios, while the overall design is kept as general-purpose as possible.

Skills and competencies for the development of the activity

The candidate should have a good background in Machine learning, Data science, descriptive, predictive, and prescriptive analytics, descriptive, predictive, and prescriptive analytics, self-tuning strategies to automatically configure the algorithm’s parameters and good programming skills.

Further information about the PhD program at Politecnico can be found here

Back to the list of PhD positions