Homological summarization of high dimensional static and dynamic simplicial data

PhD Program in Computer and Control Engineering

Supervisors

Francesco Vaccarino – francesco.vaccarino@polito.it
Paolo Garza – paolo.garza@polito.it)
Francesco Bonchi – francesco.bonchi@isi.it


Phd Student: Alessandro De Gregorio

Context of the research activity

While network approaches can describe the fabric of relations between agents in a systems or molecules in an organism, they are constrained in their descriptive power to pairwise interactions (i.e. edges), which might not always be justified when focusing on phenomena that involve group dynamics (e.g. scientific collaboration, genetic pathways) or higher-order descriptions (e.g. viral evolution, molecule folding).
Against this background, a set of new techniques for data analysis, based on a set-theoretic (hence topological and geometry-independent) formalism, has been gaining traction over the last decade and has come to be collectively referred to as Topological Data Analysis (TDA).

The novelty of TDA is that it studies the shape of topological spaces at the mesoscopic scale by going beyond the standard measures defined on data points’ pairs. This is done by moving from networks to simplicial complexes. The latter are obtained from elementary objects, called simplices, built from such simple polyhedral as points, line segments, triangles, tetrahedra, and their higher dimensional analogues glued together along their faces.

TDA is still very much developing as a branch of data science. While it provides a new paradigm, based in algebraic topology, to how we think about data, and has obtained its first successes, there still are many challenges to be met to fully exploit its potential. The most pressing one is the computational scalability of persistent homology which still prevents large-scale applications. Classic algorithms for topological features extraction had a memory and time complexity which scales exponentially on the number of simplices. On the other hand, large-scale static and dynamic graphs can be challenging to process and store, due to their size and the continuous change of communication patterns between nodes. The problem of summarizing largescale dynamic graphs, maintaining the evolution of their structure and the communication patterns has been successfully addressed by F. Bonchi and cooperators. Their approach was based on grouping the nodes of the graph in supernodes according to their connectivity and communication patterns. The resulting summary graph preserves the information about the evolution of the graph within a time window. They also proposed two online, distributed, and tunable algorithms for summarizing this type of graphs.

In this context, the research activities of the candidate will focus on exploring to which extent it will be possible to adapt to the TDA setting techniques developed in studying large-scale static and dynamic graphs.

Objectives

The goal of the research activity is threefold:

  1. To find new efficient ways to represents in a compressed way simplicial complexes losing in a
    controllable way the topological information stored thereby;
  2. To define new algorithms to find “canonical bases” of the homology groups, i.e. the
    harmonics, which, in the one dimensional case are nothing but the (polygonal) cycles of the
    underlying graph.
  3. Apply 1 and 2 to real data

At the end, the candidate is expected to maturate competences in applied mathematics and computer science. They will gain a strong expertise in topological data analysis applied to big data, design of algorithms for graph and simplicial complexes, competences in big data handling and processing.

Skills and competencies for the development of the activity

The candidate is required to have very good competences in linear algebra, topology, geometry and discrete mathematics, experience in algorithm design/analysis and good programming skills.

 

Further information about the PhD program at Politecnico can be found here

Back to the list of PhD positions