Over the last years, biological research has been revolutionized by experimental high-throughput techniques. Unprecedented amounts of data are accumulating, causing an urgent need to develop data-driven modeling approaches to unveil information hidden in raw data, thereby helping to increase our understanding of complex biological systems. To give a specific example, proteins show a remarkable degree of structural and functional conservation over billions of years of evolution, despite their large variability in amino-acid sequences. I will present recent developments around the so-called Direct-Coupling Analysis (DCA), a statistical-inference approach linking sequence variability to protein structure and function. I will show that DCA can be used (i) to infer contacts between residues and thus to guide 3D-structure prediction of proteins and their complexes and (ii) to reconstruct mutational landscapes and thus to predict the effect of mutations. Beyond these direct inference tasks, I will present evidence that our models can be used (iii) to develop novel approaches to data-driven de novo protein design.
Since 2011, Martin Weigt has been working as Professor for Bioinformatics at the Pierre Marie Curie / Sorbonne Université in Paris, France. He has built up the “Statistical Genomics and Biological Physics” team. MW has dedicated his research activity over the last years to the development of innovative statistical-inference methods for molecular biology, drawing inspiration from his statistical-physics background. His most significant contribution in the context of this project has been the abovementioned Direct Coupling, its application to protein and RNA structure , and more generally, statistical approaches to regulatory and signaling network inference.