Deep learning models for next-generation video codec

PhD in Electrical, Electronics and Communications Engineering

Funded by RAI Radiotelevisione Italiana (Italy)


Enrico Magli –
Roberto Iacoviello – RAI

Context of the research activity

Deep learning has recently emerged as a powerful approach for complex tasks such as object/image recognition and classification, speech recognition and natural language processing.
Deep learning architectures aim to extract hierarchical and compact representations of data from the analysis of large scale databases of raw content (image and videos) by using deep networks with multiple layers of nonlinear transformations.
On the other hand, traditional image and video compression algorithms (High Efficiency Video Coding – HEVC – and his successor Joint Exploration Model – JEM) have some limitations: they are agnostic to the semantics of data being compressed as they work at low level (i.e., pixel level), and they perform exhaustive search for rate-distortion optimization.
While some experiments exist in literature about compression of static images, there are very few examples of deep networks applied to compression of video sequences (‘temporally deep’).
This project aims to devise new approaches and technologies for the compression of data using deep learning models, and to show their potential to produce more accurate and visually pleasing reconstructions at much higher compression levels for both image and video data.
In this context, it must be taken into account that subjective video quality assessment has high requirements in terms of time and complexity and lacks scalability.


The objective of this Ph.D. project is to advance image/video compression by exploiting recent advances in machine learning, and particularly to develop deep learning techniques for next-generation video coding. The following list show a few examples of possible approaches that could be investigated during the project:

  • Novel image and video reconstruction algorithms providing high image quality with a manageable degree of complexity
  • New deep learning based methods to evaluate quality of experience, possibly working in real time and for a broad range of video types.
  • Exploration of the possibility to use data augmentation and transfer learning for automated ground truth generation
  • Compression artifacts reduction (visual quality improvement)
  • Smart bit allocation (visual saliency map) correlated to human visual perception

The project may also provide contributions to international bodies such as MPEG or AOM (Alliance for Open Media) for the development of next generation video codecs.

Skills and competencies for the development of the activity

Suitable candidates should have the following skills:

  • Very good programming skills in Matlab and C/C++ languages
  • A strong mathematical and signal processing background
  • Good image and video processing skills (transforms, image processing operators, statistical signal processing/machine learning)


Further information about the PhD program at Politecnico can be found here

Back to the list of PhD positions