Five Years at the Edge: Watching Internet from the ISP Network

This repository contains the data and the information available for the paper Five Years at the Edge: Watching Internet from the ISP Network, by Martino Trevisan, Danilo Giordano, Idilio Drago, Marco Mellia, Maurizio Munafò, ACM CoNEXT 2018, Heraklion, Crete.

In the paper, we provide an in-depth longitudinal view of Internet traffic. We take the point of the view of a national-wide ISP and analyze 5 years of flow- level rich measurements – or about 250 billion traffic record. We evaluate the providers’ costs in terms of traffic consumption imposed by users and services. We show that an ordinary broadband subscriber nowadays downloads more than twice as much as they used to do 5 years ago. Bandwidth hungry video services drive this change, while social messaging applications boom (and vanish) at incredible pace. We study how protocols and service infrastructures evolve over time, highlighting unpredictable events that may hamper traffic management policies. In the rush to bring servers closer and closer to users, we witness the birth of the sub-millisecond Internet, with caches located directly at ISP edge frontier. The picture we take shows a lively Internet that always evolves and suddenly changes.

 

A – Youtube starts serving videos over HTTPS; B – Google starts deploying QUIC in the wild; C – Time for SPDY; D – Google disables QUIC for security issues; E – the born of HTTP/2; F – Now Facebook comes out with its own transport protocol – FB-ZERO.

The dataset

Rise and death of Snapchat
Instagram popularity growth
Exponential growth of Instagram traffic

 

 

 

 

 

 

We cannot disclose the whole dataset due its size (more than 40TB compressed data), and for obvious privacy reasons. For this, here we share the aggregated data. However,  researchers interested in getting access to the raw data can contact us to see what kind of data we can share and under which policy.

Following the tables related to the average amount of data managed by each service every day:

Following the tables related to the percentage of users who visited a service every day:

  • ASDL
  • FFTH

Enjoy!

The rules

Below you can find the regular expressions we used to extracted the service from the raw data. We details the list of regex for each service.

Audio

spotify: "\.spotify\.com$", "\.scdn\.co$", "\.scdn\.com$"
deezer: "\.deezer\.com$", "\.dzcdn\."

Video

youtube: "\.googlevideo\.com$", "\.ytimg\.com$", "\.youtube\.com$", "\.gvt1\.com$", "\.youtube-nocookie\.com$"
netflix: "\.netflix\.", "\.nflxext\.", "\.nflximg\.", "\.nflxvideo\.", "\.nflxso\."
vimeo: "vimeo\.com$", "\.vimeocdn\.com$", "vimeopro\.com$"
adult: "porn", "\.ypncdn\.", "\.phncdn\.", "\.xvideos\.", "\.megasesso\.", "\.xnxx\.", "\.livejasmin\.", "\.xhamster\.", "imlive\.com$", "\.youjizz\.com$", "\.hclips\.com$", "culonudo", "tnaflix\.com$"
rai: "\.rai\.it$", "\.raiplay\.it$", "\.raiplayradio\.it$", "\.raitalia\.it$", "everyrai-", "[0-9]sspushrai[0-9]-", "$rai*.akamaihd\.net$"
mediaset: "\.mediaset\.it$", "\.mediasetpremium\.it$", "\.mediaset\.net$", "msp\.ticdn\.it$", "^rtinfinity", "msf\.ticdn\.it"
sky: "\.sky\.it$", "\.sky\.com$", "skylivehssctv\.cdn\.fastweb\.it", "sky\.ticdn\.it", "skyvodabr\.cdn\.fastweb\.it", "skylivehssctv\.cdn\.fastweb.it$"

Social

facebook: "\.facebook\.com$", "\.fbcdn\.net$", "\.facebook\.net$", \ "^fbcdn", "^fbstatic", "^fbexternal", "\.fbsbx\.com$"
twitter: "\.twitter\.", "\.twimg\.", "^twitter\.com$", "twitter\.com\.edgesuite\.net", "twitter-any\.s3\.amazonaws\.com", "twitter-blog\.s3\.amazonaws.com"
linkedin: "\.linkedin\.com$", "\.licdn\.com$", "\.lnkd\.in$"
instagram: "\.instagram\.com$", "\.cdninstagram\.com$", "^igcdn"

Search engine

google: "^www\.google\.it$"
bing: "\.bing\.com$"
yahoo: "\.yahoo\.com$", "\.yahoo\.net$", "\.yimg\.com$"
duckduck: "\.duckduckgo\."

Ecommerce

ebay: "\.ebay\.", "\.ebaystatic\.com$", "\.ebayimg\.com$", "\.ebayrtm\.com$", "\.ebaydesc\.com$", "\.ebayinc\.com$"
amazon: "\.amazon\.it", "\.fls-eu.amazon\.", "\.images-amazon\.com$", "images-eu\.amazon\.com$"
alibaba: "\.alibaba\.com$", "\.alicdn\.com$", "\.taobao\.com$"

Chat

whatsapp: "\.whatsapp\.com$", "\.whatsapp\.net$"
telegram: "\.telegram\.org$", "^telegram\.org$"
viber: "\.viber\.", "^viber\.kayako\.com$"
snapchat: "\.snapchat\.com$", "feelinsonice\.appspot\.com$", "feelinsonice-hrd\.appspot\.com$", "feelinsonice\.l\.google\.com$"
skype: "\.skypeassets\.com$", "\.skype\.com$", "\.skype\.net$"

 

 

Leave a comment

Your email address will not be published. Required fields are marked *