Users and Online Social Networks at the Time of COVID-19

This repository contains data and information regarding for the paper

M. Trevisan, L.Vassio, D. Giordano, “Users and Online Social Networks at the Time of COVID-19”, submitted to Online Social Networks and Media.

Other than a heavy impact on healthcare, the COVID-19 pandemic is also changing people’s habits and society. Countries such as Italy stayed in lockdown for months, with most of the people forced to remain at home. Online social networks, more than ever, represent an alternative solution for social life allowing users to interact and debate with each other. Hence, their study is of paramount importance to understand the impact of the pandemic and human behavior in general.

In this paper, we analyze how user habits varied during the first six months of 2020 in Italy, focusing on two popular social networks: Instagram and Facebook. We collect a large dataset including more than 54 million comments on over 140 thousand posts, covering the period surrounding the lockdown in Italy and focusing around the top public figures in Italy.
We analyze and compare user engagement and participation, providing quantitative figures about their activity. In particular, we show how the nature and the volume of interactions evolved during and after the lockdown, with a general growth of activity and a sizable shift in the daily and weekly patterns. We also analyze the users’ sentiment through the psycholinguistic properties of comments, and testify the rapid boom and vanish of topics related to the pandemic. To support new analyses, here we make our anonymized dataset available.

Open dataset (New)

Our dataset is open to anyone interested in reproducing our results or performing further analysis. 

As explained in the paper, commenters ID and comments’ text are the most privacy-sensitive information in the data. Hence commenters ID get immediately anonymized by the collection crawler. In the public dataset linked above, also comments are removed to do not harm the users’ privacy. To keep information about comment and let other researchers reproduce our results, we include medatadata information namely liwc scores, the presence trending topics, the comment length, and the presence of external links.  

To detect trending topics we used the following keywords:

Covid: "covid", "coronavirus", "epidemia", "pandemia", "contagio", "quarantena", "virus", "lockdown", "iorestoacasa".
Bakery: "lievito", "pizza", "impast".
Remote Working: "smart work", "microsoft team", "webex", "zoom", "skype", "hangout", "videoconf", "video conf", "telelavor", "confcall".
Conspiracy: "bill gates", "5g", "chip", "illuminat", "ordine mondiale", "complott", "scie chimiche".
Dole: "inps", "bonus", "indennit", "600".
School: "scuol", "scolastic", "universit", "docent", "professor", "aula", "banchi", "maestr".