Creating hyper-realistic digital twins of social networks by combining data mining, natural language processing and agent-based modelling
Click here to read the extended abstract.
Social media has become a primary source of information and is the “go to” place for people’s daily news consumption. Understanding how information spreads on these platforms is a critical research endeavour with applications ranging from online marketing to the spread of misinformation or the formation and bursting of social network driven financial bubbles (e.g., GameStop). Compared to traditional news outlets, the distribution of information on social media is decentralized and depends on the sharing behaviour of every single entity in the network. Thus, the spread of information online is an emergent property of agents’ micro-actions. Therefore, agent-based modelling is an ideal approach for modelling the dynamics of information spread in social media networks.
However, while ABMs have a clear appeal in this situation (as they allow to exactly model the underlying processes), their usage has often remained somewhat theoretical/abstract. A potential reason for this is that ABMs often model abstract populations of agents that resemble real-world population with respect to some summary statistics (e.g., average node degree and assortativity) but do not directly map onto any real-world entities. Thus, agent-based modelling has been useful for understanding general rules about how information spreads in social networks but has (to our knowledge) rarely been used to make predictions about specific real-world situations.
Here we present a workflow to address this, which entails data mining from social media platforms, natural language processing, behavioural modelling, and agent-based simulations. This workflow enables to create hyper-realistic digital twins of real-world social networks. Importantly, we show that this level of realism leads to highly accurate predictions about how and which information will spread in a social network of interest, outperforming other machine-learning techniques.