Monday, June 05, 2017

Comparing four modeling approaches using a Susceptible-Infected-Recovered (SIR) epidemic model

Over the years several modeling styles have been developed but often it is unclear what are the differences between them. In this joint post, we, (Yang Zhou and myself) would like to compare and contrast four modeling approaches widely used in Computational Social Science, namely: System Dynamics (SD) models, Agent-based Models (ABM), Cellular Automata (CA) models, and Discrete Event Simulation (DES). For a review of their undying mechanisms and core components of each readers are referred to Gilbert and Troitzsch's (2005) "Simulation for the Social Scientist"

To compare and contrast the differences in how these models work and how their underlying mechanisms generate outputs, we needed a common problem to test them against with the same set of model parameters. While one could choose a more complex example, here we decided to chose one of the simplest models we know. Specifically, we chose to model the spread of a disease specifically using a Susceptible-Infected-Recovered (SIR) epidemic model. Our inspiration for this came from the SD model outlined in the great book “Introduction to Computational Science: Modeling and Simulation for the Sciences” by Shiflet and Shiflet (2014) which was implemented in NetLogo from the accompanying website. For the remaining models (i.e. the ABM, CA, and DES) we created models from scratch in NetLogo. Below we will introduce how we built each model, before showing the results from the four models with the same set of parameters, which allows us to compare the results of the models. The source code, further documentation for the four models can be found over at Yang Zhou's website and GitHub page.

The System Dynamics Model

In the system dynamics model from Shiflet and Shiflet (2014), one person is infected at start. Infected people can infect susceptible people. The population of infected will always increase by (number of infected * number of susceptible * InfectionRate * change in time dt). The infected people may recover. The amount of people that will recover in an iteration is always equal to (number of infected * RecoveryRate * change in time dt). Figure 1 illustrates the system dynamics process while Figure 2 shows the SIR process as a flowchart.

Figure 1. System Dynamics process (source: Shiflet and Shiflet, 2014)

Figure 2. System Dynamics flowchart

The Agent-based Model

As in the case for the SD model, at the beginning of the simulation, one agent is infected. Agents are randomly distributed on the landscape, and in the beginning of each iteration, they turn to a random direction and move forward by one cell. During each iteration, an infected agent may infect other agents on the same cell. This is different from how the SD model works, specifically the probability of getting infected. In the SD model, the infection rate is the infection rate on the entire population. In the ABM, the probability of becoming infected is equal to the infection rate divided by the probability of an agent to be in the same cell, multiplied by the change in time. Each infected agent has a probability to recover in each time period, which equals to the recovery rate times the change in time. The equations in the ABM are the following:

Where P(same cell) = probability to be on the same cell, equals 1 divided by total number of cells; dt = change in time. Figure 3 illustrates the agent decision process while Figure 4 shows the display of the ABM

Figure 3. Agent-based Modeling: agent decision process

Figure 4. Display of the ABM. Green = susceptible. Red = infected. Blue = recovered.

The Cellular Automata Model

At the beginning of the simulation, one cell is infected. During each iteration (dt), the infected cell can infect other cells in its Moore neighborhood (i.e. 8 surrounding cells). The landscape will be a n by n square, and n is equal to the square root of the number of people to be created at the beginning of the simulation. Wrapping is enabled both horizontally and vertically. Similar to the ABM, we would like to map the probability of becoming infected to the one in the SD model. In the CA model, the probability of becoming infected is equal to the infection rate divided by the probability to be in the Moore neighborhood, multiplied by the change in time. Each infected cell has a probability to recover in each time period, which is based on the recovery rate multiplied by the change in time. The equations here are:

Figure 5 shows the changing process of the cells while Figure 6 shows the display of the CA model.

Figure 5. Cellular Automata cell changing process

Figure 6. Display of the CA model. Green = susceptible. Red = infected. Blue = recovered.

The Discrete Event Simulation Model

In a Discrete Event Simulation model (aka. queuing model), there are three abstract types of objects: 1) servers, 2) customers, 3) queues, which is quite different from the CA and ABMs.

So to implement a SIR model as a DES Servers are the processes of becoming infected and recovering. The durations people stay with the servers represent the process of becoming infected and becoming recovered. Customers are susceptible people to be infected, and infected people are waiting to recover. We assume there are two queues in this model. As susceptible objects (i.e. individuals) are created, queues for infection are formed while people are waiting to be infected. On the other hand, as people get infected, they form a second queue waiting to recover. During each iteration (dt), each object in queue has a probability to get become infected. Each infected agent object has a probability to recover which is equal to RecoveryRate. After agents recover, they enter the sink of recovered people. The equations can be written as follow:

While the whole process is illustrated in Figure 7.
Figure 7. Discrete Event Simulation process.

Results from the Implementations

Now that the models have been briefly described. We turn to how using the same set of parameters lead to different results. The default parameters being used in each model are: number of susceptible people at setup = 2500, Infection Rate = 0.002, Recovery Rate = 0.5, change of time (dt) = 0.001, and the numbers of people in each status are recorded. Since the SD model has no randomness and will always give the same result, it is run only once. Each of the other three models were run for 10 times (feel free to run them more if you wish), and then we took the average of the ten results and show them in Figure 8. The stop condition is that no individual left to be infected.

Figure 8. Results for the different models. Clockwise from top left: SD model, ABM, DES and CA

In the four models, we observe the same pattern: the number of susceptible people decreases, the number of infected people increases first and then decrease again, and the number of recovered people increase over time. However, each model realization also shows a lot of differences in how such patterns play out.

First of all, the SD model has the smallest number of iterations before no one is infected. The number of iterations shown on the graph are the average of the ten runs, since the runs range from smaller to larger numbers (except for the SD model, which only has one run). The SD model only took 17451 iterations to stop, while the ABM took 19145 iterations (on average), the DES model took 18645 iterations (on average). The CA model took the longest time on average for no more individuals to be infected, it took 25680 iterations (on average).

The results of the SD, ABM and DES models while appearing to be very similar to each other. In the sense, that the number of infected people increase fast at first and reaches a peak number of over 1500 at more than 2000 iterations (2272 for SD, 2403 for ABM, 2538 for DES). On the other hand, in the CA model, the number of infected people increases much slower due to the diffusion mechanism of the CA model and never reaches an amount as high as in the former models.

An important characteristic of the SD model is that there is no randomness in the model, so no matter how many times you run this model, you will get the same result. In the other three models, getting infected or recover always depend on a probability function, so there is difference in every run.

Furthermore, people in the SD model and the DES model are homogeneous, and everyone has the same probability to becoming infected or recovering from an infection, although these rates change over time, they do not vary among the different people in the population. On the other hand, in the ABM and the CA model, people (represented by moving agents or static cells) are heterogenous in the sense that they have different locations. Only susceptible people around an infected individual can be infected. It is interesting that when people can move around, like in the ABM, the result is similar to the SD model, though the ABM takes a little more time to recover (19145 iterations in ABM vs. 17451 iterations in SD). When people are static and the number of people on the same space is limited (one cell in one space in this case), like in the CA model, the infection process becomes slower and it takes longer for everyone to recover.

To test how the models are sensitive to a specific parameter we now present what happens if we increase the infection rate in each model from 0.002 to 0.02 and show the results shown in Figure 9. As to be expected as the infection rate increased, the number of susceptible people decrease at a much faster rate. However, the SD, the ABM, and the DES models are still similar to each other, while the infection in the CA model is slower. The average number of iterations for these models are: 15807 (SD), 15252 (ABM), 16937 (CA), 16677 (DES). By increasing the infection rate the total number of iterations of each model has decreased, with the CA model still taking the longest time to converge. The peak of infected people in each model are on average: 2363 people at 255 iterations (SD), 2310 people at 363 iterations (ABM), 2035 people at 1019 iterations (CA), 2340 people at 286 iterations (DES). The CA model takes a longer time and reaches a lower peak.

Figure 9. Results for the different models with infection rate = 0.02. Clockwise from top left: SD model, ABM, DES and CA.

These models are only simple examples of how a SIR model can be implemented in different modeling techniques, but in reality, if we were to model disease propagation in more detail we would need to consider many other things such as people could be both moving through space (i.e. traveling to work) and static (i.e. staying at home), and the capacity of each cell is always limited to some amount.

Gilbert, N. and Troitzsch, K.G. (2005), Simulation for the Social Scientist (2nd Edition), Open University Press, Milton Keynes, UK.

Shiflet, A.B. and Shiflet, G.W. (2014), Introduction to Computational Science: Modeling and Simulation for the Sciences (2nd Edition), Princeton University Press, Princeton, NJ.
More information about the models and to download them please visit Yang Zhou's website.

Thursday, April 20, 2017

Zika in Twitter: Health Narratives

In the paper we explored how health narratives and event storylines pertaining to the recent Zika outbreak emerged in social media and how it related to news stories and actual events.

Specifically we combined actors (e.g. twitter uses), locations (e.g. where the tweets originated) and concepts (e.g. emerging narratives such as pregnancy) to gain insights on the mechanisms that drive participation, contributions, and interactions on social media  during a disease outbreak. Below you can read a summary of our paper along with some of the figures which highlight our methodology and findings.  

An overview of the Twitter narrative analysis approach, starting with data collection, and proceeding with preprocessing and data analysis to identify narrative events, which can be used to build an event storyline.

Background: The recent Zika outbreak witnessed the disease evolving from a regional health concern to a global epidemic. During this process, different communities across the globe became involved in Twitter, discussing the disease and key issues associated with it. This paper presents a study of this discussion in Twitter, at the nexus of location, actors, and concepts.
Objective: Our objective in this study was to demonstrate the significance of 3 types of events: location related, actor related, and concept- related for understanding how a public health emergency of international concern plays out in social media, and Twitter in particular. Accordingly, the study contributes to research efforts toward gaining insights on the mechanisms that drive participation, contributions, and interaction in this social media platform during a disease outbreak. 
Methods: We collected 6,249,626 tweets referring to the Zika outbreak over a period of 12 weeks early in the outbreak (December 2015 through March 2016). We analyzed this data corpus in terms of its geographical footprint, the actors participating in the discourse, and emerging concepts associated with the issue. Data were visualized and evaluated with spatiotemporal and network analysis tools to capture the evolution of interest on the topic and to reveal connections between locations, actors, and concepts in the form of interaction networks. 
Results: The spatiotemporal analysis of Twitter contributions reflects the spread of interest in Zika from its original hotspot in South America to North America and then across the globe. The Centers for Disease Control and World Health Organization had a prominent presence in social media discussions. Tweets about pregnancy and abortion increased as more information about this emerging infectious disease was presented to the public and public figures became involved in this. 
Conclusions: The results of this study show the utility of analyzing temporal variations in the analytic triad of locations, actors, and concepts. This contributes to advancing our understanding of social media discourse during a public health emergency of international concern.

Keywords: Zika Virus; Social Media; Twitter Messaging; Geographic Information Systems.

Spatiotemporal participation patterns and identifiable clusters over 4 of our twelve week study. The top left panel shows the data during the first week, and time progresses from left to right and from top to bottom towards .

Subsets of the full retweet network pertaining to the WHO (left) and CDC (right), and clusters identified within them. Magenta clusters are centered upon health entities, green upon news organizations, orange upon political entities.

Visualizing a narrative storyline across locations (blue), actors (red), and concepts (green).

Full Reference:
Stefanidis, A., Vraga, E., Lamprianidis, G., Radzikowski, J., Delamater, P.L., Jacobsen, K.H., Pfoser, D., Croitoru, A. and Crooks, A.T. (2017). “Zika in Twitter: Temporal Variations of Locations, Actors, and Concepts”, JMIR Public Health and Surveillance, 3 (2): e22. (pdf)

As normal, any feedback or comments are most welcome. 

Saturday, April 08, 2017

Talk from the AAG

The last few days I have been attending the  Association of American Geographers (AAG) Annual Meeting in Boston. A common theme at the AAG sessions I attended  (to me at least) seemed to  be the rise of new sources of data which give us new ways to explore geographical problems and the challenges of working with bigger data sets. Perhaps where this was most explicitly expressed were in the Geographic Data Science sessions which was pitched to be at the nexus of data science and geography.

While at the meeting I participated in a panel under the theme of "Geographic Data Science", and as part of the Symposium on Human Dynamics in Smart and Connected Communities, I co-organized two sessions entitled Agents - the 'atomic unit' of social systems? which also included Agent-Bingo.  Finally I and gave a presentation of our current research at Mason, entitled "Megacities through the Lens of Computational Social Science", more details can be seen below. For those wanting to know more on the synthetic population generation, click here.

Geographic Data Science Panel

Megacities through the Lens of Computational Social Science


Currently there are over 35 megacities, cities with over 10 million inhabitants, and the number of such cities are expected to grow in the coming years. These habitats represent many challenges from an agent-based modeling perspective. Their size and density, the diverse behaviors of their inhabitants, and their evolving social network of communities along with multiple interacting subsystems need to be understood, captured and modeled. To capture and link the dynamics that shape and form these systems, we must grapple with them in their entirety. While there have been many models applied to specific subsystems of megacities (e.g. traffic, disease spread, urban growth etc.) their interactions often go untouched.

The lens of computational social science (CSS), the interdisciplinary science of complex social systems and their investigation through computational modeling and related techniques can be used to understand and model megacities. Given the advances in computational power and the availability of fine scale datasets, what are the opportunities offered to us with respect to exploring megacities? In an attempt to answer this question we will demonstrate how new sources of data (e.g. volunteered geographical information) can be fused with more traditional data (e.g. census data) to create the basis of a megacity model both in terms of its physical environment and its social environment. We will then show results from a simulated disaster explores how people potentially react and behave to the evolving crisis within a megacity.

Keywords: Megacities, GIS, Agent-based modeling, Social Networks, Behavior

Full References:
Crooks A.T., Kennedy W.G., Burger, A. Oz, T. and Heppenstall, A. (2017), Megacities through the Lens of Computational Social Science, The Association of American Geographers (AAG) Annual Meeting, 5th-9th, April, Boston, MA. (pdf)

Tuesday, April 04, 2017

Smart Cities in IEEE Pervasive Computing

We are excited to announce that the special issue that we organized for IEEE Pervasive Computing is now out. In the special issue entitled "Smart Cities" and demonstrates the state of the art of pervasive computing technologies that collect, monitor, and analyze various aspects of urban life. The articles and departments in the special issue highlight the coming revolution in urban data via some of the different approaches researchers are taking to build tools and applications to better inform decision making (to reduce energy consumption or improve visitor flows, for example). Such research will be critical to setting goals for sustainable urban development within different global contexts. We need to better understand cities and their underlying systems if we want to improve the quality of urban life. To this end, in the special issue we have an introduction (editorial) followed by a number of articles, an interview and a research spotlight:
We hope you enjoy them. Thank you for the authors who submitted papers, the reviewers, Rob Kitchen for giving an interview and Barbara Lenz and Dirk Heinrichs for discussing their research. Lastly, we would also like to thank the IEEE Pervasive Computing team for ensuring that the special issue came to fruition.

Full Reference to the Introduction: 
Crooks, A.T., Schechtner, K., Day, A.K and Hudson-Smith, A (2017), Creating Smart Buildings and Cities, IEEE Pervasive Computing, 16 (2): 23-25. (pdf)

Friday, March 10, 2017

Geovisualization of Social Media

Figure 1: Map Mashup of Twitter data, where eachdot
represents a tweet, the text corresponds to the selected
 tweet marked with a star
In the recently released "The International Encyclopedia of Geography: People, the Earth, Environment, and Technology" we were asked to write a brief entry entitled "geovisualization of social media". Below is a summary of  our chapter:

The proliferation of social media over the last decade is presenting substantial computational challenges associated with the management, processing, analysis and visualization of the corresponding massive volumes of data. Furthermore, this new form of information also imposes new-found challenges upon the geographical community due to the unique nature of its content, as analyzing such data calls for a hybrid mix of spatial and social analysis. The spatial content of social media comprises primarily coordinates from which the contributions originate, or references to specific locations. At the same time, these data have a strong social component, as they can reveal the underlying social structure of the user community through manifestations of their interactions. Analyzing both the spatial and social content of social media feeds is referred to as geosocial analysis. Within this entry we explore the geovisualization opportunities and challenges that are emerging as social media are becoming the subject of study of the geographical community.
In more detail, we start off discussing how the geographic content of social media feeds represents a new type of geographic information. It transcends the early definitions of crowdsourcing or volunteered geographic information as it is not the product of a process through which citizens explicitly and purposefully contribute geographic information to update or expand geographic databases. Instead, the type of geographic information that can be harvested from social media feeds can be referred to as Ambient Geographic Information; it is embedded in the content of these feeds, often across the content of numerous entries rather than within a single one, and has to be somehow extracted. Nevertheless, it is of great importance as it communicates instantaneously information about emerging issues. At the same time, it provides an unparalleled view of the complex social networking and cultural dynamics within a society, and captures the temporal evolution of the human landscape.

In many cases, the geovisualization of social media feeds predominately take the appearance of web map mashups, in essence portraying the location of social media usage on a map. Such an early attempt to visualize social media is shown Figure 1. We argue that while this approach is informative, it often falls short of capturing the depth, richness, and complexity of the information that can be gleaned from social data. As a result, a need for more advanced geovisualization approaches that are capable of better capturing and communicating the complexity and multidimensionality of social media arises. And this is the focus of our chapter. We discuss briefly the geovisualization of network structures (such as shown in Figure 2), the geovisualization of network structure dynamics, the geovisualization of social media content (such as shown in Figure 3) along with the visualization of social media analysis (Figure 4) and conclude the chapter with a list of emerging research challenges.

Figure 2: Visualizing communities: a social network of an interest group (a), and the geovisualization of the  largest community shown over the contiguous U.S (B).

Figure 3: Visualizing social media content dynamics by coupling a Twitter stream viewer (A), a Twitter activity density map (B), and a ranked list top hash-tags (C) and top authors (E), a time slider (D), and author/hash-tags time series graphs.
Figure 4: Visualizing spatiotemporal clusters of tweets following the 2013 Boston bombing. Red circles indicate the approximate radius of each cluster, and color is used to indicate time.

We hope you enjoy. As always any feedback or comments most welcome. Please note this chapter was written a couple of years ago and more recent work by us has been done, click here to see some.

Full Reference:
Croitoru, A., Crooks, A.T., Radzikowski, J. and Stefanidis, A. (2017), Geovisualization of Social Media, in Richardson, D., Castree, N., Goodchild, M. F., Kobayashi, A. L., Liu, W. and Marston, R. (eds.), The International Encyclopedia of Geography: People, the Earth, Environment, and Technology, Wiley Blackwell. DOI: 10.1002/9781118786352.wbieg0605 (PDF)