Friday, June 07, 2024

A comparison of social surveys and social media for vaccine hesitancy

In the past we have explored various ways to explore vaccine hesitancy and keeping with this theme we have a new paper published in PLOS ONE entitled "Understanding the determinants of vaccine hesitancy in the United States: A comparison of social surveys and social media" with Kuleen Sasse, Ron Mahabir, Olga Gkountouna and Arie Croitoru

In the paper we use social, demographic and economic (e.g., US Censusvariables to predict COVID-19 vaccine hesitancy levels in the ten most populous US metropolitan statistical areas (MSAs). By using  machine learning algorithms (e.g., linear regression, random forest regression, and XGBoost regression) we compare a set of baseline models that contain only these variables with models that incorporate survey data and social media (i.e., Twitter) data separately. 

We find that different algorithms perform differently along with variations in influential variables such as age, ethnicity, occupation, and political inclination across the five hesitancy classes (e.g., “definitely get a vaccine”, “probably get a vaccine”, “unsure”, “probably not get a vaccine”, and “definitely not get a vaccine”).   Further, we find that the application of the models to different MSAs yields mixed results, emphasizing the uniqueness of communities and the need for complementary data approaches. But in summary, this paper shows social media data’s potential for understanding vaccine hesitancy, and tailoring interventions to specific communities. If this sounds of interest, below we provide the abstract to the paper along with our mixed methods matrix, data sources used and the results from the various MSAs. At the bottom of the post, you cans see the full reference and the link to the paper so you can read more if you so desire. 

The COVID-19 pandemic prompted governments worldwide to implement a range of containment measures, including mass gathering restrictions, social distancing, and school closures. Despite these efforts, vaccines continue to be the safest and most effective means of combating such viruses. Yet, vaccine hesitancy persists, posing a significant public health concern, particularly with the emergence of new COVID-19 variants. To effectively address this issue, timely data is crucial for understanding the various factors contributing to vaccine hesitancy. While previous research has largely relied on traditional surveys for this information, recent sources of data, such as social media, have gained attention. However, the potential of social media data as a reliable proxy for information on population hesitancy, especially when compared with survey data, remains underexplored. This paper aims to bridge this gap. Our approach uses social, demographic, and economic data to predict vaccine hesitancy levels in the ten most populous US metropolitan areas. We employ machine learning algorithms to compare a set of baseline models that contain only these variables with models that incorporate survey data and social media data separately. Our results show that XGBoost algorithm consistently outperforms Random Forest and Linear Regression, with marginal differences between Random Forest and XGBoost. This was especially the case with models that incorporate survey or social media data, thus highlighting the promise of the latter data as a complementary information source. Results also reveal variations in influential variables across the five hesitancy classes, such as age, ethnicity, occupation, and political inclination. Further, the application of models to different MSAs yields mixed results, emphasizing the uniqueness of communities and the need for complementary data approaches. In summary, this study underscores social media data’s potential for understanding vaccine hesitancy, emphasizes the importance of tailoring interventions to specific communities, and suggests the value of combining different data sources.
Mixed methods matrix showing the data, processing, and model development steps used in our study.

Data sources used in our study.

MSA model performance (Bolded adjusted R2 values represent the best performing model for each modeling technique and MSA).

Monday, June 03, 2024

Skiing and Modeling

Looker room layouts
(Source: Gao et al., 2024)

One of my favorite winter activities is skiing and now that all the skiing places in the North East have closed (for those interested Killington, VT closed last Saturday), I thought it would be interesting to see how people have using various modeling techniques to explore ski areas. While what follows is not a comprehensive list of all the works, these are some that I have come across. If you know more, feel free to leave a comment below. 

Models have ranged form looking at the spatial arrangement  of locker rooms at ski resorts (Gao et al., 2024) to lift lines  (congestion) in places such as  La Plagne in the  French Alps (Poulhès and Mirial, 2017) or the Austrian ski resort of Fanningberg (Heinrich et al., 2023). Others have simulated entire ski areas including lift lines, slopes used etc. (Kappaurer 2022). While Pons et al., (2014) developed an agent based model to see how climate change might impact where skies go. Others have explored how climate change might impact ski areas and their associated water usage for making snow (e.g., Soboll and Schmude 2011). Keeping the climate theme, Revilloud et al., (2013) have used agent-based simulations to simulate snow hight on ski runs based on skiers movements in order to facilitate snow cover management (i.e., reduce the production cost of artificial snow and thus water and energy consumption). Murphy (2021) developed a more simple agent-based model of how skiers might ski durring a powder day and explores the area of terrain they may cover based on ability.  

.Simulation of skiers (source Revilloud et al., 2013)

Similar to some of the other models above, but in light of COVID19, Integrated Insight (2020), a analytics consulting company shows in the movie below how one can use simulation to explore crowd management in the base areas of ski resorts. 

References / papers discussed above:
As noted above, if you know more, feel free to leave a comment below. 

Tuesday, May 28, 2024

EPB turns 50

Its hard to believe that Environment and Planning B (EPB) is turning 50. To celebrate the fact, we have a special issue. Among the many number of contributed commentaries and editorials Mike Batty discusses the history of the journal, and I have a paper that reflects  on how EPB (and the authors who have published in it has shaped my own thinking of cities.  

We also have another commentary with Na (Richard) Jiang and the current editors of the journal (Linda See, Seraphim Alvanides, Dani Arribas-Bel, Levi Wolf, Mike Batty and myself) that  explore papers published in the journal over the last 50 years. By taking the abstracts from all the papers, we did some content analysis to look at the trends and themes over the decades. To some extent it was not surprising seeing a rising number of publications over the decades and the decrease in number of single authors papers. But what was quite evident by just generating some word clouds of the key terms in the abstracts, is one one can see that design’, ‘analysis’ and ‘development’ have remained as key concepts throughout the 50 years as shown below.

The top 50 words appearing in the titles of all 50 years and from the 1970s to 2020s

We also dug deeper into the abstracts by using Bidirectional Encoder Representations from Transformers (BERT) framework, which looks for relationships between words among different abstracts. BERTopic was used to classify all abstracts into 40 topics including one irrelevant or outlier topic. By excluding this outlier topic, 39 topics  emerged. These topics give a good overview of the types of papers that have appeared in EPB over the last 50 years as shown below. 

The evolution of topics over time based on topic modeling.

From the topic modeling, one can see the architectural history of the journal and the publication of quite specialized papers on shape grammars which is evident in topics 2, 5, 13, 14, and 17 while models of urban growth (topic 0) represent one of the main topics of research in the journal over time. Other topics represent emerging challenges, for example, topic 15 on the covid pandemic and topic 7 on climate change related research. If you wish to find out more about this analysis have a look at the commentaries listed below. 

Crooks, A.T. (2024), Environment and Planning B: Its Shaping of Urban Modeling and Me, Environment and Planning B, 51(5) 1020-1022. (pdf)
Crooks, A.T., Jiang, N., See, L. Alvanides, S., Arribas-Bel. D., Wolf, L.J. and Batty, M. (2024), EPB Turns 50 Years Old: An Analytical Tour of the Last Five Decades, Environment and Planning B, 51(5): 1028-1037. (pdf)

Monday, May 13, 2024

Using ABM to simulate Covid-19 vaccine uptake

In past blog posts we have discussed how one can use social media to study vaccine discussions and even tried to build a very simple disease model where vaccination rates were a factor in the spread of an outbreak. However, when it comes to vaccinations, especially that of Covid-19 vaccine there has been intense discussions in the physical (e.g., family), hybrid (e.g., work, school) and cyber (e.g., social media) spaces we inhabit. 

One thing that is unclear is how do these discussions in these various hybrid spaces impact our decision to get vaccinated or not? To this end, in a new paper published in the International Journal of Geographical Information Science with Fuzhen Yin, Li Yin and myself, entitled “How information propagation in hybrid spaces affects decision-making: using ABM to simulate Covid-19 vaccine uptake” we explore this. 

More specially we explore how through opinion dynamics modeling, how agents can chose to vaccinate or not and how much emphasis they place on physical, relational and cyber spaces Using Chautauqua County in New York State as a case study our model results captures the temporal dynamics of vaccination progress with small errors but we also find that different age groups demonstrate various preferences for different spaces to receive vaccine related information. 

If this sounds of interest, below you can read the abstract of the paper, see a flow chart of the model logic and some of the results. While at the bottom of the post you can find the full reference and link to the paper. Furthermore, Fuzhen has also provided a detailed Overview, Design Concepts and Details Protocol (ODD) document along with the source code and data needed to run the model at CoMSES Net 

The notion of physical space has long been central in geographical theories. However, the widespread adoption of information and communication technologies (ICTs) has freed human dynamics from purely physical to also relational and cyber spaces. While researchers increasingly recognize such shifts, rarely have studies examined how the information propagates in these hybrid spaces (i.e., physical, relational, and cyber). By exploring the vaccine opinion dynamics through agent-based modeling, this study is the first that combines all hybrid spaces and explores their distinct impacts on human dynamics from an individual’s perspective. Our model captures the temporal dynamics of vaccination progress with small errors (MAE=2.45). Our results suggest that all hybrid spaces are indispensable in vaccination decision making. However, in our model, most of the agents tend to give more emphasis to the information that is spread in the physical instead of other hybrid spaces. Our study not only sheds light on human dynamics research but also offers a new lens to identifying vaccinated individuals which has long been challenging in disease-spread models. Furthermore, our study also provides responses for practitioners to develop vaccination outreach policies and plan for future outbreaks. 

Keywords: Agent-based modeling, hybrid space, opinion dynamics, Covid-19, vaccination. 

Flowchart of the modeling process. 

Comparing predicted and observed vaccination rates of all populations by giving physical, relational, cyber spaces different weights. Mean absolute error (MAE) and root mean square error (RMSE) are reported to evaluate the quality of predictions.

Comparing predicted and observed vaccination rates among different age groups by using the weight combination 3 (physical), 1 (relational), 1 (cyber) for hybrid spaces. 

Comparing predicted and observed vaccination rates by varying weights of hybrid spaces for different age groups.

Spatial distribution of Covid-19 vaccines. (a)-(d) Point density of vaccination allocation at different time steps. (e) Predicted vaccination rates at census block group level.

Full Referece:
Yin, F., Crooks, A.T. and Yin, L. (2024), How information propagation in hybrid spaces affects decision-making: using ABM to simulate Covid-19 vaccine uptake, International Journal of Geographical Information Science, (pdf)

Tuesday, April 30, 2024

Presentations at the AAG

At the recent American Association of Geographers (AAG) Annual Meeting in Honolulu, Hawaii, our group had several presentations showcasing some of the research we are doing here at the University at Buffalo with respect to agent-based modeling, social media analysis and machine learning. If these sound of interest feel free to reach out to us to find out more. 

First up was Na (Richard) Jiang who presented a paper entitled "Populating Digital Twins with Humans: A Framework Utilizing Artificial Agents". In this presentation he showcase our workflow of embedding agents in models of cities using examples from simple commuting models (like that shown below) to the spread of diseases.

Over the last few decades, considerable efforts have been placed in creating digital virtual worlds. Ranging in applications from engineering, geography, industry, and translation. More recently, with the growth of computational resources and the explosion of spatial data sources (e.g., satellite imagery, aerial photos, and 3-dimensional urban data), creating detailed virtual urban environments or urban digital twins has become more widespread. However, these works emphasize on the physical infrastructure and built environment of the urban areas instead of considering the key element acting within the urban system, which are the humans. In this paper, we would like to remedy this by introducing a framework that utilizes agent-based modeling to add humans to such urban digital twins. Specifically, this framework consists of two major components: 1) synthetic population datasets generated with 2020 Census Data; and 2) pipeline of using the population datasets for agent-based modeling applications. To demonstrate the utility of this framework, we have chosen representative applications that showcase how digital twins can be created for study various urban phenomena. These include building evacuations, traffic congestion and disease transmission. By doing so, we believe this framework will benefit any modeler wishing to build an urban digital twin to explore complex urban issues with realistic populations.

Keywords: agent-based model, geosimulation, urban digital twins

Following this talk, I presented work on behalf of  Fuzhen Yin and Na (Richard) Jiang  entitled "Modeling Covid Vaccination uptake in New York State: An agent-based modeling perspective". In this paper we utilize Richards workflow and add social media networks to it in order to explore vaccination uptake for the whole of New York state. We do this through the lens of opinion dynamics and agent-based modeling, in the sense how people may change their opinions about wether or not to get vaccinated based on information from different sources (e.g. family, friends or online).   

The effect of the recent COVID pandemic has been significantly curtailed with the introduction of vaccinations. However, not everyone has been vaccinated for a multitude of reasons. For example, people might be influenced by what they read online or the opinions of others. To explore the changes in people’s views on vaccination, we have developed a geographically explicit agent-based model utilizing opinion dynamics. The model captures people’s opinions on COVID vaccination and how this relates to actual vaccination trends. Using the entire state of New York with a population of over 22 million agents, we model vaccination uptake from January 1, 2021, until May 15, 2022. Agents within the model synthesize information from the other agents they are connected with either in physical or cyberspace and decide whether to vaccinate or not. We compare these vaccination statuses among different age groups with actual vaccination rates provided by New York State. Our results suggest that there is an interplay between different spaces and ages when it comes to agents making a decision to vaccinate or not. As such the model offers a novel way to explore vacation decisions from the bottom up.
Keywords: Agent-based modeling, Covid, Vaccine, Geosimulation, Social Networks

Ofter this,  Qingqing Chen presented work with Ate Poorthuis and myself entitled "Mapping the Invisible: Decoding Perceived Urban Smells through Geosocial Media in New York City" in which we explore how social media can be used to map smells in large metropolitan areas. 

Smells can shape people’s perceptions of urban spaces, influencing how individuals relate themselves to the environment both physically and emotionally. Although the urban environment has long been conceived as a multisensory experience, research has mainly focused on the visual dimension, leaving smell largely understudied. This paper aims to construct a flexible and efficient bottom-up framework for capturing and classifying perceived urban smells from individuals based on geosocial media data. Thus, increasing our understanding of this relatively neglected sensory dimension in urban studies. We take New York City as a case study and decode perceived smells by teasing out specific smell-related indicator words through text mining and network analysis techniques from a historical set of geosocial media data (i.e., Twitter). The dataset consists of over 56 million data points sent by more than 3.2 million users. The results demonstrate that this approach, which combines quantitative analysis with qualitative insights, can not only reveal “hidden” places with clear spatial smell patterns, but also capture elusive smells that may otherwise be overlooked. By making perceived smells measurable and visible, we can gain a more nuanced understanding of smellscapes and people’s sensory experiences within the urban environment. Overall, we hope our study opens up new possibilities for understanding urban spaces through an olfactory lens and, more broadly, multi-sensory urban experience research.

Keywords: Smellscape, Urban smells, Geosocial media, Text mining, Network analysis, Multi-sensory urban experiences.

Last but not least,  Boyu Wang presented his work entitled "Simulating urban flows with geographically explicit synthetic populations".  In this talk, Boyu showed how a deep learning spatial-temporal urban flow model is trained to predict the aggregated inflows and outflows within regions and feed directly into an agent-based model.


Urban human mobility is an active research field that studies movement patterns in urban areas at both the individual and aggregated population levels. Through individual’s movement, higher level phenomena such as traffic congestion and disease outbreaks emerge. Understanding how and why people move around a city plays an important role in urban planning, traffic control, and public health. An abundance of agent-based models have been built by researchers to simulate human movements in cities and are often integrated with a GIS component to realistically represent the study area. In this work we build a geographically explicit agent-based model where agents move between their home and workplaces, to simulate people’s daily commuting patterns within a city. In order to build this model, we develop a geographically explicit synthetic population based on census data. A deep learning spatial-temporal urban flow model is trained to predict the aggregated inflows and outflows within regions of the study area, which are subsequently used to drive individual agents’ movements. To validate results from the agent-based model, agents’ movements are aggregated and evaluated along with the urban flow model. Commuting statistics are also collected and compared to existing travel surveys. As such we aim to demonstrate how urban simulation models can be complemented by recent advancements in GeoAI techniques. Conversely, the aggregated deep learning model predictions can be investigated at a fine-grained individual level. This extends traffic patterns forecasting from just looking at the patterns to the processes that lead to these patterns emerging.

Keywords: Agent-Based Modeling, Urban Flow, GeoAI, Urban Simulation, Synthetic Populations 


Yin, F., Jiang., N. and Crooks, A.T. (2024), Modeling Covid Vaccination uptake in New York State: An Agent-based Modeling Perspective, The Association of American Geographers (AAG) Annual Meeting, 23rd –27th April, Honolulu, HI. (pdf)

Jiang., N. Crooks, A.T., Wang, B. and Yin (2024), Populating Digital Twins with Humans: A Framework Utilizing Artificial Agents, The Association of American Geographers (AAG) Annual Meeting, 23rd –27th April, Honolulu, HI. (pdf)

Chen, C., Poorthuis, A. and Crooks, A.T. (2024), Mapping the Invisible: Decoding Perceived Urban Smells through Geosocial Media in New York City, The Association of American Geographers (AAG) Annual Meeting, 23rd –27th April, Honolulu, HI. (pdf)

Wang, B. and Crooks, A.T. (2024), Simulating Urban Flows with Geographically Explicit Synthetic Populations, The Association of American Geographers (AAG) Annual Meeting, 23rd –27th April, Honolulu, HI. (pdf)