Friday, June 07, 2024

A comparison of social surveys and social media for vaccine hesitancy

In the past we have explored various ways to explore vaccine hesitancy and keeping with this theme we have a new paper published in PLOS ONE entitled "Understanding the determinants of vaccine hesitancy in the United States: A comparison of social surveys and social media" with Kuleen Sasse, Ron Mahabir, Olga Gkountouna and Arie Croitoru

In the paper we use social, demographic and economic (e.g., US Censusvariables to predict COVID-19 vaccine hesitancy levels in the ten most populous US metropolitan statistical areas (MSAs). By using  machine learning algorithms (e.g., linear regression, random forest regression, and XGBoost regression) we compare a set of baseline models that contain only these variables with models that incorporate survey data and social media (i.e., Twitter) data separately. 

We find that different algorithms perform differently along with variations in influential variables such as age, ethnicity, occupation, and political inclination across the five hesitancy classes (e.g., “definitely get a vaccine”, “probably get a vaccine”, “unsure”, “probably not get a vaccine”, and “definitely not get a vaccine”).   Further, we find that the application of the models to different MSAs yields mixed results, emphasizing the uniqueness of communities and the need for complementary data approaches. But in summary, this paper shows social media data’s potential for understanding vaccine hesitancy, and tailoring interventions to specific communities. If this sounds of interest, below we provide the abstract to the paper along with our mixed methods matrix, data sources used and the results from the various MSAs. At the bottom of the post, you cans see the full reference and the link to the paper so you can read more if you so desire. 

Abstract:
The COVID-19 pandemic prompted governments worldwide to implement a range of containment measures, including mass gathering restrictions, social distancing, and school closures. Despite these efforts, vaccines continue to be the safest and most effective means of combating such viruses. Yet, vaccine hesitancy persists, posing a significant public health concern, particularly with the emergence of new COVID-19 variants. To effectively address this issue, timely data is crucial for understanding the various factors contributing to vaccine hesitancy. While previous research has largely relied on traditional surveys for this information, recent sources of data, such as social media, have gained attention. However, the potential of social media data as a reliable proxy for information on population hesitancy, especially when compared with survey data, remains underexplored. This paper aims to bridge this gap. Our approach uses social, demographic, and economic data to predict vaccine hesitancy levels in the ten most populous US metropolitan areas. We employ machine learning algorithms to compare a set of baseline models that contain only these variables with models that incorporate survey data and social media data separately. Our results show that XGBoost algorithm consistently outperforms Random Forest and Linear Regression, with marginal differences between Random Forest and XGBoost. This was especially the case with models that incorporate survey or social media data, thus highlighting the promise of the latter data as a complementary information source. Results also reveal variations in influential variables across the five hesitancy classes, such as age, ethnicity, occupation, and political inclination. Further, the application of models to different MSAs yields mixed results, emphasizing the uniqueness of communities and the need for complementary data approaches. In summary, this study underscores social media data’s potential for understanding vaccine hesitancy, emphasizes the importance of tailoring interventions to specific communities, and suggests the value of combining different data sources.
Mixed methods matrix showing the data, processing, and model development steps used in our study.

Data sources used in our study.

MSA model performance (Bolded adjusted R2 values represent the best performing model for each modeling technique and MSA).

Monday, June 03, 2024

Skiing and Modeling

Looker room layouts
(Source: Gao et al., 2024)

One of my favorite winter activities is skiing and now that all the skiing places in the North East have closed (for those interested Killington, VT closed last Saturday), I thought it would be interesting to see how people have using various modeling techniques to explore ski areas. While what follows is not a comprehensive list of all the works, these are some that I have come across. If you know more, feel free to leave a comment below. 

Models have ranged form looking at the spatial arrangement  of locker rooms at ski resorts (Gao et al., 2024) to lift lines  (congestion) in places such as  La Plagne in the  French Alps (Poulhès and Mirial, 2017) or the Austrian ski resort of Fanningberg (Heinrich et al., 2023). Others have simulated entire ski areas including lift lines, slopes used etc. (Kappaurer 2022). While Pons et al., (2014) developed an agent based model to see how climate change might impact where skies go. Others have explored how climate change might impact ski areas and their associated water usage for making snow (e.g., Soboll and Schmude 2011). Keeping the climate theme, Revilloud et al., (2013) have used agent-based simulations to simulate snow hight on ski runs based on skiers movements in order to facilitate snow cover management (i.e., reduce the production cost of artificial snow and thus water and energy consumption). Murphy (2021) developed a more simple agent-based model of how skiers might ski durring a powder day and explores the area of terrain they may cover based on ability.  

.Simulation of skiers (source Revilloud et al., 2013)

Similar to some of the other models above, but in light of COVID19, Integrated Insight (2020), a analytics consulting company shows in the movie below how one can use simulation to explore crowd management in the base areas of ski resorts. 



References / papers discussed above:
As noted above, if you know more, feel free to leave a comment below.