Thursday, November 07, 2024

A Large-Scale Geographically Explicit Synthetic Population with Social Networks for the US

In numerous posts, we have been discussing synthetic populations and their use in agent-based modeling. But there are many modeling styles that also utilize synthetic populations. In our own work we often spend significant amounts of time creating such synthetic populations, especially those grounded with data, due to the time needed to collect, preprocess and generate the final synthetic population. To alleviate this, we (Na (Richard) JiangFuzhen YinBoyu Wang and myself) have a new paper published in Scientific Data, entitled "A Large-Scale Geographically Explicit Synthetic Population with Social Networks for the United States.Our aim of this paper is to build and provide a geographically explicit synthetic population along with its social networks using open data including that from the latest 2020 U.S. Census which can be used in a variety of geo-simulation models.

Summary of the Resulting Datasets.

Specially, in the paper we outline how we created the a synthetic population of 330,526,186 individuals representing America's 50 states and Washington D.C.. Each individual has a set of geographical locations that represent their home, work or school addresses. Additionally, these individuals are not isolated, they are embedded in a larger social setting based on their household, working and studying relationships (i.e., social networks).

The work (e.g., data collection, data preprocessing and generation processes) was coded using Python 3.12 and all the scripts used are available at: https://github.com/njiang8/geo-synthetic-pop-usa while the resulting datasets (85 GB uncompressed) are available at OSF: https://osf.io/fpnc2/.  

To give you a sense of the paper, below we provide the abstract to it, along with  some results and our efforts to validate the synthetic population. While at the full reference and link to the paper can be found at the bottom of the post. 

Abstract:

Within the geo-simulation research domain, micro-simulation and agent-based modeling often require the creation of synthetic populations. Creating such data is a time-consuming task and often lacks social networks, which are crucial for studying human interactions (e.g., disease spread, disaster response) while at the same time impacting decision-making. We address these challenges by introducing a Python based method that uses the open data including that from 2020 U.S. Census data to generate a large-scale realistic geographically explicit synthetic population for America's 50 states and Washington D.C. along with the stylized social networks (e.g., home, work and schools). The resulting synthetic population can be utilized within various geo-simulation approaches (e.g., agent-based modeling), exploring the emergence of complex phenomena through human interactions and further fostering the study of urban digital twins.

Keywords: Synthetic Population, U.S. Census 2020, Agent-Based Modeling, Geo-Simulation, Social Networks.

Data Generation Workflow and Resulting Datasets.

A Sample of a Social Networks for one Household and their Home, Work and Educational Social Networks from the Generated Data.

Sample of Generated Social Networks Extracted from the City of Buffalo, New York: (a) Household; (b) Work; (c) School; (d) Daycare.

Validation of the Synthetic Population at Different Levels: (a) Population under Different 18 Age Groups; (b) Household under Different Household Types.

Full Referece: 

Jiang, N., Yin, F., Wang., B. and Crooks, A.T., (2024), A Large-Scale Geographically Explicit Synthetic Population with Social Networks for the United States, Scientific Data, 11, 1204. https://doi.org/10.1038/s41597-024-03970-1 (pdf)




Friday, November 01, 2024

Pattern of Life Human Mobility Simulation (Demo)

While in the past we have written about how we can use agent-based models to capture basic patterns of life, and even developed a simulations, but until now we have never really demonstrated how we go about this. However, at the  SIGSPATIAL 2024 conference  we (Hossein Amiri, Will Kohn, Shiyang Ruan, Joon-Seok Kim, Hamdi Kavak, Dieter Pfoser, Carola Wenk, Andreas Zufle and myslf) have a demonstration paper entitled "The Pattern of Life Human Mobility Simulation." in which we show: 

  1. How to run the Patterns of Life Simulation with the graphical user interface (GUI) to visually explore the mobility patterns of a region.
  2. How to run the Patterns of Life Simulation headless (without GUI) for large-scale data generation.
  3. How to adapt the simulation to any region in the world using OpenStreetMap data,
  4. Showcase how recent scalability improvements allow us to simulate hundreds of thousands of agents.

If this sounds of interest, below we show the GUI to the model, along with the steps to generate a trajectory dataset or a new map for the simulation. At the bottom of the post you can actually see the papers full reference and a link to download it. While at https://github.com/onspatial/generate-mobility-dataset you can find the source code for the enhanced simulation and data-processing tools for you to experiment with.

Abstract: 

We demonstrate the Patterns of Life Simulation to create realistic simulations of human mobility in a city. This simulation has recently been used to generate massive amounts of trajectory and check-in data. Our demonstration focuses on using the simulation twofold: (1) using the graphical user interface (GUI), and (2) running the simulation headless by disabling the GUI for faster data generation. We further demonstrate how the Patterns of Life simulation can be used to simulate any region on Earth by using publicly available data from OpenStreetMap. Finally, we also demonstrate recent improvements to the scalability of the simulation allows simulating up to 100,000 individual agents for years of simulation time. During our demonstration, as well as offline using our guides on GitHub, participants will learn: (1) The theories of human behavior driving the Patters of Life simulation, (2) how to simulate to generate massive amounts of synthetic yet realistic trajectory data, (3) running the simulation for a region of interest chosen by participants using OSM data, (4) learn the scalability of the simulation and understand the properties of generated data, and (5) manage thousands of parallel simulation instances running concurrently.

Keywords: Patterns of Life, Simulation, Trajectory, Dataset, Customization

A screenshot of the graphical user interface of the Patterns of Life Simulation. The GUI shows the map and the movements of agents on the left side and the social network of agents and their statistical properties on the right side. 

Steps to generate the one trajectory dataset.
Steps to generate a new map for the simulation.

Full referece: 

Amiri, H., Kohn, W., Ruan, S., Kim, J-S., Kavak, H., Crooks, A.T., Pfoser, D., Wenk, C. and Zufle, A. (2024) The Pattern of Life Human Mobility Simulation (Demo Paper), ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Atlanta, GA. (pdf)

Thursday, October 31, 2024

Studying Contagious Disease Spread: An ABM Framework

In the past we have written about the use of synthetic populations and their use in agent-based models. We are finding such synthetic populations to be extremely useful in the creation or initialization of agent-based models. To give you a sense of how we are utilizing such synthetic populations at the 7th ACM SIGSPATIAL International Workshop on Geospatial Simulation (GeoSim 2024),   Na (Richard) Jiang and myself have a new paper entitled  "Studying Contagious Disease Spread Utilizing Synthetic Populations Inspired by COVID-19: An Agent-based Modeling Framework.

In the paper we show how we can we utilize a method to create the geographically-explicit synthetic population along with capturing their social networks and how this can be used to study  contagious disease spread (and various lineages of the disease) in Western New York. If this sounds of interest, below you can read the abstract from the paper, see some of the results and find the full reference and the link to the paper. While the model itself and the data needed to run it is available at https://osf.io/zrtuj/

Abstract

The COVID-19 pandemic has reshaped societies and brought to the forefront simulation as a tool to explore the spread of the diseases including that of agent-based modeling. Efforts have been made to ground these models on the world around us using synthetic populations that attempt to mimic the population at large. However, we would argue that many of these synthetic populations and therefore the models using them, miss the social connections which were paramount to the spread of the pandemic. Our argument being is that contagious diseases mainly spread through people interacting with each other and therefore the social connections need to be captured. To address this, we create a geographically-explicit synthetic population along with its social network for the Western New York (WNY) Area. This synthetic population is then used to build a framework to explore a hypothetical contagious disease inspired by various of COVID-19. We show simulation results from two scenarios utilizing this framework, which demonstrates the utility of our approach capturing the disease dynamics. As such we show how basic patterns of life along with interactions driven by social networks can lead to the emergence of disease outbreaks and pave the way for researchers to explore the next pandemic utilizing agent-based modeling with geographically explicit social networks.

Keywords: Agent-based Modeling, Synthetic Populations, Social Networks, COVID-19, Disease Modeling.

Single Lineage Results: (a) Overall SEIR Dynamic; (b) Contact Tracing Example.

Western New York Commuting Pattern.

Disease Dynamics when Two Lineages are Introduced.

Reference: 

Jiang N., Crooks, A.T. (2024), Studying Contagious Disease Spread Utilizing Synthetic Populations Inspired by COVID-19: An Agent-based Modeling Framework, Proceedings of the 7th ACM SIGSPATIAL International Workshop on Geospatial Simulation (GeoSim 2024), Atlanta, GA., pp. 29-32. (pdf)

Wednesday, October 30, 2024

Agent-Based Models and Geography

Just a quick post, In recently released Encyclopedia of Human Geography edited by Barney Warf we were asked to write a short chapter entitled "Agent-based Models and Geography" In the chapter we discuss how over  the last several decades, agent-based modeling has gained widespread adoption in geography.and introduce the reader to what are agent-based models, how they have developed and types of geographical applications that can be explored with them, especially when linked to Geographical Information Systems (GIS). The chapter concludes with a brief summary along with a discussion of challenges and opportunities with agent-based modeling (ABM). If this sounds of interest, below you can find the full reference and link to the chapter. 

Example application domains for agent-based models over various spatial and temporal scales. For more examples and further details can be found at https://www.gisagents.org/

Full Referece:

Crooks, A.T. and Jiang, N. (2024), Agent-based Models and Geography, in Warf, B. (ed.), The Encyclopedia of Human Geography, Springer, Cham, Switzerland, https://doi.org/10.1007/978-3-031-25900-5_258-1. (pdf)

An Agent-based model of COVID-19 Vaccine uptake in New York State

In the past we have explored how agent-based modeling can be used to study vaccine uptake and what is the mechanism underlying the diffusion of different vaccine opinions in hybrid spaces (e.g., physical, relational and cyber) can affect individuals’ vaccination decisions. But this prior work was limited to  just one small area. However, we know that urban and rural communities have different levels of digital connectivity and we were wondering if our initial findings are applicable to other counties which are more urban or to a larger study area. To explore this, at the 7th ACM SIGSPATIAL International Workshop on Geospatial Simulation (GeoSim 2024)  we (Fuzhen Yin, Na Jiang, Lucie Laurian and myself) have a paper entitled "Agent-based Modeling of COVID-19 Vaccine uptake in New York State: Information Diffusion in Hybrid Spaces". 

This paper significantly extends our previous work in a number of ways. First we move from a single rural county to the entire state of New York which has 62 counties which differ substantially in  socioeconomic status. Furthermore, we move from a small population of 120,000 to over 20 million agents. By doing so, it allows us to compare vaccination uptakes in different areas (e.g., urban versus rural communities, second home destinations versus college towns). We also use  different parameters to initialize hybrid spaces for urban and rural populations to understand how individuals' preferences on hybrid spaces affect information diffusion and vaccination rates at a macro level. Lastly, we updated the decision-making rules for minors (i.e., ages under 18) that allows us to better simulate young population groups. In the sense that we make the assumption that minors need to have at least one of their guardians in the family network vaccinated already before they can take vaccines. By extending the model  we can can accurately simulate the vaccination rates for New York state (mean absolute error=6.93) and for the majority of counties within it (81%).

If this sounds of interest, below you can read the abstract of our paper, see our various hybrid spaces over the New York state along with our updated model logic and the aggregate results. The full reference and the link to the paper can be found at the bottom of the post. While the model itself, which was created in Mesa and the data needed to run the model can be found at: https://osf.io/3khyq/. We share our modeling scripts, input data and results at  for interested readers to reproduce or extend our work as they see fit but also to conform with the FAIR principles (findable, accessible, interoperable and reusable),

Abstract
During the COVID-19 pandemic, social media become an important hub for public discussions on vaccination. However, it is unclear how the rise of cyber space (i.e., social media) combined with traditional relational spaces (i.e., social circles), and physical space (i.e., spatial proximity) together affect the diffusion of vaccination opinions and produce different impacts on urban and rural population's vaccination uptake. This research builds an agent-based model utilizing the Mesa framework to simulate individuals' opinion dynamics towards COVID-19 vaccines, their vaccination uptake and the emergent vaccination rates at a macro level for New York State (NYS). By using a spatially explicit synthetic population, our model can accurately simulate the vaccination rates for NYS (mean absolute error=6.93) and for the majority of counties within it (81\%). This research contributes to the modeling literature by simulating individuals vaccination behaviors which are important for disease spread and transmission studies. Our study extends geo-simulations into hybrid-space settings (i.e., physical, relational, and cyber spaces).

Keywords: Agent-based modeling, GIS, Information diffusion, Hybrid spaces, Social networks, Health informatics, Vaccines, COVID-19. 

Schematic representation of hybrid spaces. Physical space includes family and group quarter network. Relational space represents people's social circles in work, school and daycare. Cyber space is a social media network. This figure only display 2% of total population in NYS (around 200,000 agents) for visualization process.

Modeling process and structure: from data to agent-behaviors.

Mapping the differences (i.e., mean absolute error (MAE)) in vaccination rate between simulated and ground truth data. 

Reference
Yin, F., Jiang, Na., Crooks, A.T., Laurian, L. (2024), Agent-based Modeling of Covid-19 Vaccine Uptake in New York State: Information Diffusion in Hybrid Spaces, Proceedings of the 7th ACM SIGSPATIAL International Workshop on Geospatial Simulation (GeoSim 2024), Atlanta, GA., pp. 11-20. (pdf)

Sunday, October 27, 2024

Retention in Higher Education: An Agent-Based Model

The effects of educational attainment on individuals and society have been the subject of much research. However, there is still a need to study what factors matter the most, and what is worth investing more time and resources into, and how new methods of analysis can provide additional ways of looking into some of the challenges faced by higher education. 

To this end at the 2024 International Conference of the Computational Social Science Society of the Americas (CSSSA)Amira Al-Khulaidy Stine and myself had a paper entitled "Retention in Higher Education: An Agent-Based Model of Social Interactions and Motivated Agent Behavior."  In the paper we introduce an agent-based model which explores retention where we focus on students and their levels of motivation (i.e., "grit"), their immediate connections (i.e., sense of belonging) and institutional support. At the same time we capture institutional locales (i.e., urban and rural) and their selectivity. Taken all together the model explores how these factors impact student success outcomes and retention.  We find students level of motivation is a reliable factor in determining student outcome which is inline with others, but the model also suggests that for certain student populations (i.e., those that have average motivation and mid range GPAs), their sense of belonging represented though their social connections with other students and institutional support (i.e., hubs of support)  can be more important. 

If this sounds of interest below we have the abstract to the paper, along with some images relating the data that was used to inform aspects of the model, the graphical user interface of model, its logic and some of the results that emerge from it. While at the bottom of the post you can find the full citation and a the link to the paper itself. The model itself which was developed in NetLogo, along with a detailed Overview, Design concepts, and Details (ODD) document can be found at https://www.comses.net/codebase-release/53302b0a-0a97-463a-8498-d604cb246e4c/.

Abstract:

In the United States, educational attainment and student retention in higher education are two of the main focuses of higher education research. Institutions are constantly looking for ways to identify areas of improvement across different aspects of the student experience on university campuses. This paper combines Department of Education data over a 10 year period, U.S. Census data, and higher education theory on student retention, to build an agent-based model of student behavior. Furthermore we model student social interactions with their peers along with considering environmental components (e.g., urban vs. rural campuses) and institution personnel to explore the elements that increase the likelihood of student retention. Results suggest that both social interactions and environmental components make a difference in student retention. Suggesting that higher education institutions should consider new ways to accommodate learning needs that promote better student outcomes.

Keywords: Agent-Based Model, College Campuses, Higher Education, Department of Education,  Social Interactions, Student Retention.

Student retention 2007-2021 by institutional support and urbanicity for Urban, Suburban, Town, and Rural areas.
Model Graphical User Interface.
Process Overview and Model Logic.

Retention results for Urban (left) and Rural (right) settings of low support and low sense of belonging while high motivation (grit).

Referece: 

Stine, A.A. and Crooks, A.T. (2024), Retention in Higher Education: An Agent-Based Model of Social Interactions and Motivated Agent Behavior, Proceedings of the 2024 International Conference of the Computational Social Science Society of the Americas, Santa Fe, NM. (pdf)

Wednesday, October 23, 2024

Call for Abstracts - AAG 2025: Geo-simulation Sessions



Building upon last year’s successful sessions related to geosimulation, were various topics and issues from across the urban, social and environmental fields and the resulting application areas. More excitingly, we are witnessing the emergence of the integration of cutting-edge techniques (e.g., machine learning and generative AI) which is energizing the geosimulation community as they offer new approaches for advancing geosimulations. 

This year, the 2025 AAG Annual Meeting will take place in Detroit, Michigan from March 24 to March 28. We are continuing to organize sessions on "Geosimulations for Addressing Societal Challenges," and we encourage you to submit abstracts if this area aligns with your research interests.

Session Description:

There is an urgent need for research that promotes sustainability in an era of societal challenges ranging from climate change, population growth, aging and wellbeing to that of pandemics. These need to be directly fed into policy. We, as a Geosimulation community, have the skills and knowledge to use the latest theory, models and evidence to make a positive and disruptive impact. These include agent-based modeling, microsimulation and increasingly, machine learning methods. However, there are several key questions that we need to address which we seek to cover in this session. For example, What do we need to be able to contribute to policy in a more direct and timely manner? What new or existing research approaches are needed? How can we make sure they are robust enough to be used in decision making? How can geosimulation be used to link across citizens, policy and practice and respond to these societal challenges? What are the cross-scale local trade-offs that will have to be negotiated as we re-configure and transform our urban and rural environments? How can spatial data (and analysis) be used to support the co-production of truly sustainable solutions, achieve social buy-in and social acceptance? And thereby co-produce solutions with citizens and policy makers.

We are particularly interested in presentations that will discuss issues relating to:
  • Agent-based modeling and microsimulation techniques for responding to societal challenges;
  • Agent-based models used for policy formation;
  • Data driven modeling;
  • Utilizing machine modeling for geosimulation;
  • Creating really big models using exascale computation;
  • Model validation and assessment;
  • Participatory methods for agent-based modeling;
  • Approaches to connect and share (open source) data and models;
  • Revealing, quantifying, and reducing socio-economic inequalities with Geosimulation.

Next Steps:

If this sounds of interest, please e-mail the abstract and keywords with your expression of intent to Richard Jiang (njiang8@buffalo.edu) by October 29 (2 days before the AAG session deadline). Please make sure that your abstract conforms to the AAG guidelines in relation to title, word limit and keywords and as specified at: https://aag.secure-platform.com/aag2025/page/abstracts/abstract-guidelines An abstract should be no more than 250 words that describe the presentation’s purpose, methods, and conclusions.

Timeline:

  • October 29, 2024: Please send abstract and keywords with your expression of intent to Richard Jiang (njiang8@buffalo.edu
  • October 30th, 2024: Session finalization and author notification 
  • October 31, 2024: Final abstract submission to AAG, via https://aag.secure-platform.com/aag2025.  All participants must register individually via this site. Upon registration you will be given a participant number (PIN). Send the PIN and a copy of your final abstract to Richard Jiang. Neither the organizers nor the AAG will edit the abstracts. 
  • February 6, 2025: Final Abstract/Session Editing and Presentation Conversion deadlines for AAG, via https://aag.secure-platform.com/aag2025. 
  • March 24-28 2025: AAG in Detroit.

Organizers

Friday, September 27, 2024

Genomic profiling and spatial SEIR modeling of COVID-19 transmission

Lineage distribution of SARS-CoV-2 across
geographic regions of Ontario, Canada,
Western New York, and New York City over time
In the past we have posted on using agent-based models for explore the spread of diseases. We have been keeping up with this work especially in light of COVID-19. To this end we are excited to introduce our new paper entitled "Genomic Profiling and Spatial SEIR Modeling of COVID-19 Transmission in Western New York" published in Frontiers in Microbiology In this paper have been collaborating with other researchers at the University at Buffalo who focus  on the genomic sequencing of various lineages distribution of SARS-CoV-2. What is special about this  new paper is that we explore how such linages change over space and time and how this relates to movement patterns. If this sounds of interest, below you can read the abstract of the paper, see some the lineages in different regions which change over space and time, and our agent-based model which explores how different lineages might spread though peoples movement patterns. At the bottom of the post, you can see the full reference and the link to the paper itself.  

Abstract: 

The COVID-19 pandemic has prompted an unprecedented global effort to understand and mitigate the spread of the SARS-CoV-2 virus. In this study, we present a comprehensive analysis of COVID-19 in Western New York (WNY), integrating individual patient-level genomic sequencing data with a spatially informed agent-based disease Susceptible-Exposed-Infectious-Recovered (SEIR) computational model. The integration of genomic and spatial data enables a multi-faceted exploration of the factors influencing the transmission patterns of COVID-19, including genetic variations in the viral genomes, population density, and movement dynamics in New York State (NYS). Our genomic analyses provide insights into the genetic heterogeneity of SARS-CoV-2 within a single lineage, at region-specific resolutions, while our population analyses provide models for SARS-CoV-2 lineage transmission. Together, our findings shed light on localized dynamics of the pandemic, revealing potential cross-county transmission networks. This interdisciplinary approach, bridging genomics and spatial modeling, contributes to a more comprehensive understanding of COVID-19 dynamics. The results of this study have implications for future public health strategies, including guiding targeted interventions and resource allocations to control the spread of similar viruses.
Phylogenetic and spatial–temporal distribution of omicron BA.2.12.1. (A) Geographic introduction and organization of BA.2.12.1 lineage from February 2022 to November 2022, by percentage of SARS-CoV-2 circulating in each county per month. N/A represents counties with no BA.2.12.1 cases sequenced. (B) Phylogenetic clustering of jukes-cantor distance estimations between consensus sequences of 2,737 samples. Lineages on the phylogenetic tree are color-coded by county; Erie County (pink), Monroe County (green), Onondaga County (blue), and Westchester County (chartreuse). (C) Hierarchical clustering of sample-to-sample distance estimation of 2,737 BA.2.12.1 lineages in four counties across NYS, with k-means clustering k = 4.
SEIR model schematic and dynamics. (A) Schematics of SEIR model including general parameter and synthetic population parameter sets, and model initialization and function (B) R0 = 3 Susceptibility, Exposed, Infectious, and Recovered curves based on the introduction of two infected agents, monitored over time. (C) R0 = 5, (D) R0 = 8.
Commuter behavior dynamics in WNY. Estimated commuter populations originating in a specific county. (A) Commuter behavior with Erie County origins. (B) Commuter behavior from Niagara County origin. (C) Commuter behavior from Monroe County origin. (D) Composite Commuter behavior network.

Full Reference: 

Bard, J.E., Jiang, N., Emerson, J., Bartz, M., Lamb, N.A., Marzullo, B.J., Pohlman, A., Boccolucci, A., Nowak, N.J., Yergeau, D.A., Crooks, A.T. and Surtees, J. (2024), Genomic Profiling and Spatial SEIR Modeling of COVID-19 Transmission in Western New York, Frontiers in Microbiology, 15. Available at  https://doi.org/10.3389/fmicb.2024.1416580  (pdf)

Monday, September 23, 2024

Social Simulation Conference (SSC 2024)

Last week I had the honor to give a keynote talk entitled "Exploring the World from the Bottom Up with GIS and Agent-based Models: Past, Present and Future" at the 19th annual Social Simulation Conference which is the European Social Simulation Association (ESSA) annual conference. Attending the conference was a great experience being exposed to various applications of social simulation, catching up with old friends and meeting many new people. For anyone interested below I have pasted the abstract from my talk and the slides from the talk can be found here

Abstract

 We have seen explosion in the availability of data along with utilizing such data in agent-based models. At the same time, we have seen a huge growth in computational power and the associating agent-based models to real world locations through the use of geographical information systems (GIS). This talk will explore how geographically explicit agent-based models have grown and evolved over the last 20 years taking advantage of the explosion of data and computational power. It will showcase a selection of applications of agent-based models and how they can be used to explore the world from the bottom up and with a specific emphasis on cities and regions. Through examples, I will demonstrate how GIS can be used to build agent-based models ranging from using spatial data to create the artificial worlds that the agents inhabit to utilizing demographic data to build synthetic populations. However, it is not just data that is important when building agent-based models but also how do we incorporate human behavior and theory into such models along with considerations of connecting agents through various types of social and spatial networks. While this might appear simple, there are many challenges associated with this which will be discussed using representative examples ranging from basic patterns of life to vaccination uptake. The talk will conclude with what opportunities are emerging in light of the recent growth in artificial intelligence (AI) with respect to building agent-based models. 

 Keywords: Agent-based modeling, AI, GIS, Social Networks, Cities.

Types of Problems Agent-Based Models have Explored
Growth of Geographical Agent-based models.

Referece: 
Crooks, A.T. (2024), Exploring the World from the Bottom Up with GIS and Agent-based Models: Past, Present and Future. The 19th Annual Social Simulation Conference, 16th –20th September, Cracow, Poland.

Wednesday, August 21, 2024

In Silico Human Mobility Data Science

In the past we have wrote about using simulation to build synthetic datasets for trajectory analysis due to the limited availability of real world comprehensive datasets. In relation to this work we  (Andreas Züfle, Dieter Pfoser, Carola Wenk, Hamdi Kavak, Taylor Anderson, Joon-Seok Kim, Nathan Holt, Andrew DiAntonio and myself) have a new vision paper entitled "In Silico Human Mobility Data Science: Leveraging Massive Simulated Mobility Data" published in Transactions on Spatial Algorithms and Systems

In the paper we sketch out a framework  for in silico mobility data science. The rationale being in someway that mobility data alone does not tell us much about why people do what do and to quote from the paper "but imagine a world where we can go back in time to ask people about the purpose of their mobility to understand why an individual visited a place of interest." By building models (aka, agent-based models) we can do just that which therefore allows us to build in silico human mobility data  

To build this argument, in the paper we review existing data sets of individual human mobility and their limitations in terms of size and representativeness. We then survey existing simulation frameworks that generate individual human mobility data and comment on their limitations before presenting our vision of a scalable in silico world that captures realistic human patterns of life and allows us to generate massive datasets as sandboxes for human mobility data science. Building off this we describe a small sample of applications and research directions that would be enabled by such massive individual human mobility datasets if our vision came true.

If this sounds of interest, below we provide the abstract to the paper, some of the figures we use to highlight our argument and our envisioned framework that could exhibit both realistic behavior and realistic movement. Finally at the bottom of the post we provide a reference and a link to the paper itself. As always, any thoughts or comments are most welcome. 

Abstract:

Human mobility data science using trajectories or check-ins of individuals has many applications. Recently, we have seen a plethora of research efforts that tackle these applications. However, research progress in this field is limited by a lack of large and representative datasets. The largest and most commonly used dataset of individual human trajectories captures fewer than 200 individuals while data sets of individual human check-ins capture fewer than 100 check-ins per city per day. Thus, it is not clear if findings from the human mobility data science community would generalize to large populations. Since obtaining massive, representative, and individual-level human mobility data is hard to come by due to privacy considerations, the vision of this paper is to embrace the use of data generated by large-scale socially realistic microsimulations. Informed by both real data and leveraging social and behavioral theories, massive spatially explicit microsimulations may allow us to simulate entire megacities at the person level. The simulated worlds, which do not capture any identifiable personal information, allow us to perform “in silico” experiments using the simulated world as a sandbox in which we have perfect information and perfect control without jeopardizing the privacy of any actual individual. In silico experiments have become commonplace in other scientific domains such as chemistry and biology, permitting experiments that foster the understanding of concepts without any harm to individuals. This work describes challenges and opportunities for leveraging massive and realistic simulated alternate worlds for in silico human mobility data science.

Key Words: Spatial Simulation, Mobility Data Science, Trajectory Data, Location Based Social Network Data, In Silico

The envisioned in silico mobility data science process- (let:) A massive microsimulation is created to simulate realistic human behavior specified by a user through an AI-supported builder tool. (middle:) The microsimulation generates massive datasets, including high-fidelity trajectories of all individuals over years of simulation time. This data, which is 100% accurate and complete (in the simulated world) is then sampled to generate realistic datasets. (right:) These datasets are then used to perform mobility data science tasks in the simulated in silico world as if it was the real world. The results of these tasks can then be compared to the ground truth data (of the simulated in silico world) for validation.

The Patterns of Life Simulation. A video of the simulation can be found at: https://www.youtube.com/watch?v=rP1PDyQAQ5M.
Envisioned framework for a simulation that exhibits both realistic behavior and realistic movement.

Full reference: 

Züfle, A., Pfoser, D., Wenk, C., Crooks, A.T., Kavak, H., Anderson, T., Kim, J-S., Holt, N. and Diantonio, A. (2024), In Silico Human Mobility Data Science: Leveraging Massive Simulated Mobility Data (Vision Paper), Transactions on Spatial Algorithms and Systems (pdf).

Monday, July 01, 2024

Call for Abstracts: Future Map @ AGU


Call for Abstracts! 

At the 2024 American Geophysical Union (AGU) meeting to be held during the 9th to 13th of December in Washington, D.C., Carter Christopher, Wenwen Li, Gautam Thakur and myself are organizing a session entitled: “GC077: Future Map: The Convergence of Generative GeoAI, Population Synthesis, and Agent-Based Modeling to Develop Geographic Futures for Climate Assessments” 

Abstract
The climate community has long developed reliable climate models grounded in trusted Earth systems data and physics, but it has not been until recently that human dynamics and feedbacks have been viewed as a necessary coupling within these models. Including human dynamics within integrated models necessitates a forecasted understanding of human transitions within the landscape. The geospatial science domain has typically not looked forward through simulations. Advances in agent-based modeling, synthetic population generation, and GeoAI/GenAI are presenting new opportunities for generating future-oriented representations of humans landscapes, enabling the development of scenario-specific forecasted datasets, such as synthetic satellite imagery, land cover/land use, the built environment, and more. This session will explore the boundaries of geospatial modeling, data synthesis, and microsimulations for forecasting. Emphasis will be placed on research and studies that show how synthetic forecasted data can enable high fidelity assessments of climate futures and population impacts.

If this sounds of interest and you want to be part of this session, further details can be found at: https://agu.confex.com/agu/agu24/prelim.cgi/Session/229712

Key Thinkers on Space and Place


In the recent edition of Key Thinkers on Space and Place edited by Mary GilmartinPhil Hubbard, Rob Kitchin and Sue Roberts, I was asked to write a chapter about Mike Batty

While I have known Mike for a while, to say writing the chapter was easy, is a understatement. In the sense, we had a word constraint (3,000 words plus references) and trying to sum up his biographical details and theoretical context, his spatial contributions along with his key advances and controversies, and key works was a challenge.  Anyway, if you would like to read a draft of my contribution to the book and my attempt to sum up Mikes work, you can find the reference and the link to the chapter below.

Full reference:  
Crooks, A.T. (2024), Michael Batty, in Gilmartin, M., Hubbard, P., Kitchin, R. and Roberts, S. (eds.), Key Thinkers on Space and Place (3rd edition), Sage, London, UK. pp. 37-43. (pdf)

Friday, June 07, 2024

A comparison of social surveys and social media for vaccine hesitancy

In the past we have explored various ways to explore vaccine hesitancy and keeping with this theme we have a new paper published in PLOS ONE entitled "Understanding the determinants of vaccine hesitancy in the United States: A comparison of social surveys and social media" with Kuleen Sasse, Ron Mahabir, Olga Gkountouna and Arie Croitoru

In the paper we use social, demographic and economic (e.g., US Censusvariables to predict COVID-19 vaccine hesitancy levels in the ten most populous US metropolitan statistical areas (MSAs). By using  machine learning algorithms (e.g., linear regression, random forest regression, and XGBoost regression) we compare a set of baseline models that contain only these variables with models that incorporate survey data and social media (i.e., Twitter) data separately. 

We find that different algorithms perform differently along with variations in influential variables such as age, ethnicity, occupation, and political inclination across the five hesitancy classes (e.g., “definitely get a vaccine”, “probably get a vaccine”, “unsure”, “probably not get a vaccine”, and “definitely not get a vaccine”).   Further, we find that the application of the models to different MSAs yields mixed results, emphasizing the uniqueness of communities and the need for complementary data approaches. But in summary, this paper shows social media data’s potential for understanding vaccine hesitancy, and tailoring interventions to specific communities. If this sounds of interest, below we provide the abstract to the paper along with our mixed methods matrix, data sources used and the results from the various MSAs. At the bottom of the post, you cans see the full reference and the link to the paper so you can read more if you so desire. 

Abstract:
The COVID-19 pandemic prompted governments worldwide to implement a range of containment measures, including mass gathering restrictions, social distancing, and school closures. Despite these efforts, vaccines continue to be the safest and most effective means of combating such viruses. Yet, vaccine hesitancy persists, posing a significant public health concern, particularly with the emergence of new COVID-19 variants. To effectively address this issue, timely data is crucial for understanding the various factors contributing to vaccine hesitancy. While previous research has largely relied on traditional surveys for this information, recent sources of data, such as social media, have gained attention. However, the potential of social media data as a reliable proxy for information on population hesitancy, especially when compared with survey data, remains underexplored. This paper aims to bridge this gap. Our approach uses social, demographic, and economic data to predict vaccine hesitancy levels in the ten most populous US metropolitan areas. We employ machine learning algorithms to compare a set of baseline models that contain only these variables with models that incorporate survey data and social media data separately. Our results show that XGBoost algorithm consistently outperforms Random Forest and Linear Regression, with marginal differences between Random Forest and XGBoost. This was especially the case with models that incorporate survey or social media data, thus highlighting the promise of the latter data as a complementary information source. Results also reveal variations in influential variables across the five hesitancy classes, such as age, ethnicity, occupation, and political inclination. Further, the application of models to different MSAs yields mixed results, emphasizing the uniqueness of communities and the need for complementary data approaches. In summary, this study underscores social media data’s potential for understanding vaccine hesitancy, emphasizes the importance of tailoring interventions to specific communities, and suggests the value of combining different data sources.
Mixed methods matrix showing the data, processing, and model development steps used in our study.

Data sources used in our study.

MSA model performance (Bolded adjusted R2 values represent the best performing model for each modeling technique and MSA).

Monday, June 03, 2024

Skiing and Modeling

Looker room layouts
(Source: Gao et al., 2024)

One of my favorite winter activities is skiing and now that all the skiing places in the North East have closed (for those interested Killington, VT closed last Saturday), I thought it would be interesting to see how people have using various modeling techniques to explore ski areas. While what follows is not a comprehensive list of all the works, these are some that I have come across. If you know more, feel free to leave a comment below. 

Models have ranged form looking at the spatial arrangement  of locker rooms at ski resorts (Gao et al., 2024) to lift lines  (congestion) in places such as  La Plagne in the  French Alps (Poulhès and Mirial, 2017) or the Austrian ski resort of Fanningberg (Heinrich et al., 2023). Others have simulated entire ski areas including lift lines, slopes used etc. (Kappaurer 2022). While Pons et al., (2014) developed an agent based model to see how climate change might impact where skies go. Others have explored how climate change might impact ski areas and their associated water usage for making snow (e.g., Soboll and Schmude 2011). Keeping the climate theme, Revilloud et al., (2013) have used agent-based simulations to simulate snow hight on ski runs based on skiers movements in order to facilitate snow cover management (i.e., reduce the production cost of artificial snow and thus water and energy consumption). Murphy (2021) developed a more simple agent-based model of how skiers might ski durring a powder day and explores the area of terrain they may cover based on ability.  

.Simulation of skiers (source Revilloud et al., 2013)

Similar to some of the other models above, but in light of COVID19, Integrated Insight (2020), a analytics consulting company shows in the movie below how one can use simulation to explore crowd management in the base areas of ski resorts. 



References / papers discussed above:
As noted above, if you know more, feel free to leave a comment below.