Thursday, April 11, 2024

Addressing equifinality in agent-based modeling

In the past we have blogged about the challenges of agent-based modeling but one thing we have not written much about is the challenge of uncertainty especailly when it comes to model calibration. This uncertainty is a challenge when it when it comes to situations where various parameter sets fit observed data equally well. This is known as equifinality which is a principle or phenomenon in system theory that implies that different paths can lead to the same final state or outcome. 

In a new paper with paper with Moongi Choi, Neng Wan, Simon Brewer, Thomas Cova and Alexander Hohl entitled "Addressing Equifinality in Agent-based Modeling: A Sequential Parameter Space Search Method Based on Sensitivity Analysis" we explore this issue. More specifically we introduce an Sequential Parameter Space Search (SPS) algorithm to confront the equifinality challenge in calibrating fine-scale agent-based simulations with coarse-scale observed geospatial data, ensuring accurate model selection using a pedestrian movement simulation as a test case.  

If this sounds of interest and you want to find out more, below you can read the abstract to the paper, see the logic of our simulation and some of the results. At the bottom of the page, you can find a link to the paper along with its full reference. Furthermore, Moongi has made the data and codes for indoor pedestrian movement simulation and Sequential Parameter Space search algorithm openly available at https://zenodo.org/doi/10.5281/zenodo.10815211 and https://zenodo.org/doi/10.5281/zenodo.10815195.

Abstract 

This study addresses the challenge of equifinality in agent-based modeling (ABM) by introducing a novel sequential calibration approach. Equifinality arises when multiple models equally fit observed data, risking the selection of an inaccurate model. In the context of ABM, such a situation might arise due to limitations in data, such as aggregating observations into coarse spatial units. It can lead to situations where successfully calibrated model parameters may still result in reliability issues due to uncertainties in accurately calibrating the inner mechanisms. To tackle this, we propose a method that sequentially calibrates model parameters using diverse outcomes from multiple datasets. The method aims to identify optimal parameter combinations while mitigating computational intensity. We validate our approach through indoor pedestrian movement simulation, utilizing three distinct outcomes: (1) the count of grid cells crossed by individuals, (2) the number of people in each grid cell over time (fine grid) and (3) the number of people in each grid cell over time (coarse grid). As a result, the optimal calibrated parameter combinations were selected based on high test accuracy to avoid overfitting. This method addresses equifinality while reducing computational intensity of parameter calibration for spatially explicit models, as well as ABM in general. 

Keywords: Agent-based modeling equifinality calibration sequential calibration approach sensitivity analysis.

Detail model structures and process of the simulation.
Pedestrian simulation ((a) Position by ID, Grouped proportion – (b) 0.1, (c) 0.5, (d) 0.9).
Multiple sub-observed data ((a) # grid cells passed by each individual, (b) # individuals in 1x1 grid, (c) # individuals in 2x2 grid cells).

Validation results with train and test dataset ((a) Round 1, (b) Round 2, (c) Round 3).

Full Reference: 

Choi, M., Crooks, A.T., Wan, N., Brewer, S., Cova, T.J. and Hohl, A. (2024), Addressing Equifinality in Agent-based Modeling: A Sequential Parameter Space Search Method Based on Sensitivity Analysis, International Journal of Geographical Information Science. https://doi.org/10.1080/13658816.2024.2331536. (pdf)

Wednesday, March 27, 2024

Community resilience to wildfires: A network analysis approach by utilizing human mobility data

Quantifying community resilience especially after a disaster is an open research challenge. However, with the growth in mobility datasets such as SafeGraph we are being given new opportunities to study how communities rebound from disaster.  

To this end, in a new paper with Qingqing Chen and Boyu Wang entitled "Community resilience to wildfires: A network analysis approach by utilizing human mobility data" which was published in Computers, Environment and Urban Systems we develop a framework to quantify resilience after a disaster using network analysis. To showcase this framework we us a human mobility data associated with two wildfires (Mendocino Complex and Camp wildfires) in California and measure the robustness and vulnerability of different communities over time. 

Our results show community resilience is closely tied to socio-economic and built environmental traits of the affected areas and as such our approach paves a way to study disasters and their long-term impacts on society. If this sounds of interest, below you can read the abstract to the paper, see some of the figures we use to explain and demonstrate our approach, while at the end of the post you can find the full reference along with a link to the paper. 

Abstract:
Disasters have been a long-standing concern to societies at large. With growing attention being paid to resilient communities, such concern has been brought to the forefront of resilience studies. However, there is a wide variety of definitions with respect to resilience, and a precise definition has yet to emerge. Moreover, much work to date has often focused only on the immediate response to an event, thus investigating the resilience of an area over a prolonged period of time has remained largely unexplored. To overcome these issues, we propose a novel framework utilizing network analysis and concepts from disaster science (e.g., the resilience triangle) to quantify the long-term impacts of wildfires. Taking the Mendocino Complex and Camp wildfires - the largest and most deadly wildfires in California to date, respectively - as case studies, we capture the robustness and vulnerability of communities based on human mobility data from 2018 to 2019. The results show that demographic and socioeconomic characteristics alone only partially capture community resilience, however, by leveraging human mobility data and network analysis techniques, we can enhance our understanding of resilience over space and time, providing a new lens to study disasters and their long-term impacts on society.

Keywords: Wildfire, Community resilience, Network analysis, Resilience triangle, Human mobility data.   

Resilience triangle. (a) The original resilience triangle (adapted from Bruneau et al., 2003); (b) The modified resilience triangle used in this study.

An overview of the research outline.
The zoomed in study areas of the two wildfires, where the blue areas highlight the Census Block Groups; (b) The spatial distribution of wildfire density from 2005 to 2022; (c) The distribution of annual wildfires and acres in the U.S.
The distribution of degree centrality for each census block group colored by different clusters. (a) The Camp wildfire; (b) The Mendocino Complex wildfire.
The results of resilience triangles of clustered CBGs and resilience features. (a) The determined resilience triangles of clustered CBGs for Camp wildfire; (b) The determined resilience triangles of clustered CBGs for Mendocino Complex wildfire; (c) Vulnerability of CBGs within the two wildfire areas; (d) Robustness of CBGs within the two wildfires.

Full Reference: 
Chen, Q., Wang, B. and Crooks, A.T. (2024), Community Resilience to Wildfires: A Network Analysis Approach by Utilizing Human Mobility Data, Computers, Environment and Urban System, 110: 102110. (pdf)

Monday, February 19, 2024

Exploring the New Frontier of Information Extraction through Large Language Models in Urban Analytics

Over the last year or so there has been a lot of hype about artificial intelligence (AI) and Large Language Models (LLMs) in particular, such as Generative Pre-trained Transformers (GPT) like ChatGPT. In a recent editorial in Environment and Planning B written by Qingqing Chen and myself we discussed how LLMs could be used for lower the barrier for researchers wishing to study urban problems through the lens of urban analytics. For example, analyzing street view images in the past required training and segmentation of such data which a time consuming and a rather technical task. But what can be done using ChatGPT? To test this we provided ChatGPT some images from Flickr and Mapillary: 

Examples of using ChatGPT for extracting information from imagery.

And then asked it some questions and we were quite amazed by the answers:  

Examples questions and responses when using ChatGPT for extracting information from imagery.

If this sounds of interest I encourage you to read the editorial and think how you could leverage LLMs for your own research. 

Full Reference: 

Crooks A.T. and Chen, Q (2024), Exploring the New Frontier of Information Extraction through Large Language Models in Urban Analytics, Environment and Planning B. Available at https://doi.org/10.1177/23998083241235495. (pdf)

Tuesday, December 19, 2023

Crowdsourcing Dust Storms in the United States Utilizing Flickr

In the past on this site we have written about how one can use social media to study the world around us. Often the focus has been on Twitter but that is not the only social media platform available.  Another is Flickr, and while in past posts have show how we can use this platform to explore bird sightings, wildfires and human migration we are now turning our attention to other phenomena. One of which is dust storms. Working with Festus Adegbola and Stuart  Evans we have just presented a poster at the 2023 American Geophysical Union Fall Meeting entitled "Crowdsourcing Dust Storms in the United States Utilizing Flickr"

In this research we compare Flickr images with National Weather Service  advisories and the VIIRS Deep Blue aerosol product data from the Suomi-NPP satellite. Our preliminary findings show that Flickr images of dust storms have a substantial co-occurrence with regions of NWS blowing dust advisories. If this sounds of interest, below you can read our abstract, see our workflow and the poster itself. 

Abstract

Dust storms are natural phenomena characterized by strong winds carrying large amounts of fine particles which have significant environmental and human impacts. Previous studies have limitations due to available data, especially regarding short-lived, intense dust storms that are not captured by observing stations and satellite instruments. In recent years, the advent of social media platforms has provided a unique opportunity to access a vast amount of user-generated data. This research explores the utilization of Flickr data to study dust storm occurrences within the United States and their correlation with National Weather Service (NWS) advisories. The work ascertains the reliability of using crowdsourced data as a supplementary tool for dust storm monitoring. Our analysis of Flickr metadata indicates that the Southwest is most susceptible to dust storm events, with Arizona leading in the highest number of occurrences. On the other hand, the Great Plains show a scarcity of Flickr data related to dust storms, which can be attributed to the sparsely populated nature of the region. Furthermore, seasonal analysis reveals that dust storm events are prevalent during the Summer months, specifically from June to August, followed by Spring. These results are consistent with previous studies of dust occurrence in the US, and Flickr-identified images of dust storms show substantial co-occurrence with regions of NWS blowing dust advisories. This research highlights the potential of unconventional user-generated data sources to crowdsource environmental monitoring and research.

Data collection and workflow.
Distribution of Flickr identified dust storm occurrences and NWS dust storm advisories.

Full Reference: 

Adegbola, F., Crooks, A.T. and Evans, S. (2023), Crowdsourcing Dust Storms in the United States Utilizing Flickr, American Geophysical Union (AGU) Fall Meeting, 11th – 15th December, San Francisco, CA. (abstract, poster)

Tuesday, November 14, 2023

Massive Trajectory Data Based on Patterns of Life

Following on from the last post, we (Hossein AmiriShiyang RuanJoon-Seok KimHyunjee JinHamdi KavakDieter PfoserCarola Wenk and Andreas Zufle and myself) have a paper in the Data and Resources track at the 2023 ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems entitled "Massive Trajectory Data Based on Patterns of Life".  

This Data and Resources paper introduces readers to a large sets of simulated individual-level trajectory and location-based social network data we have generated from our Urban Life Model (click here to find out more about the model). The data comprises of 4 suburban and urban regions, including 1) the George Mason University Campus area, Fairfax, Virginia, 2) the French Quarter of New Orleans, Louisiana, 3) San Francisco, California, and 4) Atlanta, Georgia. For each of the 4 study regions, we run the simulation with 1K, 3K, 5K, and 10K agents for 15 months of simulation time. We also provide simulations for 10 years and 20 years, having 1K agents for each of the 4 regions of interest. For each dataset, three items are provided: 1) Check-ins, and 2) social network links and 3) trajectory information per agent per five-minute tick. As such we argue in the paper that our datasets are orders of magnitude larger than existing real-world trajectory and location-based social network (LBSN) data sets. 

If this sounds of interest we encourage readers to check out the paper (see the bottom of this post), while the datasets, as well as additional documentation, can be found at OSF (https://osf.io/gbhm8/) and the data generator (model) can be found at https://github.com/azufle/pol.

Abstract: Individual human location trajectory and check-in data have been the driving force for human mobility research in recent years. However, existing human mobility datasets are very limited in size and representativeness. For example, one of the largest and most commonly used datasets of individual human location trajectories, GeoLife, captures fewer than two hundred individuals. To help fill this gap, this Data and Resources paper leverages an existing data generator based on fine-grained simulation of individual human patterns of life to produce large-scale trajectory, check-in, and social network data. In this simulation, individual human agents commute between their home and work locations, visit restaurants to eat, and visit recreational sites to meet friends. We provide large datasets of months of simulated trajectories for two example regions in the United States: San Francisco and New Orleans. In addition to making the datasets available, we also provide instructions on how the simulation can be used to re-generate data, thus allowing researchers to generate the data locally without downloading prohibitively large files.

Full Referece: 

Amiri, H., Ruan, S., Kim, J., Jin, H., Kavak, H., Crooks, A.T., Pfoser, D., Wenk, C. and Z├╝fle, A. (2023), Massive Trajectory Data Generation using a Patterns of Life Simulation, Proceedings of the 2023 ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Hamburg, Germany. (pdf)