Tuesday, November 14, 2023

Massive Trajectory Data Based on Patterns of Life

Following on from the last post, we (Hossein AmiriShiyang RuanJoon-Seok KimHyunjee JinHamdi KavakDieter PfoserCarola Wenk and Andreas Zufle and myself) have a paper in the Data and Resources track at the 2023 ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems entitled "Massive Trajectory Data Based on Patterns of Life".  

This Data and Resources paper introduces readers to a large sets of simulated individual-level trajectory and location-based social network data we have generated from our Urban Life Model (click here to find out more about the model). The data comprises of 4 suburban and urban regions, including 1) the George Mason University Campus area, Fairfax, Virginia, 2) the French Quarter of New Orleans, Louisiana, 3) San Francisco, California, and 4) Atlanta, Georgia. For each of the 4 study regions, we run the simulation with 1K, 3K, 5K, and 10K agents for 15 months of simulation time. We also provide simulations for 10 years and 20 years, having 1K agents for each of the 4 regions of interest. For each dataset, three items are provided: 1) Check-ins, and 2) social network links and 3) trajectory information per agent per five-minute tick. As such we argue in the paper that our datasets are orders of magnitude larger than existing real-world trajectory and location-based social network (LBSN) data sets. 

If this sounds of interest we encourage readers to check out the paper (see the bottom of this post), while the datasets, as well as additional documentation, can be found at OSF (https://osf.io/gbhm8/) and the data generator (model) can be found at https://github.com/azufle/pol.

Abstract: Individual human location trajectory and check-in data have been the driving force for human mobility research in recent years. However, existing human mobility datasets are very limited in size and representativeness. For example, one of the largest and most commonly used datasets of individual human location trajectories, GeoLife, captures fewer than two hundred individuals. To help fill this gap, this Data and Resources paper leverages an existing data generator based on fine-grained simulation of individual human patterns of life to produce large-scale trajectory, check-in, and social network data. In this simulation, individual human agents commute between their home and work locations, visit restaurants to eat, and visit recreational sites to meet friends. We provide large datasets of months of simulated trajectories for two example regions in the United States: San Francisco and New Orleans. In addition to making the datasets available, we also provide instructions on how the simulation can be used to re-generate data, thus allowing researchers to generate the data locally without downloading prohibitively large files.

Full Referece: 

Amiri, H., Ruan, S., Kim, J., Jin, H., Kavak, H., Crooks, A.T., Pfoser, D., Wenk, C. and Züfle, A. (2023), Massive Trajectory Data Generation using a Patterns of Life Simulation, Proceedings of the 2023 ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Hamburg, Germany. (pdf)

Monday, November 13, 2023

Synthetic Geosocial Network Generation

In the past the blog has explored the creation of social networks for models. Keeping with this vain of research, I was fortunate to work with Ketevan GallagherTaylor Anderson and Andreas Züfle to consider the role of location of individuals when generating social networks. This work has resulted in a new paper entitled "Synthetic Geosocial Network Data Generation"  which was presented at the 7th ACM SIGSPATIAL Workshop on Location-based Recommendations, Geosocial Networks and Geoadvertising (LocalRec 2023). If this sounds of interest, below you can read the abstract to the paper, see some the generated geosoical networks and find the full reference and link to the paper. In addition to this, the Python code and data used to generate the networks is available at https://github.com/KetevanGallagher/Synthetic-Geosocial-Networks.

Abstract: Generating synthetic social networks is an important task for many problems that study humans, their behavior, and their interactions. Geosocial networks enrich social networks with location information. Commonly used models to generate synthetic social networks include the classical Erdos-Renyi, Barabasi-Albert, and Watts-Strogatz models. However, these classic social network models do not consider the location of individuals. Real-world geosocial networks do exhibit a strong spatial autocorrelation, thus having a higher likelihood of a social connection between agents that are spatially close. As such, recent variants of the three classical models have been proposed to consider location information. Yet, these existing solutions assume that individuals are located on a uniform lattice and exhibit certain limitations when applied to real-world data that exhibits clusters. In this work, we discuss these limitations and propose new approaches to extend the three classic social network generation models to geosocial networks. Our experiments show that our generated synthetic geosocial networks address the shortcomings of the state-of-the-art models and generate realistic geosocial networks that exhibit high similarity to real-world geosocial networks. 
Keywords: Geosocial Networks, Network Generation, Synthetic Social Networks, Erdos-Renyi, Watts-Strogatz, Barabasi-Albert.

Real- World Geosocial Network using Facebook Social Connectedness Data between Zone Improvement Plan (ZIP) Region Centroids for the State of Virginia, USA.
Geosocial graphs using Virginia ZIP code data.
Graphs using Fairfax Census Tract data.

Full Referece:
Gallagher, K., Anderson, T., Crooks, A.T. and Züfle, A. (2023), Synthetic Geosocial Network Data Generation, Proceedings of the 7th ACM SIGSPATIAL Workshop on Location-based Recommendations, Geosocial Networks and Geoadvertising (LocalRec 2023), Hamburg, Germany. (pdf)

Friday, November 03, 2023

Geographically Synthetic Populations for ABM: A Gallery of Applications

Often we are building geographically explicit agent-based models we spend a lot of time creating the synthetic population to instantiate our artificial world. We have tired to overcome this with creating methods to generate such populations (see this old blog post). Building on this work, Na (Richard) Jiang, Fuzhen Yin, Boyu Wang and myself have a new paper entitled "Geographically-Explicit Synthetic Populations for Agent-based Models: A Gallery of Applications" which was presented at 2023 Computational Social Science Society of the Americas conference. In the paper we extend the synthetic population to the whole of New York state. While at the same time we introduce a pipeline for using the population datasets for model initialization. To show this pipeline, we present several case studies utilizing Python and Mesa. These models range from that of commuting to disease spread and vaccination uptake. If this sounds of interest, below we provide the abstract to the paper along with some of the key figures including our pipeline and example applications. At the bottom of the page we provide the full reference and a link to the paper which has links to the models and data.
Abstract: Over the last two decades, there has been a growth in the applications of geographically-explicit agent-based models. One thing such models have in common is the creation of synthetic populations to initialize the artificial worlds in which the agents inhabit. One challenge such models face is that it is often difficult to create reusable geographically-explicit synthetic populations with social networks. In this paper, we introduce a Python based method that generates a reusable geographically-explicit synthetic population dataset along with its social networks. In addition, we present a pipeline for using the population datasets for model initialization. With this pipeline, multiple spatial and temporal scales of geographically-explicit agent-based models are presented focusing on Western New York. Such models not only demonstrate the utility of our synthetic population on commuting patterns but also how social networks can impact the simulation of disease spread and vaccination uptake. By doing so, this pipeline could benefit any modeler wishing to reuse synthetic populations with realistic geographic locations and social networks. 
Keywords: Agent-Based Model, Geographically-Explicit Agent-Based Models, Synthetic Population, Python, Mesa.
Pipeline of Utilizing Synthetic Population Resulting Datasets in Agent-Based Models.

Large Scale Disease Spread Model Structure.

Disease Dynamics for Two Diseases.

Vaccination Opinion Dynamic Model.

Simulation Vaccination Rate v.s. Real Vaccination Records: (A) All Population; (B) Different Age Groups of Population.

Full Referece: 

Jiang, N., Crooks, A.T., Yin, F. and Wang B. (2023), Geographically-Explicit Synthetic Populations for Agent-based Models: A Gallery of Applications, Proceedings of the 2023 Conference of The Computational Social Science Society of the Americas, Santa Fe, NM. (pdf)

Monday, October 23, 2023

Evaluating the incentive for soil organic carbon sequestration from carinata production

Over the years we have developed several agent-based models that have explored various aspects of farming, ranging from farmers selling their land for development to that of water reuse. Keeping with this theme, we have a new paper with Kazi Ullah and Gbadebo Oladosu in the "Journal of Environmental Management" entitled "Evaluating the incentive for soil organic carbon sequestration from carinata production in the Southeast United States". 
In the paper we developed an agent-based model to evaluate what incentives might be needed for farmers to sequester soil organic carbon (SOC) when adopting a new bioenergy crop namely carinata. We simulated two carinata management scenarios: business as usual and climate-smart (no-till). The model finds that SOC sequestration incentives reduce the seed price needed to reach maximum adoption rates. While incentives lead to higher adoption rates, SOC sequestration, and profitability with no-till farming. 
If this sounds of interest, below you can read the abstract to the paper, get a sense of the agent logic and see some of the results. While at the bottom of the page, you can find the full reference and a link to the paper. The model (created in NetLogo) and data needed to run it is available on Kazi's GitHub page: https://github.com/KaziMaselUllah/Incentive_SOC_Carinata.

Abstract: Soil organic carbon (SOC) can be increased by cultivating bioenergy crops to produce low-carbon fuels, improving soil quality and agricultural productivity. This study evaluates the incentives for farmers to sequester SOC by adopting a bioenergy crop, carinata. Two agricultural management scenarios – business as usual (BaU) and a climate-smart (no-till) practice – were simulated using an agent-based modeling approach to account for farmers’ carinata adoption rates within their context of traditional crop rotations, the associated profitability, influences of neighboring farmers, as well as their individual attitudes. Using the state of Georgia, US, as a case study, the results show that farmers allocated 1056 × 103 acres (23.8%; 2.47 acres is equivalent to 1 ha) of farmlands by 2050 at a contract price of $6.5 per bushel of carinata seeds and with an incentive of $50 Mg−1 CO2e SOC sequestered under the BaU scenario. In contrast, at the same contract price and SOC incentive rate, farmers allocated 1152 × 103 acres (25.9%) of land under the no-till scenario, while the SOC sequestration was 483.83 × 103 Mg CO2e, which is nearly four times the amount under the BaU scenario. Thus, this study demonstrated combinations of seed prices and SOC incentives that encourage farmers to adopt carinata with climate-smart practices to attain higher SOC sequestration benefits.

Keywords: Agent-based model, Bioenergy, Climate-smart agriculture, Soil organic carbon, Incentives, Sustainable aviation fuel.


Process, overview and scheduling of the model

An example simulation output of a model run (SOC incentive = $50 Mg−1 CO2e, Carinata contract price = 6.5, Expanded diffusion, Low initial willingness scenario).

The total number of farmers who adopted carinata over the years for two farming scenarios at five levels of incentives for SOC sequestration and at the four price levels.

The mean land allocation area for four scenarios and their associated standard deviations (error bar).

Full Reference:  

Ullah, K.M., Gbadebo G.A., and Crooks, A.T. (2023), Evaluating the Incentive for Soil Organic Carbon Sequestration from Carinata Production in the Southeast United States, Journal of Environmental Management, 348: 119418. Available at https://doi.org/10.1016/j.jenvman.2023.119418 (pdf)

Wednesday, October 04, 2023

Leveraging newspapers to understand urban issues

In the past, this blog has explored several aspects of Detroit, such as how well its covered with Volunteered Street View Imagery or how through the use of agent-based models one can explore issues with urban shrinkage. Keeping up with the theme of shrinkage and Detroit but at the same time utilizing our growing interest in natural language processing (especially topic modeling) we (Na (Richard) Jiang, Hamdi Kavak, Wenjing Wang and myself) have a new paper entitled "Leveraging newspapers to understand urban issues: A longitudinal analysis of urban shrinkage in Detroit" published in Environment and Planning B

In the paper, we take 6794 English news articles published by national and local press organizations (e.g., Forbes, The New York Times, Newsweek, The Detroit News) between 1975 to 2021 using the keywords “Detroit”, “shrink” and “decline.” These keywords were selected based on the characteristics of the study area (i.e., Detroit) and the phenomenon of urban shrinkage. With these data we then use BERTopic to detect and classify all collected news articles into certain topics. We chose BERTopic because it captures the semantic relationship among words converting sentences and words to embedding and automatically generates the topic unlike other NLP topic modeling techniques (e.g., LDA). Our topic modeling results identify several insights with respect to Detroit's shrinkage. For example, we can detect the side effects of the 2007-2009 economic recession on Detroit's automobile industry, local employment status, and the housing market. If sounds of interest and you want to find out more, below we provide the abstract, some figures from the paper including the methodology workflow and an example of the resulting topics over time. Finally, at the bottom of the page you can see the full reference and s link to the paper itself.


Today we are awash with data, especially when it comes to studying cities from a diverse data ecosystem ranging from demographic to remotely sensed imagery and social media. This has led to the growth of urban analytics providing new ways to conduct quantitative research within cities. One area that has seen significant growth is using natural language processing techniques on text data from social media to explore various issues relating to urban morphology. However, we would argue that social media only provides limited insights when dealing with longer-term urban phenomena, such as the growth and shrinkage of cities. This relates to the fact that social media is a relatively recent phenomenon compared to longer-term urban problems that take decades to emerge. Concerning longer-term coverage, newspapers, which are increasingly becoming digitized, provide the possibility to overcome the limitations of social media and provide insights over a timeframe that social media does not. To demonstrate the utility of newspapers for urban analytics and to study longer-term urban issues, we utilize an advanced topic modeling technique (i.e., BERTopic) on a large number of newspaper articles from 1975 to 2021 to explore urban shrinkage in Detroit. Our topic modeling results reveal insights related to how Detroit shrinks. For example, side effects of 2007 to 2009 economic recessions on Detroit’s automobile industry, local employment status, and the housing market. 

Key Words: Natural Language Processing, Topic Modeling, Newspapers, Urban Shrinkage, Urban Analytics.


 Vacancy status change from 1970 to 2010 for city of Detroit and surrounding area.
Topic modeling work flow.
Topics over time (a) urban, (b) population, (c) shrinkage, (d) economy, (e) job, (f) house.

Full Reference:

Jiang, N., Crooks, A.T., Kavak, H. and Wang, W. (2023), Leveraging Newspapers to Understand Urban Issues: A Longitudinal Analysis of Urban Shrinkage in Detroit, Environment and Planning B. Available at https://doi.org/10.1177/23998083231204695. (pdf)