Thursday, November 07, 2024

A Large-Scale Geographically Explicit Synthetic Population with Social Networks for the US

In numerous posts, we have been discussing synthetic populations and their use in agent-based modeling. But there are many modeling styles that also utilize synthetic populations. In our own work we often spend significant amounts of time creating such synthetic populations, especially those grounded with data, due to the time needed to collect, preprocess and generate the final synthetic population. To alleviate this, we (Na (Richard) JiangFuzhen YinBoyu Wang and myself) have a new paper published in Scientific Data, entitled "A Large-Scale Geographically Explicit Synthetic Population with Social Networks for the United States.Our aim of this paper is to build and provide a geographically explicit synthetic population along with its social networks using open data including that from the latest 2020 U.S. Census which can be used in a variety of geo-simulation models.

Summary of the Resulting Datasets.

Specially, in the paper we outline how we created the a synthetic population of 330,526,186 individuals representing America's 50 states and Washington D.C.. Each individual has a set of geographical locations that represent their home, work or school addresses. Additionally, these individuals are not isolated, they are embedded in a larger social setting based on their household, working and studying relationships (i.e., social networks).

The work (e.g., data collection, data preprocessing and generation processes) was coded using Python 3.12 and all the scripts used are available at: https://github.com/njiang8/geo-synthetic-pop-usa while the resulting datasets (85 GB uncompressed) are available at OSF: https://osf.io/fpnc2/.  

To give you a sense of the paper, below we provide the abstract to it, along with  some results and our efforts to validate the synthetic population. While at the full reference and link to the paper can be found at the bottom of the post. 

Abstract:

Within the geo-simulation research domain, micro-simulation and agent-based modeling often require the creation of synthetic populations. Creating such data is a time-consuming task and often lacks social networks, which are crucial for studying human interactions (e.g., disease spread, disaster response) while at the same time impacting decision-making. We address these challenges by introducing a Python based method that uses the open data including that from 2020 U.S. Census data to generate a large-scale realistic geographically explicit synthetic population for America's 50 states and Washington D.C. along with the stylized social networks (e.g., home, work and schools). The resulting synthetic population can be utilized within various geo-simulation approaches (e.g., agent-based modeling), exploring the emergence of complex phenomena through human interactions and further fostering the study of urban digital twins.

Keywords: Synthetic Population, U.S. Census 2020, Agent-Based Modeling, Geo-Simulation, Social Networks.

Data Generation Workflow and Resulting Datasets.

A Sample of a Social Networks for one Household and their Home, Work and Educational Social Networks from the Generated Data.

Sample of Generated Social Networks Extracted from the City of Buffalo, New York: (a) Household; (b) Work; (c) School; (d) Daycare.

Validation of the Synthetic Population at Different Levels: (a) Population under Different 18 Age Groups; (b) Household under Different Household Types.

Full Referece: 

Jiang, N., Yin, F., Wang., B. and Crooks, A.T., (2024), A Large-Scale Geographically Explicit Synthetic Population with Social Networks for the United States, Scientific Data, 11, 1204. https://doi.org/10.1038/s41597-024-03970-1 (pdf)




Friday, November 01, 2024

Pattern of Life Human Mobility Simulation (Demo)

While in the past we have written about how we can use agent-based models to capture basic patterns of life, and even developed a simulations, but until now we have never really demonstrated how we go about this. However, at the  SIGSPATIAL 2024 conference  we (Hossein Amiri, Will Kohn, Shiyang Ruan, Joon-Seok Kim, Hamdi Kavak, Dieter Pfoser, Carola Wenk, Andreas Zufle and myslf) have a demonstration paper entitled "The Pattern of Life Human Mobility Simulation." in which we show: 

  1. How to run the Patterns of Life Simulation with the graphical user interface (GUI) to visually explore the mobility patterns of a region.
  2. How to run the Patterns of Life Simulation headless (without GUI) for large-scale data generation.
  3. How to adapt the simulation to any region in the world using OpenStreetMap data,
  4. Showcase how recent scalability improvements allow us to simulate hundreds of thousands of agents.

If this sounds of interest, below we show the GUI to the model, along with the steps to generate a trajectory dataset or a new map for the simulation. At the bottom of the post you can actually see the papers full reference and a link to download it. While at https://github.com/onspatial/generate-mobility-dataset you can find the source code for the enhanced simulation and data-processing tools for you to experiment with.

Abstract: 

We demonstrate the Patterns of Life Simulation to create realistic simulations of human mobility in a city. This simulation has recently been used to generate massive amounts of trajectory and check-in data. Our demonstration focuses on using the simulation twofold: (1) using the graphical user interface (GUI), and (2) running the simulation headless by disabling the GUI for faster data generation. We further demonstrate how the Patterns of Life simulation can be used to simulate any region on Earth by using publicly available data from OpenStreetMap. Finally, we also demonstrate recent improvements to the scalability of the simulation allows simulating up to 100,000 individual agents for years of simulation time. During our demonstration, as well as offline using our guides on GitHub, participants will learn: (1) The theories of human behavior driving the Patters of Life simulation, (2) how to simulate to generate massive amounts of synthetic yet realistic trajectory data, (3) running the simulation for a region of interest chosen by participants using OSM data, (4) learn the scalability of the simulation and understand the properties of generated data, and (5) manage thousands of parallel simulation instances running concurrently.

Keywords: Patterns of Life, Simulation, Trajectory, Dataset, Customization

A screenshot of the graphical user interface of the Patterns of Life Simulation. The GUI shows the map and the movements of agents on the left side and the social network of agents and their statistical properties on the right side. 

Steps to generate the one trajectory dataset.
Steps to generate a new map for the simulation.

Full referece: 

Amiri, H., Kohn, W., Ruan, S., Kim, J-S., Kavak, H., Crooks, A.T., Pfoser, D., Wenk, C. and Zufle, A. (2024) The Pattern of Life Human Mobility Simulation (Demo Paper), ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Atlanta, GA. (pdf)