Thursday, November 06, 2025

HD-GEN: A Software System for Large-Scale Human Mobility Data Generation Based on Patterns of Life


 
Human mobility datasets are essential for investigating human behavior, mobility patterns, and traffic dynamics.  In the past we have written about how one can use agent-based models to generate patterns of life trajectories datasets. Building on this work at the ACM SIGSPATIAL 2025 conference, we (Hossein AmiriRichard YangShiyang RuanJoon-Seok KimHamdi KavakAndrew Crooks,  Dieter Pfoser,  Carola Wenk and Andreas Züfle) had a paper entitled "HD-GEN: A Software System for Large-Scale Human Mobility Data Generation Based on Patterns of Life"

In this paper, we extend our previous work by introducing a software system that provides a new suite of tools built on top of the Patterns of Life simulation framework. Specifically this work consolidates our contributions into a unified data generation pipeline that includes:

  1. additional discussion of the motivation and applications of large-scale simulated trajectory data, 
  2. detailed instructions on running the simulation and generating datasets, 
  3. extended analysis of the shared dataset, and 
  4. an integrated GitHub repository

The proposed system enables large-scale synthetic dataset generation, either by statistically replicating real-world data or by creating datasets with user-defined properties. If this sounds of interest, below you can read the abstract to the paper, the poster that accompanies it and we have also provided detailed instructions on how to reproduce the generated datasets, and made the code and data available at https://github.com/onspatial/large-scale-dataset-generator.

Abstract

Understanding individual human mobility is critical for a wide range of applications. Real-world trajectory datasets provide valuable insights into actual movement behaviors but are often constrained by data sparsity and participant bias. Synthetic data, by contrast, offer scalability and flexibility but frequently lack realism. To address this gap, we introduce a comprehensive software pipeline for generating, calibrating, and processing large-scale human mobility datasets that integrate the realism of empirical data with the control and extensibility of Patterns-of-Life simulations. Our system consists of three integrated components. First, a genetic algorithm–based calibration module fine-tunes simulation parameters to align with real-world mobility characteristics, such as daily trip counts and radius of gyration, enabling realistic behavioral modeling. Second, a data generation engine constructs geographically grounded simulations using OpenStreetMap data to produce diverse mobility logs. Third, a data processing suite transforms raw simulation logs into structured formats suitable for downstream applications, including model training and benchmarking. 

Keywords: GeoLife, Patterns of Life, Simulation, Realistic Trajectory Datasets

Dataset creation phases with HD-GEN software.

Full Reference: 

Hossein, A., Yang, R.,  Ruan, S., Kim, J-S., Kavak, H., Crooks, A.T., Pfoser, D., Wenk, C. and Züfle, A., (2025). HDGEN: A Software System for Large-Scale Human Mobility Data Generation Based on Patterns of Life. In The 33rd ACM International Conference on Advances in Geographic Information Systems (SIGSPATIAL ’25), November 3–6, 2025, Minneapolis, MN. (pdf) (poster)

No comments: