In the past we have written a number of posts on synthetic populations, however, one thing we have not done is compare the various techniques that can be used to create them. This has now changed with a new paper entitled "Quantitative Comparison of Population Synthesis Techniques" which was recently presented at the 2025 Winter Simulation Conference.
In this paper, we (David Han, Samiul Islam, Taylor Anderson, Hamdi Kavak and myself) investigate five synthetic population generation techniques (e.g., Iterative Proportional Fitting, Conditional Probabilities, Simple Random Sampling, Hill Climbing and Simulated Annealing) in parallel to synthesize population data for different North America settings (e.g., Fairfax County, VA, USA and Metro Vancouver, BC, Canada). Our findings suggest that while iterative proportional fitting and conditional probabilities techniques perform best, it also suggests at the same time that it is important to consider the basis of choosing certain methods over others for generating synthetic populations with regard to a geographic domain.
If this sounds of interest, below you can read the abstract to the paper, see some of the figures and tables that support our discussion. While at the bottom of the post you can find the full referece and a link to the paper. Moreover, in an effort to allow for reproducible science, all code and data are available to interested readers in our GitHub repository located at https://github.com/kavak-lab/synthetic-pop-comparison.
Abstract
Synthetic populations serve as the building blocks for predictive models in many domains, including transportation, epidemiology, and public policy. Therefore, using realistic synthetic populations is essential in these domains. Given the wide range of available techniques, determining which methods are most effective can be challenging. In this study, we investigate five synthetic population generation techniques in parallel to synthesize population data for various regions in North America. Our findings indicate that iterative proportional fitting (IPF) and conditional probabilities techniques perform best in different regions, geographic scales, and with increased attributes. Furthermore, IPF has lower implementation complexity, making it an ideal technique for various population synthesis tasks. We documented the evaluation process and shared our source code to enable further research on advancing the field of modeling and simulation.
![]() |
| A conceptual depiction of the IPF process for population synthesis. |
![]() |
| Our four-step process used in this study. |
![]() |
| Average R2 values by geographic level and method (standard deviations in italics). |
![]() |
| % Total absolute error (% TAE) comparison by attribute for Fairfax County. |
Han, D., Islam, S., Anderson, T., Crooks, A.T. and Kavak, H. (2025), Quantitative Comparison of Population Synthesis Techniques, in Azar, E., Djanatliev, A., Harper, A., Kogler, C., Ramamohan, V., Anagnostou, A. and Taylor, S.J.E. (eds.), Proceedings of the 2025 Winter Simulation Conference, Seattle, WA, ACM. (pdf)





No comments:
Post a Comment