In the past we have written about using Multimodal Large Language Models (MLLMs) like ChatGPT for coding of models and also analyzing images. One advantage we see for MLLMs is that unlike traditional approaches that require extensive expertise in computational analysis, such as computer vision and deep learning, MLLMs leverage pre-trained capabilities that simplify the analytical process. This accessibility enables a larger group of researchers to incorporate MLLMs in their analyses when it comes to studying the form and function of cities at scale. To this end, we (Qingqing Chen, Linda See and myself) have new book chapter entitled "Evaluating the Feasibility of ChatGPT for Mapping Building Attributes" published in the open access book: "Geography According to Foundation Models" edited by Krzysztof Janowicz, Rui Zhu, GengchenMai, Song Gao. Yingjie Hu, Zhangyu Wang, Ling Cai and Lauren Bennett.
In this chapter we evaluate the potential of MLLMs, in our case ChatGPT, to extract building attributes (e.g., age, use and height) from Mapillary street view images. We find that ChatGPT was good at extracting some information and less good in other cases. For example it identified correctly 87% of the residential buildings. We also discuss ways to improve the results (e.g., using higher quality street view images, altering and refining the prompts). If you wish to find out more about our findings we encourage you to read the chapter. To give you a better sense of this research, below we provide the abstract to the paper, our case study area along with our workflow and a sample of the results. Finally at the bottom of the post, you can find the full reference to the chapter along with a link to it.
Abstract:
With increasing rates of urbanization, many challenges are emerging regarding urban sustainability such as the energy usage of buildings. Coinciding with this is the growing attention of urban climate models for energy demand estimation and climate adaptation strategies. However, the applicability of these models is constrained by the lack of detailed urban surface information. Therefore, creating comprehensive datasets that capture urban surface information at a granular scale is crucial for responding to our rapidly urbanizing world. Recent advancements in Multimodal Large Language Model (MLLMs) have opened new opportunities in urban studies, offering accessible methods for information extraction. In this chapter we explore the feasibility of ChatGPT to extract building attributes from images. Taking New York City as a case study, we collect building images from Street View Imagery and process them through ChatGPT by posing specific questions to extract building attributes (e.g., height, functions, age). These attributes are then compared with authoritative data. The proposed method helps address the current dearth of fine-grained surface data on urban issues, therefore enhancing the accuracy and utility of urban climate models. Overall, this study demonstrates the practical applications of ChatGPT in geographic knowledge extraction, advancing the understanding of MLLMs in geographic contexts, and more broadly to the discourse on Artificial Intelligence (AI) in urban modeling and climate science.
![]() |
| The spatial distribution of Mapillary images within the study area, shown on the left, and the distribution of images by variance showing increasing image quality on the right. |
![]() |
| An overview of the research workflow. |
![]() |
| Comparison of the building period of construction from the ground truth data and the classifications from ChatGPT. (a) A confusion matrix which details the distribution of buildings classified within each period by ChatGPT compared to the ground truth data; (b) A chord diagram illustrating the patterns of agreement and confusion among the categories. |
Full Reference:
Chen, Q., See, L. and Crooks, A.T. (2026), Evaluating the Feasibility of ChatGPT for Mapping Building Attributes, in Janowicz, K., Zhu, R., Mai, G., Gao, S., Hu, Y., Wang, Z., Cai, L., and Bennett, L. (eds), Geography According to Foundation Models, IOS Press, Amsterdam, The Netherlands, pp. 107-120. (pdf)





