Monday, June 01, 2026

Evaluating the Feasibility of ChatGPT for Mapping Building Attributes

In the past we have written about using Multimodal Large Language Models (MLLMs)  like  ChatGPT for coding of models and also  analyzing images. One advantage we see for MLLMs is that unlike traditional approaches that require extensive expertise in computational analysis, such as computer vision and deep learning, MLLMs leverage pre-trained capabilities that simplify the analytical process. This accessibility enables a larger group of researchers to incorporate MLLMs in their analyses when it comes to studying the form and function of cities at scale. To this end, we (Qingqing Chen, Linda See and myself) have new book chapter entitled "Evaluating the Feasibility of ChatGPT for Mapping Building Attributes" published in the open access book: "Geography According to Foundation Models" edited by  Krzysztof Janowicz, Rui Zhu, GengchenMai, Song GaoYingjie Hu, Zhangyu Wang, Ling Cai and Lauren Bennett.

In this chapter we evaluate the potential of MLLMs, in our case ChatGPT, to extract building attributes (e.g., age, use and height) from Mapillary street view images.  We find that ChatGPT was good at extracting some information and less good in other cases. For example it identified correctly 87% of the residential buildings. We also discuss ways to improve the results (e.g., using higher quality street view images, altering and refining the prompts). If you wish to find out more about our findings we encourage you to read the chapter. To give you a better sense of this research, below we provide the abstract to the paper, our case study area along with our workflow and a sample of the results. Finally at the bottom of the post, you can find the full reference to the chapter along with a link to it. 
 
Abstract:
With increasing rates of urbanization, many challenges are emerging regarding urban sustainability such as the energy usage of buildings. Coinciding with this is the growing attention of urban climate models for energy demand estimation and climate adaptation strategies. However, the applicability of these models is constrained by the lack of detailed urban surface information. Therefore, creating comprehensive datasets that capture urban surface information at a granular scale is crucial for responding to our rapidly urbanizing world. Recent advancements in Multimodal Large Language Model (MLLMs) have opened new opportunities in urban studies, offering accessible methods for information extraction. In this chapter we explore the feasibility of ChatGPT to extract building attributes from images. Taking New York City as a case study, we collect building images from Street View Imagery and process them through ChatGPT by posing specific questions to extract building attributes (e.g., height, functions, age). These attributes are then compared with authoritative data. The proposed method helps address the current dearth of fine-grained surface data on urban issues, therefore enhancing the accuracy and utility of urban climate models. Overall, this study demonstrates the practical applications of ChatGPT in geographic knowledge extraction, advancing the understanding of MLLMs in geographic contexts, and more broadly to the discourse on Artificial Intelligence (AI) in urban modeling and climate science.
The spatial distribution of Mapillary images within the study area, shown on the left, and the distribution of images by variance showing increasing image quality on the right.
An overview of the research workflow.
Comparison of the building period of construction from the ground truth data and the classifications from ChatGPT. (a) A confusion matrix which details the distribution of buildings classified within each period by ChatGPT compared to the ground truth data; (b) A chord diagram illustrating the patterns of agreement and confusion among the categories.
Comparison of building type classifications. (a) A confusion matrix detailing the distribution of ChatGPT’s classifications against the hand labels from experts; (b) A chord diagram illustrating the proportion of classifications for each building types as labeled by experts compared to ChatGPT’s classifications.
A comparison of building heights from ChatGPT and the NYC Open Data. (a) The correlation of height between the ground truth and ChatGPT; (b) The distribution of ground truth heights and the predicted heights; (c) The difference in the heights.

Full Reference:
Chen, Q., See, L. and Crooks, A.T. (2026), Evaluating the Feasibility of ChatGPT for Mapping Building Attributes,  in Janowicz, K., Zhu, R., Mai, G., Gao, S., Hu, Y., Wang, Z., Cai, L., and Bennett, L. (eds), Geography According to Foundation Models, IOS Press, Amsterdam, The Netherlands, pp. 107-120. (pdf)