Thursday, February 06, 2025

From print to perspective: A mixed-method analysis of the convergence and divergence of COVID-19 topics in newspapers and interviews

In previous posts we have noted how one can explore urban issues through newspapers, while at the same time we have used social media to explore trends in vaccinations. In a recently published paper in PLOS Digital Health entitled "From print to perspective: A mixed-method analysis of the convergence and divergence of COVID-19 topics in newspapers and interviews" with Qingqing Chen, Adam Sullivan, Jennifer Surtees, Laurene Tumiel-Berhalter and myself, we thought we would explore how COVID-19 was reported in newspapers and how this varied from interviews. 

The rationale behind this was that the COVID-19 pandemic has led to diverse experiences influenced by public health measures like lockdowns and social distancing. To explore these dynamics, we introduce a novel ’big-thick’ data approach that integrates extensive U.S. newspaper data with detailed interviews. By employing natural language processing (NLP) and geoparsing techniques, we identify key topics related to the pandemic and vaccinations both in newspapers and personal narratives from interviews, and compare the (spatial) convergences and divergences between them. 

We found that both sources converge to highlight the profound impacts of the pandemic on daily life. However, newspapers provide a macro-level perspective, predominately covering policy, public health efforts and economics, while interviews reveal the nuanced impacts at the micro-level, focusing on personal experiences, emotion and concerns. An intriguing finding is the pronounced concern regarding the reliability of news information from interviews. By showcasing both convergences and divergences in identified topics, our study enhances the understanding of key issues that both disseminated to and resonate with the public, contributing to the development of more effective communication strategies for future public health crises.

If this sounds of interest, below you can read the abstract to the paper, see some of the figures which include our workflow and some of the results. At the bottom of the post you can see the full reference and a link to the actual paper. While at https://figshare.com/s/339b1c0d059c189dd6a4?file=44583661 you can find the code we used for our analysis. 

Abstract:

In the face of the unprecedented COVID-19 pandemic, various government-led initiatives and individual actions (e.g., lockdowns, social distancing, and masking) have resulted in diverse pandemic experiences. This study aims to explore these varied experiences to inform more proactive responses for future public health crises. Employing a novel “big-thick” data approach, we analyze and compare key pandemic-related topics that have been disseminated to the public through newspapers with those collected from the public via interviews. Specifically, we utilized 82,533 U.S. newspaper articles from January 2020 to December 2021 and supplemented this “big” dataset with “thick” data from interviews and focus groups for topic modeling. Identified key topics were contextualized, compared and visualized at different scales to reveal areas of convergence and divergence. We found seven key topics from the “big” newspaper dataset, providing a macro-level view that covers public health, policies and economics. Conversely, three divergent topics were derived from the “thick” interview data, offering a micro-level view that focuses more on individuals’ experiences, emotions and concerns. A notable finding is the public’s concern about the reliability of news information, suggesting the need for further investigation on the impacts of mass media in shaping the public’s perception and behavior. Overall, by exploring the convergence and divergence in identified topics, our study offers new insights into the complex impacts of the pandemic and enhances our understanding of key issues both disseminated to and resonating with the public, paving the way for further health communication and policy-making.
An overview of the research workflow.

The monthly distribution of collected articles in the United States from January 2020 to December 2021.

An example of identified entities labeled with predefined entity types.

The spatial distribution of newspaper articles by different scales.


The spatial distribution of identified newspaper topics across different regions in New York State.

Ordered rank of identified topics by percentage from interviews.

Full reference:
Chen, Q., Crooks, A.T., Sullivan, A.J., Surtees, J.A. and Tumiel-Berhalter, L. (2025). From Print to Perspective: A mixed-method analysis of the convergence and divergence of COVID-19 topics in newspapers and interviews, PLOS Digital Health. Available at https://doi.org/10.1371/journal.pdig.0000736. (pdf)

Friday, January 31, 2025

New Directions in Mapping the Earth’s Surface with Citizen Science and Generative

In previous posts, we have written how large language models (LLMs) like ChatGPT can be used in various urban analytical applications. We have kept exploring this potential especially with respect to citizen science applications. To this end we have just published a new paper in iScience, entitled "New Directions in Mapping the Earth’s Surface with Citizen Science and Generative AI". In the paper, lead by Linda See, we discuss how multi-modal LLMs (MLLMs) which are like LMMs but can take different forms of inputs (e.g., text, images, video) and output multi-modal information (e.g., take an image and output a description) could be leveraged to enhance citizen science land cover/land use mapping campaigns. If this sounds of interest, below you can read the abstract to the paper, see some of the figures we use to build our argument, while at the bottom of the post you can see the full reference and a link to the actual paper.
Abstract: 
As more satellite imagery has become openly available, efforts in mapping the Earth’s surface have accelerated. Yet the accuracy of these maps is still limited by the lack of in-situ data needed to train machine learning algorithms. Citizen science has proven to be a valuable approach for collecting in-situ data through applications like Geo-Wiki and Picture Pile, but better approaches for optimizing volunteer time are still required. Although machine learning is being used in some citizen science projects, advances in generative Artificial Intelligence (AI) are yet to be fully exploited. This paper discusses how generative AI could be harnessed for land cover/land use mapping by enhancing citizen science approaches with multi-modal large language models (MLLMs), including improvements to the spatial awareness of AI.
Visual interpretation tasks undertaken by ChatGPT for (a) a wetland/mangrove landscape in South America (b) an agricultural area in central Europe.
Visual interpretation tasks undertaken by ChatGPT for identification of natural and non-natural ecosystems where ChatGPT misclassified the images as non-natural for locations in (a) Chad and (b) Austria. In (c), the image from Colombia was classified as unsure by validators but natural by ChatGPT.
Integrating multi-modal Large Language Models (MLLMs) in a citizen science visual interpretation workflow.
Full reference : 
See, L., Chen, Q., Crooks, A., Bayas, J.C.L., Fraisl, D., Fritz, S., Georgieva, I., Hager, G., Hofer, M., and Lesiv, M., Malek, Ž., Milenković, M., Moorthy, I., Orduña-Cabrera, F., Pérez-Guzmán, K., Schepaschenko, D., Shchepashchenko, M., Steinhauser, J.and McCallum, I. (2025), New Directions in Mapping the Earth’s Surface with Citizen Science and Generative AI, iScience, doi: https://doi.org/10.1016/j.isci.2025.111919(pdf)