Monday, June 29, 2020

Location-Based Social Network Data Generation

Continuing and building upon our previous work on Location-Based Social Networks (LBSNs) at the The 21st IEEE International Conference on Mobile Data Management we have a paper entitled "Location-Based Social Network Data Generation Based on Patterns of Life." In the paper we discuss how LBSNs research has become an active research topic in a variety of areas describing mobility patterns, location recommendation and friend recommendation systems. However we make the argument that real-world LBSN data sets (e.g., Gowalla, BrightKite) are a rather scarce resource due to privacy implications of making such data public available. Furthermore, in many publicly available LBSN data sets, the vast majority of users have less than ten check-ins or the number of locations visited by a user is usually only a small portion of all locations that user has visited (as shown in the table below).

Publicly Available Real-World LBSN Data Sets.

To overcome these weaknesses in this paper we present a LBSN simulation (an agent-based model created in MASON) capable of creating multiple artificial but socially plausible, large-scale LBSN data sets. If this sounds of interest to you, below we provide a little more information about the paper, Specifically, its abstract, a depiction of LBSNs, our case studies and the resulting simulations we used to develop LBSN data based on patterns of life (PoL) and some sample results. In addition to this, as the conference was virtual, Joon-Seok Kim also made a great movie of the conference paper. At the bottom of this post we provide the full reference and link to the paper. 

We would also like to draw the readers attention to our online resources which accompanies this paper. For example, to allow others to use and extend our work, the source code and scripts used to generate these data sets is available at: https://github.com/gmuggs/pol, while all of the generated data sets can be found at OSF (https://osf.io/e24th/?view_only=191fdd0c640847b5b85597ab0e57186d). For more details about this model and data readers are referred to the webpage created by Joon-Seok Kim: https://mdm2020.joonseok.org.

Abstract:
Location-based social networks (LBSNs) have been studied extensively in recent years. However, utilizing real-world LBSN data sets in such studies yields several weaknesses: sparse and small data sets, privacy concerns, and a lack of authoritative ground-truth. To overcome these weaknesses, we leverage a large scale LBSN simulation to create a framework to simulate human behavior and to create synthetic but realistic LBSN data based on human patterns of life. Such data not only captures the location of users over time but also their interactions via social networks. Patterns of life are simulated by giving agents (i.e., people) an array of “needs” that they aim to satisfy, e.g., agents go home when they are tired, to restaurants when they are hungry, to work to cover their financial needs, and to recreational sites to meet friends and satisfy their social needs. While existing real-world LBSN data sets are trivially small, the proposed framework provides a source for massive LBSN benchmark data that closely mimics the real-world. As such it allows us to capture 100% of the (simulated) population without any data uncertainty, privacy-related concerns, or incompleteness. It allows researchers to see the (simulated) world through the lens of an omniscient entity having perfect data. Our framework is made available to the community. In addition, we provide a series of simulated benchmark LBSN data sets using different real-world urban environments obtained from OpenStreetMap. The simulation software and data sets which comprise gigabytes of spatio-temporal and temporal social network data are made available to the research community.
LBSN Overview

Case Studies: A: New Orleans, Louisiana (NOLA), Mississippi River, Lake Pontchartrain, and the ‘French Quarter’. B: George Mason University (GMU), Fairfax, VA. C: Synthetic Villages - Small (Left) and Large (Right).
Environments Populated with Agents. Clockwise from Top Left: GMU, NOLA, Large and Small Synthetic Villages.
Data Sets Resulting from Location-Based Social Network Simulation
Average Social Network Degree over Time (1K).
Social Network





Full Reference:
Kim, J-S., Jin, H., Kavak, H., Rouly, O.C., Crooks, A.T., Pfoser, D., Wenk, C. and Züfle, A. (2020), Location-Based Social Network Data Generation Based on Patterns of Life, The 21st IEEE International Conference on Mobile Data Management, Versailles, France. (pdf)

Friday, June 19, 2020

Call for Papers: ACM SIGSPATIAL 2020 International Workshop on Geospatial Simulation (GeoSim 2020)


Building upon two successful GeoSim workshops, the 2020 GeoSim Workshop (held in conjunction with the ACM SIGSPATIAL 2020 conference) is seeking papers.

The 3rd GeoSim workshop will focus on all aspects of simulation as a general paradigm to model and predict spatial systems and generate spatial data. New simulation methodologies and frameworks, not necessarily coming from the SIGSPATIAL community, are encouraged to participate. Also, this workshop is of interest to everyone who works with spatial data. The simulation methods that will be presented and discussed in the workshop should find a wide application across the community by producing benchmark datasets that can be parameterized and scaled.

The workshop seeks high-quality full (8-10 pages) and short (up to 4 pages) papers that will be peer-reviewed. Once accepted, at least one author is required to register for the workshop and the ACM SIGSPATIAL conference, as well as attend the workshop to present the accepted work which will then appear in the ACM Digital Library.

We solicit novel and previously unpublished research on all topics related to geospatial simulation including, but not limited to:
  • Disease Spread Simulation
  • Urban Simulation
  • Agent Based Models for Spatial Simulation
  • Multi-Agent Based Spatial Simulation
  • Big Spatial Data Simulation
  • Spatial Data/Trajectory Generators
  • Road Traffic Simulation
  • Environmental Simulation
  • GIS using Spatial Simulation
  • Modeling and Simulation of COVID-19
  • Interactive Spatial Simulation
  • Spatial Simulation Parallelization and Distribution
  • Geo-Social Simulation and Data Generators
  • Social Unrest and Riot Prediction using Simulation
  • Spatial Analysis based on Simulation
  • Behavioral Simulation
  • Verifying, and Validating Spatial Simulations
  • Applications for Spatial Simulation

Special Topic
The special topic for GeoSim 2020 brings focus to current trends in disease spread simulations, their practicality in predictive and prescriptive analytics, and the challenges they face in their use.

Workshop Information

Wednesday, June 10, 2020

New Paper: A Thematic Similarity Network Approach for Analysis of Places Using VGI

Building upon our work on volunteered geographical information (VGI) and ambient geographic information (AGI) and how such data (e.g. social media) can be used to understand place, Xiaoyi Yuan, Andreas Züfle and myself have a new paper entitled: "A Thematic Similarity Network Approach for Analysis of Places Using Volunteered Geographic Information" in the ISPRS International Journal of Geo-InformationIn this paper we use textual data from crowdsourced reviews originating with TripAdvisor and geo-located Twitter data and leverage this unstructured geographical information to comprehend the complexity of places at scale. Specifically we explore the connectedness and relationships of places through thematic (i.e., topical) similarity networks using Manhattan, New York as a case study. If such work sounds of interest to you, below we provide the abstract to the paper in order for you to gain a greater understanding of work, along with some figures that show our workflow and how communities where connected, before presenting some of our results. Finally at the bottom of the post, the full reference and a link to the paper is provided.  For those interested in extending or utilizing this work. The python code for presented in our analysis is available at: https://bitbucket.org/xiaoyiyuan/network_vgi/

Abstract:
The research presented in this paper proposes a thematic network approach to explore rich relationships between places. We connect places in networks through their thematic similarities by applying topic modeling to the textual volunteered geographic information (VGI) pertaining to the places. The network approach enhances previous research involving place clustering using geo-textual information, which often simplifies relationships between places to be either in-cluster or out-of-cluster. To demonstrate our approach, we use as a case study in Manhattan (New York) that compares networks constructed from three different geo-textural data sources --TripAdvisor attraction reviews, TripAdvisor restaurant reviews, and Twitter data. The results showcase how the thematic similarity network approach enables us to conduct clustering analysis as well as node-to-node and node-to-cluster analysis, which is fruitful for understanding how places are connected through individuals’ experiences. Furthermore, by enriching the networks with geodemographic information as node attributes, we discovered that some low-income communities in Manhattan have distinctive restaurant cultures. Even though geolocated tweets are not always related to place they are posted from, our case study demonstrates that topic modeling is an efficient method to filter out the place-irrelevant tweets and therefore refining how of places can be studied.

Keywords: Geo-Textual Data, Volunteered Geographic Information, Crowdsourcing, Similarity Network Analysis, Topic Modeling

Work flow from data input to the construction of the thematic similarity network and analysis (i.e., community detection and unique nodes discovery).

A stylized network demonstrating the process of community detection from a fully-connected similarity network.


Network visualization of all communities from the thematic similarity networks with major communities highlighted. Only the major communities are shown on the map for the sake of clarity. Major communities in Network visualization and mapping for each network are colored the same and thus the legend applies for both.


Two examples of communities with boundary nodes and their respective topics.

Full Reference:
Yuan X., Crooks, A.T. and Züfle, A. (2020), A Thematic Similarity Network Approach for Analysis of Places Using Volunteered Geographic Information, ISPRS International Journal of Geo-Information,  9(6), 385, https://doi.org/10.3390/ijgi9060385. (pdf)

Tuesday, June 02, 2020

Location-Based Social Simulation for Prescriptive Analytics of Disease Spread

Building upon our previous work on Location-Based Social Networks (LBSNs) and how agent-based modeling could provide an alternative to real world data sets, in the latest SIGSPATIAL Special Newsletter, we (Joon-Seok Kim, Hamdi Kavak, Chris Rouly, Hyunjee Jin, Dieter Pfoser, Carola Wenk, Andreas Zufle and myself) have an article entitled "Location-Based Social Simulation for Prescriptive Analytics of Disease Spread."

In this article we discuss a geographically explicit agent-based model that we have been developing that is capable not only of simulating human behavior but also able to create synthetic but realistic LBSN data based on human patterns-of-life. Furthermore, in the article we discuss how such data and models can be used to explore the parameter space of possible prescriptions to find optimal strategies (or policies) to achieve a desired system state and outcome. We refer to such a search for optimal policies as prescriptive analytics. (for readers wishing to learn more about prescriptive analytics please see the 1st ACM KDD Workshop on Prescriptive Analytics for the Physical World).

To give an example of such prescriptions, in the article we make use of a simple hypothetical disease model and explore two prescribed policies to mitigate the spread of the disease. The first policy requires all agents to wear simulated Personal Protective Equipment (PPE) that reduce the chance of infection by 50%. The second policy enforces strict social distancing measures onto a fixed proportion of 50% of the population. Those who follow the social distancing order avoid recreational site visits from meeting people although they still go to restaurants. In addition to these two policies, as a baseline, we also ran a “null-prescription” in which no intervention was prescribed. We find that the social distancing prescription was extremely effective. On the other hand, our simulation results for PPE policy showed that merely wearing protective gear without any change in behavior has no significant effect (for the case of this disease).

If this type of research is of interest to you, below we provide the abstract to the paper, a movie of a representative simulation run, some of our results of the prescriptions described above and a link to the paper itself. Further information about the model and data can be found at https://geosocial.joonseok.org/p/epidemic.html and the data is available at https://osf.io/e24th/. Also as we are currently going through COVID-19, we thought a a brief write up and links to some disease models and discussions of modeling efforts related to it was also appropriate to include.

Abstract: 
Human mobility and social networks have received considerable attention from researchers in recent years. What has been sorely missing is a comprehensive data set that not only addresses geometric movement patterns derived from trajectories, but also provides social networks and causal links as to why movement happens in the first place. To some extent, this challenge is addressed by studying location-based social networks (LBSNs). However, the scope of real-world LBSN data sets is constrained by privacy concerns, a lack of authoritative ground-truth, their sparsity, and small size. To overcome these issues we have infused a novel geographically explicit agent-based simulation framework to simulate human behavior and to create synthetic but realistic LBSN data based on human patterns-of-life (i.e., a geo-social simulation). Such data not only captures the location of users over time, but also their motivation, and interactions via temporal social networks. We have open sourced our framework and released a set of large data sets for the SIGSPATIAL community. In order to showcase the versatility of our simulation framework, we added disease a model that simulates an outbreak and allows us to test different policy measures such as implementing mandatory mask use and various social distancing measures. The produced data sets are massive and allow us to capture 100% of the (simulated) population over time without any data uncertainty, privacy-related concerns, or incompleteness. It allows researchers to see the (simulated) world through the lens of an omniscient entity having perfect data.

Screenshot of the epidemic simulator depicting the French Quarter, New Orleans, LA, USA.



New cases and SEIR epidemic course.


Full Reference:
Kim, J-S., Kavak, H., Rouly, C.O., Jin, H., Crooks, A.T., Pfoser, D., Wenk, C. and Zufle, A. (2020), Location-Based Social Simulation for Prescriptive Analytics of Disease Spread, SIGSPATIAL Special, 12(1): 53-61. (pdf)

The Washington Post's Disease Model
While this post is not about COVID per se, if you are interested in disease models the Washington Post had a great article about COVID several months ago entitled "Why outbreaks like corona virus spread exponentially, and how to “flatten the curve”." This article generated a lot of discussion such as on the SIMSOC Mailing list and was citied in a paper in the Journal of Artificial Societies and Social Simulation (JASSS) entitled  "Computational Models That Matter During a Global Pandemic Outbreak: A Call to Action." Other goods discussions on COVID related models (particularly agent-based models) can be found on Review of Artificial Societies and Social Simulation (RofASSS) website (here), the CoMSES Net Discourse Forum (along with links to past epidemic models) and the Sociology and Complexity Science Blog has some very good posts on modeling and public health.

Tuesday, May 26, 2020

Crowdsourcing Street View Imagery: A Comparison of Mapillary and OpenStreetCam


In the past we have written extensively on Volunteered Geographic Information (VGI) such as OpenStreetMap or Twitter. However, we have not really explored Street View Imagery  (SVI), well not until now. Within the realm of VGI, SVI has emerged in recent years as a novel and rich source of data on cities from which geographic information can be derived.

Perhaps the most well-known example of SVI utilization is that of Google Street View (GSV). While SVI has been traditionally collected by governmental agencies and companies alike, we are now also witnessing the emergence of Volunteered Street View Imagery (VSVI), which relies on a crowdsourced effort to provide geotagged street-level imagery coverage of traversable pathways (e.g., a street or trail). Such imagery, similar to GSV, provides detailed information about the location of objects such as cars, road markings, traffic lights and signs, and allows for the automatic extraction of features at scale. Such imagery can also be mined using machine learning algorithms to automatically derive points of interest (POI) databases (e.g., locations of coffee shops and fire hydrants) without the intervention of the citizen.

To explore VSVI we have just published a new paper entitled: "Crowdsourcing Street View Imagery: A Comparison of Mapillary and OpenStreetCam" in the ISPRS International Journal of Geo-Information. In this paper we examine VSVI data collected from two different platforms: Mapillary and OpenStreetCam (OSC) for four metropoiltan areas in the United States (i.e., Washington (District of Columbia), San Francisco (California), Phoenix (Arizona), and Detroit (Michigan)). Both of these online platforms accept sequences of images captured from mobile devices and uploaded via an app on the device (like those shown in the image to the right). Images are geolocated using the device’s global positioning system (GPS). More specifically the paper examines:
  • the level of spatial coverage of each platform in order to assess the overall potential of such platforms to provide adequate coverage of geographic information.
  • user contribution patterns in Mapillary and OSC in order to understand how users are contributing to these platforms.
Results from our systematic and quantitative analysis of these two emerging VGI sources indicate that most Mapillary and OSC contributions occurred along control-access highways and local roads, and that the overall coverage in these sources is variable in comparison to an authoritative source (i.e., TIGER). Furthermore, our results showed that while the number of contributors varied across sites, only a few contributors were responsible for producing most of the raw data. User contribution patterns were also different in Mapillary and OSC. Specifically, we found that while patterns in coverage were variable for the different OSC sites, coverage patterns in Mapillary tended to be similar among sites. This finding may be linked to several factors, including differences in mapping practice, or issues with participation inequality, a topic that has been highly researched for other VGI platforms such as OSM, but which is still lacking within VSVI. Lastly, user contributions in Mapillary tended to be higher around 8:00 am, 1:00 pm and 5:00 pm (local time). This finding suggests that VSVI contributions tend to coincide with the morning and afternoon commute, and the lunch hour of the contributors.

If you wish to find out more about this work below we provide the abstract to the paper, a visual flowchart of our workflow and some of our our results. The full reference and link to the paper is provided at the bottom of the post.

Abstract:
Over the last decade, Volunteered Geographic Information (VGI) has emerged as a viable source of information on cities. During this time, the nature of VGI has been evolving, with new types and sources of data continually being added. In light of this trend, this paper explores one such type of VGI data: Volunteered Street View Imagery (VSVI). Two VSVI sources, Mapillary and OpenStreetCam, were extracted and analyzed to study road coverage and contribution patterns for four US metropolitan areas. Results show that coverage patterns vary across sites, with most contributions occurring along local roads and in populated areas. We also found that a few users contributed most of the data. Moreover, the results suggest that most data are being collected during three distinct times of day (i.e., morning, lunch and late afternoon). The paper concludes with a discussion that while VSVI data is still relatively new, it has the potential to be a rich source of spatial and temporal information for monitoring cities.

Keywords: Crowdsourcing; Volunteered Geographic Information; Street View Imagery; Mapillary, OpenStreetCam
Overview of methodology

Spatial distribution of road networks.
Spatial comparison of roads in kilometers.


Full Reference: 
Mahabir, R., Schuchard, R., Crooks, A.T., Croitoru, A. and Stefanidis, A. (2020), Crowdsourcing Street View Imagery: A Comparison of Mapillary and OpenStreetCam, ISPRS International Journal of Geo-Information. 9(6), 341; https://doi.org/10.3390/ijgi9060341 (pdf)