Monday, June 29, 2020

Location-Based Social Network Data Generation

Continuing and building upon our previous work on Location-Based Social Networks (LBSNs) at the The 21st IEEE International Conference on Mobile Data Management we have a paper entitled "Location-Based Social Network Data Generation Based on Patterns of Life." In the paper we discuss how LBSNs research has become an active research topic in a variety of areas describing mobility patterns, location recommendation and friend recommendation systems. However we make the argument that real-world LBSN data sets (e.g., Gowalla, BrightKite) are a rather scarce resource due to privacy implications of making such data public available. Furthermore, in many publicly available LBSN data sets, the vast majority of users have less than ten check-ins or the number of locations visited by a user is usually only a small portion of all locations that user has visited (as shown in the table below).

Publicly Available Real-World LBSN Data Sets.

To overcome these weaknesses in this paper we present a LBSN simulation (an agent-based model created in MASON) capable of creating multiple artificial but socially plausible, large-scale LBSN data sets. If this sounds of interest to you, below we provide a little more information about the paper, Specifically, its abstract, a depiction of LBSNs, our case studies and the resulting simulations we used to develop LBSN data based on patterns of life (PoL) and some sample results. In addition to this, as the conference was virtual, Joon-Seok Kim also made a great movie of the conference paper. At the bottom of this post we provide the full reference and link to the paper. 

We would also like to draw the readers attention to our online resources which accompanies this paper. For example, to allow others to use and extend our work, the source code and scripts used to generate these data sets is available at: https://github.com/gmuggs/pol, while all of the generated data sets can be found at OSF (https://osf.io/e24th/?view_only=191fdd0c640847b5b85597ab0e57186d). For more details about this model and data readers are referred to the webpage created by Joon-Seok Kim: https://mdm2020.joonseok.org.

Abstract:
Location-based social networks (LBSNs) have been studied extensively in recent years. However, utilizing real-world LBSN data sets in such studies yields several weaknesses: sparse and small data sets, privacy concerns, and a lack of authoritative ground-truth. To overcome these weaknesses, we leverage a large scale LBSN simulation to create a framework to simulate human behavior and to create synthetic but realistic LBSN data based on human patterns of life. Such data not only captures the location of users over time but also their interactions via social networks. Patterns of life are simulated by giving agents (i.e., people) an array of “needs” that they aim to satisfy, e.g., agents go home when they are tired, to restaurants when they are hungry, to work to cover their financial needs, and to recreational sites to meet friends and satisfy their social needs. While existing real-world LBSN data sets are trivially small, the proposed framework provides a source for massive LBSN benchmark data that closely mimics the real-world. As such it allows us to capture 100% of the (simulated) population without any data uncertainty, privacy-related concerns, or incompleteness. It allows researchers to see the (simulated) world through the lens of an omniscient entity having perfect data. Our framework is made available to the community. In addition, we provide a series of simulated benchmark LBSN data sets using different real-world urban environments obtained from OpenStreetMap. The simulation software and data sets which comprise gigabytes of spatio-temporal and temporal social network data are made available to the research community.
LBSN Overview

Case Studies: A: New Orleans, Louisiana (NOLA), Mississippi River, Lake Pontchartrain, and the ‘French Quarter’. B: George Mason University (GMU), Fairfax, VA. C: Synthetic Villages - Small (Left) and Large (Right).
Environments Populated with Agents. Clockwise from Top Left: GMU, NOLA, Large and Small Synthetic Villages.
Data Sets Resulting from Location-Based Social Network Simulation
Average Social Network Degree over Time (1K).
Social Network





Full Reference:
Kim, J-S., Jin, H., Kavak, H., Rouly, O.C., Crooks, A.T., Pfoser, D., Wenk, C. and Züfle, A. (2020), Location-Based Social Network Data Generation Based on Patterns of Life, The 21st IEEE International Conference on Mobile Data Management, Versailles, France. (pdf)

No comments: