Over the summer, Arie Croitoru and myself took part in the George Mason University Aspiring Scientists Summer Internship Program. We worked with three very talented high-school students who over the course of the seven and a half week program produced some excellent research around the areas of agent-based modeling and social media analysis. An overview of their work can be seen in the posters and abstracts that the students produced at the end of the internship.
Lawrence Wang explored how social media could be used with respect to predicting election results under a project entitled "And the Winner Is? Predicting Election Results using Social Media". Below you can read Lawrence's abstract and see his poster.
"The 2012 U.S. presidential election demonstrated how Twitter can serve as a widely accessible forum of political discourse. Recently, researchers have investigated whether social media, particularly Twitter, can function as a predictive tool. In the past decade, multiple studies have claimed to successfully predict the results of elections using Twitter data. However, many of these studies fail to account for the inherent population bias present in Twitter data, leading to ungeneralizable results. In this project, I investigate the prospects of using Twitter data as an alternative to poll data for predicting the 2012 presidential election. The tweet corpus consisted of tweets published one month before the November election day. Using VADER, a sentiment analysis tool, I analyzed over 140,000 tweets for political sentiment. I attempted to circumvent the Twitter population bias by comparing age, race, and gender metrics of the Twitter population with that of the U.S. population. Furthermore, I utilized Bayesian inference with prior distributions from the results of the 2008 presidential election in order to mitigate the effects of limited tweet data in certain states. The resulting model correctly predicted the likely outcomes of 46 of the 50 states and predicted that President Obama would be reelected with a probability of 0.945. Such a model could be used to explore the forthcoming elections. "
In a second project, Varun Talwar, explored how knowledge bases could be utilized to better contextualize social media discussions with a project entitled "Context Graphs: A Knowledge-Driven Model for Contextualizing Twitter Discourse." Below you can read Varun's project abstract and his end of project poster.
"Introduction: User posted content through online social media (SM) platforms in recent years has emerged as a rich field for narrative analysis of topics captured during the discussion discourse. In particular, collective discourse has been used to manually contextualize public perception of health related events.
Objective: As SM feeds tend to be noisy, automated detection of the context of a given SM discourse stream has proven to be a challenging task. The primary objective of this research is to explore how existing knowledge bases could be utilized to better contextualize SM discussions through topic modeling and mining. By utilizing such existing knowledge it would then be possible to explore to what extent a given discourse is related to a known or a new context, as well as compare and contrast SM discussions through their respective contexts.
Methods: In order to accomplish these goals this research proposes a novel approach for contextualizing SM discourse. In this approach, topic modeling is combined with a knowledgebase in a two-step process. First, key topics are extracted from a SM data corpus by applying a statistical topic-modeling algorithm, a process that also results in data dimensionality reduction. Once a set of salient topics are extracted, each topic is then used to mine the knowledge base for sub graphs that represent the contextual linkages between knowledge elements. Such sub-graphs can then further disambiguate the topic modeling results, and be utilized for qualifying context similarity across SM discussions.
Results: The time-series analysis of the Twitter discourse via graph-matching algorithms reveals the change in topics as evidenced by the emergence of the terms “pregnancy” and “abortion” as information about the virus propagated through the Twitter community. "
Elizabeth Hu explored the current migration crisis in Europe in a project entitled "Across the Sea: A Novel Agent-Based Model for the Migratory Patterns of the European Refugee Crisis". Below is Elizabeth's abstract, poster and an example model run.
"Since 2010, a growing number of refugees have sought asylum in European nations, fleeing violence and military conflict in their home countries. Most of the refugees originate from Syria, Iraq, Afghanistan, and African nations. The vast majority of refugees risk their lives in the popular yet perilous Mediterranean Sea Route often prone to boat accidents and subsequent deaths of migrants. The flow of millions of refugees has introduced a humanitarian crisis not seen since World War II. European nations are struggling to cope with the influx of refugees through various border policies.
In order to explore this crisis, a geographically explicit agent-based model has been developed to study the past and future patterns of refugee flows. Traditional migration models, which represent the population as an aggregate, fail to consider individual decision-making processes based on personal status and intervening opportunities. However, the novel agent-based model developed here of migration allows population behavior to emerge as the result of individual decisions. Initial population, city, and route attributes are based upon data from the UNHCR, EU agencies, crowd-sourced databases, and news articles. The agents, refugees, select goal destinations in accordance with the Law of Intervening Opportunities. Thus, goals are prone to change with fluctuating personal needs. Agents choose routes not only based on distance, but also other relevant route attributes. The resulting migration flows generated by the model under various circumstances could provide crucial guidance for policy and humanitarian aid decisions."