A team of graduate student researchers led by Stony Brook’s Andrew Schwartz, an assistant professor in the College of Engineering and Applied Science’s Department of Computer Science, and Stanford University’s Johannes Eichstaedt is using Twitter to track and analyze COVID-19 symptoms and mental health in U.S. communities. Large-scale analysis of linguistic patterns in social media offer one of the few (if not only) large-scale instruments for measuring the physical and psychological health of populations down to the county level, daily, across most of the country. The group also produces what seems to be the only county-level COVID time-tracker available.
The response to COVID-19 (social distancing, sheltering in place, etc.) is said to be the largest psychological disruption of society since World War II, and the economic impact of unemployment and economic precarity potentially creates additional distress. Due to these factors, the entire risk allocation will likely be affected, ranging from the well-adjusted (a lowering of subjective well-being) to the vulnerable (an increase in mental illness).
The social networking site, Twitter, has been used in the past to track both communicable and non-communicable diseases (e.g., the flu and heart disease, respectively). As a dynamic, ever-changing data set, Twitter’s unique advantage is providing retrospective baselines from which changes can be detected. In addition, the latest advances in this area allow us to actively track changes across time in psychological and medical variables from social media data. The researchers can present more representative estimates through post-stratification by adjusting for demographic biases of the Twitter samples. By combining the nature of big social media data and the improvement of methods for inferring psychological and health information from it, a Twitter-based surveillance architecture can be a valuable tool to inform COVID-19-related public health decisions.
The research team is utilizing AI-based language assessment and statistical techniques to isolate dependable signals of active COVID-19 infections (such as the discussion of symptoms or those seeking testing). A “sociolinguistic COVID-19 base rate” will be formed from the rate of these linguistic patterns in social media, controlled for general coronavirus trends in discussion.
Adding on to recently validated methods, the researchers are measuring the impact of the virus and of social distancing/shelter in place orders on mental health (including depression, anxiety and loneliness) and subjective well-being across counties on a weekly level, and using retrospective base rates to detect relevant changes.