“Using and Developing the Weighted Cosine Similarity Score” Paper presented at the Midwest Political Science Association Annual Meeting. Chicago, IL, April 2016. with Brice Acree, Eric Hansen, and Josh Jansa.

We highlight flaws in extant measures of text similarity used in automated text analysis and introduce a new technique, weighted cosine similarity, to address these flaws.

For the working paper, click here.

“Modeling Dynamic Issue Frames Through Supervised Text Mining of Newspapers”

I test the ability of four algorithms (Naive Bayes, LASSO regression, and boosted trees) to classify frames over time. To do so, I use articles on immigration and same sex marriage from the Media Frames Corpus (Card et al 2015) published between 1988 and 2012 that have been hand annotated from frame use. I use the most recent 5 year window to train the algorithms. The learned models are used to predict frame use in the preceding years. To evaluate the models, I compare model predictions to coder labels. From this analysis, I see that the models do not classify all frames equally well over time. To address this problem, I then modify a detection algorithm to accommodate findings from the literatures on concept drift and adaptive learning to identify when substantial linguistic change has occurred – changes substantial enough such that existing methods can no longer reliably classify a frame.

For a poster summarizing the working paper, click here.