My research agenda to date in political methodology focuses primarily on the development and validation of techniques to measure framing and capture dynamics within text. My primary area of research focuses on extracting meaning from political text that evolves dynamically over time and requires the development and validation of new methods. This includes my dissertation described above. A second area of research within political methodology is concerned with developing rigorous active learning processes for classifying text and building desired measures.
Below I summarize some of this research agenda and related efforts. Please contact me with any questions or requests for working papers at: email@example.com
Kelsey Shoub. “Variations on a Theme: Semi Supervised Classification of Dynamic Concepts.”
While there has been an explosion in the availability of text accompanied by new methods for its analysis, no efficient text mining method has been shown capable of being resistant to the problem of concept drift within political science. This presents as two problems. First, if a researcher wishes to identify specific concepts in documents, then they need to hand label a sufficient number of documents — construct a training set — that can be used to train a computer to label the remaining documents. Second, is concept drift, which is the shift in language used to talk about topics and invoke different concepts (ex. sentiment). Here I develop and validate a dynamic co-training style algorithm to address these problems. It does so by: 1) supplementing the small training set by iteratively and automatically drawing the most probably accurately labeled into it; and 2) addressing concept drift by using a moving window to construct the training set, generate the models, and apply labels. To validate the algorithm proposed here, I test it on two classification tasks: labeling the topics discussed in one-minute speeches from US House and frames used in newspaper articles.
Brice Acree, Josh Jansa, Kelsey Shoub, and Eric Hansen. “Using and Developing the Weighted Cosine Similarity Score.” Paper presented at the Midwest Political Science Association Annual Meeting. Chicago, IL, April 2016.
We highlight flaws in extant measures of text similarity used in automated text analysis and introduce a new technique, weighted cosine similarity, to address these flaws.
For the working paper, click here.