Overview of Methods | An Epidemiology of Information

We integrated traditional interpretive analysis (close readings of texts) with two computational methods: dynamic temporal segmentation (topic modeling and segmentation) and tone analysis. We developed a dynamic temporal segmentation algorithm that automatically partitions the total time period defined by the documents in the collection such that segment boundaries indicate important periods of temporal evolution and re-organization. The goal of tone analysis was to use computational analysis to interpret some forms of textual meaning, thereby giving humanities and social science researchers the ability to automate the analysis of textual sentiment in large datasets.

Dynamic Temporal Segmentation (Topic Modeling and Segmentation)

The analysis of temporal textual data sets is associated with challenges spanning both the need to summarize large textual data sets and the requirement to capture dynamic reorganizations and trends over time. We developed a dynamic temporal segmentation algorithm that wraps around topic modeling algorithms for the purpose of identifying change points where significant shifts in topics occur.

Tone Analysis

The ability to recognize and identify tones across a large dataset would mean that computational analysis could actually interpret some forms of textual meaning. We wanted to see how tone analysis could help us understand shifts of tone in the coverage of influenza across time and space, identify tones associated with particular aspects of news coverage, and assess changes in reporting as the disease waxed and waned.

You can find the code and instructions for the tone classifier here and the topic modeling and segmentation algorithm here.

You can find a full discussion of our methods in the Project Research Report.