Digital Humanities Projects with Small and Unusual Data:
Some Experiences from the Trenches

UC Irvine Data Science Initiative on Data Science and Digital Humanities, February 5, 2016

Scott Kleinman, California State University, Northridge / scott.kleinman@csun.edu

http://scottkleinman.github.io/revealjs-presentations/UCI-Symposium/dh-projects-with-small-and-unusual-data.html

Simple Schema of the Humanities Interpretive Process

1. Primary Texts (data)

2. Disciplinary expertise (metadata on speed)

3. Interpretive acts (close reading, synecdoche, rhetorical persuasion)

The Dictionary of Old English and Old English Corpus
London, British Library, Cotton Tiberius A vi, recently digitised London, British Library, Cotton Tiberius A vi, recently digitised

But really...


Digital Humanities
=
Nineteenth-Century Studies?

Lexos BubbleViz Lexos BubbleViz of two Classical Chinese Texts
Lexos MultiCloud Lexos Multicloud showing part of a topic model of The New York Times produced by Alan Liu
Cluster analysis of Daniel and Azarias

Cluster Analysis of Daniel and Azarias

Some Orthographic Variations from Beowulf

eaðe/eaðe/yðelicgan/licgean
fah/fagmaþmum/madmum
feaxe/fexesellic/syllic
gedryht/gedrihtwlonc/wlanc
gæst/gastlibban/lifcgan
yldum/eldum

From Katherine O'Brien O'Keeffe, Visible Song: Transitional Literacy in Old English Verse (Cambridge University Press, 1990)

Cluster analysis of Beowulf showing Scribe B

Cluster Analysis of Beowulf showing Scribe B

Sample Entropy Values

TextsBits per Letter
Modern English (ave.)4.03
Beowulf4.18
Beginning of Bede's Historia ecclesiastica4.24
Daniel and Azarias (ave.)4.30
Early Middle English Texts (ave.)4.60
British Library, Cotton MS Cleopatra C vi, f. 4r

British Library, Cotton MS Cleopatra C vi, f. 4r

Sample Early Middle English Corpus

  • The AB language is a term coined in 1929 by J.R.R. Tolkien to refer to the standardised language of two manuscripts of Ancrene Wisse, a guide for anchoresses.
  • The language is shared by a group of texts from the English West Midlands including Hali Meiðhad ("Holy Maidenhood"), Sawles Warde ("Refuge of the Souls"), and a life of Saint Juliana.
  • The Lambeth Homilies is a collection of sermons which also comes from the West Midlands but does not share the AB language forms. The Kentish Sermons come from southeastern England.
Cluster analysis of Early Middle English Texts

Cluster Analysis of Early Middle English Texts

Word Clouds of Early Middle English texts

Word Clouds of Early Middle English texts

Cluster analysis of Early Middle English texts after consolidations

Cluster Analysis of Early Middle English texts after consolidations

Snippet of AEME code

Snippet of AEME code

Some Lessons from the Trenches

  • We need some entry-level literature about the application of statistical methods to humanities materials--particularly when our type of data differs from the standard types used by statisticians. [Lexos In the Margins]
  • It is important to emphasise workflow—and involve students in all stages of it.
  • We need to develop methods of and resources for processing highly entropic data.
  • The pay-off for this is often unclear, and we may need to cultivate (and reward) a sort of playful hermeneutics in working with this data.
  • There is a tremendous pay-off for students in participating in these processes.

Lexos


Online: http://lexos.wheatoncollege.edu/

On GitHub: https://github.com/WheatonCS/Lexos


Made with Reveal.js