At first glance, I wasn’t sure what to do with Voyant the digital text analysis tool. I primarily spend my time working with ancient texts, and there just isn’t that much analysis you can do in translation. Originals for ancient inflected languages require the level of parsing tools built into the wonderful Perseus Project. I instead dropped in some texts I ‘liked’ (GK Chesterton pulled off Gutenberg, etc.) but it wasn’t very productive to find out that the most common word in the Man who was Thursday after articles and proper nouns was “professor.”
Ultimately and somewhat arbitrarily, I decided to compare two translations of fragments from the ‘”Epic of Gilgamesh”‘ (also found in text form on Gutenberg). The Pennsylvania tablet and the Yale tablet are not the same part of the story, the former being an account of the meeting and wrestling of Gilgamesh and Enkidu and the latter, the fight against Humbaba, but the transcriptions and translations were conducted by the same person, so, in theory, there should be consistency of English vocabulary selection, grammatical construction, and versification.
The two texts were clearly not in the same style from a visual inspection of versification alone — no tool needed to tell us that! but playing with word counting a bit yielded some interesting results that, if not novel to the close reader of a mere two texts, suggest some possibilities for the benefits and problems of applying the tools to a larger corpus. For example, I searched the word “god,” which appears 7 times in both texts together. The graphical display of relative frequencies showed two curves for the appearance in sections of the Penn text (organised by my own division of the verses when I copied the text); uses in the Yale text are entirely at the end. The sample is so small (the fragments contain only 2,274 words between them and 599 of those unique, a mere 15 words replicating more than 7 times other than proper nouns), I’m not sure the tool is remotely useful at this scale, but I can see how the tools might guide our attention to neglected details.
The frequencies are made more interesting by the “keywords in context” tool. The Pennsylvania tablet uses the word “god” in the opening and climax of the fragment exclusively as a heroic descriptor: “how like a god.” The Yale tablet, with all appearances at the end, however addresses a god indirectly with respect to prayers or offerings: this is clearly a different poetic form whether it contains a different story or not.
Interestingly, the Penn text’s frequent words produce many double-curve graphs, and the Yale many single curves. One wonders also if the Yale text’s relative incompleteness also affects the visual output (with single examples, any trends though are illusions).
I can see this tool lending great power to text analysis of large corpora, but I do hope that they can be developed to handle ancient texts as well. I tried a little Latin just for fun (Cicero, Somnium Scipionis) but trying to manage the stop words list with an inflected language was too time consuming to test. Wordle supposedly handles Latin words, but the “common Latin words” filter still left me with a goodly number of pronoun forms and pulled words like vita (=life) that although “common” were not what I had in mind to remove!