Data Management (Week 11 PR)

(Note to Class: some of the LoC materials are back up as of this morning)

This week’s practicum led me to think a little more carefully about what kinds of data were part of my proposed project for our class, and of what I will actually try to build in the next four weeks. In terms of central data, my project consists at first of ‘other people’s data’ only (geospatial data and metadata about objects), which is brought into my tool to be worked. There are, however, other kinds of data to deal with: there’s an application to store, the code that drives the site where my tool is accessed. In the greater project I dreamed up, there’s also a forum and other discussion places like news and FAQ wiki, tool development and sharing. The ultimate form of the project includes data management and storage and resharing of the data for processing through my GIS tool. Drawing from our chapters in NINCH Guide and from Digital Preservation in a Box, I made a little list of concerns and possible solutions for my data.

General:
-Use advisory boards (people) and systems (tools) for planning data in workflow and backup (draw pictures of the data and use calendars to manage regular processes such as backing up information)

Webspace:
-regular backups need to be made of the living website: forum discussion, blog posts, tools developed and shared, etc. An automated procedure (to be checked on by a human at scheduled intervals) would be ideal, where the data is migrated according to rules for naming and date and time stamping files straight onto backup media. Is it possible to have data archived in an off-site location automatically if both machines are online?

GIS point data and object metadata:
-meet metadata standards – this shouldn’t be challenging since my project consists first of stealing other people’s data and borrowing standards developed by one of my target repositories for additional database items; but we see in the readings how critical standards are for storage and preservation as well as working functionality in processing tools. This includes having subject thesauri and authority files; for input some level of data input control is necessary (for this, though, I’ve envisioned nearly complete control through drop downs).
-Storage of data: considerations include allowing for partial privacy for users of proprietary or protected archaeological data and partial open access. A hard drive on which to back up all of the data is advisable, but my main concerns for storage and retrieval in an emergency are a) getting data to users quickly from anywhere (with varying degrees of access to protected data), b) giving back to the distributed web by reduplicating and resupplying the data to other types of users elsewhere. What I’m thinking about is a second server or backup server. Cloud storage also sounds interesting and a pay-for-storage model on an otherwise free tool would provide a means to purchasing and maintaining one. From the link on the Preservation in a Box page that compares commercial platforms, SpiderOak sounds the best so far for cost, storage, and privacy, although they all have some serious downsides, namely being consumer products and lack of support for certain OS and platforms.

Week 10 Practicum

I commented on Fitzpatrick’s book as we’d already read it carefully! I have to say that I enjoyed the ability to take my scribbled marginalia and actually ask the authour something… delightful! I did worry that my questions were inane (maybe I’m not the planned audience, I should be more versed in the issues, etc.), but throughout our weekly readings, I was very concerned that the issues of value and product and market were being danced around. Where does humanities think it’s going in such a hurry? Again, it was great to ask the authour about it and maybe she’ll reply or folks will think about how to address those questions as they do their own future writing. It did seem community geared and maybe even productive.

My Guidelines for Evaluating Digital History Scholarship I draw from reading William Thomas, the Sherman Dorn section of “Part 1: Re-Visioning Historical Writing,” Writing History in the Digital Age, and thinking I did about parallelising the digital history with public while reading the Working Group on Evaluating Public History Scholarship Report and White Paper. They’re pretty rough, but here’s a selection of thoughts on the matter (the idea is to follow guidelines for assigning credit for tenure or otherwise):

1) Identify the type of project: does it parallel the argument form, archival work, or tool building? Judge its worth based on comparable principles; furthermore ascertain whether or not creating a digital work is justified: does it have more utility, value, or dimension in the digital medium than it would in another. Or, in the case of an online exhibit, meet preservation concerns for material culture?

2) Identify any component projects. Did the researcher have to do substantial digital archival work, for example, before preparing a digital work? programming? etc.? How much more time and resources did this use? (this for assigning more credit; especially if the baseline work is reusable by other researchers)

3) For an online argument, I would ask, broadly, whether or not it conveyed its argument in a legible manner (this concern comes from our readings where a new mode, like using hypertext. sometimes seemed to breach our conventions of reading a little too jarringly). For an archive or tool, whether it conveyed up front its order, purpose, reasoning, and options to alter or reuse in new ways.

I look forward to seeing the other students’ answers to this question.

Scholarly Communications and Digital Scholarship (Week 10, Reading Reflection): Musings on the Crisis in the Academy

Discussions in our readings about new modes of practise and publication surrounded the notion of the Academy in crisis. Kathleen Fitzpatrick (see especially, 14 of Introduction) and the Report from the 8th Scholarly Communications Institute both emphasise a need to become more relevant to groups ranging from the academy writ large to the public, i.e. more marketable on some level. We’re all familiar with the knee-jerk reaction to that sort of talk: the job of scholarship is not to pander to the public (I’m not really a snob, but have you seen what’s on television these days?); the point of good study is not necessarily “relevance”. I have to admit that reading the articles, I experience the same twist of the gut at the idea of scholarship and the academy becoming more capitalistic than it already has.

This is not to oversimplify the arguments, especially Fitzpatrick’s; her tongue-in-cheek concern that we might rationalise the fall of the academy as the “fate of genius in a world of mediocrity” (Fitzpatrick, 20), to blame “the ingrained anti-intellectualism in US culture” (14) distracts somewhat from her real discussion of the creation of value within the academy itself. A very interesting point in SCI 8 Report reminds us why the academic hard sciences are so well funded: it is believed by our society that they lead directly and inevitably to advances in applied science, industry, medicine, and so forth. Which they do not, but this in turn points to something interesting going on in our culture. We like to believe that we hold the tangible to be most worthwhile, but that’s not really how we play (let’s face it: one way of looking at the trouble we’re in now, so the rhetoric goes, is for playing too much with imaginary numbers). I’m excited by this idea and the question of by what sleight of hand might we keep humanities scholarship moving, or by the smashing of what “Idols” as Dan Cohen’s equally interesting discussion of value was cast in last week’s readings.

Themes in historical practise can and have shifted with the times, I think. We’ve gone from an interest in history as the educational key to civic duty, patriotism, and nationalism towards a growing interest in new methods by which to recover the subaltern, the atextual (and theoretically non-white, non-male, ergo non-literary voices), and towards an interest in systems, globalism (intriguingly, the global past as well as present). By way of social value, history is or was part of the liberal education impressed on all educated and highly employable people. Changes in that mechanism have to do with dramatic changes in the system within which the academy is set and arguably has become too much entangled in. I perceive the crisis in the academy to be a simple extension of a much larger crisis. I’m still not sure it is solvable with new modes of practise such as Fitzpatrick recommends, although what she does recommend lends itself to the question of weighted value (say between hard sciences and humanities) in discovering new ways for the community to propel itself into greater and more cohesive activity. And possibly, she shows us ways to save money and resources together through consortium and academy wide resource archiving and sharing, collaboration on publishing and promoting/selecting.

I am still musing though on the question of history, whether it should be “relevant” (or the converse, can it really avoid it?), and how our society-in-flux, including those who categorise as members of the academy, will want to understand and use the past and future, organically or constructed.

Week 7: In other mapping news…

Here is my georectified map of Rome Under Trajan, Anne-sourced by Map Warper. It was quite fun to make. Both this layer and the base map are in a nice conventional Mercator projection so it didn’t contort much when rectified.

All I want to say is that I’m so pleased this tool is out there and free now. The pain known as rubber-sheeting maps (manually or digitally) even six-odd years ago has been lessened for map-people. I want to share with non-mapping people that in past, in order to work a historic map to be able to overlay with a base map and layers in a GIS was a nightmare beyond imagining! (OK, maybe I hyperbole, but it was tedious and time consuming)

This was a great tool to get a handle on in the interest of class project and beyond.

Spatial History (Week 7): Practicum and Reading Reflection

Maps…

I have an old bias regarding maps. It dates back to somewhere between my own roots in scientific training and to further formation in undergraduate GIS and archaeology coursework. Maps are tools of analysis, not necessarily of presentation (unless you’re making maps for use finding roads or in atlases); maps are inherently ideological, and all the more so the more shaped to presentation they become. So, the Richard White article resounded most, and at first glance, only, with the mode I’m used to: use the tool to find out something you couldn’t find through other means, share findings, maybe take some nice screen shots to illustrate your findings, with rich text descriptions to support them, the pictures void of meaning except in the context of the described processes of discovery.

White discusses some of the problems historians and humanities folks in general might have with using a GIS, the analytical mapping tool, for inquiry: they just don’t handle fuzzy data that well. Even a seemingly clearcut archaeological puzzle like mapping clusters of artefacts is ultimately going to call for a tool that understands distance in terms not easily managed by programmes that deal in absolute space and made to handle the parcels of land bureaucrats. We need it to understand the intangibles of Moretti’s Marxist-informed “forces” or the set of liberties and constraints in White’s “motion.” White’s solution was to make ugly maps, essentially: to make maps that defy use as presentation tools, illegible network diagrams, distorted space. I like this. This is not Martin Jessop’s interactive object, manipulable from the end of data or of visualisation; this is a two-headed serpent, where data and visualisation are not discrete, cannot be manipulated separately.

Yet our other readings focused on maps and other spatially driven visualisations as media for the communication of scholarship. The most popular humanities solution to Lewis Mumford’s charge against “asking the kinds of questions only computers can answer” (cited first in our readings in Tim Hitchcock’s “Place and the Politics of the Past”), and maybe also against ‘showing them in a way only a computer (or a specialist) can read,’ seems to culminate in the kind of object we’re calling the DDM (alas, this acronym turns my unresisting brain towards a popular roleplaying games….). I’m not sure I like this.

I experimented with Neatline Sandbox and tried to plot out some temples of Venus in Italy and Sicily (the spread and syncretism of the cults of Venus-Aphrodite-Astarte being one of my pet projects) on the timeline and with images. It was kind of fun, but I’m not sure I see the point. Even with the complete, sophisticated projects featured in Hitchcock and on the HyperCities site and feature article by Todd Presner, I couldn’t get much more out of it than the idea that it was a quaint and charming way of presenting information, as White put it, “discovered by other means” than mapping (and quite expensive to boot as he also said!).* Will someone talk me out of this impression? What am I missing?

*my judgement doesn’t extend to 3D modeling, which, although treated together in Presner’s discussion of good practise, to my eyes, has clearer value as an experimental device for humanities research.

Site Planning and Design, Agile Development (Reading Reflection, Week 4)

Our nuts and bolts week for starting to think about developing projects, I’m glad this reading came towards the beginning. Cohen and Rosenzweig’s chapters in Digital History quickly delineate the considerations one must bear in mind when planning, and do this in such a way that when closer inspection is required, the reader can return to this section, or later chapters can be consulted.

The idea here is to give us enough base knowledge that we can obey the motto for this class meeting and begin planning and designing, and developing, agilely. The idea here, expressed by Cohen and Rosenzweig, is “plan first,” and plan the project before the technology: the technology should be secondary and subservient to the goal. We do, however, need some basic knowledge of what’s possible and how labourious it is before we begin to plan. Do we need a server? Do we have or can we afford the tools to put multimedia online?

Our List Apart selections get into nitty gritty of site planning, evaluation, reevaluation processes long before execution, and not necessarily after getting the hands dirty on the machinery. They confirm the necessity of planning first, and constructing the plan in such a way as to be able to change it easily, agilely, through processes of feedback, testing, and renegotiating.

Planning still hinges on the core question of goal. What do you want your project to do? How have others done it and where could tool choices be improved is the second question. It all comes down to design. I enjoy the slight play inherent in the word “design” when talking about the web: attractive interfaces or architecture? Even though I can map architecture with a simple matrix (hopefully…), and a different visualisation is required to look at interface, these aren’t really discrete categories, just different axes that describe the goal: the form of the argument.

Voyant – at First Glance

At first glance, I wasn’t sure what to do with Voyant the digital text analysis tool. I primarily spend my time working with ancient texts, and there just isn’t that much analysis you can do in translation. Originals for ancient inflected languages require the level of parsing tools built into the wonderful Perseus Project. I instead dropped in some texts I ‘liked’ (GK Chesterton pulled off Gutenberg, etc.) but it wasn’t very productive to find out that the most common word in the Man who was Thursday after articles and proper nouns was “professor.”

Ultimately and somewhat arbitrarily, I decided to compare two translations of fragments from the ‘”Epic of Gilgamesh”‘ (also found in text form on Gutenberg). The Pennsylvania tablet and the Yale tablet are not the same part of the story, the former being an account of the meeting and wrestling of Gilgamesh and Enkidu and the latter, the fight against Humbaba, but the transcriptions and translations were conducted by the same person, so, in theory, there should be consistency of English vocabulary selection, grammatical construction, and versification.

The two texts were clearly not in the same style from a visual inspection of versification alone — no tool needed to tell us that! but playing with word counting a bit yielded some interesting results that, if not novel to the close reader of a mere two texts, suggest some possibilities for the benefits and problems of applying the tools to a larger corpus. For example, I searched the word “god,” which appears 7 times in both texts together. The graphical display of relative frequencies showed two curves for the appearance in sections of the Penn text (organised by my own division of the verses when I copied the text); uses in the Yale text are entirely at the end. The sample is so small (the fragments contain only 2,274 words between them and 599 of those unique, a mere 15 words replicating more than 7 times other than proper nouns), I’m not sure the tool is remotely useful at this scale, but I can see how the tools might guide our attention to neglected details.

The frequencies are made more interesting by the “keywords in context” tool. The Pennsylvania tablet uses the word “god” in the opening and climax of the fragment exclusively as a heroic descriptor: “how like a god.” The Yale tablet, with all appearances at the end, however addresses a god indirectly with respect to prayers or offerings: this is clearly a different poetic form whether it contains a different story or not.

Interestingly, the Penn text’s frequent words produce many double-curve graphs, and the Yale many single curves. One wonders also if the Yale text’s relative incompleteness also affects the visual output (with single examples, any trends though are illusions).

I can see this tool lending great power to text analysis of large corpora, but I do hope that they can be developed to handle ancient texts as well. I tried a little Latin just for fun (Cicero, Somnium Scipionis) but trying to manage the stop words list with an inflected language was too time consuming to test. Wordle supposedly handles Latin words, but the “common Latin words” filter still left me with a goodly number of pronoun forms and pulled words like vita (=life) that although “common” were not what I had in mind to remove!