(Note to Class: some of the LoC materials are back up as of this morning)
This week’s practicum led me to think a little more carefully about what kinds of data were part of my proposed project for our class, and of what I will actually try to build in the next four weeks. In terms of central data, my project consists at first of ‘other people’s data’ only (geospatial data and metadata about objects), which is brought into my tool to be worked. There are, however, other kinds of data to deal with: there’s an application to store, the code that drives the site where my tool is accessed. In the greater project I dreamed up, there’s also a forum and other discussion places like news and FAQ wiki, tool development and sharing. The ultimate form of the project includes data management and storage and resharing of the data for processing through my GIS tool. Drawing from our chapters in NINCH Guide and from Digital Preservation in a Box, I made a little list of concerns and possible solutions for my data.
-Use advisory boards (people) and systems (tools) for planning data in workflow and backup (draw pictures of the data and use calendars to manage regular processes such as backing up information)
-regular backups need to be made of the living website: forum discussion, blog posts, tools developed and shared, etc. An automated procedure (to be checked on by a human at scheduled intervals) would be ideal, where the data is migrated according to rules for naming and date and time stamping files straight onto backup media. Is it possible to have data archived in an off-site location automatically if both machines are online?
GIS point data and object metadata:
-meet metadata standards – this shouldn’t be challenging since my project consists first of stealing other people’s data and borrowing standards developed by one of my target repositories for additional database items; but we see in the readings how critical standards are for storage and preservation as well as working functionality in processing tools. This includes having subject thesauri and authority files; for input some level of data input control is necessary (for this, though, I’ve envisioned nearly complete control through drop downs).
-Storage of data: considerations include allowing for partial privacy for users of proprietary or protected archaeological data and partial open access. A hard drive on which to back up all of the data is advisable, but my main concerns for storage and retrieval in an emergency are a) getting data to users quickly from anywhere (with varying degrees of access to protected data), b) giving back to the distributed web by reduplicating and resupplying the data to other types of users elsewhere. What I’m thinking about is a second server or backup server. Cloud storage also sounds interesting and a pay-for-storage model on an otherwise free tool would provide a means to purchasing and maintaining one. From the link on the Preservation in a Box page that compares commercial platforms, SpiderOak sounds the best so far for cost, storage, and privacy, although they all have some serious downsides, namely being consumer products and lack of support for certain OS and platforms.