Monday 30 January 2012

What is your legacy?

'...whereas scientific data tends to be large scale, homogenous, numeric, and generated (or collected/sampled) automatically, humanities data has a tendency to be fuzzy, small scale, heterogeneous, of varying quality, and transcribed by human researchers, making humanities data difficult (and different) to deal with computationally.'

Melissa Terras, 'Number Crunching Historians' http://melissaterras.blogspot.com/2012/01/number-crunching-historians.html

I knew the legacy data being assessed for SHARD was going to be interesting. I knew how unique and valuable these once off accumulations would be but this data is more fascinating than I expected. It is also deeply frustrating when one is deprived of it. This is either due to technical or intellectual problems. Especially when you realise that all it might have taken would have been a brief document detailing some codes/techncial specifications and whatever else might be useful for the non expert, which is probably everyone else in the world as the primary investigator will usually be the authority on the data.

It is important to remember that I am looking at the data not with a view to its content (however distracting) but rather at how well it has been managed to enable access and sharing it over time. How easy would it be to open up this data and use it again if I was not the data owner or creator? How much time and money would it cost to recreate these rich data experiences? I think we can easily forget how things used to be when it came to research, before databases and spreadsheets and PCs. I am old enough to do so. Many researchers are not. How powerful electronic data is, allowing us to use and resuse data in myriad ways to further research with a speed which would have been unimaginable 20 years ago.

With this 'power' so to speak comes responsibilities. To preserve this data we must look after it. We have a responsibility to mind it so that these rich and diverse accumulutaions of data are kept for future researchers. One day you may well be in the position where you embark on a research project and find some research data which complements/informs your topic. You find you are unable to open it due to software and hardware issues and even if you can open it you find that you can't comprehend the data as perhaps the codes used have not been written down and maybe there is no guide to the data.

Just bear this in mind, that some day that researcher might well be you.

Thursday 26 January 2012

Data matters

I am underway with the first bit of work for SHARD. This involves building up a knowledge base based on various investigations, one which involves looking at legacy data. Before any of this can happen of course we must get our hands on the data. A basic yet essential requirement. Most data understandably due to its sheer size is kept locally and often doesn't make it to a centralised storage or a repository unlike the findings or conclusions which are usually well maintained and stored in a suitable repository ensuring access to this over time.

A simple procedure such as moving a dataset internally from one drive to another may not always be as simple as that sounds.

This made me think up some rules of internal data sharing.

1. Context matters. Different people have differing needs, it is not a one size fits all approach. All data is not equal, some is more special/unique than others.

2. Procedures matter. There should be procedures and these should be agreed on. The oral tradition of remembering belongs in a folk not a data archive.

3. Metadata matters. Information about data matters. Almost as much as data, so ensure that this is shared as well as the data. Otherwise the data can be meaningless and without context.

4. Trust: hard to win, easy to loose and very hard to regain. Personal connections often gain trust so be nice to each other.

That's all for now. More very soon!

Nice to meet you

Introductions never go amiss I think. So I thought it was about time we introduced ourselves. I'm Patricia, (2nd on left in this photo of our Digital Archives and Repositories Team at ULCC). I'm an an archivist who has worked in ULCC for 15 years now on a many projects, one of which was the National Digital Archive of Datasets (NDAD). I also have spent time training and developing training on digital preservation with the Digital Preservation Training Programme (DPTP). Everyone has a dream and it has been mine to join these two up working with the Institute of Historical Research (IHR)

and Jane to develop appropriate acessible preservation training for researchers who create data in the course of their research and avoid the loss of this valuable unique resource which often gets lost over time after funding ends. SHARD is an opportunity to do this and to hopefully get people thinking about how access can be maintained over time to this valuable resource. The very experienced Ed Pinsent, (first on left) will also be helping along the way.

My name’s Jane and I’m based at the Institute of Historical Research. I’m responsible both for our traditional publications activity and also for managing our digital projects. Recently, we’ve been putting a lot of time and effort into developing online research training materials for historians, building on our longstanding face-to-face courses. The SHARD project is a wonderful opportunity to raise awareness of data preservation among historians, and to present this specialist training in the more general framework of History SPOT, thereby helping to ‘demystify’ it. It’s great to have the opportunity to work with Patricia and her colleagues at ULCC, and to help embed their work within historical research practice.