'...whereas scientific data tends to be large scale, homogenous, numeric, and generated (or collected/sampled) automatically, humanities data has a tendency to be fuzzy, small scale, heterogeneous, of varying quality, and transcribed by human researchers, making humanities data difficult (and different) to deal with computationally.'
Melissa Terras, 'Number Crunching Historians' http://melissaterras.blogspot.com/2012/01/number-crunching-historians.html
I knew the legacy data being assessed for SHARD was going to be interesting. I knew how unique and valuable these once off accumulations would be but this data is more fascinating than I expected. It is also deeply frustrating when one is deprived of it. This is either due to technical or intellectual problems. Especially when you realise that all it might have taken would have been a brief document detailing some codes/techncial specifications and whatever else might be useful for the non expert, which is probably everyone else in the world as the primary investigator will usually be the authority on the data.
It is important to remember that I am looking at the data not with a view to its content (however distracting) but rather at how well it has been managed to enable access and sharing it over time. How easy would it be to open up this data and use it again if I was not the data owner or creator? How much time and money would it cost to recreate these rich data experiences? I think we can easily forget how things used to be when it came to research, before databases and spreadsheets and PCs. I am old enough to do so. Many researchers are not. How powerful electronic data is, allowing us to use and resuse data in myriad ways to further research with a speed which would have been unimaginable 20 years ago.
With this 'power' so to speak comes responsibilities. To preserve this data we must look after it. We have a responsibility to mind it so that these rich and diverse accumulutaions of data are kept for future researchers. One day you may well be in the position where you embark on a research project and find some research data which complements/informs your topic. You find you are unable to open it due to software and hardware issues and even if you can open it you find that you can't comprehend the data as perhaps the codes used have not been written down and maybe there is no guide to the data.
Just bear this in mind, that some day that researcher might well be you.