SHARD

Monday 13 August 2012

Our online training course is now live

The online training course on 'Digital preservation for historians' is now live at http://historyspot.org.uk/. You will need to register to access the modules, which include short tests and exercises, but they are completely free to use. We'd love to hear what you think, so do let us know in the comments below.

Wednesday 18 July 2012

Here is the full leaflet introducing the benefits of research data preservation and the skills necessary to implement it. We’re really pleased with the result! Thanks Malcolm Raggett for the great design work!

The leaflet has been jointly developed by the DICE, SHARD and PrePARe projects, and designed and printed at LSE. It is intended for printing A4 double-sided and folding into 3.

How do I preserve my research data? FAQs

The good people of LSE hosted a meeting in March where we (DICE, PREPARE and SHARD) decided to devise a list of FAQs on the preservation of research data. We drafted it together on a Wiki hosted by the University of Cambridge. They will eventually be hosted on the IHR website but for now here they are.

What material and data should I preserve?

To enable the use and reuse of research data over time by others it is important to ensure that you provide documentation which describes the research data as well as the context of its creation as part of the research project. Technical information about the research data should also be kept to enable its reuse. If the data is encoded then code details must be kept. So in addition to the core research material you should provide a clear introduction to the entirety of the research data to enable future understanding and use.

Documentation such as emails and other material accompanying the core research data may seem irrelevant but they will all provide important contextualisation of the research project and can be appraised for relevance. Cambridge University uses terms such as embedded, supported and catalogue data to describe data which should accompany the search data itself.

Will I lose control over the material if I preserve it?

A significant number of research funders require that data produced in the course of the research they fund should be made available for other researchers to discover, examine and build upon to allow for new knowledge to be discovered through use, reuse, comparing data and so on. However you are responsible for deciding what data is legally obliged to be open or closed according to various pieces of legislation such as FOI and data protection. This should be stated at time of deposit.

Why shouldn't I just keep my data/material on my hard drive?

Keeping all your research data in one place is not a good idea in general. It is essential not to keep your research data on your hard drive as inevitably hard drives fail and you will lose your data. You should always back up your data at least two more devices or systems (ideally a repository) external to your hard drive.

I have all my data on an external hard drive - do I need to do anything else?

Ensure that your data is well documented and be held on at least two external devices/systems, ideally including an institutional digital repository.

Why should I preserve research material?

Researchers from all disciplines accumulate material in the course of their research. Considerable time, effort and money is spent in this endeavour. The preservation of research data is essential in order to further research through sharing of the data; to enable validation of results and demonstrate the process behind the conclusions and results of research.

What is a digital repository?

A digital repository is a system which provides a convenient infrastructure through which to store, manage, re-use and preserve digital materials. They are used by a variety of communities, may carry out many different functions, and can take many forms but essentially they are a secure way to keep data safe and accessible.

What archives/repositories are there for preserving my data?

There is no single UK repository for research data. Instead many are being developed within universities. The OpenDoar initiative provides a comprehensive list of open repositories worldwide and in the UK.Here are some UK wide repositories for specific types of data:

The Archaeology Data Service supports research, learning and teaching with freely available, high quality and dependable digital resources. It does this by preserving digital data in the long term, and by promoting and disseminating a broad range of data in archaeology. The ADS promotes good practice in the use of digital data in archaeology, it provides technical advice to the research community, and supports the deployment of digital technologies.
The University of Oxford Text Archive develops, collects, catalogues and preserves electronic literary and linguistic resources for use in Higher Education, in research, teaching and learning. We also give advice on the creation and use of these resources, and are involved in the development of standards and infrastructure for electronic language resources.
The History Data Service (HDS) collects, preserves, and promotes the use of digital resources, which result from or support historical research, learning and teaching. The History Data Service is a successor service to AHDS History which from 1996 to March 2008 was one of the five centres of the Arts and Humanities Data Service.

Can I use my institutional repository for data preservation?

Yes, you should be able to do this, if your institution has an institutional repository which collects research material. You should enquire of your institution if this is the case.

Can/should I deposit in more than one repository/archive?

No, it should be more than adequate to deposit in one repository but it depends on the service offered by the specific repository, e.g. does it guarantee that it will maintain access to the data over time?

Note: This page was developed by LSE/Cambridge/University of London and is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License

Thursday 5 July 2012

SHARD: Sending your research material into the future

It has been a very busy time for SHARD. We have been preparing content for the online course on the preservation of research data which will be going onto IHR's excellent History Spot website. Our course is aimed at researchers and we hope we demonstrate that we have listened well and come up with appropriate material. We did a lot of research ourselves through interviews, hearing about current practice, reviewing legacy data and also studying existing courses availiable. Not only that but Malcolm Ragget at LSE has come up with an eye catching leaflet about the preservation of research data. All the related projects funded by JISC (LSE, University of Cambridge and Bristol and us here in University of London) contributed to the content and we think it demonstrates some simple effective ways to keep your research data safe and sound.It has four bits of advice about keeping your research data over time: start thinking/planning early; explain it well; store it safely and share it. We have also drafted between us a set of frequently asked questions (FAQs) about preservation of research data betwen the three JISC projects. These will go live sometime soon. That's it for now.

Tuesday 27 March 2012

Research data preservation projects meeting.

"It's a question of discipline," the little prince told me later on. "When you've finished washing and dressing each morning, you must tend your planet." ~Antoine de Saint-Exupéry, The Little Prince, 1943.

We had a JISC Research data preservation projects meeting on Thursday 22 March. We had attendees from Cambridge, LSE, Bristol, and of course ULCC. Each project updated on its findings and progress including briefing on past and current training provision. Each project summarised the findings of their user survey. Although we had approached this in different ways (structured interviews, workshop, on-line questionnaire) our findings were remarkably similar. Even Bristol/DataSafe, which is concentrating on support staff and records data preservation, found resonances.

An important point which we see emerging again and again is the fact (and Cambridge, LSE and ULCC had all found this in our research) the phrase “research data” was not recognised by most researchers especially in the arts and humanities. They simply could not relate to the term. “Data” implies science and structural data and in large amounts. Whereas we are using it to define all and any information in any form for research purposes. LSE has already adopted “research material and data” as a catch all phrase and is a more accessible term. ULCC have not used the term metedata in relation to training as we consider it an alienating term.

We looked at comparing the identified needs of trainee community. There was much discussion about the attitudinal aspects of managing research data. People despair often of the lack of appreciation of the value of research data and why it should be kept. I personally think that we, the infromation managers, must share responsibility for this. Terms such as 'data' and 'metadata' for example are meaningless and alienating for most people not involved in the management of information. In a way we need to address our attitude to working with the research community. We have to develop ways of tailoring our approach/language to the non information management community. In a sense what I feel we need is to tend to ourselves first in order to make sure that we communicate effectively with the outside world.

Otherwise I noted that people are simply not getting the advice or assistance from their instiutions who are fostering their research. hence the value of our projects if we pitch our advice correctly. Very often there are no guidelines on the management of researc data avilable so researchers are very much left to their own devices. This can be demonstrated by their storage solutions as almost everyone we interviewed uses the cloud in one way or another. The exceptions were the few who knew the risks of the cloud (or who actually read the tems and conditions).

Issues in preservation skills included: choosing and using appropriate file formats; incorporating data preservation into their project; working with repository criteria for research data deposit.

We agreed that no one method of delivery or approach would suit all our target audiences, but having material that could be re-purposed for several modes (e.g. group training and on-line learning) would be the best tactic. Furthermore, all projects are constrained in what they will produce by their project scopes and institution-specific requirements. We did, though, identify several areas where collaboration would be mutually beneficial, so we agreed the following joint action:
Cambridge will set up a wiki to enable us to develop firstly a structure and set of questions for a FAQ, then secondly to develop where possible generic answers to these questions, accepting that some will need to be tailored for each institution;

LSE will develop and design a top-level brochure about research data preservation containing the core points and links to further information. This will be adapted from the similar-but-independent 4-point structures proposed by ULCC and Cambridge, namely: Explain it – Store it Safely – Share it – Start Early. And as the little prince said it is all a question of discipline but communicate the 'why bother' effectively and it will be a less bitter pill to swallow with remarkably beneficial results.

So, a good get together. More later.

Tuesday 20 March 2012

Thomas Hobbes and the preservation of research data.

Thomas Hobbes had a bleak view of humanity to put it mildly.
He considered that the state of nature - competing desires amongst essentially equal human beings for the limited supplies, generate conflict and, in Hobbes' most famous phrase, the life of man is 'solitary, poor, nasty, brutish and short'.

Let's say we apply this to data management and why not? I am not a philosopher but if his idea is true then we would think that people are not interested in sharing resources or thinking beyond their immediate desires and needs. This research data is mine, hands off!

I am pessimistic at the best of times but after running our training on the preservation of research data entitled 'What's in it for me?', I felt less so by the end of the day. It seemed that people do want to share their research data after publication, as they want to enhance existing research and contribute to the body of work which is essential to the understanding of the thought processes involved in research output. And yes there is the unaltruistic side to us all, a bit of appealing to the immediate desires and needs as ultimately sharing your research data will enhance your standing in the community of expertise if it is well and often cited.

The premise of our training on March 14th was to lure folk in to speak about their experiences of preserving research data in the course of their research while we learnt a whole lot from them and what they need so we can best plan and design an online course on this for the great History Spot site at IHR. Our cohort of people attending our training day came from a variety of research backgrounds and made it a rich day for information gathering about their needs and 'desires'.

'I lost my data in a USB key which fell into a cup of coffee' - Anonymous.

So why did people come to our workshop? People spoke about various drivers which brought them to us. Experiencing the loss of data seems to sharpen the mind somewhat when it comes to preservation of data. People also spoke about being 'swamped with data and the information overload', wanting to take care of the material they had gathered over the years and worried they might loose it. Language struck a chord with many around the table. a lot of people don't use the word 'data' to describe their research material. The term 'data' is regarded as scientific and as a result people in the Humanities ofen feel alienated.We also reflected on the project so far, the knowledge base which we are gathering based on legacy data assessment and interviews.

The good, the bad and the ugly of research data preservation

We thought that it would be good to show them examples of what I had found in the assessments and what I had heard in the interviews. Feedback was without exception good for the whole day and people seemed to take to this particular session! It demonstrated a variety of practical examples of documentation for research data from well documented examples to inadequate to nothing. Lack of documentation about research data is a severe inhibitor to allowing access to it in the future. If the researcher does not write down information both descriptive and technical about the data we will loose the capability to access it both intellectually and literally. Lack of safe storage was another point, people often didn't back up and relied heavily on the cloud for storage not really knowing what they were agreeing to when they signed the terms and conditions of cloud services. However some were well advanced in good storage solutions and backups and used good formats for preservation and consideration of how to future proof their material.

Intellectual Property Rights (IPR) rears its inevitable head and as Kit Good has rightly pointed out Data Protection and Freedom of Information can affect research data. Some people had data on living individuals and this would have implications in relation to data protection. Many people interviewed simply did not remember what permissions they had regarding use of the primary material they had copied or recorded. They had signed a piece of paper in the library or archive but didn't remember what it said. As a result they would not be able to share this data in the future as copyright and usage was not clear.

Four good ideas

We gave an overview of Four things which they could all do to enhance the preservation of their research data. Here are the main ideas for each of which we gave practical solutions.

1. Write everything down.
2. Store your data safely.
3. Interventions are needed, the earlier the better!
4. Consider sharing, the why and how.

Golden Opportunities

A vital part of the workshop had to be participation. we really needed to find out what these delegates thought about the preservation of research data. We gave them six opportunities. These opportunities allowed everyone time to work alone or in pairs to think about various aspects of digital preservation. This was done using the innovative pen and paper method. Everyone had a chance this way to express their opinion as we made them do so! We then wrote up all this feedback and presented it all back to them for review at the end of the afternoon.

These questions included:

1. Why bother keeping research data?
2. What are the risks of not keeping your research data?
3. Give us your examples of good and bad practise
4. What are your storage needs?
5. If you could have a single magic tool to do this, what would it be?
6. Are you comfortable with sharing your data at any time? If yes, why and if no, why?

We got tremendous answers which will guide us while developing our on line course.

What feedback did we get?

The feedback was either 'good' or 'excellent'. What pleases us more though are comments:

'More information on non microsoft software, I use Mac OS and open source which is often space hungry to me.' Patricia Croot

'Very well communicated and outlined, whole presentation extremely helpful and well organised. Look forward to more detailed training'...

'Selection of material for preservation will be necessary, I think, as every researcher generates so much data that preserving it all will be a full time job in itself' Pernille Richards.

Thanks to everyone for such a good afternoon and now to work honing our moodle skills!

Thursday 8 March 2012

Personal data, public information - Research data and information law

The SHARD project is looking at the preservation of research data for the traditional requirements of peer review, re-use and retention of digital assets. I have been asked to briefly cover – and it’s the ‘briefly’ that’s the challenge – for the project blog the subject of ‘access to information’ legislation and how it relates to the management of research data. Preserving your research data also has a legal context that is worthy of serious consideration.

What do I mean by ‘access to information’? Let’s get the acronyms established early on for three pieces of legislation: The first is the Data Protection Act 1998 (DPA), which is concerned with the ‘personal data’ of living identifiable individuals. The other two – the Freedom of Information Act 2000 (FOIA) and the Environment Information Regulations 2004 (EIR) – are concerned with ‘public’ information held by ‘public authorities’. Research data can be covered by all three.

Many researchers are looking at these issues already. Data management plans are a routine requirement for many research funding bodies. If there is no data management plan available for your research use one of the available templates provided by your institution or organisations such as JISC and the Digital Curation Centre.

Personal data

Research data that contains reference to living individuals – interview scripts, contact details, even statistical information relating to small numbers of individuals etc. – should be managed according to the eight principles of the DPA. I won’t go into too much detail about this here, as there is so much guidance already available, suffice to say that the following should be considered:

Do the individuals identified in your research data know how and for what purpose their data is being held? Have they given their consent?

Is there provision to store the personal data safely and securely?

How long are you planning to hold the personal data for? If the answer is ‘forever’, can you anonymise it and still retain its value?

If you are unsure, do ask your institution’s Data Protection Officer or similar information compliance contact. They will be keen to help. The Information Commissioner’s Office (ICO), the UK’s information and privacy regulator, now has enforcement powers to fine organisations up to £500,000 for the loss or unauthorised access of personal data. The rigour of a data management plan is therefore vital not just in protecting your research project, but your institution as a whole.

Freedom of Information

Since 2005, all organisations defined as ‘public authorities’ in England, Wales and Northern Ireland are subject to the Freedom of Information Act 2000 (FOIA). In Scotland they follow the similar (with at least one important difference for research data, as we will see) Freedom of Information (Scotland) Act 2002 (FOISA). The crux of the Act is that the public has a right of access to information ‘held’ by public authorities. If asked for information, the authority has to confirm that it is held and provide it, unless a legal exemption applies. The Environment Information Regulations 2004 provides a right of access to ‘environmental information’ under similar timescales and some slight differences in detail to FOIA but, for the purposes of this blog, my statements should generally cover both.

Universities are defined as public authorities by the Act and therefore obliged to respond to FOIA requests. This is not always as simple as it sounds, in that unlike many other public authorities, Universities operate in a competitive, increasingly international environment with an ever-decreasing proportion of public funding. More nuanced still is the relationship of the individual academic with ‘their’ research data, produced in everything from solitary sabbatical study to global partnerships of research institutions. At the same time, there is a significant ‘open access’ movement in academia which is arguing for the pro-active publication of research data through online journals and repositories.

FOIA and EIR requests have been made for research data and in some cases have required the Information Commissioner’s Office (ICO) to issue a ‘Decision Notice’ in order to ensure disclosure. Queen’s University Belfast were ordered by the ICO to release over 40 years of research data on tree rings, used for climate research (see the news item) under the EIR legislation.

There are, however, several exemptions in the FOI Act that can apply to research data requests: Section 22 ‘Intended for future publication’ allows a University to exempt information that will be later published. Section 43 ‘Commercial Interests’, exempt the disclosure of information which could prejudice the commercial interests of the University or another party, such as a partner institution or research funding body. If your research data contains personal data, then parts of it are likely to be exempt under Section 40 ‘Personal Information’. FOISA includes a specific research data exemption - Section 27(2) – but even so this derives from the general principle of ‘intended for future publication’ and is unlikely to prevent disclosure of research data held in the manner of the ‘tree ring’ dataset.

It is definitely worth reading the ICO’s guidance for the Higher Education sector around FOIA.

Once again, if you are unsure, do ask your institution’s Freedom of Information Officer or similar information compliance contact. Try and envisage in your data management plan how you would deal with a request for your research data. It may be that public disclosure of research data is a desired outcome of the project; it may require some serious consideration and discussion amongst the research team.

Conclusion

Access to information legislation in the UK can apply to research data. This can have important implications for a research project and therefore acts as another driver for ensuring that your data is managed and preserved effectively. Ensure that you create a data management plan when starting on a new project and discuss any issues with the FOI/DPA compliance officers at your institution.

Pages